Skip to main content
Data in Brief logoLink to Data in Brief
. 2024 Apr 23;54:110444. doi: 10.1016/j.dib.2024.110444

An innovative 12-lead resting electrocardiogram dataset in professional football

Adolfo Antonio Munoz-Macho a,b,, Manuel Jesus Dominguez-Morales a, Jose Luis Sevillano-Ramos a
PMCID: PMC11070232  PMID: 38708304

Abstract

This paper aims to provide a comprehensive and innovative 12-lead electrocardiogram (ECG) dataset tailored to understand the unique needs of professional football players. Other ECG datasets are available but collected from common people, normally with diseases confirmed, while it is well known that ECG characteristics change in athletes and elite players as a result of their intense long-term physical training. This initiative is part of a broader research project employing machine learning (ML) to analyse ECG data in this athlete population and explore them according to the International criteria for ECG interpretation in athletes. The dataset is generated through the establishment of a prospective observational cohort consisting of 54 male football players from La Liga, representing a UEFA Pro-level team.

Named the Pro-Football 12-lead Resting Electrocardiogram Database (PF12RED), it comprises 163 10-s ECG recordings, offering a detailed examination of the at-rest heart activity of professional football athletes. Data collection spans five phases over multiple seasons, including the 2018–2019 postseason, the 2019–20 preseason, the 2020–21 preseason, and the 2021–22 preseason. Athletes undergo medical evaluations that include a 10-s resting 12-lead ECG performed with General Electric's USB-CAM 14 module (https://co.services.gehealthcare.com/gehcstorefront/p/900995–002), with data saved using General Electric's CardioSoft V6.73 12SL V21 ECG Software. (https://www.gehealthcare.es/products/cardiosoft-v7)

The data collection adheres to ethical principles, with clearance granted by the Autonomous Community of Andalusia Ethics Committee (Spain) under protocol number 1573-N-19 in December 2019. Participants provide informed consent, and data sharing is permitted following anonymization. The study aligns with the Declaration of Helsinki and adheres to the recommendations of the International Committee of Medical Journal Editors (ICMJE).

The generated dataset serves as a valuable resource for research in sports cardiology and cardiac health. Its potential for reuse encompasses:

  • 1.

    International Comparison: Enabling cross-regional comparisons of cardiac characteristics among elite football players, enriching international studies.

  • 2.

    ML Model Development: Facilitating the development and refinement of machine learning models for arrhythmia detection, serving as a benchmark dataset.

  • 3.

    Validation of Diagnostic Methods: Allowing the validation of automatic diagnostic methods, contributing to enhanced accuracy in detecting cardiac conditions.

  • 4.

    Research in Sports Cardiology: Supporting future investigations into specific cardiac adaptations in elite athletes and their relation to cardiovascular health.

  • 5.

    Reference for Athlete Protection Policies: Influencing athlete protection policies by providing data on cardiac health and suggesting guidelines for medical assessments.

  • 6.

    Health Professionals Training: Serving as a training resource for health professionals interested in interpreting ECGs in sports contexts.

  • 7.

    Tool and Application Development: Facilitating the development of tools and applications related to the visualization, simulation and analysis of ECG signals in athletes.

Keywords: Electrocardiographic data, Elite players health, Sports cardiology, Arrhythmia diagnosis, Cardiology screening, Sudden death prevention


Specifications Table

Subject Signal Processing, sports health, sports and exercise medical sciences.
Specific subject area Evaluation of 10 s 12-lead electrocardiogram in sports cardiology.
Data format Raw, XML, CSV/Excel, PDF
Type of data Table, XML
Data collection Resting in supine position, each participantʼs 12-lead electrocardiogram (ECG) was captured with General Electrics (GE) USB-CAM 14 for a duration of 10 s at 500 Hz using the GE CardioSoft software.
Filtering processes were performed on the raw data helped by the ECG Visualizer software [1].
Data source location The data were gathered from La Liga, Spain, from professional football players. For confidentiality reasons, the specific football team or geographical location is not disclosed in this study.
Country: Spain
Data accessibility Repository name: PF12RED - Pro-Football 12-lead Resting Electrocardiogram Database
Direct URL to data: https://github.com/dradolfomunoz/PF12RED[2]
Instructions for accessing these data: Access to the data is public and freely available to anyone with internet access and the dataset's web address.

1. Value of the Data

  • Specialized Athlete ECG Database: This dataset addresses a critical gap by focusing on professional football players, offering unique insights into their cardiac activity during cardiological screenings at rest compared to the general population [3], [4], [5], [6], [7].

  • Advancing Sports Cardiology: Researchers and practitioners in sports cardiology can utilize this dataset as a reference for understanding and diagnosing cardiac conditions specific to elite football athletes [8,9].

  • Machine Learning Applications: The dataset, paired with the ECG Visualizer tool [1], provides a foundation for developing and testing machine learning models, potentially automating diagnostic processes for arrhythmias in athletes [10].

  • Influencing Athlete Protection Policies: Sports organizations, including UEFA and FIFA, can benefit from this dataset to enhance pre-competitive health screenings and contribute to the formulation of athlete protection policies [11,12].

  • Multicentric Studies: The dataset's innovative nature encourages collaboration, supporting multicentric studies and data sharing in various sports, promoting a comprehensive understanding of cardiac health in athletes.

2. Background

As the first public database of its kind, this innovative 12-lead resting electrocardiogram dataset in professional football was assembled to fill a gap in the availability of ECG data from professional athletes. The dataset endeavours to provide UEFA and FIFA with a fundamental resource related to professional players, potentially influencing forthcoming rules and protocols in the field of sports cardiology.

The dataset was designed as an evolving platform, its creation was guided by the limited but relevant literature, such as the works by Bohm et al. [9] and the consensus statement by Drezner et al. [8], which provide context for interpreting ECGs in athletes. The methodology for data collection adhered to UEFA's pre-participation screening recommendations [12], ensuring relevance and applicability to current sports health practices.

With its forthcoming expansion to encompass athletes of all genders, whether professional or not, the dataset is positioned to serve as a crucial resource for sports cardiology research and development, particularly in the areas of methodology and clinical practise. The dataset is unique and is undergoing analysis in an effort to make it available to the wider research community and organisations tasked with formulating health policies in the sports industry.

3. Data Description

The dataset described in this paper is available a Github repository. The repository comprises several files and folders (See Fig. 1):

Fig. 1.

Fig. 1

The electronic repository structure.

The repository has an ordinated structure with a Readme.md archive, that contains a reference and brief explanation of the repository, and a License archive that provides details on the Creative Commons Legal Code and the CC0 1.0 Universal license utilised.

The “XML&PDF Table Description.xlsx” file, can also be found in the repository, and it includes a chronogram and details the type of data stored in each session. Yellow highlights indicate the XML and PDF references for each player.

“LabelData.xlsx” file is a table with critical electrocardiogram (ECG) data and descriptions of clinical labels for the 54 professional football players. The source of this data is lead "II." (See Table 1, Table 2)

Table 1.

Extract and Distribution of recordings in LabelData.xlsx showing relevant findings in relation to International Criteria for ECG Interpretation [8]: Sinus Rhythm (SR), Sinus Bradycardia (SB), Incomplete Right Bundle Branch B (iRBBB), T Wave Inversion (TWI).

SR SB iRBBB N T Wave Inversion (TWI)
I II III aVR aVL aVF V1 V2 V3 V4 V5 V6
X X 1 X X
X 2 X
X X 3 X X
X 4 X
X 5 X X
X 6 X X X
X 7 X
X X 8 X X
X X 9 X X X
X 10 X
X 11 X X
X 12 X X
X 13 X
X 14 X X X
X 15 X
X 16 X X X
X 17 X X X
X 18 X X
X 19 X
X 20 X X X
X 21 X
X 22 X X
X 23 X X X
X 24 X
X 25 X X
X X 26 X X
X X 27 X X X
X 28 X X
X 29 X X
X 30 X X
X 31 X X
X 32 X X
X X 33 X X X
X 34 X
X X 35 X X
X 36
X 37 X X
X X 38 X
X 39 X X
X 40 X X
X 41 X
X 42 X
X 43 X X
X 44 X X
X 45 X
X X 46 X X
X 47 X X
X 48 X X
X X 49 X X
X 50 X X
X 51 X X
X 52 X X X X X
X 53 X
X 54 X X
19 35 11 N 0 0 5 50 0 0 38 4 1 2 2 1
35,19 64,81 20,37 % Total 0,00 0,00 9,26 92,59 0,00 0,00 70,37 7,41 1,85 3,70 3,70 1,85

Table 2.

Extract and Distribution of recordings in LabelData.xlsx showing relevant findings in relation to Age (years), Weigh (Kg), Heigh (cm), Race, SysBP (mmHg), DIABP (mmHg), Ventricularrate (bpm), PQInterval (ms), PQInterval (ms), QRSDuration (ms), QTInterval (ms), QTCInterval (ms), RRInterval (ms) and P and R Axis (°).

Age (years) Weight (Kg) Height (cm) Race SysBP (mmHg) DiaBP (mmHg) Ventricular Rate (bpm) PQInterval (ms) QRSDuration (ms) QTInterval (ms) QTCInterval (ms) RRInterval (ms) PPInterval (ms) Paxis (°) RAxis (°)
1 37 84 183 Caucasian 105 70 55 196 110 448 428 1084 1090 65 84
2 23 77 184 Caucasian 107 75 62 280 92 414 420 960 965 36 82
3 29 75 183 Caucasian 105 70 55 158 110 432 413 1082 1090 44 88
4 32 82 187 Caucasian 123 83 47 152 100 430 380 1262 1275 −18 74
5 35 72 184 Caucasian 106 79 38 194 106 494 392 1590 1575 77 72
6 32 77 181 Latin 102 73 63 194 96 426 435 958 950 37 41
7 24 76 184 Caucasian 127 73 69 184 92 400 428 862 865 48 71
8 28 77 180 Caucasian 118 71 48 142 112 462 412 1250 1250 51 87
9 31 64 171 Caucasian 112 71 61 176 112 432 434 978 980 47 16
10 35 74 182 Caucasian 122 81 52 166 92 428 398 1144 1150 70 75
11 26 74 185 African 109 79 42 180 104 482 402 1444 1425 62 76
12 21 75 178 Caucasian 120 83 66 148 100 392 410 910 905 40 63
13 28 75 178 Caucasian 115 82 50 230 94 428 390 1196 1200 80 26
14 35 82 188 Latin 132 81 45 146 78 464 401 1348 1330 38 69
15 30 74 182 Caucasian 126 80 54 182 108 426 403 1112 1110 34 53
16 23 68 170 Caucasian 115 78 55 162 84 440 420 1084 1090 61 38
17 23 67 178 Caucasian 105 80 38 150 104 474 376 1592 1575 41 35
18 30 79 186 Latin 135 75 63 206 98 428 437 946 950 42 88
19 20 70 172 African 110 72 43 192 102 460 388 1396 1395 51 55
20 22 80 177 Latin 135 83 64 166 94 432 445 940 935 55 90
21 34 70 168 Caucasian 115 70 48 210 86 440 393 1242 1250 37 12
22 30 88 189 Caucasian 126 78 49 154 112 430 388 1234 1220 41 70
23 33 88 190 Caucasian 115 71 47 126 108 460 407 1278 1275 75 84
24 30 79 185 Caucasian 133 83 39 202 102 480 386 1530 1535 18 56
25 23 70 177 Caucasian 128 66 69 152 102 396 424 864 865 57 74
26 23 63 169 Latin 117 76 58 178 110 416 408 1028 1030 43 63
27 22 82,5 192 Caucasian 121 81 63 150 108 386 395 956 950 69 60
28 23 76 182 Latin 124 76 73 154 104 374 412 816 820 72 80
29 33 85 190 Caucasian 112 80 63 110 104 426 435 958 950 42 45
30 27 65 170 Caucasian 120 80 56 130 98 420 405 1078 1070 70 70
31 21 72 176 Caucasian 113 85 77 164 96 390 441 780 775 67 63
32 22 73 180 Caucasian 109 75 51 124 96 422 388 1166 1175 103 90
33 22 75 181 Caucasian 128 69 64 160 120 410 422 930 935 68 −24
34 28 65 171 Caucasian 127 68 53 144 90 426 399 1122 1130 69 70
35 19 62 172 Caucasian 122 79 51 124 112 442 407 1166 1175 35 77
36 21 80 180 Caucasian 115 79 95 154 104 376 472 628 630 72 75
37 20 81 191 Caucasian 111 81 56 162 104 470 453 1072 1070 75 93
38 18 85 190 Caucasian 128 85 65 132 110 412 428 918 920 74 59
39 26 70,5 175 Caucasian 118 70 46 140 106 464 406 1298 1300 72 69
40 24 79,3 185 African 113 81 50 162 98 436 397 1196 1200 6 68
41 22 74 175 Caucasian 128 82 46 164 94 426 372 1316 1300 70 100
42 24 64 167 Caucasian 107 74 60 164 104 414 414 1000 1000 58 82
43 25 68,6 174 Caucasian 101 85 59 172 100 422 417 1020 1015 63 95
44 22 68 175 Caucasian 129 76 55 170 96 416 397 1098 1090 60 62
45 25 75 188 African 127 84 53 164 108 424 397 1132 1130 7 65
46 19 60 170 Asian 106 78 54 146 120 466 441 1120 1110 48 59
47 31 73 180 Caucasian 115 77 46 176 94 436 381 1294 1300 57 82
48 19 83 190 Caucasian 115 76 65 336 102 386 401 926 920 26 100
49 24 76 182 Caucasian 106 79 58 196 114 420 412 1026 1030 48 54
50 18 77 186 African 128 84 42 250 106 462 385 1440 1425 31 83
51 21 79 185 Caucasian 109 72 45 186 94 414 358 1348 1330 77 84
52 29 75 176 African 107 79 67 192 104 464 490 902 895 48 63
53 29 86 189 Latin 128 79 70 170 92 388 419 858 855 61 72
54 19 75 190 Caucasian 106 75 58 150 90 418 410 1032 1030 32 97
Age Weight Height Race SysBP DiaBP Ventricular Rate PQInterval QRSDuration QTInterval QTCInterval RRInterval PPInterval PAxis RAxis
Average 25,74 74,91 180,61 9,86 5,02 9,60 38,71 8,01 28,08 23,00 192,04 191,01 25,21 25,08
SD 5,16 6,78 6,92 118,45 77,18 54,33 169,98 101,32 434,01 408,85 1.136,43 1.129,32 52,09 67,63

“163XML” Folder: Contains individual anonymized XML files, each showcasing basic player data and raw 5000-block per lead for 10 s. The file naming format is 0_0000Xxx.XML, where '0′ denotes the subject number, '0000′ represents the season, and 'Xxx' indicates whether the ECG was conducted at the beginning or end of the season.

“51_Individual_PDF” Folder: Similar to the 163XML Folder, follows the same naming structure. It includes reference ECGs in a PDF graphic format with two sheets presenting the 12 leads as commonly evaluated by medical professionals.

Fig. 2 illustrates a comparison between the ECG representation in PDF and XML in the ECG Visualizer.

Fig. 2.

Fig. 2

a) Excerpt of 10-s 12-lead ECG in PDF format. b) Excerpt of 10-s ``II'' lead in ECG Visualizer from xml format.

4. Experimental Design, Materials and Methods

4.1. Data acquisition and processing

Participants: The data was gathered from 54 male La Liga UEFA Pro-level football players. The characteristics of the population are shown in Table 3.

Table 3.

Population characteristics.

Age (y) Height (cm) Weight (Kg)
Average 25.74 180.61 74.90
±SD 5.16 6.92 6.78

Data Acquisition: The data was extracted with the Medical Stress Acquisition CAM-14 Module Kit from General Electric, which was connected to a personal computer that was operating the CardioSoft V6.73 12SL V21 software from General Electric. The ECGs were archived in both XML and PDF formats. Diagnostic data was exported to XML/Excel files from the local server. A CSV file was utilised to store the reference ECG selections for each participant.

Procedure: Prospective observational cohort across five phases: 2018–2019 postseason, 2019–20 preseason, 2019–20 postseason, 2020–21 preseason, and 2021–22 preseason was performed.

An annual average of 25–30 players composes a professional football squad. Cardiovascular screening is a requirement outlined in the regulations of UEFA [12]. A 10-second, 12-lead electrocardiogram (ECG) was obtained during the data collecting process using the GE Medical Stress Acquisition CAM-14 Module Kit in conjunction with General Electricʼs CardioSoft V6.73 12SL V21 software. Also included in the sample are end-of-season electrocardiograms.

The cardiological screening was developed as follows:

  • (1)

    The players arrived at the medical center and read and signed the informed consent for the cardiological screening and the research project.

  • (2)

    In the supine position on a table the electrodes corresponding to the 12 leads were connected to the athlete.

  • (3)

    USB-CAM 14 was connected to the 12 leads.

  • (4)

    Cardiosoft software started the data acquisition at 500 Hz in a continuous form.

  • (5)

    AAMM as a Sport and Exercise physician, selected the proper 10-s ECG.

  • (6)

    The data acquisition was stopped.

Data Validation, Labelling and Characterization: The cardiological screening was the responsibility of the principal investigator, AAMM, a qualified sport and exercise physician with expertise in sports cardiology. As such, he inspected the acquisition of each ECG and XML, manually labelled features, and results in person. Every ECG and PDF report underwent a data validation check. The GE ECG Software was utilised to store the final diagnosis, and a resume CSV file was posted to the repository. https://github.com/dradolfomunoz/PF12RED.

For the purpose of validating the data and enabling free-to-use software to extract and display the ECG, the second author MJDM created the ECG Visualizer Tool, which also performed data translation and CSV format conversion. [1] https://github.Com/Mjdominguez/ECGVisualizer. This tool permitted to apply of different filtering methods as fixed window average, sliding window average, sliding window median, and band rejection filter.

Also, a noise reduction process like the sequential noise reduction method is applied to raw ECG data to address power line interference, electrode contact noise, motion artefacts, muscle contraction, and baseline noise. Finally, signal filtering and peak detection were used for signal filtering, eliminating noise, and automatically detecting P, Q, R, S, and T peaks. Four types of filtering: fixed window average, sliding window average, sliding window median, and band rejection filter.

To develop characterization and possible initial uses of the dataset, Machine Learning Development was done with the ECG Visualizer's reports for developing three classifiers: Random Forest, Support Vector Machine, and Artificial Neural Network. Hold-Out technique was applied with a 70–30 division for training and testing subsets and ML Classifiers were used:

  • Random Forest: 10 estimators, unlimited maximum features.

  • Support Vector Machine: Parameters obtained from the optimization process.

  • Artificial Neural Network: Multilayer perceptron network with specific hyperparameters.

Limitations

The proposed dataset has certain limitations. Firstly, the data collection was conducted during typical seasons, posing challenges in organizing comprehensive data. The dataset might lack diversity due to limitations in the availability of individuals of different genders and ages. Despite efforts to seek more diversity, this constraint may affect the generalizability of the findings of studies based on our dataset. Additionally, we acknowledge missing data and potential loss of follow-up, impacting the overall completeness of the dataset. While these limitations don't negate the dataset's significance, they highlight considerations for researchers interpreting and utilizing the data for further investigations.

Ethics Statement

The data collection in this study involved human subjects. Relevant informed consent was obtained from the subjects. Ethical committee approval was obtained from the Autonomous Community of Andalusia Ethics Committee (Spain), with protocol number 1573-N-19. The study aligned with the Declaration of Helsinki and adheres to the recommendations of the International Committee of Medical Journal Editors (ICMJE).

CRediT authorship contribution statement

Adolfo Antonio Munoz-Macho: Conceptualization, Methodology, Investigation, Resources, Data curation, Writing – original draft. Manuel Jesus Dominguez-Morales: Software, Formal analysis, Resources, Validation, Writing – review & editing. Jose Luis Sevillano-Ramos: Supervision, Writing – review & editing.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

References

  • 1.M. Dominguez-Morales, A.A. Munoz-Macho, J. Sevillano-Ramos, ECG Visualizer Software Tool, https://Github.Com/Mjdominguez/ECGVisualizer (2023).
  • 2.A.A. Munoz-Macho, M. Domínguez-Morales, J.L. Sevillano, An Innovative 12-Lead Resting Electrocardiogram Dataset in Professional Football, https://Github.Com/Dradolfomunoz/PF12RED (2024). https://zenodo.org/badge/DOI/10.5281/zenodo.10864238.svg. [DOI] [PubMed]
  • 3.Moody G.B., Mark R.G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 2010;20:45–50. doi: 10.1109/51.932724. http://www.ncbi.nlm.nih.gov/pubmed/11446209 [DOI] [PubMed] [Google Scholar]
  • 4.Taddei A., Distante G., Emdin M., Pisani P., Moody G.B., Zeelenberg C., Marchesi C. The European ST-T database: standard for evaluating systems for the analysis of ST-T changes in ambulatory electrocardiography. Eur. Heart J. 1992;13 doi: 10.1093/oxfordjournals.eurheartj.a060332. [DOI] [PubMed] [Google Scholar]
  • 5.J. Zheng, J. Zhang, S. Danioko, H. Yao, H. Guo, C. Rakovski, A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients, Sci. Data 2020 7:1 7 (2020) 1–8. 10.1038/s41597-020-0386-x. [DOI] [PMC free article] [PubMed]
  • 6.M. Elgendi, Fast T-wave detection with annotation of P and T waves in the MIT-BIH arrhythmia database, n.d. http://www.elgendi.net/databases.htm. (accessed January 27, 2019).
  • 7.Duarte R.P., Marinho F.A., Bastos E.S., Pinto R.J., Silva P.M., Fermino A., Denysyuk H.V., Gouveia A.J., Gonçalves N.J., Coelho P.J., Zdravevski E., Lameski P., Tripunovski T., Garcia N.M., Pires I.M. Extraction of notable points from ECG data: a description of a dataset related to 30-s seated and 30-s stand up. Data Br. 2023;46 doi: 10.1016/J.DIB.2022.108874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Drezner J.A., Sharma S., Baggish A., Papadakis M., Wilson M.G., Prutkin J.M., La Gerche A., Ackerman M.J., Borjesson M., Salerno J.C., Asif I.M., Owens D.S., Chung E.H., Emery M.S., Froelicher V.F., Heidbuchel H., Adamuz C., Asplund C.A., Cohen G., Harmon K.G., Marek J.C., Molossi S., Niebauer J., Pelto H.F., Perez M.V., Riding N.R., Saarel T., Schmied C.M., Shipon D.M., Stein R., Vetter V.L., Pelliccia A., Corrado D. International criteria for electrocardiographic interpretation in athletes: consensus statement. Br. J. Sports Med. 2017;51:704–731. doi: 10.1136/bjsports-2016-097331. [DOI] [PubMed] [Google Scholar]
  • 9.Bohm P., Ditzel R., Ditzel H., Urhausen A., Meyer T. Resting ECG findings in elite football players. J. Sports Sci. 2013;31:1475–1480. doi: 10.1080/02640414.2013.796067. [DOI] [PubMed] [Google Scholar]
  • 10.P. Rajpurkar, A.Y. Hannun, M. Haghpanahi, C. Bourn, A.Y. Ng, Cardiologist-level arrhythmia detection with convolutional neural networks, https://Arxiv.Org/Pdf/1707.01836.Pdf (2017). https://stanfordmlgroup.
  • 11.Egger F., Scharhag J., Kästner A., Dvořak J., Bohm P., Meyer T. FIFA Sudden Death Registry (FIFA-SDR): a prospective, observational study of sudden death in worldwide football from 2014 to 2018. Br. J. Sports Med. 2022;56:80–87. doi: 10.1136/bjsports-2020-102368. [DOI] [PubMed] [Google Scholar]
  • 12.II - Medical examination of players • UEFA Medical Regulations • Lector • Documents UEFA, (n.d.). https://documents.uefa.com/r/e_a_0zs~8Ut55Hay0CW8yQ/ir0ZJfBWq_wZK2aVPEea7A.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES