Abstract
Blood pressure (BP) is one of the most prominent indicators of potential cardiovascular disorders. Traditionally, BP measurement relies on inflatable cuffs, which is inconvenient and limit the acquisition of such important health-related information in general population. Based on large amounts of well-collected and annotated data, deep-learning approaches present a generalization potential that arose as an alternative to enable more pervasive approaches. However, most existing work in this area currently uses datasets with limitations, such as lack of subject identification and severe data imbalance that can result in data leakage and algorithm bias. Thus, to offer a more properly curated source of information, we propose a derivative dataset composed of 380 hours of the most common biomedical signals, including arterial blood pressure, photoplethysmography, and electrocardiogram for 1,524 anonymized subjects, each having 30 segments of 30 seconds of those signals. We also validated the proposed dataset through experiments using state-of-the-art deep-learning methods, as we highlight the importance of standardized benchmarks for calibration-free blood pressure estimation scenarios.
Subject terms: Predictive markers, Quality of life, Signal processing
Background & Summary
The majority of the calibration-free blood pressure estimation methods in the literature use publicly available datasets, of which the most commonly cited are: MIMIC (Medical Information Mart for Intensive Care, previously Multiparameter Intelligent Monitoring in Intensive Care) and the Cuff-Less Blood Pressure Estimation Dataset available in the University of California Irvine (UCI) machine learning repository. MIMIC is a publicly available database provided by the PhysioNet organization1. It comprises anonymized health data associated with thousands of intensive care unit admissions and is available in several versions (MIMIC, MIMIC-II, MIMIC-III, MIMIC-IV)2–5. Our work is based on the MIMIC-III Waveform Database Matched Subset6 because it is the version that allows, if required, to retrieve demographic information from the patients. This subset contains 22,317 waveform records and 22,247 numerics records for 10,282 distinct intensive care unit patients. These recordings typically include digitized signals such as electrocardiogram (ECG) (leads I, II, V), arterial blood pressure (ABP), respiration (RESP), and photoplethysmography (PPG), as well as periodic measurements such as heart rate, oxygen saturation, systolic and diastolic blood pressures. Several other works have already used the MIMIC database as reference to propose solutions in the area7–12.
On the other hand, the Cuff-Less Blood Pressure Estimation Dataset13 available in the UCI repository (BP-UCI) is a subset generated from the MIMIC-II database created by Kachuee et al.14. This dataset contains 12,000 records of 942 patients, with simultaneous PPG, ABP, and ECG data using different signal processing techniques, such as smoothing all signals using a moving average filter, removing blood pressure (BP) and heart rate (HR) outliers and severe discontinuities detected using the autocorrelation of PPG signals. However, the BP-UCI dataset has the issue of suppressing the patient identification information and, consequently, the possibility of tracking if two given records belong, or not, to the same subject, which could lead to data leakage in machine learning pipelines. Additionally, there have been discrepancies in the reported number of records between the BP-UCI dataset (12,000 records)13 and the number of records reported in the original study (4,254)14. These inconsistencies make it difficult to use this dataset without train-test contamination and misleading results. There is also no information related to the number of subjects and amount of data collected per subject. Although the number of records and signals is relatively large, there is only a reported guarantee that if a subject has more than one record, it will be presented sequentially in the dataset with no boundary indication between subjects. Thus, it remains unclear how many subjects exist in the dataset and which recordings belong to which subject. This lack of information can result in problems such as data imbalance and leakage since it is impossible to prevent train-test contamination without the specification of the patient’s data. From a robust machine learning solution perspective, the data leakage and imbalance can affect the results substantially by hindering generalization and producing considerably better results than when testing on a disjoint set of subjects. Thus, considering the extensive use of the BP-UCI dataset in the literature15–20, we present an alternative dataset that mitigates the pointed issues. The performance of any machine learning (ML) or deep learning (DL) model depends on the quantity, quality, and technical details of the dataset used for training and testing. Differences concerning size, number of subjects, and applied pre-processing steps would make unfair the model performance comparisons among different works.
Published work on BP estimation relies on the combination of PPG and ECG signals or solely on PPG. Therefore, these are our main signals of interest plus the ABP signal, from which systolic (SBP) and diastolic (DBP) blood pressure values can be extracted. Figure 1 presents the components of the proposed dataset. This work highlights the need for having a unified benchmark dataset, aiming to facilitate the comparison of models performances and evaluation of their generalization in a reliable way.
Fig. 1.
Overview of the constituent parts of the proposed dataset.
Methods
A waveform record from MIMIC-III Waveform Database Matched Subset6 is a binary file including, typically, digitized signals such as PPG, ECG, ABP and respiration (RESP), sampled at 125 Hz with 8-, 10-, or 12-bit analog-to-digital resolution. The whole database has a total uncompressed size of 2.4 TB. The data was collected from patients who stayed in critical care units of the Beth Israel Deaconess Medical Center, Boston, USA, between 2001 and 2012.
We justify creating a curated dataset since MIMIC-III Waveform Database Matched Subset has significant data imbalance among patients, and records with absent signals or poor quality. Not all 10,282 patients contain records with, simultaneously, all the signals of interest, namely PPG, ECG-II, and ABP. Only 2,825 patients contain at least one record with all the signals of interest. Thus, our interest relies only on these 2,825 patients that have different amounts of records with different sizes (time duration). For example, 244 patients have just one record; at the other extreme, the patient p099383 has 654 records. Figure 2(a) presents the distribution of the number of records per patient among the 2,825 patients.
Fig. 2.
Distributions of number of records and time per patient from the original MIMIC-III database6. The y-axis is on a logarithmic scale. We can see a significant imbalance in both cases.
Patients also have different recording duration of signals. Patient p099383, with the most number of records, has 592 hours of signals, while patient p018846, with a total of 266 records, has 1,154 hours. Conversely, the patient p088186 has just one record with 8 minutes. Figure 2(b) illustrates the distribution of recording time among patients. Thus, to make the data from the MIMIC-III database even more helpful in the machine learning scenario, some work must be done to select the most significant number of patients producing the most balanced data as possible.
The purpose of the MIMIC-BP dataset proposed in this paper is to provide an organized and cleaned subset from the MIMIC-III database so that
every patient has a unique identification,
have the same amount of data,
all signals of interest present,
most noisy and deformed signals are avoided,
original raw data is available (no pre-processing),
the dataset is easily accessible and downloadable.
In the original MIMIC-III database, there is no guarantee that all records simultaneously contain all signals of interest. Also, some of the signals frequently present problems such as noise, saturation, and severe irregularities. Therefore, we developed some criteria for selecting acceptable MIMIC-III record segments. For example, pulse arrival time (PAT) must be consistent along the segment. The choice of 30 segments of 30 seconds per patient was the best combination to allow the total number of patients to be over 1,500. Additionally, segments of 30 seconds provide flexibility and room for experimentation of techniques that may require a more extended and continuous sequence of data samples. A 30-second segment is present in the proposed dataset if
ECG and PPG have the same fundamental frequency,
PAT is consistent along the segment,
ABP values are within reasonable boundaries,
Pulse pressure is within the acceptable limits.
The above requirements are attended with the following sequence of steps.
Fundamental frequency from ECG and PPG can be easily estimated from the raw signals via discrete Fourier transform (DFT), since the data is not affected by motion artifacts. At every 10-second time window, the DFT is applied to ECG and PPG windows and the estimated fundamental frequencies should not be apart by more than 0.3 Hz. Note that a 10-second window at sampling frequency 125 Hz results in 1,250 samples per window and a frequency resolution of 0.1 Hz.
Once the fundamental frequency of the signals is determined, beat-to-beat analysis detects the peaks of ECG and PPG at every heartbeat. PAT was determined as the time difference between ECG R-peak and corresponding PPG peak, and it was estimated at every cycle of these waveforms along the 30-second duration of every segment. Consistent PAT values guarantee not only the simultaneous presence of ECG and PPG signals but also their synchronicity and absence of noise contamination. Noisy signals would introduce variations on PAT that should exclude those signals, according to this criterion.
Sequentially, the corresponding ABP signal is analyzed to conform to acceptable DBP and SBP values. The maximum and minimum blood pressure should not surpass the extreme21 boundaries 200 and 30 mmHg. More specifically, systolic blood pressure should not be inferior to 60 mmHg, while diastolic blood pressure should not exceed 120 mmHg.
Finally, pulse pressure, the difference between SBP and DBP, should not be superior to a high pulse pressure reference of 100 mmHg, and should not be inferior to narrow pulse pressure, considered a quarter of the SBP value22. This criterion helped to exclude patients with conditions too far from normality, for example, a low amplitude variation between SBP and DBP. We also removed ABP waveforms with constant values greater than 0.04 s (5 samples) from the dataset, eliminating signal saturation issues.
Table 1 specifies the main characteristics of MIMIC-BP. It is worth emphasizing that the signal segments are disjoint (there is no temporal overlap). Figure 3 presents an excerpt of 5 seconds from a segment. It is noteworthy to see in this figure the correlation among the envelopes of the ABP and PPG waveforms with the RESP signal. RESP waveform presents complementary information, but not all patient records contain this signal. Additionally, although ABP is considered a reliable method for blood pressure assessment, it may be affected by drifts over time. Such drifts are usually corrected by the hospital's medical staff, using a non-invasive device, which results in a more trustworthy measurement. In order to analyze if such corrections are being made, and further guarantee the quality of the ABP signal, we compared the concordance between ABP signals and non-invasive BP (NBP) measurements, available in the MIMIC-III Matched Subset, and found that ABP and NBP are well correlated (ρ = 0.8, with an average absolute difference of 2 mmHg), indicating good concordance between both sources of measurement.
Table 1.
Main characteristics of the proposed MIMIC-BP.
Fig. 3.
First 5 seconds of waveforms from segment 29 of patient p093833. Left y-axis is ABP (green, solid line style), varying from 90 to 150 mmHg and right y-axis the measurements of ECG (blue, dotted), PPG (orange, dashed) and RESP (black, dash-dotted).
The blood pressure distribution of the resulting dataset was analyzed next. Every 30-second segment from the ABP signal was further subdivided into 10-second analysis windows, and the SBP and DBP were taken. The corresponding histograms are presented in Fig. 4. Each histogram contains 1, 524 × 30 × 3 = 137, 160 values from 1,524 patients, each having 30 segments of 30 seconds.
Fig. 4.

MIMIC-BP histograms of DBP and SBP taken from 137,160 ABP windows of 10 seconds.
It can be seen, mainly on the DBP histogram, that the values are lower than expected for the healthier population (because the original MIMIC database was collected on patients at intensive care units). Nevertheless, the histogram variability conforms to the required by ISO 81060-2:201823 if the histograms are shifted to the right by specific amounts. If 21.2 mmHg is added to the DBP values and 11.8 mmHg is added to SBP, the blood pressure distribution nearly attends the required by the ISO standard, as shown in Table 2. The presented histograms in Fig. 4 — as well as the data in the MIMIC-BP dataset — were not shifted or modified in any way, shape or form from the original data in the MIMIC-III database. The reasoning for shifting the values in Table 2 of SBP and DBP was solely applied to show that the variability in ABP values conforms to the expected ABP distribution by the ISO standard for the purposes of testing and clinical equipment certification. ISO 81060-2 is the only standard recognized by the US Food and Drug Administration (FDA) for the clinical investigation of automated measurement type non-invasive sphygmomanometer. This standard requires the proportions of the hypotensive and hypertensive groups to be evaluated as a representative sample. Therefore, it is possible to develop a more general algorithm by using the MIMIC-BP dataset, and it is thought to be more helpful when applying the developed algorithm to real life.
Table 2.
BP counts for the shifted DBP and SBP and the ISO 81060-2:2018 requirements.
| BP (mmHg) | Count | ISO |
|---|---|---|
| DBP ≤ 60 | 6,417 (4.7%) | 5% |
| DBP ≥ 85 | 33,142 (24.2%) | 20% |
| DBP ≥ 100 | 6,515 (4.7%) | 5% |
| SBP ≤ 100 | 6,403 (4.7%) | 5% |
| SBP ≥ 140 | 33,806 (24.6%) | 20% |
| SBP ≥ 160 | 6,345 (4.6%) | 5% |
We highlight that the shift was applied to the values in this table just to show the adherence of our proposed dataset to the ISO norm, which is important to obtain certification in blood pressure estimation. The data in our dataset is not shifted.
The dataset also provides labels with the median values of the SBP and DBP for each 30-second segment of ABP. That is, the SBP and DBP may vary up to 25 mmHg along a 30-second segment and the provided labels correspond to the respective median values for the entire segment. The median was chosen over the average value as it is a statistical parameter less severely affected by the presence of outliers, thus better representing the segments’ SBP and DBP values. If needed, demographics and clinical records of the patients are available in the MIMIC-III Clinical Database24. Figure 5 illustrates the inclusion criteria flux and Table 3 presents a cohort summary of patients’ descriptive statistics by care unit, following a similar structure as Table 1 in the MIMIC-III data descriptor4. The care units described are Cardiac Care Unit (CCU), Cardiac Surgery Recovery Unit (CSRU), Surgical Intensive Care Unit (SICU), Medical Intensive Care Unit (MICU), Trauma Surgical Intensive Care Unit (TSICU) and Neonatal Intensive Care Unit (NICU). For Age, Weight, Length of Stay (LOS) and 24H Sequential Organ Failure Assessment (SOFA) score, we provide the median and, in parenthesis, the first and third quartiles. Furthermore, we denote the existence of a newborn in the subset, patient p028331, as their data may strongly differ from the general population of the database. Additionally, Table 4 presents a comparison between the MIMIC-III Clinical database and the MIMIC-BP subset with respect to the distribution of the four most common ethnic groups/races, while Table 5 shows the disease distribution along the database (grouped using ICD-9 coding, the same as Table 2 in the MIMIC-III data descriptor4), by care unit. Additional clinical information, such as the SOFA score, can obtained in the mimiciii_derived database, available on Google BigQuery.
Fig. 5.
Cohort summary: patient inclusion diagram.
Table 3.
Cohort summary: MIMIC-BP patients descriptive statistics by critical care unit.
| Critical care unit | CCU | CSRU | SICU | MICU | TSICU | NICU | Total |
|---|---|---|---|---|---|---|---|
| N (% of total) | 131 (8.6%) | 82 (5.4%) | 591 (38.8%) | 517 (33.9%) | 202 (13.2%) | 1 (0.1%) | 1,524 (100%) |
| Age, years | 68 (60-77) | 70 (61-76) | 60 (50-70) | 60 (50-70) | 57 (41-70) | - (-) | 61 (50-72) |
| Weight, kg | 84 (69-102) | 81 (68-92) | 77 (66-92) | 79 (67-79) | 80 (68-95) | - (-) | 79 (67-94) |
| N males (relative %) | 95 (72.5%) | 47 (57.3%) | 291 (49.2%) | 268 (51.8%) | 115 (56.9%) | - (-) | 816 (53.5%) |
| ICU LOS, days | 5.0 (3.1-10.7) | 3.1 (1.6-5.9) | 4.1 (2.0-8.4) | 4.8 (2.7-8.7) | 3.7 (2.0-8.8) | 53.6 (-) | 4.2 (2.3-8.6) |
| Hosp. LOS, days | 9.1 (5.4-14.5) | 8.6 (5.8-12.2) | 10.4 (6.5-17.9) | 11.7 (7.7-19.8) | 9.9 (6.3-16.4) | 53.6 (-) | 10.6 (6.5-17.6) |
| SOFA score | 5.9 (4.1-8.3) | 5.7 (4.0-7.2) | 4.2 (2.8-5.9) | 5.6 (4.2-7.7) | 4.9 (3.6-6.3) | - (-) | 4.9 (3.5-6.9) |
Table 4.
MIMIC-BP database predominant ethnic groups/races compared with MIMIC-III population.
| Ethnicity/Race | White | Black | Hispanic | Asian | Other/not specified |
|---|---|---|---|---|---|
| (MIMIC-BP) Count (% of total) | 1,037 (68.1%) | 129 (8.5%) | 66 (4.3%) | 48 (3.1%) | 244 (16.0%) |
| (MIMIC-III Clinical) Count (% of total) | 32,313 (69.5%) | 3,848 (8.3%) | 1,641 (3.5%) | 1,686 (3.6%) | 7,032 (15.1%) |
Table 5.
Cohort summary: Distribution of diseases along the database. Patients can have multiple conditions at once.
| Critical care unit | CCU | CSRU | SICU | MICU | TSICU | NICU | Total (% relative to N) |
|---|---|---|---|---|---|---|---|
| Infectious/parasitic diseases | 8 (5.5%) | 5 (3.4%) | 41 (28.3%) | 75 (51.7%) | 15 (10.3%) | 1 (0.7%) | 145 (9.5%) |
| Neoplasms of digestive/intrathoracic organs | 38 (6.8%) | 15 (2.7%) | 234 (41.7%) | 222 (39.6%) | 52 (9.3%) | 0 (0%) | 106 (7.0%) |
| Endocrine, nutritional, metabolic | 2 (3.2%) | 0 (0%) | 18 (28.6%) | 42 (66.7%) | 1 (1.6%) | 0 (0%) | 63 (4.1%) |
| Diseases of the circulatory system | 33 (6.3%) | 22 (4.2%) | 232 (44.4%) | 206 (39.4%) | 29 (5.5%) | 1 (0.2%) | 523 (34.3%) |
| Pulmonary diseases | 38 (9.2%) | 15 (3.6%) | 102 (24.7%) | 236 (57.1%) | 22 (5.3%) | 0 (0%) | 413 (27.1%) |
| Diseases of the digestive system | 15 (10.1%) | 5 (3.4%) | 38 (25.5%) | 79 (53.0%) | 12 (8.1%) | 0 (0%) | 149 (9.8%) |
| Diseases of the genitourinary system | 0 (0%) | 0 (0%) | 9 (26.5%) | 22 (64.7%) | 3 (8.8%) | 0 (0%) | 34 (2.2%) |
| Trauma | 14 (7.2%) | 2 (1.0%) | 45 (23.1%) | 96 (49.2%) | 38 (19.5%) | 0 (0%) | 195 (12.8%) |
| Poisoning (by drugs/biological substances) | 0 (0%) | 0 (0%) | 1 (25.0%) | 0 (0%) | 3 (75.0%) | 0 (0%) | 4 (0.3%) |
| Other combined causes | 131 (8.6%) | 82 (5.4%) | 591 (38.8%) | 517 (33.9%) | 202 (13.3%) | 1 (0.1%) | 1,524 (100%) |
The MIMIC-III Clinical Database stores a diverse range of data regarding patients’ stay at the hospital. Information such as demographics, reports, chart notes, bedside monitoring notes, laboratory tests and many more can be retrieved and associated with a single, de-identified patient. Although this database can be an invaluable resource in the problem of blood pressure estimation, it also contains sensitive information from several patients, and consequently, the data should be handled carefully. Thus, in order to gain access to and use the data, the PhysioNet website requires the completion of a three-step checklist, found at the Files section of the MIMIC-III Clinical database web repository24. This requirement comes as a way of minimizing the erroneous use of the database.
Data Records
The dataset is available at Harvard Dataverse, as MIMIC-BP25, publicly offered as a collection of binary data files, split data files (for training, validation and test experiments) and source code to help users read the data. We adopted the NumPy standard binary file format (NPY) due to the open access to Python language resources and the ease of using it for reading data correctly on machines with different architectures.
The provided data has not gone through any kind of processing and is stored exactly as read by the package wfdb26 directly from the original MIMIC-III database. Therefore, any additional processing and conditioning (filtering, detrending, normalization, etc.) is left to the choice of the researcher, according to the specific needs and requisites of their work.
The repository consists of 10 files:
{abp,ppg,ecg,resp}.zip: each with 30 waveform segments of 30-second each for every one of the 1524 patients
labels.zip: SBP and DBP values for every 30-second segment of the ABP signals
read_data.{py,ipynb}: supporting code to help users to handle the above data
{train,val,test}_subjects.txt: lists of patients used in the Technical Validation section
The grouping of the 7,620 binary files that compose the dataset into the compressed files (.zip) aims to offer practicality for data retrieval while maintaining the equivalent class of signals. To help ensure the integrity of the downloaded files, we present a list of files, sizes, and partial checksums:
| size, kB | file name | SHA-256 (partial) |
|---|---|---|
| 665 | labels.zip | 57660...3c970 |
| 245288 | abp.zip | 45f39...58d1c |
| 155846 | ecg.zip | 09f23...cae97 |
| 276435 | ppg.zip | 12fa8...8cafa |
| 243000 | resp.zip | c7917...2a3e3 |
Each of the 1,524 patients is associated with 5 data files corresponding to the ABP, ECG, PPG and RESP signals, and, additionally, a file containing the median SBP and DBP values for each of the 30 segments. All files present the same name structure: the subject’s anonymized ID followed by the name of the signal, e.g., for patient p093833, the name of the file containing their ECG data is p093833_ecg.npy. To allow for reproducibility and comparability with other methods, we supply lists (in .txt format) indicating which subjects were used on the train-, validation- and test-set for the experiments reported in this work. Finally, we provide a Python script, read_data.py, and a Python notebook, read_data.ipynb, to help users to access and manipulate the data files. Below, we examplify the expected data files corresponding to patient p093833. The patient number ID, 093833, is kept from the original MIMIC-III Waveform Database Matched Subset6:
| size, byte | file name |
|---|---|
| 608 | p093833_labels.npy |
| 900128 | p093833_abp.npy |
| 900128 | p093833_ecg.npy |
| 900128 | p093833_ppg.npy |
| 900128 | p093833_resp.npy |
Data file p093833_labels.npy contains the SBP and DBP median values for each of the 30 waveform segments of ABP in p093833_abp.npy. The size of these files can be easily inferred: two blood pressure values, SBP and DBP (in double precision, 8 bytes), for 30 segments plus the header size for the Python NumPy binary format (.npy): 2 × 8 × 30 + 128 = 608 bytes. The waveforms were sampled at 125 Hz and were read from the original MIMIC-III database with the Python Waveform Database package, wfdb26, which converts the waveform samples to double precision. Thus, a waveform data file contains 30 segments of 30 seconds each, with their size resulting in 8 × 125 × 30 × 30 + 128 = 900, 128 bytes. Thus, the total size of the dataset amounts to 5.49 GB.
Figure 3 presents the first 5 seconds of the four waveforms for this patient’s segment of index 29. The ABP signal, in green solid line style, varies from 90 mmHg to almost 150 mmHg, as can be seen by the left vertical axis. The right vertical axis contains the actual values of the signals as collected initially, with no processing, precisely as they appear in the MIMIC-III database.
Technical Validation
In order to test this new dataset, we performed a series of experiments with multiple neural network (NN) architectures (e.g., ResNet, LSTM, Transformer) and classical machine learning (ML) algorithms (e.g., Support Vector Regression (SVR), Random Forest (RF), Gradient Boosting Regressor (GBR), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge Regression (RR)). The outcomes are summarized in Table 6. They were trained and tested following the same calibration-free protocol consisting of a training set (1,100 of 1,524 subjects), and a validation and test sets (consisting of 195 and 229 subjects, respectively). It is important to emphasize that all the segments of a given subject are present in only one of the sets, avoiding data leakage (i.e., guaranteeing the calibration-free protocol). We report the blood pressure estimation performance in terms of Mean Absolute Error (MAE), Mean Error (ME), and their respective standard deviations (STD). According to ISO 81060-2:2018, we highlight that medical equipment should not produce an error (ME) outside the range ±5 mmHg and standard deviation (STD) greater than 8 mmHg.
Table 6.
Evaluation of neural network architectures and classic machine learning algorithms in the proposed dataset.
| NN Architecture | SBP | DBP | ||||||
|---|---|---|---|---|---|---|---|---|
| MAE | STD | ME | STD | MAE | STD | ME | STD | |
| LSTM | 13.91 | 10.68 | 2.10 | 17.41 | 9.14 | 7.28 | 0.54 | 11.67 |
| Bidirectional LSTM | 13.78 | 10.76 | 2.72 | 17.27 | 9.13 | 7.26 | 0.51 | 11.65 |
| ResNet152 | 12.98 | 10.30 | 0.37 | 16.56 | 8.78 | 7.70 | 2.02 | 11.50 |
| ResNet152 + LSTM | 14.10 | 11.29 | −1.45 | 18.00 | 10.14 | 7.94 | −3.50 | 12.39 |
| Transformer | 14.45 | 12.96 | −2.57 | 19.23 | 9.82 | 7.08 | −3.19 | 11.68 |
| Classic ML | MAE | STD | ME | STD | MAE | STD | ME | STD |
|---|---|---|---|---|---|---|---|---|
| SVR | 14.06 | 10.62 | −1.79 | 17.53 | 9.33 | 7.47 | −2.70 | 11.64 |
| RF | 13.95 | 10.47 | −0.03 | 17.45 | 8.87 | 7.09 | −0.75 | 11.33 |
| GBR | 14.08 | 10.46 | −0.29 | 17.54 | 9.03 | 7.13 | −1.09 | 11.45 |
| LASSO | 14.10 | 10.46 | −0.35 | 17.55 | 9.24 | 7.20 | −1.23 | 11.65 |
| RR | 14.08 | 10.43 | −0.35 | 17.52 | 9.24 | 7.20 | −1.24 | 11.65 |
| Avg. Guessing (baseline) | 13.96 | 10.42 | 0.00 | 17.42 | 9.22 | 7.21 | 0.00 | 11.70 |
We adopted ReLU activation for all NN architectures with a dropout regularization ratio of 0.2 and the Root-Mean-Squared Error (RMSE) as the loss function. We also configured the learning rate to 0.0003, the batch size to 64, and the training stop criteria to 100 epochs. The inputs are windows of 10 seconds of ECG, PPG and its corresponding first and second derivatives.
For the classic machine learning approaches the square magnitude of the discrete Fourier transform is calculated based on 10 seconds of signal. Considering a sampling rate of 125 Hz, 10 seconds correspond to an interval of 1250 samples, resulting in a frequency resolution of 0.1 Hz in the spectrum. To capture most of the spectral energy content, our analysis is limited to frequencies up to 6 Hz, allowing us to extract the first 60 spectral components as features.
As a baseline, we also included a dummy method (avg. guessing), which always estimates the dataset’s average SBP and DBP values. As most of the human population presents BP values concentrated in a typical range, BP estimation approaches can present reduced errors even if they only learn to estimate values around this range. The ResNet152 architecture achieved the best overall results with 12.98 ± 10.30 of MAE. Concerning DBP, the best results are also possible using the ResNet152 architecture, reaching 8.78 ± 7.70 of MAE. The best model errors close to the average guessing baseline illustrate the importance of establishing better benchmarks for evaluating BP estimation approaches and avoiding misleading experimental results conclusions.
We also analyzed each DBP range separately. In Table 7, we can notice that as the BP range increases, complex NN architectures (e.g., ResNet152 + LSTM and Transformer) achieve smaller MAE values for each range. As can be noted, the and are the ranges that produce the lowest error values. Such findings result from data imbalance in the dataset in favor of these BP ranges (see Fig. 4) that cause overfit towards these values. A comparison of box plots of ResNet152 + LSTM and Avg. Guessing for each DBP range is illustrated in Fig. 6.
Table 7.
Mean absolute errors (MAE) of Neural Network architectures in the proposed dataset considering DBP by range.
| LSTM | Bidirectional LSTM | ResNet152 | ResNet152 + LSTM | Transformer | Avg. Guessing | |
|---|---|---|---|---|---|---|
| (30, 40] | 20.07 | 20.11 | 14.11 | 17.79 | 23.72 | 20.71 |
| (40, 50] | 11.40 | 11.57 | 7.67 | 12.03 | 15.22 | 12.19 |
| (50, 60] | 3.34 | 3.24 | 4.98 | 8.39 | 6.37 | 3.52 |
| (60, 70] | 6.89 | 6.84 | 7.87 | 7.85 | 3.61 | 6.37 |
| (70, 80] | 16.41 | 16.44 | 15.21 | 8.86 | 12.63 | 16.16 |
| (80, 90] | 26.12 | 26.34 | 25.93 | 19.12 | 22.53 | 26.00 |
| (90, 100] | 35.98 | 35.92 | 33.03 | 31.58 | 31.76 | 35.38 |
| (100, 110] | 44.96 | 44.66 | 42.41 | 37.80 | 40.87 | 44.26 |
Fig. 6.
Comparison between Resnet152 + LSTM and Avg. Guessing in the proposed dataset considering DBP by range. The and are the ranges that produce the lowest error values due to data imbalance in the dataset in favor of these BP ranges.
Analyzing the SBP ranges separately (Table 8), we can notice a similar behavior as the aforementioned DBP analysis. However, the ranges that produce the lowest error values are and . Once more, as can be seen in Fig. 4, the dataset presents much more data regarding those BP ranges. Figure 7 compares the box plots of ResNet152 and the Avg. Guessing for each SBP range.
Table 8.
Mean absolute errors (MAE) of Neural Network architectures in the proposed dataset considering SBP by range.
| LSTM | Bidirectional LSTM | ResNet152 | ResNet152 + LSTM | Transformer | Avg. Guessing | |
|---|---|---|---|---|---|---|
| (60, 70] | 43.20 | 42.88 | 45.68 | 54.83 | 47.78 | 45.30 |
| (70, 80] | 34.43 | 33.03 | 31.85 | 35.13 | 38.75 | 36.53 |
| (80, 90] | 24.66 | 23.50 | 22.83 | 25.61 | 28.99 | 26.76 |
| (90, 100] | 15.19 | 14.33 | 14.05 | 16.35 | 19.44 | 17.29 |
| (100, 110] | 5.88 | 5.28 | 7.62 | 10.14 | 10.13 | 7.98 |
| (110, 120] | 4.12 | 4.80 | 6.64 | 8.00 | 3.32 | 2.86 |
| (120, 130] | 13.91 | 14.29 | 11.03 | 10.45 | 10.33 | 11.82 |
| (130, 140] | 23.73 | 23.96 | 17.59 | 15.89 | 19.47 | 21.64 |
| (140, 150] | 33.43 | 33.39 | 25.19 | 23.55 | 28.64 | 31.34 |
| (150, 160] | 43.06 | 43.17 | 36.67 | 34.23 | 38.24 | 40.97 |
| (160, 170] | 52.76 | 54.38 | 49.36 | 51.44 | 48.47 | 50.66 |
| (170, 180] | 62.75 | 64.01 | 61.21 | 65.34 | 58.71 | 60.65 |
| (180, 190] | 73.97 | 72.65 | 59.26 | 61.58 | 68.84 | 71.92 |
Fig. 7.
Comparison between Resnet152 and Avg. Guessing in the proposed dataset considering SBP by range. The and are the ranges that produce the lowest error values due to data imbalance in the dataset in favor of these BP ranges.
In summary, in most of the cases and even in many extreme BP ranges, the best methods are not much better than the average guessing, highlighting the importance for better benchmarks.
Acknowledgements
This work was funded by Samsung Eletrônica da Amazônia Ltda., under the terms of Brazilian Informatics Law 8.248/91.
Author contributions
I.S. and V.G. constructed the dataset, C.C. and I.S. conducted the experiments, C.C., I.S., L.C., O.P., S.B., T.B., V.C., V.G. and W.L. produced and reviewed the manuscript.
Code availability
The supporting code for processing the dataset is openly available and is placed together with the binary data files25.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Ivandro Sanches, Email: ivandro.s@samsung.com.
Otávio A. B. Penatti, Email: o.penatti@samsung.com
References
- 1.Goldberger, A. L. et al. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. 10.1161/01.CIR.101.23.e215 (2000). [DOI] [PubMed]
- 2.Moody, G. B. & Mark, R. G. A database to support development and evaluation of intelligent intensive care monitoring. In Computers in Cardiology 1996, 657–660, 10.1109/CIC.1996.542622 (IEEE, 1996).
- 3.Saeed, M. et al. Multiparameter intelligent monitoring in intensive care ii: a public-access intensive care unit database. Critical Care Medicine39, 952–960, 10.1097/CCM.0b013e31820a92c6 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Johnson, A. et al. MIMIC-III, a freely accessible critical care database. Scientific Data3, 10.1038/sdata.2016.35 (2016). [DOI] [PMC free article] [PubMed]
- 5.Johnson, A. et al. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data10, 10.1038/s41597-022-01899-x (2023). [DOI] [PMC free article] [PubMed]
- 6.Moody, B., Moody, G., Villarroel, M., Clifford, G. D. & Silva, I. MIMIC-III waveform database matched subset (version 1.0). PhysioNet 10.13026/c2294b (2020).
- 7.Tanveer, S. & Hasan, K. Cuffless blood pressure estimation from electrocardiogram and photoplethysmogram using waveform based ANN-LSTM network, https://arxiv.org/abs/1811.02214 (2018).
- 8.Hsu, Y.-C., Li, Y.-H., Chang, C.-C. & Harfiya, L. N. Generalized deep neural network model for cuffless blood pressure estimation with photoplethysmogram signal only. Sensors20, 10.3390/s20195668 (2020). [DOI] [PMC free article] [PubMed]
- 9.Baker, S., Xiang, W. & Atkinson, I. A hybrid neural network for continuous and non-invasive estimation of blood pressure from raw electrocardiogram and photoplethysmogram waveforms. Computer Methods and Programs in Biomedicine207, 106191, 10.1016/j.cmpb.2021.106191 (2021). [DOI] [PubMed] [Google Scholar]
- 10.Athaya, T. & Choi, S. An estimation method of continuous non-invasive arterial blood pressure waveform using photoplethysmography: A U-Net architecture-based approach. Sensors21, 10.3390/s21051867 (2021). [DOI] [PMC free article] [PubMed]
- 11.Lin, W., Demirel, B. U., Al Faruque, M. A. & Li, G. P. Energy-efficient blood pressure monitoring based on single-site photoplethysmogram on wearable devices. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 504-507, 10.1109/EMBC46164.2021.9630488 (IEEE, 2021). [DOI] [PubMed]
- 12.Leitner, J., Chiang, P.-H. & Dey, S. Personalized blood pressure estimation using photoplethysmography: A transfer learning approach. IEEE Journal of Biomedical and Health Informatics26, 218–228, 10.1109/JBHI.2021.3085526 (2022). [DOI] [PubMed] [Google Scholar]
- 13.Kachuee, M., Kiani, M., Mohammadzade, H. & Shabany, M. Cuff-Less Blood Pressure Estimation. UCI Machine Learning Repository 10.24432/C5B602 (2015).
- 14.Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less high-accuracy calibration-free blood pressure estimation using pulse transit time. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS), 1006–1009, 10.1109/ISCAS.2015.7168806 (2015).
- 15.Baek, S., Jang, J. & Yoon, S. End-to-end blood pressure prediction via fully convolutional networks. IEEE Access7, 185458–185468, 10.1109/ACCESS.2019.2960844 (2019). [Google Scholar]
- 16.Panwar, M., Gautam, A., Biswas, D. & Acharyya, A. PP-Net: A deep learning framework for PPG-based blood pressure and heart rate estimation. IEEE Sensors Journal20, 10000–10011, 10.1109/JSEN.2020.2990864 (2020). [Google Scholar]
- 17.Fati, S. M., Muneer, A., Akbar, N. A. & Taib, S. M. A continuous cuffless blood pressure estimation using tree-based pipeline optimization tool. Symmetry13, 10.3390/sym13040686 (2021).
- 18.Wang, W., Mohseni, P., Kilgore, K. L. & Najafizadeh, L. Cuff-less blood pressure estimation from photoplethysmography via visibility graph and transfer learning. IEEE Journal of Biomedical and Health Informatics26, 2075–2085, 10.1109/JBHI.2021.3128383 (2022). [DOI] [PubMed] [Google Scholar]
- 19.Ibtehaz, N. et al. PPG2ABP: Translating photoplethysmogram (PPG) signals to arterial blood pressure (ABP) waveforms. Bioengineering9, 10.3390/bioengineering9110692 (2022). [DOI] [PMC free article] [PubMed]
- 20.Mahmud, S. et al. NABNet: A nested attention-guided BiConvLSTM network for a robust prediction of blood pressure components from reconstructed arterial blood pressure waveforms using PPG and ECG signals. Biomedical Signal Processing and Control79, 104247, 10.1016/j.bspc.2022.104247 (2023). [Google Scholar]
- 21.Ritchie, J. et al. Extreme elevations in blood pressure and all-cause mortality in a referred CKD population: Results from the CRISIS study. International Journal of Hypertension2013, 10.1155/2013/597906 (2013). [DOI] [PMC free article] [PubMed]
- 22.Homan, T., Bordes, S. & Cichowski, E.Physiology, Pulse Pressure (StatPearls Publishing LLC, https://www.ncbi.nlm.nih.gov/books/NBK482408, 2021). [PubMed]
- 23.ISO 81060-2:2018(E). Non-invasive sphygmomanometers - Part 2: Clinical investigation of intermittent automated measurement type. Standard, ISO - International Organization for Standardization, Geneva, CH https://www.iso.org/standard/73339.html (2018).
- 24.Johnson, A., Pollard, T. & Mark, R. MIMIC-III clinical database (version 1.4). PhysioNet 10.13026/C2XW26 (2016).
- 25.Samsung R&D Institute Brazil, SRBR et al.MIMIC-BP dataset. 10.7910/DVN/DBM1NF (2023).
- 26.Xie, C. et al. Waveform database software package (WFDB) for python. PhysioNet 10.13026/egpf-2788 (2021).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The supporting code for processing the dataset is openly available and is placed together with the binary data files25.






