Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 27.
Published in final edited form as: Physiol Meas. 2019 Dec 27;40(12):125002. doi: 10.1088/1361-6579/ab5b84

Deep Learning approaches for Plethysmography Signal Quality Assessment in the Presence of Atrial Fibrillation

Tania Pereira 1, Cheng Ding 1, Kais Gadhoumi 1, Nate Tran 1, Rene A Colorado 2, Karl Meisel 2, Xiao Hu 1,3,4,5
PMCID: PMC7198064  NIHMSID: NIHMS1582199  PMID: 31766037

Abstract

Objective:

Photoplethysmography (PPG) monitoring has been implemented in many portable and wearable devices we use daily for health and fitness tracking. Its simplicity and cost-effectiveness enabled a variety of biomedical applications, such as continuous long-term monitoring of heart arrhythmias, fitness, and sleep tracking, and hydration monitoring. One major issue that can hinder PPG-based applications is movement artifacts, which can lead to false interpretations. In many implementations, noisy PPG signals are discarded. Misinterpreted or discarded PPG signals pose a problem in applications where the goal is to increase the yield of detecting physiological events, such as in the case of paroxysmal atrial fibrillation (AF) - a common episodic heart arrhythmia and a leading risk factor for stroke. In this work, we compared a traditional machine learning and deep learning approaches for PPG quality assessment in the presence of AF, in order to find the most robust method for PPG quality assessment.

Approach:

The training data set was composed of 78278 30-second long PPG recordings from 3764 patients using bedside patient monitors. Two different representations of PPG signals were employed — a time-series based (1D) and an image-based (2D). Trained models were tested on an independent set of 2683 30-second PPG signals from 13 stroke patients.

Main results:

ResNetl8 showed a higher performance (0.985 accuracy, 0.979 specificity, and 0.988 sensitivity) than SVM and other deep learning approaches. 2D-based models were generally more accurate than 1D-based models.

Significance:

2D representation of PPG signal enhances the accuracy of PPG signal quality assessment.

Keywords: Photoplethysmography, Atrial Fibrillation, Signal Quality Assessment, Deep Learning, Convolutional Neural Networks

I. INTRODUCTION

Atrial fibrillation (AF) is a heart rhythm disorder characterized by an irregular and often rapid heartbeat. AF is strongly associated with ischemic stroke; patients with AF have a 3 to 5-fold higher risk of stroke than normal subjects [1][2]; there is also increasing evidence that stroke may trigger future AF episodes [1]. Early AF detection helps in primary and secondary prevention of stroke by guiding antithrombotic and antiarrhythmic therapeutic management [3]. AF is recognized by the absence of P-wave and an irregularly irregular ECG rhythm. In the ambulatory context, long-term ECG monitoring remains the primary tool for AF diagnosis [4]. However, PPG sensors are becoming ubiquitous in many wearable devices and have the potential to enable a more convenient and affordable way of monitoring cardiac rhythm including AF in ambulatory settings [5]–[7].

A major limitation of using PPG signals is the susceptibility to movement artifacts [8]. PPG with movement artifacts is extremely hard to clean and recover due to low signal to noise ratio [9]. Thus, most current applications rely on quality assessment algorithm to identify and reject segments with movement artifacts [10]. The recognition of poor-quality PPG can often be complicated by the presence of AF, the irregularity of these rhythms can create waveforms with characteristics similar to those induced by movement artifact. A body of studies has proposed approaches for pulsatile signal quality assessment that are applicable to PPG signals. Early approaches were based on measures of similarity between consecutive pulses [12], [13], nevertheless, they fail to correctly assess the signals quality in the case of AF because consecutive PPG pulses become morphology different [14]. Machine learning (ML) techniques can overcome this problem since they are trained with AF cases [12]. In our previous work, we proposed a two-class SVM-based quality assessment model with a 0.95 accuracy that showed to be more accurate than the approaches reported in the literature that were based on either traditional statistical [10], [11], [13]–[16] or other ML methods [17]. Deep learning (DL) approaches have achieved promising results for biomedical solutions and can be a more powerful tool for this challenge. Deep learning can benefit tremendously from large datasets, and thus are suitable to solve a variety of biomedical problems where annotated data are abundant [18]–[20].

In this work, we applied DL approaches to build a signal quality assessment model for PPG based applications. We compare the performance of the model with that of SVM based model we previously developed to evaluate the importance of learning from large training sets and the limitation of conventional ML approaches. The training dataset consisted of 78278 30-second PPG segments acquired with a fingertip from 3764 Intensive Care Unit (ICU) patients at the University of California San Francisco (UCSF) Medical Center. For testing, we used a dataset of 2683 30-second PPG segments acquired in a cohort of 13 stroke patients from the Neuro ICU. We investigate two representations of the data as input to the model based on time-series data (1D) and image-based (2D) to enable testing different DL architectures and learning different features from PPG waveforms. For 1D signals classification we used: Attention Long Short Term Memory Fully Convolutional Network (ALSTM-FCL) and Fully Connected Network (FCN) [21], [22]; and for the image classification we used: VVG19 [23], ResNet18, ResNet50 [24], and Xception [25]. Performance results from all methods were compared with our previously proposed SVM model.

II. Material and Methods

A. Data Collection and Study Population

PPG continuous waveform data from patients admitted to Medical Center Intensive Care Units (ICU) at the University of California San Francisco (UCSF) medical center were collected. In the first cohort of 3764 patients admitted to the ICUs between March 2013 and December 2016, referred to hereafter as group A, we randomly sampled 25 PPG segments per patient, each lasting 30 seconds as described in our previous study [26], for a total of 78278 PPG segments. AF prevalence was estimated by analyzing electronic health record data (prescribed medication, international classification disease (ICD) codes for unspecific AF, paroxysmal AF, persistent AF, and chronic AF). 500 of 3764 patients in group A (13%) were identified as having AF, a prevalence comparable to what is reported in ICU patients [27], [28].

Another set of pulse oximetry data was collected in the second cohort of 13 stroke patients (age range 19 to 91 years, median = 73.5) admitted to the UCSF Medical Center Neuro ICU between October 2016 and January 2018. We refer to such a cohort as group B. These patients were diagnosed with acute ischemic stroke, were at least 18 years old, and spoke English. Patients with significant problems related to their attention, alertness, cognitive function, or communication were excluded unless a legally authorized representative could consent on their behalf. All enrolled patients provided written consent after being informed of the protocols approved by UCSF’s Institutional Review Board. Between 3h and 22h of continuous PPG recordings (median = 10.5h) were collected per patient of group B. Each PPG recording was split into non-overlapping 30-second segments for analysis. Eight of 13 patients had AF episodes, as documented by the clinicians at the time of recording.

A summary of the number of patients in each cohort is depicted in Table I.

TABLE I -.

AF prevalence in the analyzed patient cohorts.

Group A (Training + Validation) Group B (Testing)
Total 3764 13
AF 500 8
Non-AF 3264 5

All 30-second PPG time-series data in both groups were normalized between zero and one.2D image fdes were generated from PPG waveforms by converting signal plots into RGB images. These, images were then adjusted to all shave one size 224 × 224 × 3, and one density of96 dots per inch (DPI) (96×96 pixels per square inch) as shown in figure 1. Segmentation, normalization and image generation were carried out using Matlab™ 2018b (Mathworks Inc, USA). 7-lead ECG recordings were simultaneously acquired with PPG recordings by the BedMasterEx® system (Excel Medical Inc, USA), and used as a reference for AF annotation.

Figure 1.

Figure 1.

Example of a) good and AF and b) bad 30-second PPG segments, correspondent one-lead ECG signal used with other leads (not represented in the figure) for the annotation, and the respective images representation generated from the 1D signals.

B. Annotation Process

Signal Quality Annotation

Each PPG segment was labeled as “Good”, “Bad”, or “Not Sure” by a group of three expert annotators. Guided by the morphological changes of the ECG, the quality of a PPG signal was assessed based on a set of rules. A good quality PPG signal must: 1) reflect the response of the blood volume pulse to the underlying pathophysiological characteristics of the cardiovascular system, irrespectively of the shape of the pulse; 2) show a consistent number of inflection points; 3) be free of artifacts, including irregular shapes, that cannot be explained by changes in the ECG. The segments labeled “not sure” were not included in subsequent analysis.

The criteria used in the definition of “good quality PPG” are arbitrary and are based on the observation of hundreds of PPG segments with ECG signals. The clinicians based on elements physiologically interpretable drove these rules. The rules were defined in a way that enables visual assessment. Using thresholds against certain signal characteristics to judge signal quality is a theoretically appealing approach because of its objectivity. However, this approach was tried in several previous works, failing for AF cases [23]. To support visual inspection, we plot multi-lead ECG signals during the annotation process as partial shown in figure 2. In this way, an annotator can readily use ECG to ascertain whether a cardiac origin could explain observed PPG pulse morphological changes.

Figure 2.

Figure 2.

PPG segments with movement artifacts (marked in gray). a) PPG segment with (4.9+2.6)/30*100=25% of movement noise; b) PPG segment with 4.5/30*100=15% of movement noise.

Training Set

PPG segments extracted from group A recordings were randomly assigned to annotators to avoid the bias of evaluating a block of consecutive segments from one patient that would have included some level of similarity among one another. To assess interrater variability, a random subset of 100 PPG segments was assigned to all annotators, and Cohen’s kappa coefficient was calculated. The remaining set of PPG segments were split without overlapping among annotators.

Testing Set

From group B, 3000 segments were randomly selected and assigned to all annotators, of which 317 were excluded. A test set is built from 2683 of 3000 segments for which a congruent annotation between annotators was obtained.

Proportion of noise

Additionally, we labeled PPG segments from group B (test set) with respect to the proportion of movement noise in the segment. For example, a 30-second PPG segment in which the total duration of noisy pulses was 15 sec was assigned the label “50%” that is the proportion of total noise in the signal. Figure 2 shows examples of noise labeling.

AF Annotation

Segments in group B were annotated with respect to AF presence by seven clinicians. Segments were labeled as “AF”, “Not AF”, or “Not Sure” using ECG recordings were used as a reference for AF annotation according to guidelines [4]. Cohen’s kappa coefficient was calculated for an inter-rater agreement after annotation of the 100 randomly selected segments. Of note, the test set (group B) was labeled in both annotation tasks (signal quality and AF). The segments labeled “not sure” were not included in subsequent analysis.

III. Classification

A. SVM-based classifier

We applied the previous approach based on SVM using forty-two features that were extracted from each of the 30-second PPG segments. These features consist of metrics from temporal-domain (mean, median, standard deviation, variance, interquartile range, skewness, kurtosis, root mean square. Shannon entropy, and mean and standard deviation of the first derivative), spectral domain (first- to fourth-order moments in the frequency domain, median frequency, spectral entropy, total spectral power, and peak amplitude in frequency band between 0 to 10 Hz), and non-linear dynamic analysis (SD1, standard deviation of the short-tenn beat-to-beat interval variability; the major axis SD2, the standard deviation of the long-tenn beat-to-beat interval variability; and the SD1/SD2 ratio), statistical results from cross-conelation of each pulse with a four templates based on Gaussian waves (mean, standard deviation and range of the maxima list for cross-correlation results ), and simple statistic summary of each pulse in segment (the mean of area under curve was determined, and the minimum and maximum period of a beat in the segment were used) and between consecutive pulses (mean, median, standard deviation, variance, interquartile range, range, skewness, kurtosis, root mean square, sample entropy, Shannon entropy and mean and standard deviation of first derivative of the signal) [29]. We used an Radial Basis Function (RBF) kernel in the SVM model and tuned hyper-parameters including C and sigma using a cross-validation based approach. A detailed description of the features and parameter optimization of the classifier can be found in our previous work [29].

B. DL Models – Time-series based

Two DL network models were used with PPG time-series data (1D signals): an attention long short term memory fully convolutional layer (ALSTM-FCL) [21], and a fully connected network (FCN) [22]. ALSTM-FCL is an ensemble model that includes a fully convolutional block (made of three stacked temporal convolutional blocks) and another block of a shuffle layer and an attention LSTM layer. The output of both blocks is concatenated and passed into a softmax classification layer [21]. A grid search was done to optimize the parameters of ALSTM-FCL and is reported on the supplementary material. Grid search was only applied in this model, as the number of parameters to tune is particularly significant.

C. DL Models – Image based

Convolutional neural networks (CNN) are multi-layer neural networks designed to recognize visual patterns through pixel values with minimal or no preprocessing required on the input images. CNN learns features from input images that capture difference between classes. The creation of “ImageNet”, a public dataset of over a million images with corresponding human annotations for 1000 classes [30], had greatly facilitated the development of deep neural network models that have achieved human-level accuracy of object recognition [23].

The different architectures tried different approaches to increase the deep of the model, using shortcuts or parallel blocks of convolutional filters. Due to the interesting results of those architectures for image classification, we decided to apply them in this work and compare the results. VGG19 is a CNN with 19 layers and an increased depth using 3 × 3 convolution fdters [23]. ResNet models [24] are built with residual blocks where each layer feeds into the next layer and directiy into the layers about 2–3 hops away. Base on this architecture, very deep convolutional neural networks can be trained efficiendy. In this work, we used ResNet18 and ResNet50, with 18 and 50 layers, respectively. Xception is an adaptation of the Inception model, which replaces the standard Inception modules with depthwise separable convolutions and shows an increase of the performance for other image databases such as ImageNet [25], [30].

PPG images were first normalized by subtracting the mean value across all training segments, thus centralizing the data dispersion around zero [31]. Mean values calculated in the training test was later used to normalize the test set. Each network was trained from scratch. An Adam optimizer with default parameters and a batch size of 32 was used. 50 training epochs with a weight decay of 1e-6 and base learning rate of 0.0001 were set for each model. Models were built using the Keras library [32] with TensorFlow backend [33].

D. Performance Metrics

The models were assessed based on training, validation and testing accuracy. Training and validation accuracy and loss function were monitored in each training epoch. Model parameters achieving the lowest loss for the validation set during the optimization process were selected for the best model. Test performance was evaluated using accuracy, sensitivity, and specificity. We finally evaluated the distribution of output probabilities for each class (AF and Non-AF) in the best model.

E. Statistical Analysis

In order to assess if there is a significant statistical difference between the best model among the DL and the SVM classifier we previously built, we used the McNemar test to compare accuracies. McNemar test statistically assesses the accuracies of two classification models, comparing their predicted labels against the true labels, and detects whether the difference between the misclassification rates is statistically significant [34]. All differences were considered statistically significant at p-value< 0.05.

F. Experimental Design

For each model, we analyzed the impact of the size of the training set on the classification performance using a learning curve. The method with the highest accuracy evaluated on the test set was selected for further analysis (figure 3).

Figure 3.

Figure 3.

Study design. Overview of the methodology followed in this work and the datasets. Data were split into training, validation (group A) and testing dataset sets (group B).

Finally, we evaluate the impact of the percentage of movement noise in the test set, we carried out a correlation analysis between predicted scores and the percentage of noise in the test set.

IV. Results

A. Annotation Agreement

Kappa coefficient of 0.83 showed a very good inter-rater agreement, while it was moderate for AF annotation (kappa = 0.64). The distributions of labels of PPG segments for signal quality and AF annotation is shown in Table II.

TABLE II -.

Distribution of two datasets (group A and B) with respect to annotation labels.

Group A (Training + Validation) Group B (Testing)
Total segments 78278 2683
Good 54397 1868
Bad 23881 815
AF - 1216
Non-AF - 1467

B. Performance Results

Learning curves

Learning curves showing the variation of model accuracy as a function of the training data size are depicted in figure 4. A trend of accuracy increase can be observed for most models. SVM accuracy did not seem to benefit from large training sets. Performance results for all models are presented in Table III.

Figure 4.

Figure 4.

Learning curves showing the test accuracy for all classifiers as a function of the proportions of training data in each model.

TABLE III -.

Performance results for test set for all selected models.

Test set (Number of segments) Test set (2683)

Performance metrics Accuracy Specificity Sensitivity

Number of segments (training) 7828 39139 78278 7828 39139 78278 7828 39139 78278

Percentage of data (training) 10% 50% 100% 10% 50% 100% 10% 50% 100%
Resnet18 0.9642 0.9773 0.9851 0.9583 0.9534 0.9791 0.9668 0.9877 0.9877
Resnet50 0.9661 0.9687 0.9724 0.9436 0.9656 0.9656 0.9759 0.9700 0.9754
VGG19 0.9430 0.9765 0.9836 0.9374 0.9861 0.9845 0.9558 0.9546 0.9816
Xception 0.9363 0.9687 0.9814 0.9347 0.9936 0.9872 0.9399 0.9117 0.9681
ALSTM-FCL 0.9620 0.9612 0.9739 0.9166 0.8773 0.9178 0.9818 0.9979 0.9984
FCN 0.9318 0.9489 0.9657 0.9387 0.8908 0.9141 0.9288 0.9743 0.9882
SVM 0.9765 0.9773 0.9769 0.9374 0.9448 0.9337 0.9936 0.9914 0.9957

Numbers in bold designate best performance in DL models and in SVM model

While ResNet18, Xception, and VGG19 showed comparable results, ResNet18 achieved the highest accuracy, and for this reason, we selected this model for the following studies.

ResNet18 showed relatively better accuracy than SVM (results in bold in Table III) when the entire training set was used. McNemar test showed a significant difference between the performance of SVM and the ResNet18 (p-value = 0.0117), confirming that is significance the improvement on the accuracy for the ResNet18 model predictions. Additionally, ResNet18 showed more balanced results for specificity (0.9791) and sensitivity (0.9877), than SVM, with 0.9337 of specificity and 0.9957 of sensitivity.

PPG quality assessment in the presence of AF

The classification performance of ResNet18 evaluated using AF and Non-AF labeled segments is shown in figure 5. The uncertainty of the model depicted through the model class probability was slightly higher for data with AF than for data with Non-AF. High dispersion of the prediction scores is indicative of high model uncertainty; prediction scores were split between two values (0 and 1) inNon-AF data. Overall classification performance was markedly high in both subsets (AF data: 0.9770 accuracy, 0.9742 specificity, and 0.9835 sensitivity; Non-AF data: 0.9918 accuracy, 0.9990 specificity, and 0.9757 sensitivity).

Figure 5.

Figure 5.

Histograms for the score predictions for ResNet18 from: a) entire from all test set; b) subset of AF cases and c) sub-set of Non-AF cases.

Proportion of noise

Figure 6 illustrates the impact of noise on classification performance for the ResNet18 classifier. A monotonic relationship between scores (probability prediction from ResNet18) and the proportion of noise in a PPG segment can be observed. The best fit is an exponential function with one term, with an R-square of 0.9388 and an RMSE of 0.1087, showing the goodness of the fit. Figure 6 shows that the trained ResNet18 model incorrectly classified 14 cases as having good quality, which corresponding to 0.5% cases from 2863. These 14 cases all have less than 15% of artifacts hence should still be classified as poor quality based on the rules used in annotation (dashed line in figure 6b). Nevertheless, this result makes sense because it is logical that the classifiers have more difficulties with cases of less amount of artifacts. In other words, we have confidence in stating that PPG signals classified as of good quality are expected not to have more than 15% of artifacts in duration. The uncertainty scores, around the threshold 0.5 used for this binary classification, happen for noise levels less than 10 %.

Figure 6.

Figure 6.

Results of the study of the percentage of noise in a PPG segment and the respective probability predictions from ResNestl8 trained with all original dataset. a) Scatter plot of the percentage of noise and predictions score from the ResNet18 and the fitted curve (exponential function). b) Plot distribution of the percentage of noise by class. The red lines represent the mean values, the red areas represent the mean ± 1.96 × standard error, and the blue areas represent the mean ± one standard deviation.

Table IV represents the values from the curve fit to the data plotted in figure 6 for different percentages of noise in a PPG segment. From the values in the Table IV we can verify that ResNet18 is more stringent than SVM in identifying good quality segments following our rules since at 2% noise proportion, a segment is classified ‘bad’ by ResNet18 (score = 0.17) while it is classified as ‘Good’ by SVM (score = 0.63).

TABLE IV -.

percentage of noise and the correspondent score level from the curve fit for ResNet18 and SVM.

Percentage of noise Score ResNet18 fit Score SVM fit
0.1 0.9240 0.9505
1 0.6272 0.7853
2 0.4078 0.6352
4 0.1724 0.4156
25 2.0432e-05 0.0048
100 1.9417e-19 5.9678e-10

V. Discussion and future work

In this study, we demonstrated the utility of deep learning in PPG quality assessment using image-based data. Considering the learning curve, the general tendency to increase the performance with the increase of training set size was verified for DL models, but not for SVM classifiers. SVM used hand-crafted features, instead DL simplified the multi-step pipeline, utilized by learning classification features automatically extracted from images/time-series. Unsurprisingly, when training data is limited, SVM achieves the best performance compared to other DL models. However, SVM performance quickly hit the plateau within only 10% of the training data used in this study. On the other hand, the performance of deep learning models continues to improve as we incrementally increase the training dataset. Both of the findings confirm our hypothesis that, when annotated data are abundant, deep learning models often exceed the performance of other machine learning models and can be confidently applied when working with medical waveform data. Nevertheless, for small datasets, SVM showed to be the best approach, because with less amount of data this method achieved higher performance than other methods.

In general, the largest change in the performance occurs between 10 and 50% of the full training set. It was also observed that the accuracy did not increase for training proportions beyond 50% for models with a higher performance from the beginning (10% of original data in the training).

Another objective of this work was to compare the classification performance for different types of data representation – as images or time-series. ResNet18, which utilized 2D images as input, showed the best performance with a statistical significant improvement. Additionally, we also verified that it is possible to achieve very good results using CNNs for biomedical signals classification as 2D images These results showed that image representation of time-series data can be an interesting and effective option for biomedical signals classification, in addition to typical medical images such as MRI [35], CT [36], or ultrasound [37]. As future work, we will explore other image representations that can enhance other relevant information than the raw plot of PPG, such as FFT or wavelet spectrogram [38], [39].

We tested the tendency of the ResNet18 model in “mistaking” an AF case for Bad. Similar performance in classifying AF and Non-AF subsets demonstrates the robustness of ResNet18 in assessing the quality of PPG segments regardless whether AF is present or not. It remains however that cases where AF is present are more challenging than non-AF cases. This was depicted by the marginal difference in the distribution of prediction probabilities in the two classes. Larger training data from the AF-present class may help further improve the performance.

The analysis of the impact of noise percentage on the overall performance showed monotonic relationship between score predictions and noise level: the higher noise levels the higher the prediction certainty (i.e. probability). High prediction uncertainty was observed for noise levels below 10%. Interestingly, when comparing the results from two curve fits of ResNet18 and SVM models, only ResNet18 model was found to meet the rather strict rules we used to define good quality PPG. It succeeds to correctly classify (with prediction scores below 0.5) segments with very low percentage of noise (< 2%). SVM model, however, fails to attain similar performance. A particular and unique goal of this study was to analyze the impact of the proportion of noise in an analyzed PPG signal on the performance of the quality assessment model. The results suggest that DL approaches may benefit from adding a dedicated processing block for noise instance segmentation to identify and quantify the contribution of noise. This would help minimize the amount of data discarded by only removing the noisy data samples in a processed segment and not its entirety.

Supervised DL approaches require large labeled datasets. Labeled data are generally costly and difficult to obtain. To overcome such limitation, we used data augmentation to generate an extensive diverse dataset for the DL model to learn. Synthetic data are generated by transforming existing labeled data samples to help the model learn the range of intra-class invariances one could observe. In this work, we tested few basic techniques such as introducing salt-and-pepper noise and flipping original PPG image plots. However, these techniques did not improve the results. Other data augmentation techniques were not applicable in our case. For example, cropping a PPG signal of a poor quality could select a part of the segment with only good pulses and results in a new segment that will carry an incorrect label. Therefore, we did not see any improvement on the test performance using these augmentation data to increase training size and more studies are needed in the future to investigate new data augmentation approaches for PPG-based noise detection.

VI. Conclusion

This work compared PPG quality assessment approaches based on machine learning techniques. Image representation of raw PPG data was introduced to enable the application of powerful renowned deep learning image classification approaches. A high accuracy was obtained using ResNet18 model, outperforming our previously proposed SVM based PPG quality assessment model. Refined deep learning approaches may further benefit from additional large datasets and provide robust tools for PPG quality assessment. Accurate identification of useful PPG signal in continuous monitoring is crucial in the detection of AF episodes as well as in other applications.

Supplementary Material

Supplementary table

Acknowledgment

This work is partially funded by NIH funds R01NHLBI128679, R18HS022860, R01GM111378, UCSF Middle-Career scientist award, and UCSF RAP award A127552. The authors would like to thank colleagues Michele Pelter, Duc Do, Jason Yang, David Lee, Xiuyun Liu, Ran Xiao, Richard Fidler, David Pickham, and Kevin Keenan for their help in annotating AF and PPG signal quality.

References

  • [1].Kamel H, Okin PM, Elkind MSV, and Iadecola C, “Atrial Fibrillation and Mechanisms of Stroke: Time for a New Model,” Stroke, vol. 47, no. 3, pp. 895–900, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Wolf PA, Dawber TR, Thomas HE, and Kannel WB, “Epidemiologic assessment of chronic atrial fibrillation and risk of stroke: The Framingham Study,” Neurology, vol. 28, no. 10, pp. 973–973, 1978. [DOI] [PubMed] [Google Scholar]
  • [3].Gladstone DJ et al. , “Atrial Fibrillation in Patients with Cryptogenic Stroke,” N. Engl. J. Med, vol. 370, no. 26, pp. 2467–2477, 2014. [DOI] [PubMed] [Google Scholar]
  • [4].Kirchhof P et al. , “2016 ESC Guidelines for the management of atrial fibrillation developed in collaboration with EACTS,” Eur. Heart J, vol. 37, no. 38, pp. 2893–2962, 2016. [DOI] [PubMed] [Google Scholar]
  • [5].Caulfield B, Reginatto B, and Slevin P, “Not all sensors are created equal: A framework for evaluating human performance technologies,” np Digit. Med, no. January, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Speier W et al. , “Evaluating utility and compliance in a patient-based eHealth study using continuous-Time heart rate and activity trackers,” J. Am. Med. Informatics Assoc, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Naeini EK, Azimi I, Rahmani AM, Liljeberg P, and Dutt N, “A Real-time PPG Quality Assessment Approach for Healthcare Internet-of-Things,” Procedia Comput. Sci, 2019. [Google Scholar]
  • [8].Yu C, Liu Z, McKenna T, Reisner AT, and Reifman J, “A Method for Automatic Identification of Reliable Heart Rates Calculated fromECG and PPG Waveforms,” J. Am. Med. Informatics Assoc, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Gambarotta N, Aletti F, Baselli G, and Ferrario M, “A review of methods for the signal quality assessment to improve reliability of heart rate and blood pressures derived parameters,” Med. Biol. Eng. Comput, vol. 54, no. 7, pp. 1025–1035, 2016. [DOI] [PubMed] [Google Scholar]
  • [10].Sukor JA, Redmond SJ, and Lovell NH, “Signal quality measures for pulse oximetry through waveform morphology analysis,” Physiol. Meas, vol. 32, no. 3, pp. 369–384, 2011. [DOI] [PubMed] [Google Scholar]
  • [11].Li Q and Clifford GD, “Dynamic time warping and machine learning for signal quality assessment of pulsatile signals,” Physiol. Meas, vol. 33, no. 9, pp. 1491–1501, 2012. [DOI] [PubMed] [Google Scholar]
  • [12].Pereira T, Gadhoumi K, Ma M, Colorado R, Keenan K, and Meisel K, “Robust Assessment of Photoplethysmogram Signal Quality in The Presence of Atrial Fibrillation,” in Computing in Cardiology, 2018. [Google Scholar]
  • [13].Karlen W, Kobayashi K, Ansermino JM, and Dumont GA, “Photoplethysmogram signal quality estimation using repeated Gaussian filters and cross-correlation,” Physiol. Meas,vol. 33, no. 10, pp. 1617–1629, 2012. [DOI] [PubMed] [Google Scholar]
  • [14].Asgari S, Bergsneider M, and Hu X, “A robust approach toward recognizing valid arterial-blood-pressure pulses,” IEEE Trans. Inf. Technol. Biomed, vol. 14, no. 1, pp. 166–172, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Orphanidou C, Bonnici T, Charlton P, Clifton D, Vallance D, and Tarassenko L, “Signal-quality indices for the electrocardiogram and photoplethysmogram: Derivation and applications to wireless monitoring,” IEEE J. Biomed. Heal. Informatics, vol. 19, no. 3, pp. 832–838, 2015. [DOI] [PubMed] [Google Scholar]
  • [16].Liu C, Li Q, and Clifford GD, “Evaluation of the accuracy and noise response of an open-source pulse onset detection algorithm on pulsatile waveform databases,” Comput. Cardiol. (2010)., vol. 43, pp. 913–916, 2016. [Google Scholar]
  • [17].Chong JW et al. , “Photoplethysmograph Signal Reconstruction Based on a Novel Hybrid Motion Artifact Detection-Reduction Approach. Part I: Motion and Noise Artifact Detection,” Ann. Biomed. Eng, vol. 42, no. 11, pp. 2238–2250, 2014. [DOI] [PubMed] [Google Scholar]
  • [18].Bakator M and Radosav D, “Deep Learning and Medical Diagnosis: A Review of Literature,” Multimodal Technol. Interact, vol. 2, no. 3, p. 47, 2018. [Google Scholar]
  • [19].Xiao R et al. , “Monitoring significant ST changes through deep learning,” J. Electrocardiol, vol. 51, no. 6, pp. S78–S82, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Tajbakhsh N et al. , “Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, 2016. [DOI] [PubMed] [Google Scholar]
  • [21].Karim F, Majumdar S, Darabi H, and Chen S, “LSTM Fully Convolutional Networks for Time Series Classification,” IEEE Access, vol. 6, pp. 1662–1669, 2017. [Google Scholar]
  • [22].Wang Z, Yan W, and Oates T, “Time series classification from scratch with deep neural networks: A strong baseline,” arXiv:1611.06455v4, 2016. [Google Scholar]
  • [23].Simonyan K and Zisserman A, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, pp. 1–14, 2014. [Google Scholar]
  • [24].He K, Zhang X, Ren S, and Sun J, “Deep Residual Learning for Image Recognition,” arXiv:1512.03385v1, 2015. [Google Scholar]
  • [25].Chollet F, “Xception: Deep Learning with Depthwise Separable Convolutions,” arXiv:1610.02357v3, 2017. [Google Scholar]
  • [26].Drew BJ et al. , “Insights into the problem of alarm fatigue with physiologic monitor devices: A comprehensive observational study of consecutive intensive care unit patients,” PLoS One, vol. 9, no. 10, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Wei W, Teixeira PL, Mo H, Cronin RM, Warner JL, and Denny JC, “Combining billing codes , clinical notes , and medications from electronic health records provides superior phenotyping performance,” J Am Med Inf. Assoc, vol. 23, no. e1, pp. 20–27, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Tseng Y-H, Ko H-K, Tseng Y-C, Lin Y-H, and Kou YR, “Atrial Fibrillation on Intensive Care Unit Admission Independently Increases the Risk of Weaning Failure in Nonheart Failure Mechanically Ventilated Patients in a Medical Intensive Care Unit” Medicine (Baltimore)., vol. 95, no. 20, pp. 1–9, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Pereira T, Gadhoumi K, Ma M, Liu X, Xiao R, and Colorado RA, “A Supervised Approach to Robust Photoplethysmography Quality Assessment,” IEEE J Biomed Heal. Inform, pp. 1–9, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Russakovsky O et al. , “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis, vol. 115, no. 3, pp. 211–252, 2015. [Google Scholar]
  • [31].Sandino CM and Cheng JY, “Deep convolutional neural networks for accelerated dynamic magnetic resonance imaging,” Stanford Univ. CS231N, Course Proj, 2017. [Google Scholar]
  • [32].Chollet F, “Keras.” p. https://github.com/keras-team/keras.
  • [33].Abadi M, “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.” p. : https://www.tensorflow.org/. [Google Scholar]
  • [34].Mcnemar Q, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947. [DOI] [PubMed] [Google Scholar]
  • [35].Lundervold AS and Lundervold A, “An overview of deep learning in medical imaging focusing on MRI,” Zeitschrift fur Medizinische Physik. 2019. [DOI] [PubMed] [Google Scholar]
  • [36].Chilamkurthy S et al. , “Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study,” Lancet, 2018. [DOI] [PubMed] [Google Scholar]
  • [37].Liu S et al. , “Deep Learning in Medical Ultrasound Analysis: A Review,” Engineering. 2019. [Google Scholar]
  • [38].Shashikumar SP, Shah AJ, Li Q, Clifford GD, and Nemati S, “A deep learning approach to monitoring and detecting atrial fibrillation using wearable technology,” 2017 IEEE EMBS Int. Conf Biomed. Ideal. Informatics, BHI 2017, pp. 141–144, 2017. [Google Scholar]
  • [39].Liang Y, Chen Z, Ward R, and Elgendi M, “Photoplethysmography and Deep Learning: Enhancing Hypertension Risk Stratification,” Biosensors, vol. 8, no. 4, pp. 1–13, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary table

RESOURCES