Multimodal Spatio-Temporal Deep Learning Approach for Neonatal Postoperative Pain Assessment

Md Sirajus Salekin; Ghada Zamzmi; Dmitry Goldgof; Rangachar Kasturi; Thao Ho; Yu Sun

doi:10.1016/j.compbiomed.2020.104150

. Author manuscript; available in PMC: 2022 Feb 1.

Published in final edited form as: Comput Biol Med. 2020 Nov 28;129:104150. doi: 10.1016/j.compbiomed.2020.104150

Multimodal Spatio-Temporal Deep Learning Approach for Neonatal Postoperative Pain Assessment

Md Sirajus Salekin ^a,^*, Ghada Zamzmi ^a, Dmitry Goldgof ^a, Rangachar Kasturi ^a, Thao Ho ^b, Yu Sun ^a

PMCID: PMC7856028 NIHMSID: NIHMS1652419 PMID: 33348218

Abstract

The current practice for assessing neonatal postoperative pain relies on bedside caregivers. This practice is subjective, inconsistent, slow, and discontinuous. To develop a reliable medical interpretation, several automated approaches have been proposed to enhance the current practice. These approaches are unimodal and focus mainly on assessing neonatal procedural (acute) pain. As pain is a multimodal emotion that is often expressed through multiple modalities, the multimodal assessment of pain is necessary especially in case of postoperative (acute prolonged) pain. Additionally, spatio-temporal analysis is more stable over time and has been proven to be highly effective at minimizing misclassification errors. In this paper, we present a novel multimodal spatio-temporal approach that integrates visual and vocal signals and uses them for assessing neonatal postoperative pain. We conduct comprehensive experiments to investigate the effectiveness of the proposed approach. We compare the performance of the multimodal and unimodal postoperative pain assessment, and measure the impact of temporal information integration. The experimental results, on a real-world dataset, show that the proposed multimodal spatio-temporal approach achieves the highest AUC (0.87) and accuracy (79%), which are on average 6.67% and 6.33% higher than unimodal approaches. The results also show that the integration of temporal information markedly improves the performance as compared to the nontemporal approach as it captures changes in the pain dynamic. These results demonstrate that the proposed approach can be used as a viable alternative to manual assessment, which would tread a path toward fully automated pain monitoring in clinical settings, point-of-care testing, and homes.

Keywords: Postoperative pain, Acute prolonged pain, Neonatal pain classification, Infant monitoring, Neonatal Intensive Care Unit (NICU), Multimodal, Facial expression, Body movement, Crying sound

1. Introduction

Postoperative pain [43] affects a large number of patients across the world, with an estimated number of 234 million surgical procedures each year [46]. In case of neonates, more than 1.5 million anesthetics are performed every year in the United States for surgical procedures such as gastrostomy tube placement and circumcision [46, 8]. This leads to the publications of a large body of research articles and guidelines in recent years to discuss optimal approaches for assessing and managing postoperative pain [22, 41, 2, 7]. Despite this significant attention, the management of postoperative pain has remained inadequate [48, 28, 19]. This poor management is the main cause of delayed hospital discharge, which leads to substantial emotional and financial burden [10, 11]. In addition, it has been found [10] that the poor management of postoperative pain can lead to serious short-term complications and long-term physiological, behavioral, and cognitive sequelae [12, 44]. As accurate pain assessment is the cornerstone for adequate management [9], it is critical to develop accurate pain assessment tools to obtain optimal interventions.

Broadly, pain in neonates can be categorized into three types [39]: acute procedural, acute prolonged, and chronic. Usually, prolonged acute pain (aka., postoperative pain) occurs after a major surgery (i.e. omphalocele repair), lasts for a longer time compared to acute procedural, and repeats with a decreasing rate after the surgery. The current practice [45] for assessing neonatal pain after a major surgery is manual and requires caregivers to observe specific behavioral (e.g., facial expression and body movement) and physiological (e.g., heart rate) indicators. Each of these indicators is assigned a score and the total pain score is generated by summing all the scores together. There are at least 29 validated score-based tools [23] for manually assessing procedural and postoperative pain in neonates, and more than half of these scales are multidimensional. The multidimensional pain assessment is necessary because pain manifests itself in various behavioral and physiological signals. Several studies (e.g., [40]) reported that pain has at least two dimensions, and suggested the use of multidimensional scales for effective assessment.

In addition, the multidimensional approach for assessment allows to 1) detect pain during the failure of recording a specific pain indicator due to developmental (e.g., facial nerve palsy), clinical (e.g., sedation), and environmental (e.g., background noise) factors, and 2) capture individual differences in pain reactions. The score-based multidimensional scales of procedural pain have a narrower range of scores (pain vs no-pain) as this type of pain tends to be intense for a short period of time and disappears as soon as its cause (e.g., heel lancing) is gone. On the contrary, acute prolonged (postoperative pain), or pain after any major surgery, continues long after its cause is gone, tends to have fluctuations in pain intensity, and evolves in a more complex pattern over time. Fig. 1 and Fig. 2 present examples of crying sounds and facial expression captured during procedural and postoperative pain, respectively. As can be seen, postoperative pain is less intense and occurs at different time intervals as compared to procedural pain (e.g., heel lancing). Hence, we believe assessing postoperative pain frequently and consistently is critical to generate effective plans for interventions.

Figure 1: — Audio signals from procedural (top) and postoperative (bottom) pain. In both cases, the pain score of crying is 2. [Sample Rate = 44.1 kHz].

Figure 2: — Examples from neonatal procedural (left) and postoperative (right) pain. In both cases, the score of facial expression is 1.

The current practice for pain assessment using multidimensional score-based scales is discontinuous, inconsistent, and suffers from high inter- and intra-observer variations. To mitigate these limitations, several artificial intelligence-based methods [37, 5, 49, 36] have been published in the literature. Of these published works, very few focus on assessing postoperative pain. For example, in [37], an automated method for assessing children postoperative pain based on the analysis of facial expression was proposed. Specifically, the proposed method extracts facial features around different facial action units (AUs) using handcrafted descriptors. The extracted features are then used to train a Support Vector Machine (SVM) to detect different levels of pain. Recently, a deep learning-based approach was proposed in [34] to assess the postoperative pain of neonates based on the analysis of facial expression. Instead of using a single indicator (unimodal) approach for the automated analysis of pain, in [49], a multimodal approach that integrates facial expression, body movement, crying sound, and vital signs to assess procedural (short-term) pain of neonates has been proposed. The proposed approach used different handcrafted descriptorstoextractpain-relevantfeaturesfollowedbytrainingmachine learning classifiers and fusing the output of these classifiers to obtain the pain label. Other works that propose automated methods for assessing neonatal procedural, or short-term, pain can be found in [48, 5, 33, 36].

To summarize, the majority of existing machine learning approaches for pain assessment focus on procedural pain, are unimodal and do not take into account temporal information and dynamic pattern of pain. A recent multimodal approach [49] was proposed to assess procedural acute pain using handcrafted methods, but it does not integrate temporal information. In this work, we propose the first spatio-temporal and multimodal AI-based approach for assessing neonatal postoperative pain. The main contributions of this paper can be summarized as follows.

We propose a novel temporal multimodal deep learning approach to assess neonatal postoperative pain. Existing works focus on assessing procedural, or short-term, pain based on the spatial analysis of a single pain indicator or traditional approaches using multiple pain indicators.
We investigate and compare the performance of unimodal and multimodal approaches for assessing neonatal postoperative pain. We also compare the performance of the proposed multimodal approach with the state-of-the-art.
We present a multimodal pain dataset that includes video, audio, and physiological signals recorded from neonates during their hospitalization in the Neonatal Intensive Care Unit (NICU). The dataset is recorded during the normal state (baseline) as well as procedural (short-term) and postoperative (long-term) pain states.

The rest of the paper is organized as follows. Section 2 presents technical background needed to understand the rest of the paper. Section 3 presents the neonatal pain dataset. Our approach is presented in Section 4 followed by the experimental results in Section 5. Finally, Section 6 concludes the paper and discusses directions for future research.

2. Technical Background

2.1. VGG-Net and LSTM

VGG-Net [38] is a state-of-the-art Convolutional Neural Networks (CNN) for visual feature extraction. Although several versions of VGG-Net exist, VGG-16 [38] has been widely and successfully used [26, 4]. VGG-16 [38] consists of 13 uniform convolution layers followed by 3 fully connected layers. Each convolution layer uses a 3×3 kernel-size filters and is followed by a pooling layer. The network starts with 64 depth and gradually increases by a factor of 2 until it reaches 512. The depth of the network and the use of small kernel size allow to extract robust visual features. In this paper, we used VGG-16 [38] network to extract visual features from the face, body, and spectrogram images of sounds.

Long Short Term Memory (LSTM) [14] is one type of Recurrent Current Neural Networks (RNN) that is capable of learning the temporal information in a given sequence. Although RNN can handle long-term dependencies in theory, these networks fail to learn these dependencies in practice. To solve this issue, LSTM [14] network was introduced and has been widely used in a wide range of applications. LSTM [14] solves the long-term dependencies as well as vanishing gradient problem using the cell state, which is controlled by three gates: input, forget, and output gates. The input gate controls which information should be saved to the cell state. The forget gate controls which information should be ignored or forgotten from the previous cell state. Finally, the output gate controls which information should be sent to the next state. In this paper, we used LSTM [14] with the deep features, extracted by VGG-Net, to learn the temporal pattern and dynamics of postoperative pain.

2.2. Bilinear CNN

Bilinear CNN [21] is introduced to address fine-grained image classification. It uses two CNN streams to extract features from two different regions of the same image, and the final bilinear vector is generated by combining the features of the two CNN streams. Mathematically, given that there are two CNN stream X and Y with pooling layer P and classification layer C, then the bilinear model can be represented as B = (X, Y, P, C). Now for a location L within the image I, if the feature functions are F_X and F_Y, then the bilinear feature vector b, can be represented as follows.

b = (I, L, F_{X}, F_{Y}) \to F_{X} {(I, L)}^{T} F_{Y} (I, L)

(1)

Finally, a sum-pooling is applied to collect all the bilinear features from the entire image. To improve the performance, the final bilinear vector u = ∑b(I, L) is forwarded to the following steps.

v \overset{sqrt}{\leftarrow} (sign (u) * \sqrt{| u |})

(2)

w \overset{normalization}{\leftarrow} (v / {‖ v ‖}_{2})

(3)

The bilinear feature vector extracts orderless features, which provide better texture representation as compared to the orderfull features in the fine-grained image classification problem. As discussed in [21], this network is capable of extracting robust features in the context of the different pose, lighting, and background [34]. This resembles the context of the real-world NICU environment. In this paper, we used two VGG-16 [38] models as CNN streams of the Bilinear CNN.

3. Neonatal Pain Dataset

To evaluate our temporal multimodal approach, we used a dataset containing data of procedural (acute) and postoperative (acute prolonged) neonatal pain. The dataset, which is known as USF-MNPAD-I (University of South Florida Multimodal Neonatal Pain Assessment Dataset), was collected at the NICU in Tampa General Hospital, FL, USA. The dataset consists of 45 neonates with a gestational age that ranges from 30 to 41 weeks. It has ethnically and racially diverse population including Asian, African American, and Caucasian neonates. The data collection was approved by the USF Ethics Review Board (IRB # Pro00014318).

3.1. Setup and Painful Procedures

USF-MNPAD-I dataset has video, audio, and physiological data. To collect the video and audio data, a Go-Pro Hero Black 5 camera was used. The camera was set up on a camera stand facing the infant’s incubator to capture the neonate’s face and body. A bedside vital sign Phillips MP-70 monitor was used to collect the physiological data including heart rate, blood pressure, and oxygen saturation. All these data were recorded from neonates experiencing either short-term procedural or postoperative pan during their NICU hospitalization. The dataset contains multimodal data for 36 neonates (17 female) recorded during baseline, during a procedural pain stimulus (i.e., heel lancing and immunization), and immediately after the completion of the stimulus. In case of postoperative pain, 9 neonates (5 males) were recorded prior to major surgery (e.g., omphalocele-repair) to get their baseline state and monitored for three hours after the surgery to get their postoperative pain state. Note that in the current dataset, we only monitored the neonates up to three hours after the surgery due to clinical constraints.

3.2. Ground Truth Labels

The ground truth labels for both types of pain were documented independently by trained nurses using NIPS (Neonatal Infant Pain Scale) [15] and N-PASS (Neonatal Pain, Agitation and Sedation Scale) [16] for procedural and postoperative pain, respectively. NIPS [15] score-based pain scale has a total pain score that ranges from 0 to 7, and three levels of pain: no-pain (total score of 0–2), moderate pain (total score of 3–4), and severe pain (total score > 4). The final score is generated by summing the individual scores of the following pain indicators: facial expression (score of 0 or 1), crying sound (score of 0, 1, or 2), breathing patterns (score of 0 or 1), arms movement (score of 0 or 1), legs movement (score of 0 or 1), and state of arousal (score of 0 or 1).

N-PASS [16] score-based pain scale has a total score that ranges from −10 to +10, and five levels: deep sedation (score −10 to −5), light sedation (score −5 to −2), normal (score 0–2), moderate pain (score 3–5), and severe pain (score > 5). This total score is generated by summing the individual scores of the following pain indicators: crying irritability, behavior state, facial expression, extremities of tone, and vital signs (heart rate, blood pressure, oxygen saturation). Each of these indicators has a score that ranges from −2 to +2, where minus (−), 0, and plus (+) indicate the sedation, normal, and pain states, respectively. In our dataset, we have 109, 33, and 76 samples for the normal state, moderate pain, and severe pain, respectively.

Our dataset was labeled manually by independent trained nurses. The agreement between the nurses is measured using Kappa coefficient (0.85) and Pearson correlation (0.89). We include all the cases of agreement and exclude the cases of disagreement from further analysis. Fig. 3 shows examples from neonates recorded during postoperative pain. The images were randomly selected and masked to ensure confidentiality.

Figure 3: — Examples from our real-world neonatal postoperative dataset.

4. Methodology

In this paper, we investigate the use of a temporal multimodal approach for assessing postoperative pain. Our approach combines facial expression, body movement, and crying sound. We used the data of procedural and postoperative pain (see Section 3) for separately training different models corresponding to different pain indicators. For each pain indicator, spatio-temporal features are extracted and used to generate the score of that specific indicator. Then, we fused the scores of all indicators to generate the final pain level. Fig. 4 represents an overview of the proposed temporal multimodal approach for assessing postoperative pain.

Figure 4: — Flowchart of the proposed spatio-temporal multimodal approach for neonatal postoperative pain assessment.

4.1. Facial Expression Analysis

4.1.1. Pre-processing and Augmentation

The first pre-processing step involves extracting key-frames from all videos using the FFmpeg library¹. We then detected the face region in each frame using a pre-trained YOLO-based [29] face detector. The YOLO face detector was pre-trained using the WIDER face dataset [47], which contains around 393,703 faces. We empirically decided to fix the total number of key-frames extracted from each video segment to 32 frames. Using a fixed number of frames is important because the number of key-frames in each video varies. Further, the face region in some key-frames was occluded, which causes the face detector to fail. Therefore, we used a fixed number of key-frames to facilitate the training process. We randomly dropped some key-frames if the number of frames is larger than 32 and used resampling techniques to generate more frames if the number is lower than 32. To enlarge the dataset prior to the CNNs training, we performed image augmentation on the key-frames using random composition of 30° rotation, ±25% brightness change, and horizontal flipping.

4.1.2. Facial Feature Extraction

Deep learning-based architectures (e.g., VGG-Net) have been successfully used for detecting a wide range of emotions including pain [31, 5, 13, 50, 34]. In this paper, we fine-tuned a pre-trainedVGG-16 [38] CNN architecture to extract visual features from images captured during postoperative pain. Table 1 shows the details of the fine-tuned VGG-16 [38] architecture. Since empirical evidence showed that Bilinear CNN (Section 2) can better capture subtle changes, we used a Bilinear CNN with two VGG-16 [38] streams to learn pain-related features. As shown in Fig. 4, the features extracted by both streams are then combined to generate the bilinear vector followed by two Fully Connected (FC) layers (64 units) and a dense layer (1 unit, linear activation). Also, Dropout layers (0.5) are added after each FC layers to prevent over-fitting. We used two VGG-16 networks, which were pre-trained using VGGFace2 [4] and ImageNet [6] datasets, as the streams of the Bilinear CNN. We then fine-tuned the entire Bilinear CNN model using our procedural and postoperative dataset.

Table 1.

Details of fine-tuned VGG-16 architecture.

Layer Type	Configuration
Base model	Before FC layer without Pooling
FC	Dense 512, Relu
Dropout	Dropout (0.5)
FC	Dense 512, Relu
Dropout	Dropout (0.5)
FC	Dense 1, Activation = Linear

Open in a new tab

4.1.3. Temporal Information Integration

Pain is a dynamic event that evolves in a particular pattern over time. Hence, it is necessary to integrate temporal information to obtain an accurate assessment of pain [31, 13, 33]. After extracting the features using the Bilinear CNN, the deep features are further trained by RNN to learn the pain dynamics. Specifically, we used LSTM [14] network with the configuration shown in Table 2. We used two LSTM layers followed by two FC layers. Finally, a Dense layer with sigmoid activation was used to classify the signal as pain or no-pain. To prevent over-fitting, dropout layers were used as shown in Table 2.

Table 2.

Details of LSTM architecture.

Layer Type	Configuration
RNN	LSTM 16, Activation = Tanh, Recurrent Activation = Hard Sigmoid, Dropout (0.2)
RNN	LSTM 16, Activation = Tanh, Recurrent Activation = Hard Sigmoid, Dropout (0.2)
FC	Dense 16, Relu
Dropout	Dropout (0.3)
FC	Dense 16, Relu
Dropout	Dropout (0.3)
FC	Dense 1, Activation = Sigmoid

Open in a new tab

4.2. Body Movement Analysis

4.2.1. Pre-processing and Augmentation

Similar to the facial expression (Section 4.1.1), we extracted the key-frames from the video segments using FFmpeg library. We used a YOLO detector, which was pre-trained originally on COCO dataset [20] containing around 330K images from 80 object categories, to detect the body regions of neonates. Further, similar to facial expression, we fixed the number of key-frames to 32 from each video segment. The resampling technique helps us to generate an equal number of frames in case of any failure detection. To enlarge the dataset for the CNN training, we performed random composition of rotation (30°), brightness change (±25%), and horizontal flipping.

4.2.2. Feature Extraction

The state-of-the-art methods for extracting pain-relevant features from body regions are handcrafted-based (e.g., motion image) and deep-learning-based (e.g., VGG-16 [38]). Therefore, we used two types of method, namely the motion image and VGG-16 [38], to assess neonatal postoperative pain from body movement.

The motion image identifies the changes in pixels between consecutive frames, and it is calculated by subtracting consecutive frames followed by thresholding. Pixels of the motion image have a value of 1 (movement) and 0 (no-movement). To calculate the total motion in each frame, all the pixels are summed together and divided by the frame’s dimensions. The calculated total motion is then used as the main feature [49] to train traditional classifiers such as Gaussian Naive Bayes [30], Random Forest [3], and K-Nearest Neighbors [1]. For deep learning, we trained the VGG-16 [38] networks using both the motion image and original body image. The configurations of the fine-tuned VGG-16 [38] network are presented in Table 1. Fig. 5 shows different ROIs (Region of Interest) of a sample subject.

Figure 5: — Region of Interest (RIO) from sample input image.

4.2.3. Temporal Information Integration

To capture the temporal changes of body movement, we integrated RNN (i.e. LSTM [14]) network to VGG-16 [38]. We used the same LSTM [14] network architecture (Table 2). which is used for the facial expression (see Table 2). The integration of VGG-16 [38] and LSTM [14] allows to learn body movement dynamics over time.

4.3. Crying Sound Analysis

4.3.1. Pre-processing and Augmentation

During the failure of recording a specific pain indicator due to occlusion or swaddle, crying sound can be used to assess pain. The state-of-the-art methods for extracting pain-relevant features from crying sounds are handcrafted-based (e.g., MFCC [49]) and deep-learning-based (e.g., spectrogram image [35]). Therefore, we extracted two types of features, MFCC, and deep features, and used them to assess neonatal postoperative pain.

MFCC, which stands for Mel Frequency Cepstral Coefficient (MFCC), is a popular Cepstral Domain [25] method that has been successfully used to extract a useful and representative set of features (i.e., coefficients) from an audio signal while discarding noise and non-useful features. Taking the Inverse Fourier Transform (IFT) of the logarithm of the signal’s spectrum converts the audio signal to the Cepstral Domain. We extracted 20 MFCCs features over all of the frames of an audio segment (approx. 9 seconds). We then calculated the mean features from the 20 MFCCs, which lead to a mean MFCCs feature vector length of 388.

In addition to MFCCs features, we converted the raw audio signal (approx. 9 seconds) to a spectrogram image. The spectrogram image [24] shows the visual representation of a given audio signal. It represents the change of frequency components with respect to time and suppresses noise. Brighter pixels in the spectrogram image represent higher energy and vice versa. After generating the spectrogram image for each audio segment, we extracted deep features from these images using a VGG-16 network.

To train the network, we enlarged our set of spectrogram images by applying signal augmentation techniques to the original audio signal. Each audio signal is augmented by changing the raw frequency f at 3 different levels (f/3, f/2, 2f/3), and adding 6 different levels of noise (0.001, 0.003, 0.005, 0.01, 0.03, 0.05). Further, a combination of both frequency and noise is also applied to create more variant signals. This process generated a total of 27 (3+6+3*6) augmented images for each audio signal. Fig. 6 and Fig. 7 show examples of the raw audio signals and their corresponding spectrogram images during no-pain and pain states of a same subject.

Figure 6: — Audio signal (top) and its corresponding spectrogram image (bottom) of a neonate during no-pain state.

Figure 7: — Audio signal (top) and its corresponding spectrogram image (bottom) of a neonate during pain state.

4.3.2. Feature Extraction

Following the state-of-the-art methods [49, 35], we used both traditional machine learning classifier and deep learning-based classifier. In case of the traditional classifier, we trained lassifiers such as Gaussian Naive Bayes [30], Random Forest [3], and K-Nearest Neighbors (KNN) [1] classifiers using the extracted MFCCs features.

For the deep learning-based classification, we used a pre-trained (ImageNet [6]) VGG-16 [38] CNN network and fine-tuned this network (similar to Table 1) using our postoperative pain dataset. The VGG-16 [38] CNN network was trained using the spectrogram images extracted as described above. The last classification layer of the VGG-16 [38] CNN has a sigmoid activation function instead of the linear activation.

4.4. Multimodal Approach

To generate a multimodal assessment of postoperative pain, we combined the pain scores generated by all indicator-specific models together using decision fusion as shown in Figure 4. The multimodal pain assessment is necessary because pain manifests itself in different signals [48, 49]. In addition, the multimodal approach is necessary because it allows to detect pain during the failure of recording some pain indicators as discussed in the next section and shown in Table 3. To combine the labels or scores of facial expression, crying sound, and body movement, we used unweighted majority voting [27] scheme in which we choose the majority label in a given combination of labels as the final label. If the combination has a tie, we use the class probability (confidence score) to break the tie.

Table 3.

Unimodal and Multimodal assessment of neonatal postoperative pain using different traditional and deep learning approaches.

Modality	Approach	Accuracy	Precision	Recall	F1-Score	TPR	FPR	AUC
Face	VGG16 + LSTM	0.6203	0.6195	0.6203	0.6197	0.6634	0.4302	0.7300
Face	Bilinear VGG16 + LSTM	0.6952	0.7084	0.6952	0.6834	0.8614	0.5000	0.8196
Body	Motion + Gaussian NB	0.6330	0.6562	0.6330	0.6189	0.4404	0.1743	0.5001
	Motion + Random Forest	0.5872	0.5874	0.5872	0.5868	0.5596	0.3853	0.3382
	Motion + KNN	0.5688	0.5697	0.5688	0.5675	0.5138	0.3761	0.3899
	Motion Image + VGG16 + LSTM	0.6835	0.6906	0.6835	0.6805	0.7799	0.4128	0.7323
	Body ROI Image + VGG16 + LSTM	0.7050	0.7047	0.7050	0.7047	0.7333	0.3263	0.7786
Sound	MFCC + Gaussian NB	0.6296	0.6328	0.6296	0.6267	0.5421	0.2844	0.4194
	MFCC+ KNN	0.6991	0.7001	0.6991	0.6988	0.7290	0.3303	0.3592
	MFCC + Random Forest	0.7269	0.7362	0.7269	0.7245	0.8224	0.3670	0.4459
	Spectrogram Image + VGG16	0.7963	0.7964	0.7966	0.7963	0.7850	0.1927	0.8690
Multimiodal(F+B+S) + Decision Fusion		0.7936	0.8028	0.7936	0.7920	0.8807	0.2936	0.9010

Open in a new tab

Precision, Recall, and F-1 score are weighted by both classes.

TPR, FPR, and AUC are calculated for the pain class.

Bold texts indicate our approaches and bold values indicate superiority.

Bold text (F+B+S) represents the best from the unimodal (bold texts) approaches.

5. Experimental Results and Discussion

In this section, we present the performance of assessing neonatal postoperative pain using a single pain indicator at a time (unimodal) and multiple pain indicators together (multimodal). Before presenting the results, we describe the process of extracting and preparing the videos followed by our training and evaluation protocols.

5.1. Dataset Preparation

We used the aforementioned (section 3) neonatal pain dataset to evaluate the proposed temporal multimodal approach. The dataset consists of both procedural (202 videos) and postoperative (218 videos) pain. We used procedural dataset a balanced set of 116 samples) for pre-training the model (in case of face only), and used the postoperative dataset for fine-tuning and evaluation. After performing the pre-processing steps (see section 4), the total number of video segments (each has 9 seconds length) for each pain indicator in the postoperative dataset, were 187, 218, and 216 for face, body, and sound, respectively. Note that the face was missing in 31 videos (187/218) and the sound was missing in 2 videos (216/218).

5.2. Training and Evaluation Protocol

We used two types of training techniques: traditional classifiers training and deep learning. For both cases, we used the leave-one-subject-out protocol for training and testing as this protocol is more realistic in case of clinical applications (see [32, 18]) as it allows to capture the differences between patients. In the case of the traditional classifiers, we used KNN [1] classifier (K = 3, determined empirically), and Random Forest [3] classifier (N = 100, determined empirically). For deep learning, we used images (face image, body image, motion image, and spectrogram) of size 224 × 224 as input to individual VGG-16 [38] models to extract deep features from each individual indicator as shown in Fig. 4. The extracted features are then fed to RNNnetworks to learn pain patterns and dynamics. We used Adam [17] optimizer with a learning rate of 0.0001 to train the CNN and RNN models. A batch size of 16 and 1 are used for CNN and RNN respectively for up to 100 epochs. All the training is performed to minimize the validation loss following an early stopping strategy.

We performed two levels of training in case of deep learning. In the first level, we used the pain scores of each indicator (i.e., score 0 or 1 [face and body] and score 0, 1, or 2 [sound]) for training the CNN models. In the second level, we used the final pain labels, which are no-pain, moderate pain, and severe pain, to train the RNN models. As discussed in Section 3, these final pain labels are generated by summing the individual scores and thresholding. Note that we combined the labels of moderate and severe pain into a single pain class while training the RNN models because the number of instances with a moderate pain label is relatively smaller (33 examples).

To evaluate the performance of the trained models, we used the weighted accuracy, weighted precision, weighted recall, and F-1 score. Weighted metrics reflect the performance of each class as they report the fraction of the correct prediction for each class over the total number of samples; i.e., weighted metrics consider the instances of a specific class. In addition to these, we calculated the True Positive Rate (TPR), False Positive Rate (FPR), and Area Under the Curve (AUC) for the pain class.

5.3. Unimodal Postoperative Pain Assessment

We evaluated the performance of using a single pain indicator at a time for postoperative pain assessment. We used both traditional machine learning-based approaches and deep learning-based approaches. Table 3 shows the performance of using both traditional and deep learning approaches with a single pain indicator for assessing postoperative pain. In all indicators and in most cases, our approaches outperformed the state-of-the-art methods [49] by a large margin. As can be seen from Table 3, crying sound indicator achieved the highest accuracy (79.63%) and outperformed the accuracies of body (70.50%) and face (69.52%). Similarly, crying sound indicator achieved the highest AUC (0.87) and outperformed the AUCs of body (0.78) and face (0.82).

To understand these results, we observed the data and found that sound has less noise as compared to face and body in our dataset of postoperative neonates. Specifically, neonates’ faces in the NICU are usually occluded (partial or complete) by oxygen’s masks, tapes, or due to a prone sleeping position. In case of body, some neonates are swaddled while others show weak movements due to sedation or exhaustion. In summary, we can conclude from the Table 4 that crying sound can better assess postoperative pain as compared to facial expression and body movement. In addition, we can conclude that our proposed approaches for analyzing facial expression, sound, and body show better performance, in terms of accuracy, precision, recall, TPR, FPR, and AUC, as compared to the traditional approaches.

Table 4.

Unimodal and Multimodal neonatal assessment of postoperative pain (all pain indicators are present).

Metric	Face	Body	Sound	Face + Body	Body + Sound	Sound + Face	Face + Body + Sound
Accuracy	0.7076	0.6667	0.7661	0.7076	0.7719	0.6901	0.7895
Precision	0.7119	0.6645	0.7682	0.8071	0.8274	0.7032	0.7913
Recall	0.7076	0.6667	0.7661	0.7076	0.7719	0.6901	0.7895
F-1 Score	0.6970	0.6650	0.7667	0.6630	0.7522	0.6703	0.7863
TPR	0.8557	0.7320	0.7732	1.0000	0.9897	0.8866	0.8761
FPR	0.4865	0.4189	0.2432	0.6757	0.5135	0.5676	0.3243
AUC	0.8082	0.7778	0.8239	0.8353	0.8763	0.8396	0.8791

Open in a new tab

Precision, Recall, and F-1 score are weighted by both classes.

TPR, FPR, and AUC are calculated for the pain class.

Bold values indicate superiority.

In addition, it can also be observed that temporal information integration improves the performance a lot. Existing works [49, 50], did consider the feature only frame-byframe. But we integrate temporal information (over frames) which leads the better performance in case of all approaches. In case of body, inclusion of the LSTM network shows AUC of 0.78 and 0.73 which was a jump from 0.50. Also, in case of sound, the spectrogram image shows better performance compared to the MFCC features due to better temporal information integration.

5.4. Multimodal Postoperative Pain Assessment

The unimodal approach uses a single indicator at a time to predict the pain class. In practice, there are cases where face and body are not visible. For example, the baby’s face can be wrapped with tape and the body can be swaddled. In such cases, the multimodal assessment provides a reliable solution [42]. To investigate the impact of the multimodal approach on postoperative pain assessment, we combined the scores or labels of different pain indicators, which are generated using the best approach for each indicator (best approaches are bolded in the second column of Table 3). Table 3 shows the results of fusing (decision-level) the labels of face, body, and sound. Recall that the numbers of video instances for face, body, and sound are 187, 218, and 216, respectively. This means that some indicators would be missing when we combine all of them together to generate the multimodal assessment. As shown in Table 3, the multimodal approach achieved better overall performance as compared to the unimodal approach. The reason for the high performance of sound can be attributed to the fact that this indicator has less noise and a larger number of instances as compared to other indicators (e.g., facial expression). Although crying sound has a similar performance compared to the multimodal approach, we believe that the multimodal approach is necessary because pain manifests itself in different signals. In addition, the multimodal approach allows to assess pain during circumstances when sounds signals are missing due to noise, sedation, or individual differences (e.g., some neonates do not cry but move their arms/legs during pain). Fig. 8 provides visualization of the ROC curve of Table 3. It can be observed that the multimodal approach achieves better performance (curve) compared to the individual modalities.

Figure 8: — ROC curves of different approaches.

To make a more reliable and fair comparison, we further extend our experiments by making sure that there are no missing indicators; i.e., we selected 171 samples from our dataset where all the pain indicators are present. Table 4 presents the performance of the multimodal when all indicators are present. The Table also presents the performance of unimodal (single indicator at a time) and different combinations of pain indicators using 171 samples. It can be observed that in most cases the multimodal achieved the best performance. In the final experiment, we randomly dropped 25% of samples from each indicator to assess the robustness of our multimodal approach. We performed random dropping by 25% ten times and reported the average performance in Table 5. From the Table 5, we can conclude that the multimodal results are consistent over all indicators and perform better than the unimodal. These results are consistent with previous clinical findings [15, 16] and suggest that the automated multimodal approach for assessing postoperative pain is more efficient, in terms of performance and robustness, as compared to the unimodal approach.

Table 5.

Unimodal and Multimodal assessment of neonatal postoperative pain (randomly dropping 25% samples from each indicator 10 times).

Metric	Face		Body		Sound
Metric	Unimodal	Multimodal	Unimodal	Multimodal	Unimodal	Multimodal
Accuracy	0.7124 ± 0.03	0.7913± 0.01	0.6610± 0.02	0.7649± 0.01	0.7742 ± 0.01	0.7784 ± 0.01
Precision	0.7218 ± 0.03	0.7988± 0.01	0.6596± 0.02	0.7692± 0.01	0.7764 ± 0.01	0.7908 ± 0.01
Recall	0.7124 ± 0.03	0.7913± 0.01	0.6610± 0.02	0.7650± 0.01	0.7742 ± 0.01	0.7784 ± 0.01
F-1 Score	0.7035 ± 0.03	0.7859± 0.01	0.6591± 0.02	0.7593± 0.01	0.7746 ± 0.01	0.7705 ± 0.01
TPR	0.8563 ± 0.03	0.9052± 0.02	0.7282± 0.03	0.8784± 0.00	0.7819 ± 0.03	0.9155 ± 0.02
FPR	0.4612 ± 0.04	0.3581± 0.03	0.4250± 0.03	0.3838± 0.02	0.2358 ± 0.03	0.4014 ± 0.03
AUC	0.8093 ± 0.02	0.8724± 0.01	0.7739± 0.02	0.8675± 0.01	0.8288 ± 0.02	0.8682 ± 0.01

Open in a new tab

Precision, Recall, and F-1 score are weighted by both classes.

TPR, FPR, and AUC are calculated for the pain class.

Bold values indicate superiority.

6. Conclusion and Future Work

In this paper, a temporal multimodal AI-based system is proposed for assessing postoperative pain in neonates. The proposed system uses video (face, body) and audio (crying sound) signals individually to generate pain scores. These scores are then combined using a decision fusion to predict the final pain assessment. We compared the proposed multimodal approach with the traditional machine learning approaches and found that our approach achieved superior performance. We also found that the multimodal approach is better than the unimodal approach for assessing postoperative in neonates. The experimental results suggest that the multimodal approach is more reliable for assessing postoperative pain in a real-world clinical environment. We believe that the proposed approach can significantly enhance the current assessment practice, which is discontinuous, inconsistent, highly depends on the nurses’ experience and subjectivity, and is often limited by the lack of medical resources. In the future, we plan to integrate other signals, such as vital signs, into our multimodal system. We also plan to investigate other fusion methods such as feature level fusion.

Highlights.

Analysis of spatio-temporal approach for neonatal postoperative pain classification
Performance comparison of unimodal and multimodal machine learning approach
Design of deep learning-based approach using Convolutional Neural Network and Recurrent Neural Network
Classification performance is boosted up by the temporal information integration
Multimodal approach is found reliable and shows better performance compared to other unimodal approaches

Acknowledgment

We are grateful for the entire neonatal staff at Tampa General Hospital for their help and cooperation in the data collection. This research is partially supported by University of South Florida Nexus Initiative (UNI) Grant and National Institute of Health (NIH), United States Grant (NIH R21NR018756).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Competing interests

The authors declare no competing interests.

https://ffmpeg.org/

References

[1].Altman NS, 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 175–185. doi: 10.1080/00031305.1992.10475879. [DOI] [Google Scholar]
[2].Brasher C, Gafsous B, Dugue S, Thiollier A, Kinderf J, Nivoche Y, Grace R, Dahmani S, 2014. Postoperative pain management in children and infants: an update. Pediatric Drugs 16, 129–140. doi: 10.1007/s40272-013-0062-0. [DOI] [PubMed] [Google Scholar]
[3].Breiman L, 2001. Random forests. Machine learning 45, 5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
[4].Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A, 2018. Vggface2: A dataset for recognising faces across pose and age, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE; pp. 67–74. doi: 10.1109/FG.2018.00020. [DOI] [Google Scholar]
[5].Celona L, Manoni L, 2017. Neonatal facial pain assessment combining hand-crafted and deep features, in: International Conference on Image Analysis and Processing, Springer; pp. 197–204. doi: 10.1007/978-3-319-70742-6_19. [DOI] [Google Scholar]
[6].Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L, 2009. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee; pp. 248–255. doi: 10.1109/CVPR.2009.5206848. [DOI] [Google Scholar]
[7].Dmytriiev D, 2019. Assessment and treatment of postoperative pain in children. Anaesthesia, Pain & Intensive Care, 392–400. [Google Scholar]
[8].Dowell D, Arias E, Kochanek K, Anderson R, Guy GP, Losby JL, Baldwin G, 2017. Contribution of opioid-involved poisoning to the change in life expectancy in the united states, 2000–2015. Jama 318, 1065–1067. doi: 10.1001/jama.2017.9308. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Fortier MA, Chung WW, Martinez A, Gago-Masague S, Sender L, 2016. Pain buddy: A novel use of m-health in the management of children’s cancer pain. Computers in biology and medicine 76, 202–214. doi: 10.1016/j.compbiomed.2016.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Gan TJ, 2017. Poorly controlled postoperative pain: prevalence, consequences, and prevention. Journal of pain research 10, 2287. doi: 10.2147/JPR.S144066. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Grosse SD, Waitzman NJ, Yang N, Abe K, Barfield WD, 2017. Employer-sponsored plan expenditures for infants born preterm. Pediatrics 140, e20171078. doi: 10.1542/peds.2017-1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Grunau RE, Holsti L, Peters JW, 2006. Long-term consequences of pain in human neonates, in: Seminars in Fetal and Neonatal Medicine, Elsevier; pp. 268–275. doi: 10.1016/j.siny.2006.02.007. [DOI] [PubMed] [Google Scholar]
[13].Haque MA, Bautista RB, Noroozi F, Kulkarni K, Laursen CB, Irani R, Bellantonio M, Escalera S, Anbarjafari G, Nasrollahi K, et al. , 2018. Deep multimodal pain recognition: a database and comparison of spatio-temporal visual modalities, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE; pp. 250–257. doi: 10.1109/FG.2018.00044. [DOI] [Google Scholar]
[14].Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neural computation 9, 1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
[15].Hudson-Barr D, Capper-Michel B, Lambert S, Palermo TM, Morbeto K, Lombardo S, 2002. Validation of the pain assessment in neonates (pain) scale with the neonatal infant pain scale (nips). Neonatal Network 21, 15–22. doi: 10.1891/0730-0832.21.6.15. [DOI] [PubMed] [Google Scholar]
[16].Hummel P, Puchalski M, Creech S, Weiss M, 2008. Clinical reliability and validity of the n-pass: neonatal pain, agitation and sedation scale with prolonged pain. Journal of Perinatology 28, 55. doi: 10.1038/sj.jp.7211861. [DOI] [PubMed] [Google Scholar]
[17].Kingma DP, Ba J, 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. [Google Scholar]
[18].Koul A, Becchio C, Cavallo A, 2018. Cross-validation approaches for replicability in psychology. Frontiers in Psychology 9, 1117. doi: 10.3389/fpsyg.2018.01117. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Kulshrestha A, Bajwa SJS, et al. , 2019. Management of acute postoperative pain in pediatric patients. Anaesthesia, Pain&Intensive Care. [Google Scholar]
[20].Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL, 2014. Microsoft coco: Common objects in context, in: European conference on computer vision, Springer; pp. 740–755. doi: 10.1007/978-3-319-10602-1_48. [DOI] [Google Scholar]
[21].Lin TY, RoyChowdhury A, Maji S, 2015. Bilinear cnn models for fine-grained visual recognition, in: Proceedings of the IEEE international conference on computer vision, pp. 1449–1457. doi: 10.1109/ICCV.2015.170. [DOI] [Google Scholar]
[22].McNair C, Ballantyne M, Dionne K, Stephens D, Stevens B, 2004. Postoperative pain assessment in the neonatal intensive care unit. Archives of Disease in Childhood-Fetal and Neonatal Edition 89, F537–F541. doi: 10.1136/adc.2003.032961. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Melo G.M.d., Lélis A.L.P.d.A., Moura A.F.d., Cardoso MVLML, Silva V.M.d., 2014. Pain assessment scales in newborns: integrative review. Revista paulista de pediatria 32, 395–402. doi: 10.1590/S0103-05822014000400017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Oppenheim AV, 1999. Discrete-time signal processing. Pearson Education India. [Google Scholar]
[25].Oppenheim AV, Schafer RW, 2004. From frequency to quefrency: A history of the cepstrum. IEEE signal processing Magazine 21, 95–106. doi: 10.1109/MSP.2004.1328092. [DOI] [Google Scholar]
[26].Parkhi OM, Vedaldi A, Zisserman A, 2015. Deep face recognition, 41.1–41.12doi: 10.5244/C.29.41. [DOI] [Google Scholar]
[27].Penrose LS, 1946. The elementary statistics of majority voting. Journal of the Royal Statistical Society 109, 53–57. doi: 10.2307/2981392. [DOI] [Google Scholar]
[28].Quinlan J, Lobo D, Levy N, 2020. Postoperative pain management: time to get back on track. Anaesthesia 75, e10–e13. doi: 10.1111/anae.14886. [DOI] [PubMed] [Google Scholar]
[29].Redmon J, Farhadi A, 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. [Google Scholar]
[30].Rish I, et al. , 2001. An empirical study of the naive bayes classifier, in: IJCAI 2001 workshop on empirical methods in artificial intelligence, pp. 41–46. [Google Scholar]
[31].Rodriguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Nasrollahi K, Moeslund TB, Roca FX, 2017. Deep pain: Exploiting long short-term memory networks for facial expression classification. IEEE transactions on cybernetics, 1–11doi: 10.1109/TCYB.2017.2662199. [DOI] [PubMed] [Google Scholar]
[32].Saeb S, Lonini L, Jayaraman A, Mohr DC, Kording KP, 2017. The need to approximate the use-case in clinical machine learning. Gigascience 6, gix019. doi: 10.1093/gigascience/gix019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Salekin MS, Zamzmi G, Goldgof D, Kasturi R, Ho T, Sun Y, 2019a. Multi-channel neural network for assessing neonatal pain from videos, in: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), IEEE; pp. 1551–1556. doi: 10.1109/SMC.2019.8914537. [DOI] [Google Scholar]
[34].Salekin MS, Zamzmi G, Goldgof D, Kasturi R, Ho T, Sun Y, 2020. First investigation into the use of deep learning for continuous assessment of neonatal postoperative pain, in: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG), IEEE; pp. 529–533. doi: 10.1109/FG47880.2020.00082. [DOI] [Google Scholar]
[35].Salekin MS, Zamzmi G, Paul R, Goldgof D, Kasturi R, Ho T, Sun Y, 2019b. Harnessing the power of deep learning methods in healthcare: Neonatal pain assessment from crying sound, in: 2019 IEEE Healthcare Innovations and Point of Care Technologies, (HIPOCT), IEEE; pp. 127–130. doi: 10.1109/HI-POCT45284.2019.8962827. [DOI] [Google Scholar]
[36].Seok HS, Choi BM, Noh GJ, Shin H, 2019. Postoperative pain assessment model based on pulse contour characteristics analysis. IEEE journal of biomedical and health informatics 23, 2317–2324. doi: 10.1109/JBHI.2018.2890482. [DOI] [PubMed] [Google Scholar]
[37].Sikka K, Ahmed AA, Diaz D, Goodwin MS, Craig KD, Bartlett MS, Huang JS, 2015. Automated assessment of children’s postoperative pain using computer vision. Pediatrics 136, e124–e131. doi: 10.1542/peds.2015-0029. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [Google Scholar]
[39].Stevens BJ, Pillai Riddell R, Oberlander TE, Gibbins S, 2007. Assessment of pain in neonates and infants. Pain in neonates and infants 3, 67–90. [Google Scholar]
[40].Talbot K, Madden V, Jones S, Moseley G, 2019. The sensory and affective components of pain: are they differentially modifiable dimensions or inseparable aspects of a unitary experience? a systematic review. British journal of anaesthesia doi: 10.1016/j.bja.2019.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Taylor BJ, Robbins JM, Gold JI, Logsdon TR, Bird T, Anand K, 2006. Assessing postoperative pain in neonates: a multicenter observational study. Pediatrics 118, e992–e1000. doi: 10.1542/peds.2005-3203. [DOI] [PubMed] [Google Scholar]
[42].Temko A, Doyle O, Murray D, Lightbody G, Boylan G, Marnane W, 2015. Multimodal predictor of neurodevelopmental outcome in newborns with hypoxic-ischaemic encephalopathy. Computers in biology and medicine 63, 169–177. doi: 10.1016/j.compbiomed.2015.05.017. [DOI] [PubMed] [Google Scholar]
[43].Tiippana E, Hamunen K, Heiskanen T, Nieminen T, Kalso E, Kontinen VK, 2016. New approach for treatment of prolonged postoperative pain: Aps out-patient clinic. Scandinavian Journal of Pain 12, 19–24. doi: 10.1016/j.sjpain.2016.02.008. [DOI] [PubMed] [Google Scholar]
[44].Walker SM, 2014. Neonatal pain. Pediatric Anesthesia 24, 39–48. doi: 10.1111/pan.12293. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Walter-Nicolet E, Annequin D, Biran V, Mitanchez D, Tourniaire B, 2010. Pain management in newborns. Pediatric Drugs 12, 353–365. doi: 10.2165/11318900-000000000-00000. [DOI] [PubMed] [Google Scholar]
[46].Weiser TG, Gawande A, 2015. Excess surgical mortality: strategies for improving quality of care, in: Essential Surgery: Disease Control Priorities, Third Edition (Volume 1). The International Bank for Reconstruction and Development/The World Bank. doi: 10.1596/978-1-4648-0346-8_ch16. [DOI] [PubMed] [Google Scholar]
[47].Yang S, Luo P, Loy CC, Tang X, 2016. Wider face: A face detection benchmark, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5525–5533. doi: 10.1109/CVPR.2016.596. [DOI] [Google Scholar]
[48].Zamzmi G, Kasturi R, Goldgof D, Zhi R, Ashmeade T, Sun Y, 2018. A review of automated pain assessment in infants: Features, classification tasks, and databases. IEEE reviews in biomedical engineering 11, 77–96. doi: 10.1109/RBME.2017.2777907. [DOI] [PubMed] [Google Scholar]
[49].Zamzmi G, Pai CY, Goldgof D, Kasturi R, Ashmeade T, Sun Y, 2019a. A comprehensive and context-sensitive neonatal pain assessment using computer vision. IEEE Transactions on Affective Computing doi: 10.1109/TAFFC.2019.2926710. [DOI] [Google Scholar]
[50].Zamzmi G, Paul R, Salekin MS, Goldgof D, Kasturi R, Ho T, Sun Y, 2019b. Convolutional neural networks for neonatal pain assessment. IEEE Transactions on Biometrics, Behavior, and Identity Science 1, 192–200. doi: 10.1109/TBIOM.2019.2918619. [DOI] [Google Scholar]

[R1] [1].Altman NS, 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 175–185. doi: 10.1080/00031305.1992.10475879. [DOI] [Google Scholar]

[R2] [2].Brasher C, Gafsous B, Dugue S, Thiollier A, Kinderf J, Nivoche Y, Grace R, Dahmani S, 2014. Postoperative pain management in children and infants: an update. Pediatric Drugs 16, 129–140. doi: 10.1007/s40272-013-0062-0. [DOI] [PubMed] [Google Scholar]

[R3] [3].Breiman L, 2001. Random forests. Machine learning 45, 5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]

[R4] [4].Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A, 2018. Vggface2: A dataset for recognising faces across pose and age, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE; pp. 67–74. doi: 10.1109/FG.2018.00020. [DOI] [Google Scholar]

[R5] [5].Celona L, Manoni L, 2017. Neonatal facial pain assessment combining hand-crafted and deep features, in: International Conference on Image Analysis and Processing, Springer; pp. 197–204. doi: 10.1007/978-3-319-70742-6_19. [DOI] [Google Scholar]

[R6] [6].Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L, 2009. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee; pp. 248–255. doi: 10.1109/CVPR.2009.5206848. [DOI] [Google Scholar]

[R7] [7].Dmytriiev D, 2019. Assessment and treatment of postoperative pain in children. Anaesthesia, Pain & Intensive Care, 392–400. [Google Scholar]

[R8] [8].Dowell D, Arias E, Kochanek K, Anderson R, Guy GP, Losby JL, Baldwin G, 2017. Contribution of opioid-involved poisoning to the change in life expectancy in the united states, 2000–2015. Jama 318, 1065–1067. doi: 10.1001/jama.2017.9308. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Fortier MA, Chung WW, Martinez A, Gago-Masague S, Sender L, 2016. Pain buddy: A novel use of m-health in the management of children’s cancer pain. Computers in biology and medicine 76, 202–214. doi: 10.1016/j.compbiomed.2016.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Gan TJ, 2017. Poorly controlled postoperative pain: prevalence, consequences, and prevention. Journal of pain research 10, 2287. doi: 10.2147/JPR.S144066. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Grosse SD, Waitzman NJ, Yang N, Abe K, Barfield WD, 2017. Employer-sponsored plan expenditures for infants born preterm. Pediatrics 140, e20171078. doi: 10.1542/peds.2017-1078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Grunau RE, Holsti L, Peters JW, 2006. Long-term consequences of pain in human neonates, in: Seminars in Fetal and Neonatal Medicine, Elsevier; pp. 268–275. doi: 10.1016/j.siny.2006.02.007. [DOI] [PubMed] [Google Scholar]

[R13] [13].Haque MA, Bautista RB, Noroozi F, Kulkarni K, Laursen CB, Irani R, Bellantonio M, Escalera S, Anbarjafari G, Nasrollahi K, et al. , 2018. Deep multimodal pain recognition: a database and comparison of spatio-temporal visual modalities, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE; pp. 250–257. doi: 10.1109/FG.2018.00044. [DOI] [Google Scholar]

[R14] [14].Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neural computation 9, 1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]

[R15] [15].Hudson-Barr D, Capper-Michel B, Lambert S, Palermo TM, Morbeto K, Lombardo S, 2002. Validation of the pain assessment in neonates (pain) scale with the neonatal infant pain scale (nips). Neonatal Network 21, 15–22. doi: 10.1891/0730-0832.21.6.15. [DOI] [PubMed] [Google Scholar]

[R16] [16].Hummel P, Puchalski M, Creech S, Weiss M, 2008. Clinical reliability and validity of the n-pass: neonatal pain, agitation and sedation scale with prolonged pain. Journal of Perinatology 28, 55. doi: 10.1038/sj.jp.7211861. [DOI] [PubMed] [Google Scholar]

[R17] [17].Kingma DP, Ba J, 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. [Google Scholar]

[R18] [18].Koul A, Becchio C, Cavallo A, 2018. Cross-validation approaches for replicability in psychology. Frontiers in Psychology 9, 1117. doi: 10.3389/fpsyg.2018.01117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Kulshrestha A, Bajwa SJS, et al. , 2019. Management of acute postoperative pain in pediatric patients. Anaesthesia, Pain&Intensive Care. [Google Scholar]

[R20] [20].Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL, 2014. Microsoft coco: Common objects in context, in: European conference on computer vision, Springer; pp. 740–755. doi: 10.1007/978-3-319-10602-1_48. [DOI] [Google Scholar]

[R21] [21].Lin TY, RoyChowdhury A, Maji S, 2015. Bilinear cnn models for fine-grained visual recognition, in: Proceedings of the IEEE international conference on computer vision, pp. 1449–1457. doi: 10.1109/ICCV.2015.170. [DOI] [Google Scholar]

[R22] [22].McNair C, Ballantyne M, Dionne K, Stephens D, Stevens B, 2004. Postoperative pain assessment in the neonatal intensive care unit. Archives of Disease in Childhood-Fetal and Neonatal Edition 89, F537–F541. doi: 10.1136/adc.2003.032961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Melo G.M.d., Lélis A.L.P.d.A., Moura A.F.d., Cardoso MVLML, Silva V.M.d., 2014. Pain assessment scales in newborns: integrative review. Revista paulista de pediatria 32, 395–402. doi: 10.1590/S0103-05822014000400017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Oppenheim AV, 1999. Discrete-time signal processing. Pearson Education India. [Google Scholar]

[R25] [25].Oppenheim AV, Schafer RW, 2004. From frequency to quefrency: A history of the cepstrum. IEEE signal processing Magazine 21, 95–106. doi: 10.1109/MSP.2004.1328092. [DOI] [Google Scholar]

[R26] [26].Parkhi OM, Vedaldi A, Zisserman A, 2015. Deep face recognition, 41.1–41.12doi: 10.5244/C.29.41. [DOI] [Google Scholar]

[R27] [27].Penrose LS, 1946. The elementary statistics of majority voting. Journal of the Royal Statistical Society 109, 53–57. doi: 10.2307/2981392. [DOI] [Google Scholar]

[R28] [28].Quinlan J, Lobo D, Levy N, 2020. Postoperative pain management: time to get back on track. Anaesthesia 75, e10–e13. doi: 10.1111/anae.14886. [DOI] [PubMed] [Google Scholar]

[R29] [29].Redmon J, Farhadi A, 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. [Google Scholar]

[R30] [30].Rish I, et al. , 2001. An empirical study of the naive bayes classifier, in: IJCAI 2001 workshop on empirical methods in artificial intelligence, pp. 41–46. [Google Scholar]

[R31] [31].Rodriguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Nasrollahi K, Moeslund TB, Roca FX, 2017. Deep pain: Exploiting long short-term memory networks for facial expression classification. IEEE transactions on cybernetics, 1–11doi: 10.1109/TCYB.2017.2662199. [DOI] [PubMed] [Google Scholar]

[R32] [32].Saeb S, Lonini L, Jayaraman A, Mohr DC, Kording KP, 2017. The need to approximate the use-case in clinical machine learning. Gigascience 6, gix019. doi: 10.1093/gigascience/gix019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Salekin MS, Zamzmi G, Goldgof D, Kasturi R, Ho T, Sun Y, 2019a. Multi-channel neural network for assessing neonatal pain from videos, in: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), IEEE; pp. 1551–1556. doi: 10.1109/SMC.2019.8914537. [DOI] [Google Scholar]

[R34] [34].Salekin MS, Zamzmi G, Goldgof D, Kasturi R, Ho T, Sun Y, 2020. First investigation into the use of deep learning for continuous assessment of neonatal postoperative pain, in: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG), IEEE; pp. 529–533. doi: 10.1109/FG47880.2020.00082. [DOI] [Google Scholar]

[R35] [35].Salekin MS, Zamzmi G, Paul R, Goldgof D, Kasturi R, Ho T, Sun Y, 2019b. Harnessing the power of deep learning methods in healthcare: Neonatal pain assessment from crying sound, in: 2019 IEEE Healthcare Innovations and Point of Care Technologies, (HIPOCT), IEEE; pp. 127–130. doi: 10.1109/HI-POCT45284.2019.8962827. [DOI] [Google Scholar]

[R36] [36].Seok HS, Choi BM, Noh GJ, Shin H, 2019. Postoperative pain assessment model based on pulse contour characteristics analysis. IEEE journal of biomedical and health informatics 23, 2317–2324. doi: 10.1109/JBHI.2018.2890482. [DOI] [PubMed] [Google Scholar]

[R37] [37].Sikka K, Ahmed AA, Diaz D, Goodwin MS, Craig KD, Bartlett MS, Huang JS, 2015. Automated assessment of children’s postoperative pain using computer vision. Pediatrics 136, e124–e131. doi: 10.1542/peds.2015-0029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [Google Scholar]

[R39] [39].Stevens BJ, Pillai Riddell R, Oberlander TE, Gibbins S, 2007. Assessment of pain in neonates and infants. Pain in neonates and infants 3, 67–90. [Google Scholar]

[R40] [40].Talbot K, Madden V, Jones S, Moseley G, 2019. The sensory and affective components of pain: are they differentially modifiable dimensions or inseparable aspects of a unitary experience? a systematic review. British journal of anaesthesia doi: 10.1016/j.bja.2019.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Taylor BJ, Robbins JM, Gold JI, Logsdon TR, Bird T, Anand K, 2006. Assessing postoperative pain in neonates: a multicenter observational study. Pediatrics 118, e992–e1000. doi: 10.1542/peds.2005-3203. [DOI] [PubMed] [Google Scholar]

[R42] [42].Temko A, Doyle O, Murray D, Lightbody G, Boylan G, Marnane W, 2015. Multimodal predictor of neurodevelopmental outcome in newborns with hypoxic-ischaemic encephalopathy. Computers in biology and medicine 63, 169–177. doi: 10.1016/j.compbiomed.2015.05.017. [DOI] [PubMed] [Google Scholar]

[R43] [43].Tiippana E, Hamunen K, Heiskanen T, Nieminen T, Kalso E, Kontinen VK, 2016. New approach for treatment of prolonged postoperative pain: Aps out-patient clinic. Scandinavian Journal of Pain 12, 19–24. doi: 10.1016/j.sjpain.2016.02.008. [DOI] [PubMed] [Google Scholar]

[R44] [44].Walker SM, 2014. Neonatal pain. Pediatric Anesthesia 24, 39–48. doi: 10.1111/pan.12293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Walter-Nicolet E, Annequin D, Biran V, Mitanchez D, Tourniaire B, 2010. Pain management in newborns. Pediatric Drugs 12, 353–365. doi: 10.2165/11318900-000000000-00000. [DOI] [PubMed] [Google Scholar]

[R46] [46].Weiser TG, Gawande A, 2015. Excess surgical mortality: strategies for improving quality of care, in: Essential Surgery: Disease Control Priorities, Third Edition (Volume 1). The International Bank for Reconstruction and Development/The World Bank. doi: 10.1596/978-1-4648-0346-8_ch16. [DOI] [PubMed] [Google Scholar]

[R47] [47].Yang S, Luo P, Loy CC, Tang X, 2016. Wider face: A face detection benchmark, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5525–5533. doi: 10.1109/CVPR.2016.596. [DOI] [Google Scholar]

[R48] [48].Zamzmi G, Kasturi R, Goldgof D, Zhi R, Ashmeade T, Sun Y, 2018. A review of automated pain assessment in infants: Features, classification tasks, and databases. IEEE reviews in biomedical engineering 11, 77–96. doi: 10.1109/RBME.2017.2777907. [DOI] [PubMed] [Google Scholar]

[R49] [49].Zamzmi G, Pai CY, Goldgof D, Kasturi R, Ashmeade T, Sun Y, 2019a. A comprehensive and context-sensitive neonatal pain assessment using computer vision. IEEE Transactions on Affective Computing doi: 10.1109/TAFFC.2019.2926710. [DOI] [Google Scholar]

[R50] [50].Zamzmi G, Paul R, Salekin MS, Goldgof D, Kasturi R, Ho T, Sun Y, 2019b. Convolutional neural networks for neonatal pain assessment. IEEE Transactions on Biometrics, Behavior, and Identity Science 1, 192–200. doi: 10.1109/TBIOM.2019.2918619. [DOI] [Google Scholar]

PERMALINK

Multimodal Spatio-Temporal Deep Learning Approach for Neonatal Postoperative Pain Assessment

Md Sirajus Salekin

Ghada Zamzmi

Dmitry Goldgof

Rangachar Kasturi

Thao Ho

Yu Sun

Abstract

1. Introduction

Figure 1:

Figure 2:

2. Technical Background

2.1. VGG-Net and LSTM

2.2. Bilinear CNN

3. Neonatal Pain Dataset

3.1. Setup and Painful Procedures

3.2. Ground Truth Labels

Figure 3:

4. Methodology

Figure 4:

4.1. Facial Expression Analysis

4.1.1. Pre-processing and Augmentation

4.1.2. Facial Feature Extraction

Table 1.

4.1.3. Temporal Information Integration

Table 2.

4.2. Body Movement Analysis

4.2.1. Pre-processing and Augmentation

4.2.2. Feature Extraction

Figure 5:

4.2.3. Temporal Information Integration

4.3. Crying Sound Analysis

4.3.1. Pre-processing and Augmentation

Figure 6:

Figure 7:

4.3.2. Feature Extraction

4.4. Multimodal Approach

Table 3.

5. Experimental Results and Discussion

5.1. Dataset Preparation

5.2. Training and Evaluation Protocol

5.3. Unimodal Postoperative Pain Assessment

Table 4.

5.4. Multimodal Postoperative Pain Assessment

Figure 8:

Table 5.

6. Conclusion and Future Work

Highlights.

Acknowledgment

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases