Automated pain assessment based on facial expression of free-moving mice

Koji Kobayashi; Naoaki Sakamoto; Yusuke Miyazaki; Masahito Yamamoto; Takahisa Murata

doi:10.1093/pnasnexus/pgaf352

. 2025 Nov 5;4(11):pgaf352. doi: 10.1093/pnasnexus/pgaf352

Automated pain assessment based on facial expression of free-moving mice

Koji Kobayashi ¹, Naoaki Sakamoto ^2,^#, Yusuke Miyazaki ^3,^#, Masahito Yamamoto ⁴, Takahisa Murata ^5,^6,^7,^c,^✉

Editor: Yannis Yortsos

PMCID: PMC12617409 PMID: 41245089

Abstract

Pain is a basic sensation associated with tissue injury. Although facial expression is a useful indicator of pain in mammals, its assessment in rodents requires expertise and experience. Here, we aimed to establish an automated pain assessment method using the facial images of free-moving mice. A convolutional neural network (CNN) was trained with the facial images of untreated mice and those subjected to acetic acid (AC)-induced pain. The trained CNN successfully predicted the faces of AC-, capsaicin-, and calcitonin gene-related peptide-induced pain that had not been used for CNN training. It also detected the analgesic effect of diclofenac, a nonsteroidal anti-inflammatory drug, against AC-induced pain. We used dimensionality reduction algorithms to select images with similar compositions and visualized the regions focused on by the CNN during predictions. The CNN focused on the head, forehead, ear, eye, cheek, and nose to predict pain or no pain. In conclusion, we established a method for automated pain assessment using the facial images of free-moving mice.

Keywords: pain, facial expression, machine learning, neural network

Significance Statement.

Pain assessment in rodents depends on researcher's manual observation which is not only labor-intensive but often produces subjective and unreproducible results. We here developed an automated pain assessment method from facial expression in free-moving mice. Mouse facial images were automatically extracted from video files and convolutional neural network (CNN) were trained with them. The CNN accurately predicted pain induced by three different stimuli. In addition, we showed that CNN focused on head, forehead, ear, eye, cheek, and nose for the prediction of pain or no pain. This study provides an objective, reproducible, and interpretable method to assess pain in free-moving mice.

Introduction

We empirically identify several emotional states of animals from their behavior, such as their motion, activity, and facial expression. For example, we can estimate to some extent whether dogs and cats are sick, angry, joyful, or happy. Experimental animals, such as rodents, also experience emotions. Understanding their emotions is expected to promote the ethical treatment of these animals and provide fundamental insights into various scientific fields, including neuroscience, ethology, and physiology. However, as experimental animals are small compared with companion and industrial animals, it is difficult to recognize subtle changes in their behavior and facial expressions with the naked eye to assess their emotions.

Pain is among the basic sensations associated with tissue injury (1). As pain significantly impairs the quality of life of animals and humans, many researchers have intensely investigated the mechanism of pain induction and strategies to relieve it using experimental animal models. Animals experiencing pain exhibit typical expressions, such as decreased locomotor activity, facial grimace, and rough hair coat. However, a precise evaluation of pain in experimental animals has been challenging owing to the minute nature of the pain-induced changes in these animals. Various methods have been developed to assess experimental pain in animals (2, 3). Classically, the writhing test, which counts the characteristic stretching behavior induced by the intraperitoneal injections of chemicals, is widely used in pain studies. The tail-flick test and von Frey test assess the escape behavior from heat or mechanical simulation. In 2010, Langford et al. (4) developed the mouse grimace scale (MGS) to quantify facial expressions of pain, focusing on eye tightening, nose/cheek bulging, ear position, and whisker position. This method has been extended to other animals, including rats and piglets (5, 6). However, the methods described above have a common drawback, i.e. they rely entirely on human observation, which requires expertise and experience for assessment, and the results often vary depending on the observer, resulting in a lack of reproducibility and objectivity. In addition, this approach is time-consuming, has a low throughput, and induces physical and mental fatigue for the observer.

Several studies have established automated and quantitative pain assessment methods using machine learning to address this problem. Abdus-Saboor et al. (7) analyzed mouse behavior induced by mechanical stimuli in the paws and calculated pain scores using a support vector machine algorithm. Recently, Dolensek et al. (8) demonstrated that feature extraction using a histogram of oriented gradients (HOG) can distinguish six emotion states, including pain, from mouse facial images. Although these methods seem sufficiently accurate and rapid, they require specific equipment. Moreover, the movement of the mice is restricted, which reduces experimental freedom and may cause stress in mice. In addition, several automated grimace scale scoring approaches have been reported (9, 10). In these methods, researchers manually scored the MGS for tens of thousands of images and trained classifiers to predict these scores. While accurate, such approaches still rely on labor-intensive MGS labeling, which limits their broader practical utility. Therefore, there remains a need to establish simple, efficient, and minimally invasive methods for automated pain detection in rodents.

Since Krizhevsky et al. (11) have proposed that convolutional neural networks (CNNs) exhibit outstanding performance in image classification tasks, neural network technology has made remarkable progress. For animal behavior assessment, several CNN-based algorithms have been developed for animal pose estimation, including DeepLabCut and LEAP (12, 13). In recent years, we and others have shown that neural networks can also predict serial behavior, such as scratching and grooming, recorded in videos (14–16). These results highlight the potential of neural networks in investigating animal behavior. However, only a few studies have used neural networks to assess animal emotions.

In this study, we established an automated and objective method for pain detection based on facial images, without the need for manual MGS scoring. Our results reveal that the CNN-based algorithm can successfully predict acetic acid (AC)-, capsaicin-, and calcitonin gene-related peptide (CGRP)-induced pain based on the facial images of free-moving mice.

Results

Establishment of pain classifier from facial expression

We collected the facial images of BALB/c mice (nine male, three female) before and after AC administration (“pre” and “AC” images, respectively) to establish an image classifier. All “pre” images were labeled as “no pain,” whereas all “AC” images were labeled as “pain.” We understand that these criteria may include errors, as there is no assurance that AC-treated mice invariably exhibit facial pain and vice versa (see also Supplementary Document S1, Fig. S3A and B). However, these “weak” labeling criteria (17) allowed us to take advantage of large-size datasets with minimal effort. A CNN was trained with these “pre” and “AC” training datasets (details are described in Supplementary Document S1). The trained CNN-based algorithm effectively classified images in the training dataset with a sensitivity of 97.0% and a specificity of 99.4% (Table 1). We note that the trained CNN demonstrated comparable sensitivity and specificity between female and male mice (sensitivity: female 98.0%, male 96.6%; specificity: female 99.5%, male 99.4%).

Table 1.

Confusion matrix of convolutional neural network prediction for training datasets.

		Prediction
		Pain	No pain
Label	Pain	269,497	8,437
	No pain	1,526	266,354

Open in a new tab

Prediction of AC-induced pain

We then investigated whether the trained CNN could predict the first-look data. After 10 min of habituation at the recording stage, the behavior of male BALB/c mice (N = 5 for each treatment, 8–9 weeks old) was recorded for 20 min. Subsequently, 1% AC, 0.6% AC, or vehicle (saline, 100 µL) was intraperitoneally administered to BALB/c mice, and their behavior was recorded immediately after these treatments. The preprocessing of these videos was similar to that of the training dataset videos. The probability of pain for each image was predicted using the trained CNN. The results were pooled, cumulated every minute, and shown as pain probability, as presented in the Materials and methods section, Fig. S4A and B. Before vehicle or AC administration, each group showed a low pain probability, and there was no significant difference among them (Fig. 1A). The vehicle-induced transient pain, probably caused by needle puncture and fluid injection into the abdominal cavity, disappeared within 5 min. In contrast, AC treatment increased pain probability, which started at 5 min post-treatment and lasted for 30 min (Fig. 1B). We confirmed that 30-minute average pain probability tended to increase with the administration of 0.6% AC and significantly increased with the administration of 1% AC (Fig. 1C). These results clearly indicate that the trained CNN has sufficient generalization ability to treat the first-look data. We further assessed whether the trained CNN could detect the analgesic effect of a known pain-relieving drug. Female BALB/c mice (N = 8–9, 9–16 weeks old) received an intraperitoneal injection of either vehicle (saline, 100 µL) or diclofenac (10 mg/kg), a nonsteroidal anti-inflammatory drug. After 45 min, mice were injected intraperitoneally with AC (0.6%, 100 µL), and their behavior was recorded for 30 min. As shown in Fig. 1D and E, pretreatment with diclofenac significantly reduced AC-induced pain compared with vehicle treatment. These results not only demonstrate the robustness of our system but also confirm that the CNN-based model can reliably detect “pain” from facial images.

Fig. 1. — Assessment of AC-, capsaicin-, and CGRP-induced pain with trained CNN. A and B) Pain probability in each minute before (A) and after intraperitoneal AC treatment (B). C) Thirty-minute average pain probability after treatment. D) Pain probability following the application of AC (0.6%, 100 μL) after a 45-min pretreatment with either vehicle (saline, 100 μL) or diclofenac (10 mg/kg). E) Thirty-minute average pain probability after AC treatment. F) Pain probability after intradermal vehicle or capsaicin treatment (10 μg). G) Thirty-minute average pain probability after treatment. H) Pain probability after intraperitoneal vehicle or CGRP treatment (0.1 mg/kg). I) Thirty-minute average pain probability after treatment. Veh, vehicle; Dic, diclofenac. *P < 0.05, **P < 0.01, ***P < 0.001. Data were shown as mean ± SEM or individually.

Prediction of capsaicin-induced pain

We then demonstrated the expandability of this method by investigating whether the trained CNN could predict other mouse pain models. We used a capsaicin-induced pain model that is known to induce transient pain (18). After 10 min of habituation, vehicle (7% Tween 80 in saline, 10 µL) or capsaicin (10 μg) were intradermally injected into the nape of BALB/c mice (N = 6 for each treatment, male and female, 7–9 weeks old). Immediately after the treatments with the vehicle or capsaicin, their behavior was recorded for 30 min. As shown in Fig. 1F, the vehicle treatment transiently induced pain, which declined within 5 min. However, capsaicin treatment significantly increased the pain probability lasting for 30 min (Fig. 1F and G), which returned to the basal level within an hour (not shown).

Prediction of CGRP-induced pain

The CGRP-induced pain was also evaluated using the trained CNN. Intraperitoneal treatment with vehicle (saline, 100 µL) only induced transient pain in BALB/c mice (N = 9, female, 9–11 weeks old) similar to the results described above. Treatment with CGRP (0.1 mg/kg) increased pain probability for 30 min (Fig. 1H and I). However, CGRP-induced pain probability was lower than that induced by 1% AC and capsaicin treatments.

The interpretation of CNN

Several methods, including gradient-weighted class activation mapping (Grad-CAM), can help visualize regions that critically affect the output prediction of a CNN (19). Here, we attempted to identify the regions where the trained CNN focused to predict “no pain” or “pain.” In other words, we aimed to understand where CNN looked at to make a prediction. Grad-CAM was applied to all “pre” and “AC” training datasets. An example of the result of Grad-CAM analysis is shown in Fig. S5A, which indicates that the CNN focused on the head, ear, eye, and mouth for prediction. However, there were many images with different angles, and it was challenging to compare these images directly. Therefore, mutually similar composition pairs were sought from these images.

Images in the training dataset were projected onto a 2D space using dimensionality reduction algorithms (Supplementary Document S2 and Fig. S5B–E). Each image was plotted as a single point within this space, and images with similar compositions were found to be located close together. Typical nearest image pairs, which appear to have a similar composition, are shown in Fig. 2A. A total of 500 mutually similar composition pairs were randomly selected, and 13 specific regions, including head, forehead, above-ear, ear top, ear base, eye, cheek, nose, mouth, hand, stage, underbody, and others, were examined to determine whether the CNN focused on them (Fig. 2B). The CNN primarily focused on the ear top, eye, cheek, nose, and mouth regions when predicting “no pain,” whereas it emphasized the head, forehead, above-ear, and eye regions when predicting “pain” (Fig. 2C). These results were consistent when we compared 1,000 no pain and 1,000 pain images randomly chosen from all training datasets without considering similarities (Fig. S5F).

Fig. 2. — Visualization of Grad-CAM. A) Typical similar composition pair with Grad-CAM visualization. Original, Grad-CAM, and overlay images are shown. Each row indicates a mutually similar composition pair. B) Example of regions. C) Comparison of Grad-CAM positive regions between no pain and pain images. Five hundred random similar composition pairs were selected and analyzed. *P < 0.05.

Discussion

Evaluating the emotional states of experimental animals is inherently challenging due to the limited capacity of human observers to recognize subtle behavioral and facial cues. In this study, we established an automated method for pain assessment in mice based on facial expressions. Our approach offers advantages over existing methods: it eliminates the need for labor-intensive manual grimace scoring during classifier training, and it does not require immobilization of the mouse head on a platform. As a result, our system reduces stress for the animals while allowing for extended and more naturalistic experimental protocols.

Tissue injury following inflammation leads to the production of various damage-associated mediators, including bradykinin and ATP, stimulating primary sensory nerves with Aδ- and C-fibers. The information detected from these stimulated sensory nerves is transmitted through secondary spinal neurons to the thalamus, which then transmits these signals to various brain areas, such as the primary sensory cortex, anterior cingulate cortex, and prefrontal cortex, leading to pain perception and emotional reactions (20). Here, we used three mouse pain models, i.e. AC-, capsaicin-, and CGRP-induced pain models. AC and capsaicin activate TRPV1 channels in primary sensory neurons, whereas CGRP stimulates CGRP receptors in secondary neurons. Although CNN was trained only on faces with AC-induced pain, the trained CNN could assess the pain induced using all three pain models. Our results reveal that “pain face” is uniform at least from the CNN's perspective even though it was induced by different stimuli. Moreover, our results support the hypothesis that various algogenic stimuli converge in the brain and induce a stereotyped and similar grimace facial expression in mice. Pain stimuli can cause emotions such as malaise, discomfort, and anxiety. Although trained CNN successfully predicted pain in other pain models and detected the analgesic effect of diclofenac, it remains unclear whether the CNN truly distinguished pain. Thus, it is still possible that the trained CNN learned facial malaise as pain. Assessing facial expressions in other emotion models will provide further insights.

Pain changes facial expression in animals. Langford et al. (4) summarized five observable and quantifiable expressions, i.e. orbital tightening, nose/cheek bulge, ear retraction, and whisker change, and established a scoring method based on the MGS. Although the grimace scale provides a useful method for assessing pain, this assessment requires the expertise and experience of an observer. Several studies have aimed at automating grimace scale scoring. Tuttle et al. (9) established a CNN that directly predicts grimace scores based on facial images. Moreover, several studies have proposed two-step methods involving (i) the manual or machine-learning-based identification of the eye, ear, and other facial parts and (ii) the calculation of grimace scale-related indices from identified points (e.g. eye area for squint and ear angle for ear retraction) (21–23). More recently, McCoy et al. proposed an automated MGS scoring method that combined regional detection of the eyes, nose, ears, and whiskers with subsequent prediction of scores for each region (10). In addition, several studies have reported classifiers without predetermined rules that do not explicitly intend to mimic grimace scale scoring. Dolensek et al. demonstrated that feature extraction from facial images using HOG filters, followed by unsupervised clustering, classified six emotions, including tail-shock-induced pain (8). They also showed that the ear, nose, cheek, and nose regions changed in response to neutral faces. Tanaka et al. successfully classified facial images into neutral, tail-pinch-induced pain, and brushing images using a CNN-based algorithm. Their trained CNN focused on the ears, eyes, cheeks, and mice (24).

In this study, we analyzed the facial images of free-moving mice using Grad-CAM to visualize the critical region identified by the CNN-based algorithm to predict pain. A comparison of similar composition pairs revealed that the trained CNN focused on the ear, eye, mouth, and cheek to distinguish between pain and no pain, as in previous studies. In addition, the trained CNN examined the head and forehead to predict pain, which were newly identified as regions expressing pain. Previous studies used the facial images of head-constrained mice, which may have prevented examining these areas for prediction. In this study, we did not elucidate the mechanism underlying the changes in these regions because the interpretability of neural networks is still low. Although we considered that the quality of mouse fur and the head–forehead angle may have changed, quantifying these parameters is challenging and remains to be elucidated.

The annotation of a dataset is a laborious and challenging step during the training of machine learning models, preventing the use of large-scale datasets. In this study, the CNN was trained with “pre” and “AC” images, which were uniformly labeled as “no pain” and “pain,” respectively. A few labels may be incorrect because AC-treated mice may not invariably exhibit facial pain, and vice versa (Supplementary Document S1, Fig. S3A and B). However, training the CNN with these weakly labeled datasets (17) converged successfully (Fig. S3D), and the trained CNN could predict a dose-dependent increase in AC-induced pain in the first-look dataset (Fig. 1A–C). These results indicate that we do not need to obtain strictly accurate labels and that a large dataset (∼545 K images in this study) can overcome the contamination caused by some incorrect labels. This implication is critical because it enables the expansion of this approach to other emotions, including depression, fear, and pleasure. For example, to establish a facial expression classifier for predicting depression, we can label the facial images of mouse with depression (i.e. depression model mice) as the “depression face” without laborious annotation. Although further studies are required, our results will help establish machine-learning-based models for predicting different emotion states in free-moving mice.

Our method has one important limitation. In this study, we trained CNN with facial images of white-coated BALB/c mice. And we confirmed that the trained CNN could not be applicable to the pain prediction of black-coated C57BL/6 mice. Indeed, it could not detect facial pain induced by intraperitoneal AC (1%) administration (Fig. S6A). Grad-CAM analysis revealed that the CNN did not focus on mouse region in some images (Fig. S6B). These findings suggest that separate models may be required for different strains or colors.

In conclusion, we established an automated pain assessment method for unconstrained free-moving mice in this study. The trained model accurately identified pain in the first-look data based on a previously unreported expression of pain in the head and forehead.

Materials and methods

Animals

BALB/c mice (male and female, 7–16 weeks old) were purchased from Japan SLC Inc. (Shizuoka, Japan). C57BL/6 mice (female, 12 weeks old) were purchased from Shiraishi Animals Co., Ltd (Saitama, Japan). All experiments were approved by the Institutional Animal Care and Use Committee at the University of Tokyo (P20-004, P21-106, P22-002, and P24-112). Animal care and treatments were performed in accordance with the guidelines outlined within the Guide to Animal Use and Care of the University of Tokyo.

Spontaneous locomotor activity measurement

Male BALB/c mice (7–9 weeks old) were habituated in the black square arena (50 × 50 × 50 cm) for 60 min. Vehicle (saline, 100 µL) or 1% AC was intraperitoneally injected in these mice (N = 6 for each treatment), and their behavior was recorded by a camera (HDR-CX720V, Sony Corporation, Tokyo, Japan) set at a height of 110 cm above the arena. Detailed recording conditions were as follows: frame rate: 30 Hz, resolution: 1,920 × 1,080 pixel. We calculated the movement of mice during each 1 s as previously described and represented them as moving distance (25).

Facial image recording apparatus

Mice were habituated on the black stage (8 × 8 × 16 cm) for 10 min and then their behavior was recorded by two cameras (HXR-NX80, Sony corporation, Tokyo, Japan) arranged at 90° and located 30 cm from the stage. Detailed recording conditions were as follows: frame rate: 30 Hz, resolution: 3,840 × 2,160 pixel.

Facial image recording for CNN training

After habituation for 10 min on the recording stage as described above, mice behavior (BALB/c, nine male and three female, 7–15 weeks old) was recorded for 10 min (defined as “pre” training dataset). 0.6% AC (200 µL) was then intraperitoneally administrated and after mice exhibited first writhing (usually ∼5 min after injection), their behavior was recorded again for 10 min (defined as “AC” training dataset, Fig. S2A). Recorded videos were then split into frame images.

Prediction of eyes and nose with DeepLabCut

To predict the position of nose, right eye, and left eye, we trained DeepLabCut with the movies of four mice selected from training datasets. We first decreased the resolution of all movies from 3,840 × 2,160 to 1,280 × 720 pixel for fast DeepLabCut processing. Fifty images from each movie were selected and positions of the nose, right eye, and left eye were manually annotated. Resnet-50-based network was trained for 50,000 iterations with default settings. After training, we analyzed all movies and obtained the coordinates and likelihood of three points. The coordinates were converted to fit original videos with 3,840 × 2,160 pixel.

Facial image cropping

Movies were split into frame images. After the prediction of DeepLabCut, an image has the coordinate (C) and likelihood (L) of nose, right eye, and left eye (hereinafter called Cnose, Lnose, Ceye_r, Leye_r, Ceye_l, and Leye_l. According to following serial criteria, the 800 × 800 square facial region was cropped from original frame images. (i) If Lnose ≥ 0.9, Leye_r ≥ 0.9, and Leye_l ≥ 0.9: cropped centering around the midpoint of Ceye_r and Ceye_l. (ii) If Lnose ≥ 0.9, Leye_r ≥ 0.9, and Leye_l < 0.9: cropped centering around Ceye_r. (iii) If Lnose ≥ 0.9, Leye_r < 0.9, and Leye_l ≥ 0.9: cropped centering around Ceye_l. (iv) otherwise: not cropped because the mouse was not facing the camera (Fig. S2B).

CNN architecture

We constructed CNN with six CNN blocks and three fully connected (FC) blocks (Fig. S3C). A CNN block was composed of convolution (3 × 3 kernel size, 1 × 1 stride), rectified linear unit (ReLU) activation, batch normalization, and MaxPooling layer (2 × 2 pooling size, 2 × 2 stride). A FC block was composed of linear operation layers and ReLU activation layers except the last block whose activation was SoftMax. A global average pooling layer was located between CNN and FC block. The detailed parameter and shape of output tensor for each layer were shown in Table S1.

CNN training

We labeled these images as follows: all “pre” images were “no pain” and all “AC” images were “pain”. The images and labels were combined as a training dataset. CNN was trained with “pre” and “AC” training dataset. For one epoch, we selected 30,000 images at random, and these images were randomly rotated between −20° and 20° and flipped horizontally for data augmentation. Minibatch size was 128. An Adam optimizer with a 10⁻³ learning rate and cross entropy loss function was used.

MGS scoring

We randomly selected 1,000 images from the training dataset containing ∼545 K images and scored them using the MGS as previously described (4). In brief, eye tightening, nose building, cheek bulging, ear position, and whisker position were scored as 0 (normal), 1 (moderate), and 2 (severe). The scores for each feature were summed to obtain the total MGS. All scoring was performed in a blind manner. Blurry images were excluded from analysis.

AC, capsaicin, and CGRP-induced pain model for trained CNN validation

For the AC-induced pain model, vehicle (saline, 100 μL), 0.6 or 1% AC was intraperitoneally injected into BALB/c mice (15 male and 17 female, 8–16 weeks old), and their facial expression were recorded immediately after the administration. In some experiments, vehicle (saline, 100 µL) or diclofenac (10 mg/kg), a nonsteroidal anti-inflammation drug was intraperitoneally injected 45 min prior to AC (0.6%) injection. For capsaicin-induced pain model, vehicle (7% Tween-80 in saline, 10 μL) or capsaicin (10 μg, FUJIFILM Wako Pure Chemical Corporation) was intradermally injected to the nape of BALB/c mice (n = 6 respectively, male and female, 7–9 weeks old) and their facial expression was recorded. For the CGRP-induced pain model, vehicle (saline, 100 μL) or rat α-CGRP (0.1 mg/kg, FUJIFILM Wako Pure Chemical Corporation) was intraperitoneally injected at random to BALB/c mice (n = 9, female, 9–11 weeks old). Immediately after injection, their facial expression and activity were recorded. After several days, mice received the other (i.e. CGRP for vehicle-treated mice and vehicle for CGRP-treated mice) treatment, and their facial expressions were similarly recorded.

Calculation of pain probability

CNN finally output two decimal values from one input image representing the probability of “no pain” and “pain” respectively. Since two cameras were used for recording, we sometimes obtained two facial images for one timepoint when the mouse face was captured by both two cameras (Fig. S4A). The results from cameras were integrated as follows (Fig. S4B). (i) If facial images were obtained from both cameras, their pain and no pain probability was summed up and the higher of the two values was used as the final prediction. (ii) If a facial image was obtained from only one camera, the prediction from this image was used. (iii) If no facial images were obtained, we skipped evaluation at that timepoint since mouse did not face either camera. We finally tallied up the count of “no pain,” “pain,” and “skipped” frames for every one minute (total 1,800 frames) and calculated the pain probability for 1 min as follows:

Pain probability = \frac{Number of pain frames}{Number of no pain + pain frames} .

The classification of Grad-CAM positive regions

Pytorch-gradcam library (https://github.com/vickyliin/gradcam_plus_plus-pytorch) was introduced for calculating Grad-CAM in the last CNN layer and visualized after min–max normalization. Grad-CAM positive regions were annotated from 13 regions (head, forehead, ear above, ear top, ear base, eye, cheek, nose, mouth, hand, stage, under body, and others, Fig. 2B) for each image. When positive areas were spread to several regions, all of the regions were judged as positive.

The pain evaluation of C57BL/6 mice

Behavior of C57BL/6 mice (N = 2, female, 12 weeks old) was recorded for 10 min (pre). 1% AC (100 µL) was then intraperitoneally administrated their behavior was recorded again for 10 min. DeepLabCut was newly trained to detect nose, left eye, and right eye in C57BL/6 mice as described above. Facial images were cropped from each video file according to the predicted coordinates of nose, left eye, and right eye. Finally, pain probability of these images was evaluated using CNN trained with BALB/c mouse dataset as described above.

Statistics

The results were represented individually or as mean ± SEM. Statistical evaluation of the data was performed by unpaired Student's t-test for comparison between the two groups. The comparison of multiple groups was performed using one-way ANOVA followed by Dunnett's test. A comparison of frequency was performed using chi-squared test followed by Bonferroni's compensation. The value of P < 0.05 was considered to be statistically significant.

Software and hardware

The training of neural networks was conducted on AI-COMPLIANT ADVANCED COMPUTER SYSTEM at the information initiative center, Hokkaido University (Sapporo, Japan) equipped with an NVIDIA Tesla V100 GPU. Predictions and other processes were conducted on a Desktop computer with NVIDIA GeForce GTX 1080 Ti or RTX 4090. Training and predictions were conducted using the Pytorch library in Python. The other specific libraries and parameters used have been described above or in Supplementary Documents S1 and S2.

Supplementary Material

pgaf352_Supplementary_Data

pgaf352_supplementary_data.pdf^{(918.5KB, pdf)}

Contributor Information

Koji Kobayashi, Laboratory of Food and Animal Systemics, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.

Naoaki Sakamoto, Laboratory of Animal Radiology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.

Yusuke Miyazaki, Laboratory of Animal Radiology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.

Masahito Yamamoto, Autonomous Systems Engineering Laboratory, Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan.

Takahisa Murata, Laboratory of Food and Animal Systemics, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan; Laboratory of Animal Radiology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan; Laboratory of Veterinary Pharmacology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.

Supplementary Material

Supplementary material is available at PNAS Nexus online.

Funding

This work was supported by a Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science (20H05678 and 25H00430 to T.M.) and an Adaptable and Seamless Technology Transfer Program through Target-driven R&D (A-STEP) from the Japan Science and Technology Agency (JST, JPMJTR22UF, to T.M.). This work was partly achieved through the use of the AI-COMPLIANT ADVANCED COMPUTER SYSTEM at the Information Initiative Center, Hokkaido University (Sapporo, Japan).

Author Contributions

Koji Kobayashi (Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft), Naoaki Sakamoto (Data curation, Investigation, Writing—review & editing), Yusuke Miyazaki (Data curation, Investigation, Writing—review & editing), Masahito Yamamoto (Resources, Supervision, Writing—review & editing), and Takahisa Murata (Conceptualization, Resources, Software, Supervision, Funding acquisition, Validation, Methodology, Project administration, Writing—review & editing)

Data Availability

All data are included in the manuscript and in the Supplementary material.

References

1. Raja SN, et al. 2020. The revised International Association for the Study of Pain definition of pain: concepts, challenges, and compromises. Pain. 161:1976–1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Gregory NS, et al. 2013. An overview of animal models of pain: disease models and outcome measures. J Pain. 14:1255–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Barrot M. 2012. Tests and models of nociception and pain in rodents. Neuroscience. 211:39–50. [DOI] [PubMed] [Google Scholar]
4. Langford DJ, et al. 2010. Coding of facial expressions of pain in the laboratory mouse. Nat Methods. 7:447–449. [DOI] [PubMed] [Google Scholar]
5. Sotocinal SG, et al. 2011. The rat grimace scale: a partially automated method for quantifying pain in the laboratory rat via facial expressions. Mol Pain. 7:55. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Di Giminiani P, et al. 2016. The assessment of facial expressions in piglets undergoing tail docking and castration: toward the development of the piglet grimace scale. Front Vet Sci. 3:100. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Abdus-Saboor I, et al. 2019. Development of a mouse pain scale using sub-second behavioral mapping and statistical modeling. Cell Rep. 28:1623–1634.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Dolensek N, Gehrlach DA, Klein AS, Gogolla N. 2020. Facial expressions of emotion states and their neuronal correlates in mice. Science. 368:89–94. [DOI] [PubMed] [Google Scholar]
9. Tuttle AH, et al. 2018. A deep neural network to assess spontaneous pain from mouse facial expressions. Mol Pain. 14. 10.1177/1744806918763658 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. McCoy ES, et al. 2024. Development of PainFace software to simplify, standardize, and scale up mouse grimace analyses. Pain. 165:1793–1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Krizhevsky A, Sutskever I, Hinton GE. 2012. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 25:1097–1105. [Google Scholar]
12. Mathis A, et al. 2018. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 21:1281–1289. [DOI] [PubMed] [Google Scholar]
13. Pereira TD, et al. 2019. Fast animal pose estimation using deep neural networks. Nat Methods. 16:117–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Geuther BQ, et al. 2021. Action detection using a neural network elucidates the genetics of mouse grooming behavior. Elife. 10:e63207. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Kobayashi K, et al. 2021. Automated detection of mouse scratching behaviour using convolutional recurrent neural network. Sci Rep. 11:658. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Sakamoto N, et al. 2022. Automated grooming detection of mouse by three-dimensional convolutional neural network. Front Behav Neurosci. 16:797860. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Ren Z, Wang S, Zhang Y. 2023. Weakly supervised machine learning. CAAI Trans Intell Technol. 8:549–580. [Google Scholar]
18. Frias B, Merighi A. 2016. Capsaicin, nociception and pain. Molecules. 21:797. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Selvaraju RR, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 2017. p. 618–626.
20. Mercer Lindsay N, Chen C, Gilam G, Mackey S, Scherrer G. 2021. Brain circuits for pain and its treatment. Sci Transl Med. 13:eabj7360. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Rea BJ, et al. 2022. Automated detection of squint as a sensitive assay of sex-dependent calcitonin gene-related peptide and amylin-induced pain in mice. Pain. 163:1511–1519. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Le Moëne O, Larsson M. 2023. A new tool for quantifying mouse facial expressions. eNeuro. 10:0349–22. 10.1523/ENEURO.0349-22.2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Daruwalla K, et al. 2024. Cheese3D: sensitive detection and analysis of whole-face movement in mice. bioRxiv 59305. 10.1101/2024.05.07.593051, preprint: not peer reviewed. [DOI]
24. Tanaka Y, Nakata T, Hibino H, Nishiyama M, Ino D. 2023. Classification of multiple emotional states from facial expressions in head-fixed mice using a deep learning-based image analysis. PLoS One. 18:e0288930. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Kobayashi K, Shimizu N, Matsushita S, Murata T. 2020. The assessment of mouse spontaneous locomotor activity using motion picture. J Pharmacol Sci. 143:83–88. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

pgaf352_Supplementary_Data

pgaf352_supplementary_data.pdf^{(918.5KB, pdf)}

Data Availability Statement

All data are included in the manuscript and in the Supplementary material.

[pgaf352-B1] 1. Raja SN, et al. 2020. The revised International Association for the Study of Pain definition of pain: concepts, challenges, and compromises. Pain. 161:1976–1982. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B2] 2. Gregory NS, et al. 2013. An overview of animal models of pain: disease models and outcome measures. J Pain. 14:1255–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B3] 3. Barrot M. 2012. Tests and models of nociception and pain in rodents. Neuroscience. 211:39–50. [DOI] [PubMed] [Google Scholar]

[pgaf352-B4] 4. Langford DJ, et al. 2010. Coding of facial expressions of pain in the laboratory mouse. Nat Methods. 7:447–449. [DOI] [PubMed] [Google Scholar]

[pgaf352-B5] 5. Sotocinal SG, et al. 2011. The rat grimace scale: a partially automated method for quantifying pain in the laboratory rat via facial expressions. Mol Pain. 7:55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B6] 6. Di Giminiani P, et al. 2016. The assessment of facial expressions in piglets undergoing tail docking and castration: toward the development of the piglet grimace scale. Front Vet Sci. 3:100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B7] 7. Abdus-Saboor I, et al. 2019. Development of a mouse pain scale using sub-second behavioral mapping and statistical modeling. Cell Rep. 28:1623–1634.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B8] 8. Dolensek N, Gehrlach DA, Klein AS, Gogolla N. 2020. Facial expressions of emotion states and their neuronal correlates in mice. Science. 368:89–94. [DOI] [PubMed] [Google Scholar]

[pgaf352-B9] 9. Tuttle AH, et al. 2018. A deep neural network to assess spontaneous pain from mouse facial expressions. Mol Pain. 14. 10.1177/1744806918763658 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B10] 10. McCoy ES, et al. 2024. Development of PainFace software to simplify, standardize, and scale up mouse grimace analyses. Pain. 165:1793–1805. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B11] 11. Krizhevsky A, Sutskever I, Hinton GE. 2012. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 25:1097–1105. [Google Scholar]

[pgaf352-B12] 12. Mathis A, et al. 2018. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 21:1281–1289. [DOI] [PubMed] [Google Scholar]

[pgaf352-B13] 13. Pereira TD, et al. 2019. Fast animal pose estimation using deep neural networks. Nat Methods. 16:117–125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B14] 14. Geuther BQ, et al. 2021. Action detection using a neural network elucidates the genetics of mouse grooming behavior. Elife. 10:e63207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B15] 15. Kobayashi K, et al. 2021. Automated detection of mouse scratching behaviour using convolutional recurrent neural network. Sci Rep. 11:658. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B16] 16. Sakamoto N, et al. 2022. Automated grooming detection of mouse by three-dimensional convolutional neural network. Front Behav Neurosci. 16:797860. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B17] 17. Ren Z, Wang S, Zhang Y. 2023. Weakly supervised machine learning. CAAI Trans Intell Technol. 8:549–580. [Google Scholar]

[pgaf352-B18] 18. Frias B, Merighi A. 2016. Capsaicin, nociception and pain. Molecules. 21:797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B19] 19. Selvaraju RR, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 2017. p. 618–626.

[pgaf352-B20] 20. Mercer Lindsay N, Chen C, Gilam G, Mackey S, Scherrer G. 2021. Brain circuits for pain and its treatment. Sci Transl Med. 13:eabj7360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B21] 21. Rea BJ, et al. 2022. Automated detection of squint as a sensitive assay of sex-dependent calcitonin gene-related peptide and amylin-induced pain in mice. Pain. 163:1511–1519. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B22] 22. Le Moëne O, Larsson M. 2023. A new tool for quantifying mouse facial expressions. eNeuro. 10:0349–22. 10.1523/ENEURO.0349-22.2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B23] 23. Daruwalla K, et al. 2024. Cheese3D: sensitive detection and analysis of whole-face movement in mice. bioRxiv 59305. 10.1101/2024.05.07.593051, preprint: not peer reviewed. [DOI]

[pgaf352-B24] 24. Tanaka Y, Nakata T, Hibino H, Nishiyama M, Ino D. 2023. Classification of multiple emotional states from facial expressions in head-fixed mice using a deep learning-based image analysis. PLoS One. 18:e0288930. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf352-B25] 25. Kobayashi K, Shimizu N, Matsushita S, Murata T. 2020. The assessment of mouse spontaneous locomotor activity using motion picture. J Pharmacol Sci. 143:83–88. [DOI] [PubMed] [Google Scholar]

PERMALINK

Automated pain assessment based on facial expression of free-moving mice

Koji Kobayashi

Naoaki Sakamoto

Yusuke Miyazaki

Masahito Yamamoto

Takahisa Murata

Roles

Abstract

Significance Statement.

Introduction

Results

Establishment of pain classifier from facial expression

Table 1.

Prediction of AC-induced pain

Fig. 1.

Prediction of capsaicin-induced pain

Prediction of CGRP-induced pain

The interpretation of CNN

Fig. 2.

Discussion

Materials and methods

Animals

Spontaneous locomotor activity measurement

Facial image recording apparatus

Facial image recording for CNN training

Prediction of eyes and nose with DeepLabCut

Facial image cropping

CNN architecture

CNN training

MGS scoring

AC, capsaicin, and CGRP-induced pain model for trained CNN validation

Calculation of pain probability

The classification of Grad-CAM positive regions

The pain evaluation of C57BL/6 mice

Statistics

Software and hardware

Supplementary Material

Contributor Information

Supplementary Material

Funding

Author Contributions

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases