Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Feb 11;15(2):e0228739. doi: 10.1371/journal.pone.0228739

Affect and exertion during incremental physical exercise: Examining changes using automated facial action analysis and experiential self-report

Sinika Timme 1, Ralf Brand 1,*
Editor: Dominic Micklewright2
PMCID: PMC7012425  PMID: 32045442

Abstract

Recent research indicates that affective responses during exercise are an important determinant of future exercise and physical activity. Thus far these responses have been measured with standardized self-report scales, but this study used biometric software for automated facial action analysis to analyze the changes that occur during physical exercise. A sample of 132 young, healthy individuals performed an incremental test on a cycle ergometer. During that test the participants’ faces were video-recorded and the changes were algorithmically analyzed at frame rate (30 fps). Perceived exertion and affective valence were measured every two minutes with established psychometric scales. Taking into account anticipated inter-individual variability, multilevel regression analysis was used to model how affective valence and ratings of perceived exertion (RPE) covaried with movement in 20 facial action areas. We found the expected quadratic decline in self-reported affective valence (more negative) as exercise intensity increased. Repeated measures correlation showed that the facial action mouth open was linked to changes in (highly intercorrelated) affective valence and RPE. Multilevel trend analyses were calculated to investigate whether facial actions were typically linked to either affective valence or RPE. These analyses showed that mouth open and jaw drop predicted RPE, whereas (additional) nose wrinkle was indicative for the decline in affective valence. Our results contribute to the view that negative affect, escalating with increasing exercise intensity, may be the body’s essential warning signal that physiological overload is imminent. We conclude that automated facial action analysis provides new options for researchers investigating feelings during exercise. In addition, our findings offer physical educators and coaches a new way of monitoring the affective state of exercisers, without interrupting and asking them.

1. Introduction

Exercise plays a significant role in reducing the risk of developing diseases and in improving health and wellbeing [1], yet despite knowing that exercise is good for them most adults in Western countries are insufficiently active [2].

Exercise psychologists have spent the last 50 years developing and testing theories about why some people are more successful than others in changing their behavior to promote their own health and exercise more regularly. After decades of focusing on social-cognitive factors and the role of deliberate reasoning in motivation (e.g. goal-setting and self-efficacy) researchers began to focus on the role of more automatic and affective processes in promoting change in health-related behaviors [3, 4, 5].

Affect has been defined as a pleasant or unpleasant non-reflective feeling that is always accessible and is an inherent aspect of moods and emotional episodes, but can be experienced independently of these states as well [6]. Affect can be described in the two orthogonal dimensions: ‘affective valence’ (how good or bad one feels) and ‘arousal’ (high vs. low) [7]. There is conclusive evidence that those who experience a more pleasant affective state during exercise are more likely to exercise again [8].

Dual-mode theory [9] explains how feelings during exercise are moderated by exercise intensity. According to the theory and supported by evidence [10], the affective response to moderate intensity exercise (below ventilatory threshold; VT) is mostly positive, but affective responses to heavy intensity exercise (approaching the VT) are more variable. Some individuals continue to report positive affect as exercise intensity increases, but others report more and more negative affect. When the intensity of exercise increases to the severe domain (when the respiratory compensation threshold, RCT, is exceeded), almost all individuals report a decline in pleasure [9, 10].

Ratings of affective valence above the VT are closely connected to the concept of perceived exertion. Borg [11, p. 8] defined perceived exertion as “… the feeling of how heavy and strenuous a physical task is”. A recent article in Experimental Biology proposed that at high exercise intensities feelings of negative affect and perceived exertion may even convert into one, suggesting that the sensation of severe exertion enters consciousness via a decline in pleasure [12].

We believe that gaining a deeper understanding of the relationship between the affective response to exercise and perceived exertion is important not just from a research perspective, but also from a practical perspective. Practitioners (e.g. teachers and coaches) would greatly and immediately benefit from being able to assess an exerciser’s perceived exertion and his or her momentary affective state to increase the odds of further effective and pleasurable physical exercise.

1.1 Measurement of exercise-induced feelings during exercise

Thus far exercise-induced feelings have been mostly measured with exercisers’ self-reports [3]. The most commonly used psychometric measures of affective valence is the Feeling Scale (FS) [13], a single-item measure consisting of the question “How do you feel right now?” to which responses are given using an 11-point bipolar rating scale. Various studies have shown that displeasure increases with a quadratic trend under increasing exercise intensity, although with considerable inter-individual variability [10].

Perceived exertion, on the other hand, has often been measured with Borg’s rating of perceived exertion (RPE) scale [11]. In this test participants are asked to indicate their actual state during exercise on a 15-point scale ranging from 6 no exertion to 20 maximal exertion. The scale is designed to reflect the heart rate of the individual before, during and after physical exercise. It would be assumed that an RPE of 13 corresponds approximately to a heart rate of 130 [14].

Focusing on two tasks simultaneously (exercising and rating one’s own feelings at the same time) can bias the validity of the answer as well as the feeling states itself. It is known that the act of labeling affect can influence the individual’s affective response [15]. Another limitation is that affective valence changes during exercise [10] and repeatedly asking people how they feel inevitably carries the risk that it will interrupt their experience and introduce additional bias to their answers. Monitoring changes in biometric data avoids these interruptions and can thereby provide an alternative way to learn about the feelings that occur during exercise.

1.2 Facial action (facial expression) analysis

Spectators and commentators on sport readily infer how athletes might feel from their facial movements during exercise. Some of these “expressions” might reveal information about an athletes’ inner state. However, it cannot universally be assumed that observed facial movements always reflect (i.e., are expressive of) an inner state [16]. Facial actions can also be related to perceptual, social, attentional, or cognitive processes [17, 18]. Therefore, we refer to facial expressions as facial actions in order to discourage the misunderstanding that subjective inner states are unambiguously expressed in the face.

The majority of studies conducted so far has quantified facial action by using either facial electromyographic activity (fEMG) or specific coding systems, of which the Facial Action Coding System (FACS) is probably the most widely known [19, 20].

fEMG involves measuring electrical potentials from facial muscles in order to infer muscular contractions. It requires the placement of electrodes on the face and thus can only measure the activity of a pre-selected set of facial muscles. Another limitation of using fEMG is that it is affected by crosstalk, meaning that surrounding muscles interfere with the signals from the muscles of interest, making fEMG signals noisy and ambiguous [21, 22]. A few fEMG studies have demonstrated that contraction of specific facial muscles (corrugator supercilii, zygomaticus and masseter muscle) is correlated with RPE during resistance training [21, 23] and bouts of cycling [20, 24].

Furthermore there are coding systems. Many of them are rooted in the FACS, which is an anatomy based, descriptive systems for manually coding all visually observable facial movements [19]. Trained coders view video-recordings of facial movements frame-by-frame in order to code facial movements into action units (AUs). FACS is time-consuming to learn and use (approximately 100 hours to learn FACS and one to two hours to analyze just one minute of video content) [20].

Recent progress has been made in building computer systems to identify facial actions and analyze them as a source of information about for example affective states [25]. Computer scientists have developed computer vision and machine learning models, which automatically decode the content of facial movements to facilitate faster, more replicable coding. The computer systems display high concurrent validity with manual coding [26].

We are aware of only one study so far that has used automated facial feature tracking to describe how facial activity changed with exercise intensity [27]. The authors analyzed video-recordings of overall head movement and 49 facial points with the IntraFace software to classify movement in the upper and lower face. The study showed that facial activity in all areas differed between intensity domains. The movement increased from lactate threshold until attainment of maximal aerobic power with greater movement in the upper face than in the lower face at all exercise intensities.

1.3 This study

The aim of this study was to examine changes in a variety of discrete facial actions during an incremental exercise test, and relate them to changes in self-reported RPE and affective valence, i.e. feelings that typically occur during exercise. To the best of our knowledge it is the first study to involve the use of automated facial action analysis as a method of investigating the covariation of these variables.

We have used an automated facial action coding system with the Affectiva Affdex algorithm at its core [28]. It includes the Viola Jones Cascaded Classifier algorithm [29] to detect faces in digital videos, and then digitally tags and tracks the configuration of 34 facial landmarks (e.g., nose tip, chin tip, eye corners). Data is fed into a classification algorithm which translates the relative positions and movements of the landmarks into 20 facial actions (e.g., mouth open). Classification by Affectiva Affdex relies on a normative data set based on manual initial codings of human FACS coders, and subsequent machine learning data enrichment with more than 6.5 million faces analyzed [30]. Facial actions as detected by Affectiva Affdex are similar [31] but not identical to the AUs from the FACS. Facial actions consist of single facial movements or combinations of several movements (e.g., facial action mouth open: lower lip drops downwards as indicated by AU 25 lips part; facial action smile as indicated by AU 6 cheak raiser together with AU 12 lip corner puller).

Connecting with dual mode theory [9] and research pointing out the importance of positive affect during exercise for further exercising [8], facial action metrics might provide useful biometric indicators for evaluating feeling states during exercise at different intensities. We took a descriptive approach to analyze which facial actions co-occur with affective valence and perceived exertion during exercise. This approach enables us to contribute conceptually to the examination of the relationship between the constructs of perceived exertion and affective valence (e.g. to determine if they are one or two distinct constructs and whether this depends on physical load) [12], whilst avoiding bias caused by repeatedly interrupting subjects’ experience of exercise to obtain self-reports.

In order to account for expectable high inter- and intra-individual variability in both the affective response to exercise [3] and in facial actions [16], we used multilevel regression modeling to analyze our data; as far as we know, we are the first in this research area to use this method of data analysis.

2. Method and materials

The Research Ethics Committee of the University of Potsdam approved the study and all procedures complied with the Helsinki declaration. All participants gave their signed consent prior to partaking in the experiment. The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish these case details.

2.1 General setup

Study participants completed an exercise protocol involving exercising at increasing intensity on a cycle ergometer until they reached voluntary exhaustion. Whilst they were exercising their face was recorded continuously on video. Both affective valence and perceived exertion were measured repeatedly every two minutes. Changes in facial action were then evaluated with the help of software for automated facial action analysis and related to the self-report data. Advanced statistical methods were used for data analysis, accounting for the generally nested data structure (repeated measurements are nested within individuals).

2.2 Participants

We tested a group of 132 healthy individuals, aged between 18 and 36 years (Mage = 21.58, SDage = 2.93; 53 women). All of them were enrolled in a bachelor’s degree course in sport and exercise science. The group average of (self-reported) at least moderate physical activity was 337 minutes per week. Students with a beard or dependent on spectacles were not eligible to participate. Data from 19 participants were unusable due to recording malfunction (n = 6), poor video quality (n = 6; more than 10% missing values because the software did not detect the face) or due to disturbing external circumstances (n = 7; people entering the room unexpectedly; loud music played in the nearby gym). This resulted in a final sample of 113 study participants.

2.3 Treatment and measures

2.3.1 Exercise protocol

The participants performed an incremental exercise test on an indoor bike ergometer. Required power output was increased by 25 watt increments every two minutes, starting from 25 watts until the participants indicated that they had reached voluntary exhaustion [32]. The protocol was stopped when the participant was unable to produce the required wattage any more. If a participant reached 300 watts, the final phase involved pedaling at this level for two minutes. Thus the maximum duration of the exercise was 26 minutes. All participants performed a five-minute cool-down consisting of easy cycling.

For a plausibility check whether self-declared physical exhaustion would be at least close to the participants’ physiological state heart rate during exercise was monitored in about half of the participants (n = 54). A Shimmer3 ECG device with a sampling rate of 512 Hz was used for that. These participants started with a one-minute heart rate baseline measurement before the exercise.

2.3.2 Affective valence and perceived exertion

The FS (a single item scale: response options range from -5 very bad to +5 very good) [13] was used to measure affective valence, and participants rated their level of exertion using Borg’s Rating of Perceived Exertion (RPE; a single-item scale; response options range from 6 no exertion to 20 maximal exertion) [11]. FS and RPE were assessed every two minutes during the exercise task, at the end of each watt level. For this purpose the two questionnaires (FS first and RPE second) were displayed on the monitor in front of the participants (see below) and they were asked to give their rating verbally to the experimenter.

2.3.3 Automated facial expression analysis

The participants’ facial actions during the exercise task were analyzed using the software Affectiva Affdex [28] as implemented in the iMotionsTM platform for biometric research (Version 7.2). Faces were continuously recorded with a Logitech HD Pro C920 webcam at a sampling rate of 30 fps during performance of the exercise task. The camera was mounted on top of the ergometer screen (0.4 m in front of the face with an angle of 20 degrees from below) and connected to the investigator’s laptop.

Affectiva Affdex continuously analyzed the configuration of the 34 facial landmarks [31] during performance of the exercise task (Fig 1). It provided scores for 20 discrete facial actions (e.g., nose wrinkle, lip press) from all over the face (all facial actions detected by Affectiva Affdex are listed in Table 1, in the results section) [31]. The algorithm performs analysis and classification at frame rate. This means that at a time resolution of 30 picture frames per video second (30 fps), our analyses were based on 1.800 data points per facial action per 1 minute. Recent research has shown that Affectiva Affdex facial action scores are highly correlated with fEMG-derived scores, and that Affectiva Affdex outperforms fEMG in recognizing affectively neutral faces [33].

Fig 1. Examples of facial actions during exercise.

Fig 1

Mouth open and nose wrinkle (left picture), jaw drop (right picture).The position of the 34 analyzed facial landmarks are marked with yellow dots.

Table 1. Repeated measures correlation of all facial actions with FS and RPE.
Facial Action FS RPE
mouth open -0.55* 0.70*
jaw drop -0.40* 0.51*
nose wrinkle -0.34* 0.29*
lip pucker -0.32* 0.32*
upper lip raise -0.31* 0.27*
lid tighten -0.30* 0.26*
eye closure -0.29* 0.30*
smile -0.26* 0.25*
lip stretch -0.21* 0.18*
cheek raise -0.19* 0.21*
lip press -0.19* 0.15*
dimpler -0.17* 0.13*
brow furrow -0.14* 0.17*
eye widen 0.14* -0.27*
lip corner depressor -0.13* 0.13*
lip suck -0.10* 0.01
inner brow raise -0.10* 0.07
brow raise -0.07 0.16*
chin raise -0.07 0.03
smirk -0.07 -0.06

FS, Feeling Scale; RPE, Rating of perceived exertion

*p < .001

Each data point that Affectiva Affdex provides for a facial action is the probability of presence (0–100%) of that facial action. We aggregated these raw data, for each facial action separately, to facial actions scores (time percent scores) indicating how long during a watt level on the ergometer (i.e., within 2 minutes) a facial action was detected with the value 10 or higher. For example, a facial action score of 0 indicates that the facial action was not present during the watt level, whereas a score of 100 indicates that it was present all the time during that watt level.

Fig 1 illustrates examples of facial actions and the analyzed facial landmarks.

2.4 Procedure

After the participants arrived at the laboratory they were informed about the exercise task and told that their face would be filmed during the task. They were also given a detailed description of the two scales (FS and RPE), what they are supposed to measure and how they would be used in the study.

Participation was voluntary and all participants completed data protection forms and were checked for current health problems. Participants performed the exercise task on a stationary cycle ergometer in an evenly and clearly lit laboratory in single sessions. An external 22” monitor was positioned 1.5 m in front of the participant; this was used to display instructions during the exercise session (instruction on watt level for 100 s always at the beginning of each watt level; the two scales, FS and RPE, always for 10 s at the end of each level). Throughout the trial, no verbal encouragement or performance feedback was provided and the researcher followed a standard script of verbal interaction. During the exercise session the researcher remained out of the participants’ sight and noted the participant’s verbal responses when FS and RPE responses were solicited. The periods during which participants were reporting their ratings were cut from the video for the facial action analysis.

2.5 Statistical approach, modeling and data analysis

Multilevel models were used to assess the anticipated increase in negative affect during exercise and to examine the relationships between facial action, affective valence and perceived exertion. We had multiple observations for each participant (20 facial action scores, FS, RPE), so that these repeated measurements (level 1) were nested within individuals (level 2). The main advantages of multilevel models are that they separate between-person variance from within-subject variance, so that estimates can be made at individual level as well as at sample level [34]. Because they use heterogeneous regression slopes (one regression model for each participant) multilevel statistics enable analysis of dependent data and a potentially unbalanced design (series of measurements with different lengths); two conditions that would violate test assumptions of traditional regression and variance analysis.

Our first model tested whether affective valence (FS) showed the expected quadratic trend [10] with increasing perceived exertion (RPE; time-varying predictor). In this model, RPE and derived polynomials were centered at zero and used as a continuous covariate for prediction of change in affective valence (FS).

To investigate which facial actions were associated with affective valence (FS) and with perceived exertion (RPE) we carried out separate analyses of the degree of covariation of FS and RPE with each facial action. First we looked at repeated measure correlations, which take the dependency of the data into account by analyzing common intra-individual associations whilst controlling for inter-individual variability [35]. Then we predicted affective valence (FS) from facial action whilst controlling for the influence of RPE, considering each facial action in a separate model. In parallel analyses we predicted RPE from facial action whilst controlling for the influence of FS. The significance of the fixed effects of facial actions were tested using chi-square tests for differences in -2 log likelihood values. A model with facial action as a predictor was compared with a reduced model without facial action. We compared all models in which FS or RPE was predicted by facial action, using the Akaike Information Criterion Corrected (AICC) and Weight of Evidence (W) [36]. Pseudo Rx2 (within-subject level) was calculated to estimate the proportion of variance explained by the predictor [36].

Finally, to test whether FS and RPE made unique contributions in explaining variance in facial action, we calculated separate multilevel models in which specific facial actions were predicted by FS and RPE. This allowed us to partial out the separate amounts of explained variance of FS and RPE in the respective facial action.

We used the lme script from the nlme package (version 3.1–139) [37] to estimate fixed and random coefficients. This package is supplied in the R system for statistical computing (version 3.6.0) [38].

3 Results

3.1 Manipulation checks

As expected, participants reached different maximum watt levels in the exercise session and so the number of observations varied between participants. In summary, we recorded 1102 data point observations for the 113 participants, derived from between 5 and 13 power levels per participant.

Mean maximum RPE in our sample was 19.29 (SD = 1.01) and the mean heart rate in the final stage before exhaustion was 174.61 bpm (SD = 16.08). This is similar to previously reported reached maximal heart rate in incremental cycling tasks (e.g. HRmax: 179.5 ± 20.2 bpm, in [39]). The correlation between heart rate and RPE was very high, r = .82, p < .001. We believe it is valid to assume that most of the participants were working at close to maximum capacity at the end of the incremental exercise session in our study.

3.2 Multilevel trend analysis of FS with RPE

An unconditional null model was estimated to calculate the intraclass correlation for affective valence (FS) (ρI = .33), supporting the rationale of conducting multilevel analysis [34]. Next we introduced centered RPE (RPE_0) as a time-varying covariate to test the trend of FS with increasing RPE_0.

The model with a quadratic trend (b1 = -0.01, p = .65; b2 = -0.02, p < .001) provided a significantly better fit to the data compared to the linear model, χ2 (1) = 93.39, p < .001. The inclusion of random slopes (χ2 (2) = 141.46, p < .001) and random curvatures further improved the model fit significantly, χ2 (3) = 28.39, p < .001. The full model, with RPE_0 and (RPE_0)2 as fixed effects and random intercepts and slopes, explained 67.12% of the variance in FS.

Thus our results confirm previous results, indicating that FS showed the expected negative quadratic trend [11] with increasing intensity (RPE). Fig 2 illustrates the finding, which can be made particularly obvious by means of multilevel regression analysis: The high interindividual variability in the decrease of affective valence (more negative) under increasing perceived exhaustion is striking.

Fig 2. Quadratic relationship between FS and RPE at individual level.

Fig 2

Data from a random selection of half of the participants (n = 56) are presented to illustrate the intra- and inter-individual variability in affective response to increasing exercise intensity. Intraclass correlation shows that 33% of the variance in affective valence (FS) is due to inter-individual variability.

3.3 Repeated measures correlations

3.3.1 Covariation of FS and RPE with facial action as intensity increases

First correlations between each facial action and FS and RPE were calculated (Table 1). Repeated measures correlations revealed that mouth open (r = -.55, p < .001), jaw drop (r = -.40, p < .001) and nose wrinkle (r = -.34, p < .001) showed the highest correlations with affective valence (FS). Mouth open (r = .70, p < .001) and jaw drop (r = .51, p < .001) also showed the highest correlation with perceived exertion (RPE), followed by lip pucker (r = .32, p < .001). FS and RPE were highly correlated (r = -.74, p < .001). These results indicate that both FS and RPE were associated with mouth open and jaw drop.

3.4 Multilevel analyses

3.4.1 Predicting FS from facial action whilst controlling for RPE

To identify which facial action best explains variation in FS during an incremental exercise session we calculated separate multilevel models, one for each facial action (left column of Table 2). RPE was included in these models as a control variable with random intercepts and slopes. The model with nose wrinkle as the predictor showed the best fit (AICC = 2770.08, W = 1). Parameter estimates (b = -0.09, p < .001) indicate a linear decrease in FS with increasing nose wrinkle. Adding nose wrinkle as a fixed effect significantly improved the model fit (χ2(1) = 12.37, p < .001) compared to the reduced model (RPE predicting FS). Adding nose wrinkle to this model as a random effect further improved model fit significantly, χ2(3) = 32.89, p < .001. Nose wrinkle explained 15.51% of the within-subject variation in FS. Smile showed the next best fit (AICC = 2780.83, W = 0), with parameter estimates (b = -0.03, p < .001) indicating a linear decrease in FS as smile increased, explaining 3.47% of the within-subject variation in FS. All other facial actions showed an even worse model fit (left column of Table 2).

Table 2. Comparison of multilevel models in which one facial action predicts FS (left column) or RPE (right column).
FS RPE
model K AICC Delta AICC W K AICC Delta AICC W
reduceda 7 2807.19 37.11 0 7 4192.92 57.86 0
mouth open 11 2796.18 26.10 0 11 4135.06 0 0
jaw drop 11 2810.51 40.43 0 11 4184.02 48.96 0
nose wrinkle 11 2770.08 0 1 11 4197.21 62.14 0
lip pucker 11 2805.21 35.13 0 11 4197.64 62.58 0
upper lip raise 8b 2796.12 26.04 0 11 4199.05 63.99 0
lid tighten 11 2788.75 18.67 0 11 4200.74 65.68 0
eye closure 11 2810.74 40.66 0 11 4198.96 63.90 0
smile 11 2780.83 10.75 0 11 4200.38 65.32 0
lip stretch 11 2811.84 41.76 0 11 4197.89 62.82 0
cheek raise 11 2796.27 26.19 0 11 4199.80 64.74 0
lip press 11 2808.03 37.95 0 11 4198.29 63.23 0
dimpler 11 2813.96 43.88 0 11 4197.60 62.53 0
brow furrow 11 2803.04 32.96 0 11 4200.38 65.32 0
eye widen 11 2811.23 41.15 0 11 4188.29 53.23 0
lip corner depressor 8b 2807.60 37.52 0 11 4200.32 65.25 0
lip suck 11 2814.52 44.44 0 11 4201.08 66.02 0
inner brow raise 11 2809.52 39.44 0 11 4200.38 65.32 0
brow raise 11 2813.85 43.77 0 11 4200.81 65.75 0
chin raise 11 2813.14 43.06 0 11 4200.92 65.86 0
smirk 11 2804.41 34.33 0 11 4197.28 62.22 0

FS, Feeling Scale; RPE, Rating of perceived exertion; K, number of parameters; AICC, Akaike information criterion corrected; W, weight of evidence. Models predicted FS (left column) resp. RPE (right column) with each facial action as a fixed and random factor while controlling for the influence of RPE resp. FS.

aThe reduced model describes the respective outcome variable predicted by the respective covariate (left column: RPE predicting FS, right column: FS predicting RPE).

bThe models with upper lip raise and lip corner depressor as a predictor of FS failed to converge. Therefore, a more parsimonious model without the facial action as a random factor was calculated, resulting in a smaller number of parameters (K).

All in all, these results indicate that when controlling for the effects of RPE, nose wrinkle explains a significant proportion of the variation in affective valence and more than any other of the facial actions.

3.4.2 Predicting RPE from facial action whilst controlling for FS

To determine which facial action explains the most variation in RPE during the incremental exercise session we calculated a series of analyses in which RPE was predicted by all different facial actions in separate multilevel models (right column of Table 2). FS was included in each model as a control variable with random intercepts and slopes.

Here mouth open showed the best model fit (AICC = 4135.06, W = 1), followed by jaw drop (AICC = 4184.02, W = 0). Parameter estimates for both mouth open (b = 0.03, p < .001) and jaw drop (b = 0.02, p < .001) indicate a linear increase in RPE with increasing facial action.

Adding mouth open as a fixed effect to the reduced model (FS predicting RPE) significantly improved model fit (χ2 (1) = 65.85, p < .001) and this model explained 16.28% of within-subject variance in RPE. Adding mouth open as a random effect did not further improve model fit, χ2 (3) = 0.15, p = .99.

Adding jaw drop as a fixed effect to the reduced model (FS predicting RPE) significantly improved the model fit (χ2 (1) = 16.84, p < .001) and this model explained 5.37% of within-subject variance in RPE; adding jaw drop as a random effect did not further improve the model fit, χ2 (3) = 0.20, p = .98.

All other facial actions showed a worse model fit (right column of Table 2), none explained more than 2.68% (eye widen) of the within-subject variation in FS.

Taken together these results indicate that mouth open and jaw drop explained significant variation in perceived exertion, and more than all other facial actions. Both facial actions involve movements in the mouth region; jaw drop is the bigger movement, as the whole jaw drops downwards, whereas mouth open only involves a drop of the lower lip [31].

3.4.3 Predicting facial action from FS and RPE

In order to separate the proportion of variance in the above identified facial actions (i.e. mouth open and jaw drop; nose wrinkle) explained by RPE and FS we calculated three separate multilevel models with each of these facial actions as the dependent variable and RPE and FS as time-varying predictors.

Mouth open was significantly predicted by both, RPE (b = 2.53, p = < .001) and FS (b = -1.34, p = .003). Introducing random slopes for RPE significantly improved model fit, χ2 (2) = 7.12, p = .03. RPE accounted for 41.21% of the within-subject variance in mouth open and significantly improved the model compared to a reduced model without RPE as a predictor, χ2 (3) = 99.32, p < .001. FS accounted for 11.42% of the within-subject variance in mouth open and significantly improved model fit compared with the reduced model without FS as a predictor, χ2 (1) = 10.50, p = .001.

Nose wrinkle was significantly predicted by FS (b = -0.32, p = .003), but not by RPE (b = 0.06, p = .13). Introducing random slopes for FS and then RPE in separate steps significantly improved model fit; FS: χ2 (2) = 152.07, p < .001, and RPE: χ2 (3) = 21.78, p < .001. FS explained 21.10% of the within-subject variance in nose wrinkle and significantly improved model fit compared to the reduced model without FS as a predictor, χ2 (4) = 122.11, p < .001.

Jaw drop was significantly predicted by RPE (b = 1.06, p < .001), but not by FS (b = -0.49, p = .12). Introducing random slopes for RPE significantly improved model fit, χ2 (2) = 14.54, p < .001. RPE explained 35.83% of the within-subject variance in jaw drop and significantly improved model fit compared to a reduced model without RPE as a predictor, χ2 (3) = 59.02, p < .001.

4 Discussion

The aim of this study was to examine whether and how single facial actions change with exercise intensity and how they were related to affective valence and perceived exertion. The study is innovative with regard to at least two aspects. First, we used automated facial action analysis technology to observe change in 20 discrete facial areas covering the whole face in a large sample of study participants. Second, the use of multilevel models allowed us to account for differences in change across individuals (nested data structure). We found that both affective valence and perceived exertion were significantly associated with mouth open. After controlling for the influence of RPE, mouth open was no longer significantly associated with affective valence, but the relationship between mouth open and RPE remained significant after controlling for the effect of affective valence. All in all, during exercise nose wrinkle was specifically characteristic of negative affect (i.e., less pleasurable feelings with increasing perceived exertion) and jaw drop of higher RPE. Fig 1 illustrates examples of these relevant facial actions.

4.1 Affective responses at different levels of perceived exertion

Several studies have investigated the change of affective responses during exercise with repeated measurement designs [10]. We think that this makes the separation of the intra- and inter-individual variability in data analysis inevitable. However, to the best of our knowledge there is currently no published study in which trajectories have been analyzed using the according multilevel regression approach. On the basis of dual-mode theory [9] and previous findings we hypothesized that there would be a negative quadratic trend [10, 40] of the affective response with increasing exercise intensity. Multilevel analysis confirmed this hypothesis and also demonstrated that there was high inter-individual variability in reported affective valence during exercise (Fig 2). This demonstrates the, in our view, necessity of using multilevel analysis when examining the decline in affect (more negative) during exercising with increasing intensity.

Previous studies were able to demonstrate the existence of inter-individual variability in affective valence by describing that e.g. 7% of participants reported an increase in affect ratings, 50% no change and 43% a decrease during exercise below the VT [41]. The statistical approach presented here extends this approach and allows to perform research that quantifies the influence of moderators of the exercise intensity-affect relationship to explain inter-individual differences in affective responses to exercise at given intensity level.

4.2 Affective responses and facial action

In our study affective valence was most highly correlated with the facial action mouth open when using simple repeated measures correlations (Table 1). However, affective valence was highly correlated with RPE, which was in turn highly correlated with mouth open. In order to determine what facial actions account for components of variance in specific constructs it is necessary to take into account the multicollinearity of the constructs. We did this by controlling statistically for variance in one construct (e.g. RPE) when analyzing the effect of the other (e.g. affective valence). When the influence of perceived exertion was taken into account, affective valence was most strongly associated with the facial action nose wrinkle (Table 2). This is consistent with previous research showing that nose wrinkling may indicate negative affect. For example, newborns [42] and students [43] respond to aversive stimuli (e.g., a sour liquid [42] or offensive smells [43]) by wrinkling their nose. Perhaps pain is the context most relatable to high-intensity exercise. Studies of pain have identified nose wrinkling as an indicator of the affective dimension of pain [44], which is highly correlated with, but independent from, the sensory dimension [45].

Nose wrinkle has also been specifically associated with the emotion disgust [15]. However, the same facial action has been observed in various other situations (e.g. while learning) [46] and emotional states (e.g., anger) [47] and is not always observed concomitantly with reports of disgust [48]. Nose wrinkle may be indicative of negative affect more generally, rather than of a specific emotional state therefore.

Nose wrinkle explained more variance in affective valence than any other facial action, but given that this is the first study to have examined changes in facial action and affect during the course of an incremental exercise test and was performed with a sample of healthy adults, we suggest limiting the conclusion to the following: nose wrinkle is a facial action indicating negative affect in healthy adults during incremental exercise. To draw more general conclusions, for example, that nose wrinkle is the characteristic expression of negative affect during exercise, further research is needed. It should be demonstrated, for example, that this facial action reliably co-occurs with negative affect and that this co-occurrence prevails across several exercise modalities (e.g., running, resistance training).

4.3 Perceived exertion and facial action

The facial actions that were most highly correlated with perceived exertion, when controlling for the effect of affective valence were mouth open and jaw drop (Table 2). On one hand, this is in line with research showing that activity in the jaw region is correlated with RPE [24]. At first sight, this may not go well with the findings from the fEMG study [22] that suggested that perceived exertion during physical tasks is mainly linked with corrugator muscle activity. It is important to note, however, that fEMG only measures activity in the muscles to which electrodes were attached (apart from noisy crosstalk), and that it cannot capture the dynamics of the whole face [49].

On the other hand, it is worth pointing out that we observed a correlation between RPE and brow furrow (which partly reflects corrugator activity). This correlation was smaller than the two correlations between RPE and jaw drop and mouth open however (Table 1). First and foremost, it must be noted that as physical exertion increases, the exerciser is likely to breath heavier. The change from nose to mouth breathing is certainly to be interpreted against the background that more air can flow faster through the mouth. The observed change in facial action (i.e. increased mouth open and jaw drop) therefore most likely correlated with the physiological need for optimized gas exchange in the working organism. It is therefore particularly important to exploit the advantages of automated facial action analyses of the whole face and discrete facial actions to investigate the covariation of the various facial actions more closely.

4.4 Affective responses, perceived exertion and facial action

Both affective valence and perceived exertion were significantly associated with the facial action mouth open (Table 2). While nose wrinkle was specific in explaining significant amounts of variance in affective valence and jaw drop in perceived exertion, mouth open explained significant amounts of variance in both affective valence and physical exertion (the facial action mouth open is described as “lower lip dropped downwards” in the Affectiva developer portal; jaw drop is “the jaw pulled downwards” with an even wider and further opening of the mouth [31]). This pattern of results might be interesting for the conceptual differentiation of affective valence and perceived physical exertion.

The two concepts, affective valence and physical exertion, are certainly closely linked [12]. This is reflected in our finding that the two are significantly correlated with the same facial action–mouth open. However, when the relationship of affective valence with the facial actions was controlled for the influence of RPE, mouth open explained only 1.19% of the within-subject variance in affective valence; nose wrinkle explained 15.51% on the other hand. These results suggest that mouth opening can be seen as a sign for the physical exertion portion in the experienced affect, whereas nose wrinkle indicates negative affect specifically.

Jaw drop (as the more extreme mouth opening), on the other hand, appeared not to be related to affective valence. Jaw drop could thus be assumed to be the more specific sign for (excessive) perceived exertion. Both the metabolic thresholds, VT and RCP, are related to perceived exertion. They are objective, individualized metabolic indicators of intensity, and are already associated with psychological transitions in dual mode theory [9]. Linking them to transitions in facial actions could be a future prospect and be something like this: While exercising at the VT might mark the transition between nose to (predominantly) mouth breathing and thus also the transition to more mouth open, exercising above the VT might mark a transition to more jaw drop. This kind of intensified breathing might covary with escalating negative affective valence–that is the evolutionary built-in warning signal that homeostatic perturbation is precarious and behavioral adaptation (reduction of physical strain) is necessary [12]. We have not analyzed the dynamics of the different facial actions in our study under this aspect, as this would not have been appropriate because we did not measure physiological markers for exercise intensity. But we suggest that future research should focus on exactly that.

4.5 Context- and individual-specific facial actions

This study can also be seen as a contribution to the current debate on what the face reveals about underlying affective states and whether universal, prototypical emotional facial expressions exist [16]. Our results support the notion that specific facial actions must be associated with affective states in a context- or individual-dependent manner in the first place. For example, smile (AU 6 + AU 12) is typically associated with the emotion “happiness” [50] and with positive affective valence [51]. This does not match our finding, and that of another study in the context of exercise [21], that smile can also be correlated with negative affective valence.

The use of biometric indices of facial action to measure psychological states requires that one takes into consideration that facial action is subject to high intra- and inter-individual variability [16]. Using multilevel analyses allowed us to take this into account. Due to the fact that some people show little or no movement in their faces, aggregational grand mean analyses such as a repeated measures ANOVA (which does not first model individual change) would be biased by this variation. Such analyses treat individual deviation from the grand mean as residual error, leading to the loss of important information about inter-individual differences. By taking individual trajectories into account, multilevel analyses allowed us to separate within-subject variance from between-subject variance and hence to adjust for obvious individual differences in facial action.

4.6 Limitations and recommendations for further research

Among the limitations of our study are the following: Basically we argued that automated facial action analysis could be an alternative for a more unobtrusive measurement of feelings during exercise. It is important not to lose sight of the fact, however, that simply knowing that you are being filmed can of course also change your behavior [52]. Another point is that although this study primarily focused on the correlations between facial actions and ratings of affective valence and perceived exertion, it would be advantageous to determine exercise intensity physiologically at the level of the individual participants in future studies (e.g., by the use of respiratory gas analysis in a pretest). This would have given us more confidence as to whether the majority of our participants have actually reached a state close to physical exhaustion at the end of the exercise protocol. Considering the participants’ average RPE in their maximum watt levels and the comparison of the achieved heart rates with other studies on bicycle ergometers we think this is likely, but we cannot be sure of course. We further suggest that future studies should use more heterogeneous participant samples and a greater variety of sports and exercises to assure higher generalizability of the findings. Different modalities and different exercise intensities might produce specific facial actions. More heterogeneous samples are likely to produce more variance in affective responses, which may lead to further insight into the variation in facial reactions to exercise.

5 Conclusion

We conclude that both affective valence and perceived exertion can be captured using automated facial action analysis. Escalating negative affect during physical exercise may be characterized by nose wrinkling, representing the ‘face of affect’ in this context. The ‘face of exertion’, on the other hand, may be characterized by jaw dropping.

From a practical perspective, these results suggest that observing the face of an exerciser can give instructors important insights into the exerciser’s momentary feelings. Facial actions can tell a lot about how the individual feels during exercise, and instructors could use individual facial cues to monitor instructed exercise intensity; to enhance exercisers’ affective experience during exercise, which, at least for those who are not keen on exercise, is an important variable for maintaining the disliked behavior.

Data Availability

The data underlying the results presented in the study are available from: https://osf.io/z8rv7/

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Piercy KL, Troiano RP, Ballard RM, Carlson SA, Fulton JE, Galuska DA, et al. The physical activity guidelines for Americans. Jama. 2018; 320(19): 2020–2028. 10.1001/jama.2018.14854 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Tudor-Locke C, Brashear MM, Johnson WD, Katzmarzyk PT. Accelerometer profiles of physical activity and inactivity in normal weight, overweight, and obese U.S. men and women. International Journal of Behavioral Nutrition and Physical Activity. 2010; 7: 60 10.1186/1479-5868-7-60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ekkekakis P, Brand R. Affective responses to and automatic affective valuations of physical activity: Fifty years of progress on the seminal question in exercise psychology. Psychology of Sport & Exercise. 2019; 42: 130–137. 10.1016/j.psychsport.2018.12.018 [DOI] [Google Scholar]
  • 4.Conroy DE, Berry TR. Automatic affective evaluations of physical activity. Exercise and Sport Sciences Reviews. 2017; 45(4): 230–237. 10.1249/JES.0000000000000120 . [DOI] [PubMed] [Google Scholar]
  • 5.Brand R, Ekkekakis P. Affective–reflective theory of physical inactivity and exercise. German Journal of Exercise and Sport Research. 2018; 48: 48–58. 10.1007/s12662-017-0477-9 [DOI] [Google Scholar]
  • 6.Russell JA, Feldman Barrett L. Core affect. The Oxford Companion to Emotion and the Affective Sciences; 2009; 104. [Google Scholar]
  • 7.Russell JA. A circumplex model of affect. Journal of Personality and Social Psychology. 1980; 39(6):1161–1178. 10.1037/h0077714 [DOI] [Google Scholar]
  • 8.Rhodes RE, Kates A. Can the affective response to exercise predict future motives and physical activity behavior? A systematic review of published evidence. Annals of Behavioral Medicine. 2015; 49(5): 715–731. 10.1007/s12160-015-9704-5 . [DOI] [PubMed] [Google Scholar]
  • 9.Ekkekakis P. Pleasure and displeasure from the body: Perspectives from exercise. Cognition and Emotion. 2003; 17(2): 213–239. 10.1080/02699930302292 . [DOI] [PubMed] [Google Scholar]
  • 10.Ekkekakis P, Parfitt G, Petruzzello SJ. The pleasure and displeasure people feel when they exercise at different intensities. Sports Medicine. 2011; 41(8): 641–671. 10.2165/11590680-000000000-00000 . [DOI] [PubMed] [Google Scholar]
  • 11.Borg G. Borg's perceived exertion and pain scales. US: Human Kinetics; 1998. [Google Scholar]
  • 12.Hartman ME, Ekkekakis P, Dicks ND, Pettitt RW. Dynamics of pleasure–displeasure at the limit of exercise tolerance: conceptualizing the sense of exertional physical fatigue as an affective response. Journal of Experimental Biology. 2019; 222(3): jeb186585 10.1242/jeb.186585 . [DOI] [PubMed] [Google Scholar]
  • 13.Hardy CJ, Rejeski WJ. Not what, but how one feels: The measurement of affect during exercise. Journal of Sport and Exercise Psychology. 1989; 11: 304–317. 10.1123/jsep.11.3.304 [DOI] [Google Scholar]
  • 14.Borg G. Psychophysical bases of perceived exertion. Medicine and Science in Sports and Exercise. 1982; 14(5): 377–381. [PubMed] [Google Scholar]
  • 15.Lieberman MD, Eisenberger NI, Crockett MJ, Tom SM, Pfeifer JH, Way BM. Putting feelings into words. Psychological Science. 2007;18: 421–428. 10.1111/j.1467-9280.2007.01916.x [DOI] [PubMed] [Google Scholar]
  • 16.Barrett LF, Adolphs R, Marsella S, Martinez AM, Pollak SD. Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest. 2019; 20: 1–68. 10.1177/1529100619832930 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Overbeek TJ, van Boxtel A, Westerink JH. Respiratory sinus arrhythmia responses to cognitive tasks: effects of task factors and RSA indices. Biological Psychology. 2014; 99: 1–14. 10.1016/j.biopsycho.2014.02.006 [DOI] [PubMed] [Google Scholar]
  • 18.Stekelenburg JJ, van Boxtel A. Inhibition of pericranial muscle activity, respiration, and heart rate enhances auditory sensitivity. Psychophysiology. 2001; 38: 629–641. 10.1111/1469-8986.3840629 [DOI] [PubMed] [Google Scholar]
  • 19.Ekman P, Friesen WV. Facial action coding systems Palo Alto: Consulting Psychologists Press; 1978. [Google Scholar]
  • 20.Ekman P, Friesen WV, Hager JC. Facial action coding system: Manual and Investigator’s Guide. Salt Lake City: Research Nexus; 2002. [Google Scholar]
  • 21.Uchida MC, Carvalho R, Tessutti VD, Bacurau RFP, Coelho-Júnior HJ, Capelo LP, et al. Identification of muscle fatigue by tracking facial expressions. PLoS ONE. 2018; 13(12): e0208834 10.1371/journal.pone.0208834 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.de Morree HM, Marcora SM. Frowning muscle activity and perception of effort during constant-workload cycling. European Journal of Applied Physiology. 2012; 112(5): 1967–1972. 10.1007/s00421-011-2138-2 [DOI] [PubMed] [Google Scholar]
  • 23.de Morree HM, Marcora SM. The face of effort: frowning muscle activity reflects effort during a physical task. Biological Psychology. 2010; 85(3): 377–382. 10.1016/j.biopsycho.2010.08.009 . [DOI] [PubMed] [Google Scholar]
  • 24.Huang DH, Chou SW, Chen YL, Chiou WK. Frowning and jaw clenching muscle activity reflects the perception of effort during incremental workload cycling. Journal of Sports Science & Medicine. 2014; 13(4): 921–928. [PMC free article] [PubMed] [Google Scholar]
  • 25.Lien JJJ, Kanade T, Cohn JF, Li CC. Detection, tracking, and classification of action units in facial expression. Robotics and Autonomous Systems. 2000; 31(3): 131–146. 10.1016/S0921-8890(99)00103-7 [DOI] [Google Scholar]
  • 26.Cohn JF, Zlochower AJ, Lien J, Kanade T. Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. Psychophysiology. 1999; 36: 35–43. 10.1017/s0048577299971184 . [DOI] [PubMed] [Google Scholar]
  • 27.Miles KH, Clark B, Périard JD, Goecke R, Thompson KG. Facial feature tracking: a psychophysiological measure to assess exercise intensity? Journal of Sports Sciences. 2018; 36(8): 934–941. 10.1080/02640414.2017.1346275 . [DOI] [PubMed] [Google Scholar]
  • 28.McDuff D, Mahmoud A, Mavadati M, Amr M, Turcot J, Kaliouby RE. AFFDEX SDK: A cross-platform real-time multi-face expression recognition toolkit CHI 2016: Conference extended Abstracts on Human Factors in Computing Systems, 2016. May 07–12; San Jose, California: ACM; 2016. p. 3723–3726. [Google Scholar]
  • 29.Viola P, Jones MJ. Robust real-time face detection. International Journal of Computer Vision. 2004; 57(2): 137–154. 10.1023/B:VISI.0000013087.49260.fb [DOI] [Google Scholar]
  • 30.Affectiva. Emotion SDK. 2018 [cited 2 September 2019]. In: Product [Internet]. Available from: https://www.affectiva.com/product/emotion-sdk/
  • 31.Affectiva. Metrics. 2019 [cited 2 December 2019]. In: Developer [Internet]. Available from: https://developer.affectiva.com/metrics/.
  • 32.Trappe H. J., & Löllgen H. (2000). Leitlinien zur Ergometrie. Herausgegeben vom Vorstand der Deutschen Gesellschaft für Kardiologie–Herz-und Kreislaufforschung. Z Kardiol, 89, 821–837. 10.1007/s003920070190 [DOI] [PubMed] [Google Scholar]
  • 33.Kulke L, Feyerabend D, Schacht A. Comparing the Affectiva iMotions Facial Expression Analysis Software with EMG.for identifying facial expression of emotion. PsyArXiv [preprint]. 2018. [cited 2019 Aug 20]. Available from: https://psyarxiv.com/6c58y/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Long JD. Longitudinal data analysis for the behavioral sciences using R Thousand Oaks, CA: Sage; 2012. [Google Scholar]
  • 35.Bakdash JZ, Marusich LR. Repeated Measures Correlation. Frontiers in psychology. 2017; 8: 456 10.3389/fpsyg.2017.00456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Singer JD, Willett JB, Willett JB. Applied longitudinal data analysis: Modeling change and event occurrence New York: Oxford University Press; 2003. [Google Scholar]
  • 37.Pinheiro J, Bates D, DebRoy S, Sarkar D,. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1–139 [internet]. 2019. [cited 2019 Sep 2]. Available from: https://CRAN.R-project.org/package=nlme. [Google Scholar]
  • 38.R Core Team R. A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing [internet]. 2019. [cited 2019 Sep 2]. Available from: https://r-project.org [Google Scholar]
  • 39.Roecker K, Striege H, Dickhuth HH. Heart-rate recommendations: transfer between running and cycling exercise? International Journal of Sports Medicine. 2003; 24: 173–178. 10.1055/s-2003-39087 [DOI] [PubMed] [Google Scholar]
  • 40.Alvarez-Alvarado S, Chow GM, Gabana NT, Hickner RC, & Tenenbaum G. Interplay Between Workload and Functional Perceptual–Cognitive–Affective Responses: An Inclusive Model. Journal of Sport and Exercise Psychology. 2019; 41(2), 107–118. 10.1123/jsep.2018-0336 [DOI] [PubMed] [Google Scholar]
  • 41.Ekkekakis P, Hall EE, Petruzzello SJ. The relationship between exercise intensity and affective responses demystified: to crack the 40-year-old nut, replace the 40-year-old nutcracker!. Annals of Behavioral Medicine. 2008; 35(2): 136–149. 10.1007/s12160-008-9025-z . [DOI] [PubMed] [Google Scholar]
  • 42.Ganchrow JR, Steiner JE, Daher M. Neonatal facial expressions in response to different qualities and intensities of gustatory stimuli. Infant Behavior and Development. 1983; 6(4): 473–484. 10.1016/S0163-6383(83)90301-6 [DOI] [Google Scholar]
  • 43.Rozin P, Lowery L, Ebert R. Varieties of disgust faces and the structure of disgust. Journal of personality and social psychology. 1994;. 66(5): 870 10.1037//0022-3514.66.5.870 . [DOI] [PubMed] [Google Scholar]
  • 44.Kunz M, Lautenbacher S, LeBlanc N, Rainville P. Are both the sensory and the affective dimensions of pain encoded in the face? Pain. 2012; 153(2): 350–358. 10.1016/j.pain.2011.10.027 . [DOI] [PubMed] [Google Scholar]
  • 45.Rainville P, Duncan GH, Price DD, Carrier B, Bushnell MC. Pain affect encoded in human anterior cingulate but not somatosensory cortex. Science. 1997; 277(5328): 968–971. 10.1126/science.277.5328.968 . [DOI] [PubMed] [Google Scholar]
  • 46.Vail AK, Grafsgaard JF, Boyer KE, Wiebe EN, Lester JC. Gender differences in facial expressions of affect during learning UMAP 2016: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, 2016 July 13–17; Halifax, Nova Scotia: ACM; 2016. p. 65–73. [Google Scholar]
  • 47.Camras LA, Oster H, Bakeman R, Meng Z, Ujiie T, Campos JJ. Do infants show distinct negative facial expressions for fear and anger? Emotional expression in 11‐month‐old European American, Chinese, and Japanese infants. Infancy. 2007; 11(2): 131–155. 10.1111/j.1532-7078.2007.tb00219.x [DOI] [Google Scholar]
  • 48.Bennett DS, Bendersky M, Lewis M. Facial expressivity at 4 months: A context by expression analysis. Infancy. 2002; 3: 97–113. 10.1207/S15327078IN0301_5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.van Boxtel A. Facial EMG as a tool for inferring affective states In: Spink A, Grieco F, Krips O, Loijens L, Noldus L, Zimmerman P. Proceedings of Measuring Behaviour 2010. 7th International Conference on Methods and Techniques in Behavioral Research, 2010 August 24–27; Eindhoven, Netherlands. Noldus; 2010.p. 104–108. [Google Scholar]
  • 50.Ekman P, Freisen WV, Ancoli S. Facial signs of emotional experience. Journal of personality and social psychology. 1980; 39(6): 1125–1134. 10.1037/h0077722 [DOI] [Google Scholar]
  • 51.Lang PJ, Greenwald MK, Bradley MM, Hamm AO. Looking at pictures: Affective, facial, visceral, and behavioral reactions. Psychophysiology. 1993; 30(3): 261–273. 10.1111/j.1469-8986.1993.tb03352.x . [DOI] [PubMed] [Google Scholar]
  • 52.Philippen PB, Bakker FC, Oudejans RR, Canal Bruland R. The effects of smiling and frowning on perceived affect and exertion while physically active. Journal of Sport Behavior. 2012; 35(3): 337–352. [Google Scholar]

Decision Letter 0

Dominic Micklewright

22 Nov 2019

PONE-D-19-26758

Affect and exertion during aerobic exercise: Examining changes using automated facial expression analysis and continuous experiential self-report.

PLOS ONE

Dear Mrs. Timme,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Jan 06 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Dominic Micklewright, PhD CPsychol PFHEA FBASES FACSM

Academic Editor

PLOS ONE

Journal requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

* In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for inviting me to review ‘Affect and exertion during aerobic exercise: Examining changes using automated facial expression analysis and continuous experiential self-report.’ The primary aim of this study was to examine continuous changes in single facial actions of the whole face at various exercise intensities. The authors concluded that affective valence and perceived exertion can be captured using automated facial action analysis.

This study uses novel methods and technology to assess changes in facial actions during exercise. I commend the authors on doing this innovative project and obtaining substantial participant numbers. Overall, I found the paper quite hard to read due to superlative words and lack of information provided about the methods. At present, the work is not accessible to non-specialists because of this, and improvements need to be made to make it more comprehensible. I think this will also make the implications of this project more obvious to the reader.

I have provided specific line-by-line comments, as well as two general comments below:

- Please provide the raw data as supplementary material (not video but the numbers produced from the video analysis (i.e. percentage changes in different facial actions)).

- Please provide more detail about the methods used. It is a good opportunity to provide a methods section that others can use to do further research, and in a journal such as PLOS ONE I would expect some more detail on this. At present your methods could not be replicated by the reader in future research.

Abstract

- Page 2, Line 2-6 – This sentence is quite hard to read and long. Clarity of your point may be made better by reducing words or splitting into two sentences.

- Page 2, Line 21 – Not sure about using (e.g.) in brackets as part of a flowing sentence. Maybe find a better way to write this?

Introduction

- Page 4, Line 2-6 – Why have you chosen ventilatory threshold and respiratory compensation threshold for these? Also, both need a reference.

- Page 5, Line 9 – I am not sure I agree with the comment that biometric data is more unobtrusive. I think videoing someone’s face and facial analysis in general is just as obtrusive; it is just different than techniques researchers have traditionally used before. I think it is more powerful, but not more unobtrusive. Would like to see this statement rectified.

- Page 6, Line 7 – FEMG should be fEMG

- Page 6, Line 11 – I don’t think you need to add an abbreviation for facial feature tracking as FFT. It wasn’t done in the original paper, isn’t commonly abbreviated and you don’t talk about it much after this.

- Page 7, Line 18 – That first statement is quite strong, and I believe there are more than two processes that have been used. Maybe tone it down a bit and just say, “the most commonly used are…”

- Throughout introduction (and the whole paper), you use facial action, facial expression and facial configurations interchangeably. I would just choose one, in your case it seems to be facial actions (especially considering you define why you are using this, which is good), and be consistent throughout.

- Generally, I am not sure about the structure of the introduction. The reader must read through nearly eight pages of background information before you get to the project aim. I think that some of this background content is of minor consequence to the rest of the paper, quite repetitive and could be made significantly briefer. You want the reader to get straight into the sections that are really related to the content of your paper, like parts of 1.3 and all of 1.4. For example, the following sections don’t really relate strongly to the paper and could be removed or shortened significantly:

o Page 3, Section 1 – except for the definition of affect on lines 8-12

o Page 4, Section 1.1

o Page 6, Paragraph 4

o Page 7, Paragraph 1 and 2

Method and materials

- Page 10, Line 1 – the use of the term modern is unnecessary. It is just software.

- Page 10, Line 4 – what do you define as fit? Do you have any information on training history or sports played? Did you perform any baseline cycling assessments?

- Page 10, Section 2.2 – Can you report participant height and weight?

- Page 10, Line 7 – Why was the video quality poor?

- Page 10, Line 8 – What happened in the environment to cause this?

- Page 10, Line 10 – “after one minute of baseline measurement” Baseline measurement of what?

- Page 10, Line 10-12 - A reference for the power output selection would be good.

- Page 10, Line 13-15 – “If a participant reached 300 watts the final phase involved pedaling at this level for two minutes, thus the maximum duration of the exercise was 26 minutes.” Why did you do this? How many participants completed the final stage without reaching volitional exhaustion?

- Page 10, Line 17-19 – Why did only half of participants have heart rate monitors?

- Page 11, Section 2.3.3 – I think this would be a really good section for a diagram showing the points on the face or a technical schematic of how the software works.

- Page 11, Section 2.3.3 – I think it would be good to have at least two cameras to pick-up different face angles. Is there a reason why you didn’t implement this? Do you think this affected the data you obtained during higher intensities when the head can drop? I think it is also important to know the distance the camera was to the face, an idea of head to camera angle, if it was adjusted for different participants height on the bike etc.

- Page 12, Section 2.3.3 – I think you need to describe more about the time each video frame was analysed for (i.e. milliseconds, seconds, batched into bigger 10 second blocks). At present it is not clear exactly how this was done.

- Page 12, Section 2.3.3 – What is threshold analysis?

- Page 12, Section 2.4 – So the participants knew the purpose of the video camera? I have a bit of an issue with this as I think this substantially changes how a person reacts to the video camera. Please address this as a limitation in your discussion. Might also be worth adding a reference for this manipulation of facial action when the participant knows their expression is being monitored. For example, this reference: Philippen P, Bakker F, Oudejans R, Canal-Bruland R. The Effects of Smiling and Frowning on Perceived Affect and Exertion While Physically Active. J Sport Behav. 2012;35(3):337-352.

Results

- Page 14, Line 13 – You are introducing some commentary and interpretation into your results, which I would suggest removing or putting in your discussion

- Page 14, Lines 17-19 - Based upon the fact that HR was only recorded for half participants and they only reached a mean HR of 174.61 and that RPE was 19 with an SD over 1 which suggests a large number of participants reported RPE well below maximum, I really don’t think you can say that for this age group they reached maximum capacity. Please alter statement.

- The image quality and clarity of Figure 1 is poor and does not convey the point that I can see you want to get across very well. Maybe think of an alternative way to represent this data?

Discussion and conclusion

- Page 23, Line 11 – you refer to the “aerobic exercise” here. This I the first time you refer to it as this, which is a bit odd. Maybe better to keep it consistent to what you said in the introduction, such as an incremental test.

- Page 23, Line 13 –you use e.g. mid-sentence again. Will flow better if written as for example or something similar in the sentence structure.

- Page 26, Section 4.6 – Generally, you don’t report any of the limitations that this study appears to have in your limitations section. Some of the main ones I have noted above are the fact participants knew the purpose of the video cameras so could change expressions, you didn’t obtain HR for all participants so cannot determine if they did actually reach the intensities you aimed for, and the final exercise intensity was absolute and fixed to 300W. Please rectify.

- Page 26, Lines 7-9 – This is a strong statement. I would just tone it down a bit…

- Page 26, Lines 10-11 – I like this section and agree with what you are saying, but I don’t feel the methods section you have provided in this paper would allow this research to be expanded upon by others. I think an improved methods section could be the strength of this paper

- General comment throughout discussion and conclusion– You discuss that mouth open and draw drop are the “face of exertion”. I don’t disagree with this statement per say, and yes, it is shown in your results. However, I feel you need to more strongly note that as RPE increases, someone is likely to be breathing heavier, and therefore the jaw drops and mouth opens. At present you don’t really discuss this in any detail, which I feel is an oversight in the interpretation of your findings.

Reviewer #2: Review of manuscript PONE-D-19_26758 submitted to PLOS ONE

S. Timme, R. Brand. Affect and exertion during aerobic exercise: Examining changes using automated facial expression analysis and continuous experiential self-report.

In this study, facial expressions during aerobic exercise were recorded at fixed time intervals using an algorithm to detect specific facial actions and identify them as actions units defined within the Facial Action Coding System (FACS). These expressions were related to subjective ratings of positive or negative affective state as well as ratings of perceived exertion. Covariations between facial expressions and subjective ratings were analyzed using multilevel regression or trend analysis, allowing investigation of these relationships at the level of individual participants.

The rationale behind this study is clear, the statistical analyses are sophisticated, and the manuscript is well written. I agree with the authors that a better insight in the exerciser's affective state and subjective feelings of exertion may contribute to stimulating people to engage in further pleasurable physical exercise. Although the results of this study seem clear, I have a few comments, questions, or (generally minor) concerns.

1. On pp. 6-7, and in the remainder of this manuscript, emphasis is laid on facial actions as indices of affective states or specific emotions. However, facial actions may also be related to perceptual, motivational, attentional, or cognitive processes (see, for example, studies mentioned in reference 58 of the current manuscript; see also Overbeek et al., 2014, Stekelenburg & van Boxtel, 2001). Although in the current study a lot of different facial actions were measured it seems as if the authors a priori consider these actions as indices of emotional processes whereas within FACS individual actions units strictly do not refer to specific emotions.

2. Although the current software used for analyzing facial actions indeed detected elementary facial actions as indicated in Table 1, it is remarkable that one of these actions ("smile") does not represent an elementary action but a combination of actions (AU6, check raiser; AU12, lip corner puller). This combination is generally interpreted as signifying a smile. I find this confusing since emotions are strictly not measured in the current study. On p. 25, it is said that in this study smile was associated with a negative affective valence. In line 12 on this page, it is erroneously suggested that the detection of AU6 is synonymous with the occurrence of a smile.

3. Figure 2 illustrates relevant facial actions which were observed during the current physical exertion task. I find these examples somewhat confusing since for the reader it may be difficult to associate them with an aerobic exercise task. But particularly the illustration of jaw drop is confusing since this facial expression also depicts AU's 6 and 12 which are generally considered to represent happiness, suggesting that this person is overtly laughing. I have shown this picture to several colleagues asking them to indicate what they saw. They reported to see an overtly laughing person.

4. On p. 23, it is defended that nose wrinkle need not be specifically related to disgust and that it may also be indicative of other emotions. However, in this respect studies are mentioned which have been performed in infants. I am afraid that facial expressions of infants cannot directly be compared with those of adults.

5. Later on this page, it is concluded that mouth open and jaw drop are highly correlated with perceived exertion but that this does not agree with results from an EMG study which would suggest that perceived exertion during physical tasks is mainly linked with corrugator activity. This brings me to the general question whether discrepancies between different studies may (at least partially) be related to studying either aerobic or anaerobic exercise. This distinction is not really discussed in this manuscript. When suggesting on p. 26, third paragraph, that future studies should include a wider ranger of sports to assure a higher generalizability of the current results, I wonder whether types of anaerobic exercise shouldn't also be included.

Minor points

- P. 7, line 3: "action" > "actions"

- P. 9, line 10: "action" > "actions"

- P. 11, line 11: "Logitech HD Pro C920" > "Logitech HD Pro C920 webcam"

- P. 19, footnote to Table 2: "in less number of parameters" > "in a smaller number of parameters"

References

Overbeek, T.J.M, van Boxtel, A., & Westerink, J.H.D.M. (2014). Respiratory sinus arrhythmia responses to cognitive tasks: Effects of task factors and RSA indices. Biological Psychology, 99, 1-14.

Stekelenburg, J.J., & van Boxtel, A. (2001). Inhibition of pericranial muscle activity, respiration, and heart rate enhances auditory sensitivity. Psychophysiology, 38, 629-641.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Feb 11;15(2):e0228739. doi: 10.1371/journal.pone.0228739.r002

Author response to Decision Letter 0


8 Jan 2020

Reviewer 1

Reviewer #1: Thank you for inviting me to review ‘Affect and exertion during aerobic exercise: Examining changes using automated facial expression analysis and continuous experiential self-report.’ The primary aim of this study was to examine continuous changes in single facial actions of the whole face at various exercise intensities. The authors concluded that affective valence and perceived exertion can be captured using automated facial action analysis.

This study uses novel methods and technology to assess changes in facial actions during exercise. I commend the authors on doing this innovative project and obtaining substantial participant numbers. Overall, I found the paper quite hard to read due to superlative words and lack of information provided about the methods. At present, the work is not accessible to non-specialists because of this, and improvements need to be made to make it more comprehensible. I think this will also make the implications of this project more obvious to the reader.

Thank you very much for the overall positive evaluation and especially for your specific advice. We have tried to consider and revise all passages of the manuscript accordingly.

I have provided specific line-by-line comments, as well as two general comments below:

- Please provide the raw data as supplementary material (not video but the numbers produced from the video analysis (i.e. percentage changes in different facial actions)).

We made our data available on OSF (https://osf.io/z8rv7/) as requested by PlosOne.

- Please provide more detail about the methods used. It is a good opportunity to provide a methods section that others can use to do further research, and in a journal such as PLOS ONE I would expect some more detail on this. At present your methods could not be replicated by the reader in future research.

Many thanks for this advice. We tried to thoroughly revise the text to this effect (and present all changes made point by point below).

Abstract

Page 2, Line 2-6 – This sentence is quite hard to read and long. Clarity of your point may be made better by reducing words or splitting into two sentences.

We split the sentence into two.

Page 2, Line 21 – Not sure about using (e.g.) in brackets as part of a flowing sentence. Maybe find a better way to write this?

We corrected this.

Introduction

- Page 4, Line 2-6 – Why have you chosen ventilatory threshold and respiratory compensation threshold for these? Also, both need a reference.

We chose the ventilatory and respiratory compensation threshold because the two are explicit aspects of dual-mode theory (Ekkekakis, 2003). Thank you for pointing out that this was not yet clear enough in the manuscript. We tried to improve this section by describing the theoretical claims concerning the ventilatory and respiratory compensation threshold in more detail and added a reference (review for empirical evidence).

Page 3:

“Dual-mode theory [9] explains how feelings during exercise are moderated by exercise intensity. According to the theory and supported by evidence [10], the affective response to moderate intensity exercise (below ventilatory threshold; VT) is mostly positive, but affective responses to heavy intensity exercise (approaching the VT) are more variable”

- Page 5, Line 9 – I am not sure I agree with the comment that biometric data is more unobtrusive. I think videoing someone’s face and facial analysis in general is just as obtrusive; it is just different than techniques researchers have traditionally used before. I think it is more powerful, but not more unobtrusive. Would like to see this statement rectified.

We agree that videoing someone’s face can also be obtrusive in a more general sense. Therefore, we now specified what exactly we mean by “unobtrusive” (i.e., by using this method ongoing physical exercise will not be interrupted e.g. by asking questions). We have added a respective passage in the limitations section, and revised all other passages in the manuscript were we had previously used the word “unobtrusive”.

e.g., Page 5:

“Monitoring changes in biometric data avoids these interruptions and can thereby provide an alternative way to learn about the feelings that occur during exercise.“

- Page 6, Line 7 – FEMG should be fEMG

We corrected this.

- Page 6, Line 11 – I don’t think you need to add an abbreviation for facial feature tracking as FFT. It wasn’t done in the original paper, isn’t commonly abbreviated and you don’t talk about it much after this.

We deleted the “FFT” throughout the manuscript and used the unabbreviated expression.

- Page 7, Line 18 – That first statement is quite strong, and I believe there are more than two processes that have been used. Maybe tone it down a bit and just say, “the most commonly used are…”

We toned the statement down by writing that “the majority of studies conducted so far has quantifies facial action by using…”

- Throughout introduction (and the whole paper), you use facial action, facial expression and facial configurations interchangeably. I would just choose one, in your case it seems to be facial actions (especially considering you define why you are using this, which is good), and be consistent throughout.

Thank you for pointing that out. We now consistently refer only to "facial actions" and no longer to “facial expression”.

- Generally, I am not sure about the structure of the introduction. The reader must read through nearly eight pages of background information before you get to the project aim. I think that some of this background content is of minor consequence to the rest of the paper, quite repetitive and could be made significantly briefer. You want the reader to get straight into the sections that are really related to the content of your paper, like parts of 1.3 and all of 1.4. For example, the following sections don’t really relate strongly to the paper and could be removed or shortened significantly:

o Page 3, Section 1 – except for the definition of affect on lines 8-12

o Page 4, Section 1.1

o Page 6, Paragraph 4

o Page 7, Paragraph 1 and 2

Many thanks for this advice. We agree. We have restructured the line of argument and completely revised the introductory part of the proposed article. In particular, we deleted all unnecessary and redundant passages (and we devoted special attention to the text passages indicated by the reviewer above). We hope that the reviewer can understand how we proceed now: We try to efficiently guide the reader to the empirical part of our study, but to adequately tie in with the psychological literature from that research area (exercise psychology, affective responses to exercise) at the same time.

Method and materials

- Page 10, Line 1 – the use of the term modern is unnecessary. It is just software.

We removed the term “modern”.

- Page 10, Line 4 – what do you define as fit? Do you have any information on training history or sports played? Did you perform any baseline cycling assessments?

We no longer use the term "fit" and instead try to characterize the sample more accurately in other words. We added information about the self-reported weekly physical activity (e.g.; “The group average of (self-reported) at least moderate physical activity was 337 minutes per week.” )

Unfortunately, more than the information which is now reported in the manuscript is not available to us, and we have not made a cycling basement assessment. Anticipating the concerns of the reviewer, we have added restrictive comments in different parts of the text (in the methods section and in the discussion section).

For example page 26:

“it would be advantageous to determine exercise intensity physiologically at the level of the individual participants in future studies (e.g., by the use of respiratory gas analysis in a pretest).”

- Page 10, Section 2.2 – Can you report participant height and weight?

Unfortunately, we did not measure height and weight of the participants.

- Page 10, Line 7 – Why was the video quality poor?

The video quality was poor when the face detection algorithm could not identify a face, thereby leading to a high number of missing values. Participants with more than 10% missing values were excluded from the analysis. We included this more detailed information now in the methods section.

- Page 10, Line 8 – What happened in the environment to cause this?

Other people, unrelated to the study, entered the room, and loud music from a gym next door has disturbed a handful of measurements. We included these details now in the method section.

Page 9:

(n = 6; more than 10% missing values because the software did not detect the face) or due to disturbing external circumstances (n = 7; people entering the room unexpectedly; loud music played in the gym).

- Page 10, Line 10 – “after one minute of baseline measurement” Baseline measurement of what?

Baseline measurement of heart rate. We have included this specification in the article now.

Page 9:

“A Shimmer3 ECG device with a sampling rate of 512 Hz was used for that. These participants started with a one-minute heart rate baseline measurement before the exercise.”

- Page 10, Line 10-12 - A reference for the power output selection would be good.

We selected this power output according to the recommendation of Trappe & Löllgen (2000). This reference is included in the text now.

- Page 10, Line 13-15 – “If a participant reached 300 watts the final phase involved pedaling at this level for two minutes, thus the maximum duration of the exercise was 26 minutes.” Why did you do this? How many participants completed the final stage without reaching volitional exhaustion?

Including an upper time limit was required by the ethical committee that finally approved this study, because the committee members thought that this would help to protect the physical safety of the participants. 11 participants reached the final stage of 300 watts, with n = 7 reporting an RPE of 20 and n = 4 an RPE of 19 at the end. According to their self-reports, all participants have reached volitional exhaustion; but of course we cannot be sure that all of them (as hoped for) have worked up to close to the state of actual physical exhaustion. We have revised the manuscript in several places in order to describe this more carefully now. For example (page 26):

“it would be advantageous to determine exercise intensity physiologically at the level of the individual participants in future studies (e.g., by the use of respiratory gas analysis in a pretest). This would have given us more confidence as to whether the majority of our participants have actually reached a state close to physical exhaustion at the end of the exercise protocol.”

- Page 10, Line 17-19 – Why did only half of participants have heart rate monitors?

You are right, that it could have been beneficial to record heart data for all participants. Unfortunately, we can not change this anymore. We decided beforehand to record heart rate only as a control variable, to show that the exercise protocol elicited an increase in heart rate and that the heart rate data corresponded with the RPE ratings (r = .82). We now try to make this clearer in the respective passage in the manuscript.

“For a plausibility check whether self-declared physical exhaustion would be at least close to the participants’ physiological state heart rate during exercise was monitored in about half of the participants (n = 54). (...) The correlation between heart rate and RPE was very high, r = .82, p < .001. “

- Page 11, Section 2.3.3 – I think this would be a really good section for a diagram showing the points on the face or a technical schematic of how the software works.

We agree with the reviewer that a figure depicting Affectiva-landmarks on the face is helpful for a better understanding of how the software works. For this purpose, we replaced the pictures used in Figure 2 (now Figure 1), which now illustrates the algorithm’s identified and tracked landmarks as well as prototypical configurations of the facial actions “Nose Wrinkle”, “Jaw drop” and “Mouth Open” in an exercise-related context.

- Page 11, Section 2.3.3 – I think it would be good to have at least two cameras to pick-up different face angles. Is there a reason why you didn’t implement this? Do you think this affected the data you obtained during higher intensities when the head can drop? I think it is also important to know the distance the camera was to the face, an idea of head to camera angle, if it was adjusted for different participants height on the bike etc.

The face detection algorithm (Affectiva) as implemented in the iMotions biometric platform is technically restricted (but in the same way: optimized) to process signals from only one video source (camera). We agree with the reviewer that two cameras might lead to even better results. However, our study illustrates that using only one camera works sufficiently well: Only 6 participants had to be excluded from our study data set due to missing values (i.e. suboptimal face recognition and facial action analysis, according the algorithm’s error reports).

We agree that some more information about the specific setup would be beneficial and added this specification to the methods section (p. 10):

“The camera was mounted 0.4 m in front of the face with an angle of 20 degrees from below, as it recommended by iMotions.”

- Page 12, Section 2.3.3 – I think you need to describe more about the time each video frame was analysed for (i.e. milliseconds, seconds, batched into bigger 10 second blocks). At present it is not clear exactly how this was done.

We have revised section 2.3.3 and described more about the time each video was analyzed for.

“The algorithm performs analysis and classification at frame rate. This means that at a time resolution of 30 picture frames per video second (30 fps), our analyses were based on 1.800 data points per facial action per 1 minute.”

Frame-by-frame raw data was aggregated by using threshold analysis, which is described in more detail in response to the next question.

- Page 12, Section 2.3.3 – What is threshold analysis?

Thank you for pointing out that this point had not been sufficiently clear in the manuscript so far. iMotions identifies, tracks and analyzes facial landmarks continuously at picture/video frame-level (i.e., based on this frame by frame analysis iMotions provides facial action scores by threshold analysis). We tried to improve the respective sections (on the whole process and threshold analysis specifically) in the manuscript as follows:

“Each data point that Affectiva Affdex provides for a facial action is the probability of presence (0 - 100%) of that facial action. We aggregated these raw data, for each facial action separately, to facial actions scores (time percent scores) indicating how long during a watt level on the ergometer (i.e., within 2 minutes) a facial action was detected with the value 10 or higher. For example, a facial action score of 0 indicates that the facial action was not present during the watt level, whereas a score of 100 indicates that it was present all the time during that watt level.”

- Page 12, Section 2.4 – So the participants knew the purpose of the video camera? I have a bit of an issue with this as I think this substantially changes how a person reacts to the video camera. Please address this as a limitation in your discussion. Might also be worth adding a reference for this manipulation of facial action when the participant knows their expression is being monitored. For example, this reference: Philippen P, Bakker F, Oudejans R, Canal-Bruland R. The Effects of Smiling and Frowning on Perceived Affect and Exertion While Physically Active. J Sport Behav. 2012;35(3):337-352.

According to ethical reasons, the participants were informed that their faces will be filmed during the study. After the experiment, we asked them whether the camera had affected their behavior. Participants responded to this question on a scale from 1 (“do not agree at all”) to 5 (“fully agree”) with a mean of 1.57 (SD = 0.69), indicating that the camera did not influence their behavior substantially. We did not include this information in the manuscript because the participants could still have been biased by the presence of a camera. Therefore, of course, we agree with the reviewer’s comment that video observation can sometimes influence behavior without the participants noticing it themselves.

In order to address the reviewer’s comment we have added the following sentence with the mentioned reference in the limitations section of the manuscript and referenced the suggested study (thank you for bringing it to our attention):

Page 25:

“It is important not to lose sight of the fact, however, that simply knowing that you are being filmed can of course also change your behavior [52]. “

Results

- Page 14, Line 13 – You are introducing some commentary and interpretation into your results, which I would suggest removing or putting in your discussion

We agree with the reviewer that the interpretation of study results must be part of the discussion. We decided to delete this interpretation (that differences in the reached intensity stage were probably due to differences in physical fitness) from the results section.

- Page 14, Lines 17-19 - Based upon the fact that HR was only recorded for half participants and they only reached a mean HR of 174.61 and that RPE was 19 with an SD over 1 which suggests a large number of participants reported RPE well below maximum, I really don’t think you can say that for this age group they reached maximum capacity. Please alter statement.

It is indeed possible that not all of the participants have reached maximum capacity, and we have therefore included a note on this in the limitations section (p. 26). However, with respect to previous findings, we still believe that our data and findings are plausible: For example, Roecker, Striegel and Dickhuth (2003) found an average HRmax of 179.5 ± 20.2 for 129 recreational sports participants, which is close to our reported average HRmax of 174.6 ± 16.1 (we incorporated this reference in the manuscript). Importantly, subjects would usually reach lower maximal heart rates on cycling ergometers compared to when they are asked to run on a treadmill. We have included this reference in the manuscript.

- The image quality and clarity of Figure 1 is poor and does not convey the point that I can see you want to get across very well. Maybe think of an alternative way to represent this data?

We agree that the image quality and clarity of Figure 1 (now Figure 2) is poor in the submission pdf. PLOS One informed us that the compiled submission PDF only includes low-resolution preview images, to allow you to download the entire submission as quickly as possible. By clicking on the link at the top of each preview page, it is possible to download the high-resolution version of each figure. We submitted our figures according to journal requirements and double-checked with the PACE-tool (as suggested on the journal homepage).

We prefer to keep Figure 1 (now Figure 2) in the manuscript. Its purpose is to illustrate the considerable intra- and inter-individual variability in the participants’ affective responses to increasing exercise intensity. Including this figure underpins the (in our view) necessity of using multilevel regression models for analyzing data of this kind (previous studies had used standard regression analysis that ignores individual differences by aggregating individual slopes). In order to strengthen this point, we included the following passage in the results section with regard to Figure 2 (p. 14):

Fig 2 illustrates the finding, which can be made particularly obvious by means of multilevel regression analysis: The high interindividual variability in the decrease of affective valence (more negative) under increasing perceived exhaustion is striking.

Discussion and conclusion

- Page 23, Line 11 – you refer to the “aerobic exercise” here. This I the first time you refer to it as this, which is a bit odd. Maybe better to keep it consistent to what you said in the introduction, such as an incremental test.

We corrected this term according to your recommendation as “incremental exercise test”.

- Page 23, Line 13 –you use e.g. mid-sentence again. Will flow better if written as for example or something similar in the sentence structure.

We corrected this and used “for example” instead.

- Page 26, Section 4.6 – Generally, you don’t report any of the limitations that this study appears to have in your limitations section. Some of the main ones I have noted above are the fact participants knew the purpose of the video cameras so could change expressions, you didn’t obtain HR for all participants so cannot determine if they did actually reach the intensities you aimed for, and the final exercise intensity was absolute and fixed to 300W. Please rectify.

Thank you for pointing that out. We have thoroughly revised the limitations section and included the points mentioned by the reviewer (p. 26; the influence of knowing to be filmed):

“It is important not to lose sight of the fact, however, that simply knowing that you are being filmed can of course change your behavior [52].

Limitation in statement about exercise intensity:

“It would be advantageous to determine exercise intensity physiologically at the level of the individual participants in future studies (e.g., by the use of respiratory gas analysis in a pretest). This would have given us more confidence as to whether the majority of our participants have actually reached a state close to physical exhaustion at the end of the exercise protocol.”

- Page 26, Lines 7-9 – This is a strong statement. I would just tone it down a bit…

We removed this strong statement.

- Page 26, Lines 10-11 – I like this section and agree with what you are saying, but I don’t feel the methods section you have provided in this paper would allow this research to be expanded upon by others. I think an improved methods section could be the strength of this paper

Thank you for expressing your appreciation. We tried to improve the methods section according to your comments and hope that it now allows other researchers better to replicate our work and findings (or build up on them).

- General comment throughout discussion and conclusion– You discuss that mouth open and draw drop are the “face of exertion”. I don’t disagree with this statement per se, and yes, it is shown in your results. However, I feel you need to more strongly note that as RPE increases, someone is likely to be breathing heavier, and therefore the jaw drops and mouth opens. At present you don’t really discuss this in any detail, which I feel is an oversight in the interpretation of your findings.

Thank you for this valuable addition to a more adequate interpretation of our findings. We emphasized this issue in greater detail now on p. 24:

“Both the metabolic thresholds, VT and RCP, are related to perceived exertion. They are objective, individualized metabolic indicators of intensity, and are already associated with psychological transitions in dual mode theory [9]. Linking them to transitions in facial actions could be a future prospect and be something like this: While exercising at the VT might mark the transition between nose to (predominantly) mouth breathing and thus also the transition to more mouth open, exercising above the VT might mark a transition to more jaw drop. This kind of intensified breathing might covary with escalating negative affective valence – that is the evolutionary built-in warning signal that homeostatic perturbation is precarious behavioral adaptation (reduction of physical strain) is necessary [12].“

Reviewer 2

Reviewer #2: Review of manuscript PONE-D-19_26758 submitted to PLOS ONE

In this study, facial expressions during aerobic exercise were recorded at fixed time intervals using an algorithm to detect specific facial actions and identify them as actions units defined within the Facial Action Coding System (FACS). These expressions were related to subjective ratings of positive or negative affective state as well as ratings of perceived exertion. Covariations between facial expressions and subjective ratings were analyzed using multilevel regression or trend analysis, allowing investigation of these relationships at the level of individual participants.

The rationale behind this study is clear, the statistical analyses are sophisticated, and the manuscript is well written. I agree with the authors that a better insight in the exerciser's affective state and subjective feelings of exertion may contribute to stimulating people to engage in further pleasurable physical exercise. Although the results of this study seem clear, I have a few comments, questions, or (generally minor) concerns.

Thank you for sharing your thoughts with us! We have tried to take up all points raised by the reviewer and try to improve the manuscript. Before we start discussing the single points, please let us politely correct one aspect of the summary above, because we believe that some of the reviewer's comments below will be easier to answer after having done so:

Facial actions during exercise were recorded continuously and Affectiva’s Affdex algorithm was used to analyze facial actions. This algorithm statistically compares observed changes with data from a normative database with information from more than 6 million faces of participants from 75 countries. The software records and continuously analyzes the configuration of 34 facial landmarks, with a resolution of 30 frames per second (fps). The Affdex algorithm returns as a sequential score at frame rate in real time (and stores for later analysis) probabilistic results (0–100%) that indicate the likelihood of occurrence of defined facial actions (Affectiva, 2018). Importantly, the Affdex algorithm involves elements of artificial intelligence (nonparametric machine learning), so that more detailed information about the transformation of facial action data into scores for facial expression is not available. This is a major difference to the use of the FACS, which is based on parametrically defined coding rules.

In fact, the Affectiva algorithm was initially trained with data based on human FACS-coding. As the result of the algorithm’s training however, Affectiva deviates from FACS in a few aspects (e.g. the Affectiva “smile” score is formed from a combination of FACS AU 6 + AU 12). A detailed descriptions of the facial actions provided by Affectiva can be found on https://developer.affectiva.com/metrics/. We included this reference now also in the relevant passages throughout the manuscript.

1. On pp. 6-7, and in the remainder of this manuscript, emphasis is laid on facial actions as indices of affective states or specific emotions. However, facial actions may also be related to perceptual, motivational, attentional, or cognitive processes (see, for example, studies mentioned in reference 58 of the current manuscript; see also Overbeek et al., 2014, Stekelenburg & van Boxtel, 2001). Although in the current study a lot of different facial actions were measured it seems as if the authors a priori consider these actions as indices of emotional processes whereas within FACS individual actions units strictly do not refer to specific emotions.

Thank you for this valuable remark. We agree that facial actions can be related to different processes and not just specific emotions. Therefore, we have incorporated the notion that facial action may also be related to perceptual, motivational, attentional, or cognitive processes in the manuscript with your suggested references.

For example, see p. 5:

“However, it cannot universally be assumed that observed facial movements always reflect (i.e., are expressive of) an inner state [16]. Facial actions can also be related to perceptual, social, attentional, or cognitive processes [17, 18].”

2. Although the current software used for analyzing facial actions indeed detected elementary facial actions as indicated in Table 1, it is remarkable that one of these actions ("smile") does not represent an elementary action but a combination of actions (AU6, check raiser; AU12, lip corner puller). This combination is generally interpreted as signifying a smile. I find this confusing since emotions are strictly not measured in the current study. On p. 25, it is said that in this study smile was associated with a negative affective valence. In line 12 on this page, it is erroneously suggested that the detection of AU6 is synonymous with the occurrence of a smile.

We agree with your comment that different labels in FACS and Affectiva scores are confusing and our presentation of this was confusing in the first version of the manuscript. We have now tried to improve this (make the difference between FACS and Affectiva even clearer in the manuscript) and to avoid (even more; further) confusion.

In conclusion, we think that we have to stick to the nomenclature and classification system in Affectiva (https://developer.affectiva.com/metrics/) because only this terminological system was used in this study. That is why we continue to use the labels (e.g. “smile”) provided by iMotions Affectiva so that further researchers working with the same system (Affectiva) can relate their findings to ours.

Last but not least, thank you for your drawing to our attention, that we erroneously wrote that smile is solely the occurrence of AU 6. We corrected this on p. 25.

3. Figure 2 illustrates relevant facial actions which were observed during the current physical exertion task. I find these examples somewhat confusing since for the reader it may be difficult to associate them with an aerobic exercise task. But particularly the illustration of jaw drop is confusing since this facial expression also depicts AU's 6 and 12 which are generally considered to represent happiness, suggesting that this person is overtly laughing. I have shown this picture to several colleagues asking them to indicate what they saw. They reported to see an overtly laughing person.

Thank you for bringing this to our attention! We agree that these pictures can be confusing. We have deleted them and included context-specific material now (see Figure 1), i.e. example pictures during exercise, which should allow a much better illustration of the points that are important to us (showing high Affectiva-scores for “Nose Wrinkle” and “Jaw Drop” in an exercise context).

4. On p. 23, it is defended that nose wrinkle need not be specifically related to disgust and that it may also be indicative of other emotions. However, in this respect studies are mentioned which have been performed in infants. I am afraid that facial expressions of infants cannot directly be compared with those of adults.

Thank you for mentioning this important point. We can well understand your concerns that studies with infants should not be generalized to our participants. Instead, in the manuscript we now focus previous studies with adults, and refer to the there-discussed association between “nose wrinkle” and “pain” ( references 54, 55), and we have edited the respective passage as follows:

“Nose wrinkle has also been specifically associated with the emotion disgust [15]. However, the same facial action has been observed in various situations (e.g. while learning) [46] and different emotional states (e.g., anger) [48].”

5. Later on this page, it is concluded that mouth open and jaw drop are highly correlated with perceived exertion but that this does not agree with results from an EMG study which would suggest that perceived exertion during physical tasks is mainly linked with corrugator activity. This brings me to the general question whether discrepancies between different studies may (at least partially) be related to studying either aerobic or anaerobic exercise. This distinction is not really discussed in this manuscript. When suggesting on p. 26, third paragraph, that future studies should include a wider ranger of sports to assure a higher generalizability of the current results, I wonder whether types of anaerobic exercise shouldn't also be included.

Thank you for that point. We focused mainly on cardiovascular load in an incremental endurance test to study the affective response against the background of dual mode theory, and tried to explain that better in the manuscript now. We agree with the reviewer that different forms of exercise could be associated with different facial action responses and that further research should address this. We have now written in the manuscript (p. 26):

“Future studies should extend the use of automated facial action analysis to a wider range of participants and sports to assure higher generalizability of the findings reported here. Different modalities and different exercise intensities might produce specific facial actions. More heterogeneous samples are likely to produce more variance in affective responses, which may lead to further insight into the variation in facial reactions to exercise.

The current study is limited in drawing conclusions regarding differences in the affective response to different intensity domains as exercise intensity was not measured physiologically.”

Minor points

- P. 7, line 3: "action" > "actions"

- P. 9, line 10: "action" > "actions"

- P. 11, line 11: "Logitech HD Pro C920" > "Logitech HD Pro C920 webcam"

- P. 19, footnote to Table 2: "in less number of parameters" > "in a smaller number of parameters"

Thank you for pointing this out, we corrected all these points in the manuscript.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Dominic Micklewright

23 Jan 2020

Affect and exertion during incremental physical exercise: Examining changes using automated facial action analysis and experiential self-report.

PONE-D-19-26758R1

Dear Dr. Timme,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Dominic Micklewright, PhD CPsychol PFHEA FBASES FACSM

Academic Editor

PLOS ONE

Acceptance letter

Dominic Micklewright

28 Jan 2020

PONE-D-19-26758R1

Affect and exertion during incremental physical exercise: Examining changes using automated facial action analysis and experiential self-report.

Dear Dr. Timme:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Dominic Micklewright

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data underlying the results presented in the study are available from: https://osf.io/z8rv7/


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES