Abstract
In competitive and cooperative scenarios, task difficulty should be dynamically adapted to suit people with different abilities. State-of-the-art difficulty adaptation methods for such scenarios are based on task performance, which conveys little information about user-specific factors such as workload. Thus, we present an exploratory study of automated affect recognition and task difficulty adaptation in a competitive scenario based on physiological linkage (covariation of participants’ physiological responses). Classification algorithms were developed in an open-loop study where 16 pairs played a competitive game while 5 physiological responses were measured: respiration, skin conductance, electrocardiogram, and 2 facial electromyograms. Physiological and performance data were used to classify four self-reported variables (enjoyment, valence, arousal, perceived difficulty) into two or three classes. The highest classification accuracies were obtained for perceived difficulty: 84.3% for two-class and 60.5% for three-class classification. As a proof of concept, the developed classifiers were used in a small closed-loop study to dynamically adapt game difficulty. While this closed-loop study found no clear advantages of physiology-based adaptation, it demonstrated the technical feasibility of such real-time adaptation. In the long term, physiology-based task adaptation could enhance competition and cooperation in many multi-user settings (e.g., education, manufacturing, exercise).
Keywords: Affective computing, competition, physiological measurements, physiological linkage, pattern recognition, dynamic difficulty adaptation
1. Introduction
Competition and cooperation between multiple interacting humans are popular in many areas of human-machine interaction. Perhaps most famously, computer games tend to include a multiplayer mode that lets several players compete or cooperate with each other (Chanel, Kivikangas, & Ravaja, 2012). However, there are also many serious examples of competitive and cooperative human-machine interaction scenarios. For example, students can cooperate or compete in educational scenarios to learn different topics (N. Zhou et al., 2020), patients in technology-assisted language therapy can work with each other to relearn words (Grechuta et al., 2016), and stroke survivors in technology-assisted motor rehabilitation (Baur et al., 2018; Goršič, Cikajlo, & Novak, 2017) as well as overweight adults in technology-assisted weight loss programs (Esakia, McCrickard, Harden, Horning, & Ramalingam, 2020) can compete or cooperate with each other for increased motivation and exercise intensity. While such multi-user scenarios have many potential benefits, they also present new challenges. One challenge is that such scenarios often include two users with different skills and abilities – for example, a severely impaired patient working with a mildly impaired patient. In such cases, the task difficulty should be intelligently balanced so that both users remain engaged by the task, usefully contribute to it, and derive maximum benefit from it. But how can this adaptation be performed most effectively?
In single-user scenarios, difficulty adaptation is usually performed based on either task performance or physiological responses. Performance is a task-specific concept – for example, the score in a computer game or the number of successfully completed actions in a rehabilitation exercise. It can be easily measured and can serve as the basis for simple, intuitive adaptation rules (e.g., if score is high, make the game harder). However, although performance is a good indicator of a person’s capabilities, it does not provide precise information regarding their internal, subjective state. For example, a person can exhibit acceptable performance but at the cost of high workload that may lead to stress and fatigue (Watson, Ntuen, & Park, 1996). Therefore, many studies have instead focused on physiological measurements such as the electrocardiogram (ECG), respiration and skin conductance, which provide an estimate of the person’s cognitive and affective (emotional) state, allowing difficulty to be adapted in a more personalized manner (e.g., if player workload is high, reduce game difficulty) (Novak, Mihelj, & Munih, 2012). Assessment of a user’s cognitive and affective states from physiological responses and scenario adaptation in response to these user states falls under the field of affective computing (Picard, Vyzas, & Healey, 2001) and is used for diverse purposes such as attention and workload recognition in drivers (Fan, Wade, Key, Warren, & Sarkar, 2018), computer game difficulty adaptation (C. Liu, Agrawal, Sarkar, & Chen, 2009), and adaptive automation in flight (Wilson & Russell, 2007). Studies of single-user scenarios have shown that physiology-based adaptation often outperforms performance-based adaptation (Bian et al., 2019; C. Liu et al., 2009; Xu et al., 2018); thus, physiology-based adaptation might also be promising for multi-user scenarios.
To date, task difficulty adaptation in competitive and cooperative scenarios has been almost exclusively based on task performance (Baldwin, Johnson, Wyeth, & Sweetser, 2013; Chih-Yueh, Shao-Pin, & Zhi-Hong, 2013; Goršič, Darzi, & Novak, 2017; Vicencio-Moreira, Mandryk, & Gutwin, 2014) rather than physiological measurements. However, given the demonstrated advantages of physiology-based adaptation over performance-based adaptation in single-user scenarios, such physiology-based adaptation should be investigated in multi-user scenarios as well. In our previous study, we took the first step by preliminarily evaluating a physiology-based adaptation method for a competitive scenario, but that method only adapted difficulty to suit one person based on that person’s physiological responses and ignored the other person (Darzi, Goršič, & Novak, 2017). Nonetheless, it demonstrated the feasibility of physiology-based difficulty adaptation for multi-user scenarios. The goal of the current study was thus to adapt the difficulty of a competitive scenario to suit both participants based on their physiological responses.
In a multi-user scenario, we could study each participant’s physiological responses individually and use this as a basis for difficulty adaptation, as done in our previous study. However, we can also study the similarity (degree of synchronization or mutual variation) of the participants’ physiological responses, thus obtaining information related specifically to the interaction between participants. This is referred to as physiological linkage (or synchronization), and is not simply due to participants perceiving the same stimuli (Haataja, Malmberg, & Järvelä, 2018; Pérez, Carreiras, & Duñabeitia, 2017). The degree of linkage increases with the amount of cooperation (Ahonen et al., 2016; Hu et al., 2018; Szymanski et al., 2017), the intensity of competition (T. Liu, Saito, Lin, & Saito, 2017; Spapé et al., 2013) or simply the degree of shared attention (Chênes, Chanel, Soleymani, & Pun, 2013; Muszynski, Kostoulas, Lombardo, Pun, & Chanel, 2018); thus, it may be able to provide additional information that would complement information obtained from individual physiological responses, as suggested by previous studies (Chanel et al., 2016). However, despite the unique potential of physiological linkage, it has never been used for task difficulty adaptation. Most applications have been limited to group-level correlational studies (Delaherche et al., 2012; Reidsma, Nijholt, Tschacher, & Ramseyer, 2010; Tschacher, Rees, & Ramseyer, 2014); one research group did perform automated classification of movie highlights based on spectators’ physiological linkage, but this is a fundamentally different application (Chênes et al., 2013; Muszynski et al., 2018).
This paper, to the best of our knowledge, thus presents the first use of two participants’ physiological responses for automated affect classification and consequent dynamic difficulty adaptation in a competitive scenario. Each participant’s individual physiological responses, different metrics of physiological linkage, and task performance were measured during different competitive game conditions and used to train affect classification algorithms. Different performance- and physiology-based classifiers were then compared with regard to both offline accuracy and their effect on user experience in a closed-loop real-time difficulty adaptation study. Our plan was originally to obtain a large sample of participants for the closed-loop study and thoroughly evaluate the effectiveness of such closed-loop adaptation; however, due to the COVID-19 pandemic, data collection had to be terminated early, and the study is thus presented as an exploratory proof-of-concept. Our research questions were:
RQ1: How accurately can physiological measurements classify human cognitive and affective states in a competitive scenario? This has been previously extensively explored in single-user scenarios, where physiological responses are most commonly classified into either two classes (e.g., low/high enjoyment) or three classes (e.g., low/medium/high stress) (Novak et al., 2012), but not in competitive scenarios. As a baseline for qualitative comparison, the same classification was also done with a few simple performance measurements, which are much easier to obtain and analyze.
RQ2: Does the addition of physiological linkage information allow more accurate affect classification than using only individual physiological responses? Physiological linkage calculation requires more signal processing than using only individual responses, as both participants’ physiological responses must be time-synchronized and analyzed together. Thus, adding it is useful only if it results in higher classification accuracy.
RQ3: (Preliminarily) Does physiology-based difficulty adaptation result in a positive user experience? High offline classification accuracy is not guaranteed to transfer to accurate real-time difficulty adaptation (Fairclough, Karran, & Gilleade, 2015; McCrea, Geršak, & Novak, 2017). For example, there is more potential for erroneous adaptation decisions since artefacts that can be easily manually removed in offline processing cannot be quickly removed during gameplay itself; at the same time, users may be able to compensate for erroneous decisions made by the system by adapting their own behavior (Fairclough & Lotte, 2020). Our original goal was to compare the effectiveness of physiology-based adaptation to that of performance-based adaptation similarly to what has previously been done in single-user studies (Bian et al., 2019; C. Liu et al., 2009; Xu et al., 2018); however, as data collection was interrupted by COVID-19, we have limited ourselves to demonstrating the technical feasibility of providing a positive user experience using physiology-based adaptation. Performance-based adaptation was included as a baseline for qualitative comparison.
A preliminary version of this study was published as a 2019 conference paper (Darzi & Novak, 2019). It used the same study setup (hardware and competitive game) as the current paper. The differences between this paper and the preliminary conference version are as follows:
The conference version included no closed-loop adaptation, though such adaptation was mentioned briefly as a future step. The current version includes closed-loop adaptation based on individual physiological responses and physiological linkage, with simple performance-based adaptation also included as a baseline for comparison.
The conference version included only preliminary open-loop classification (two classes, only a subset of the classifiers and features used in this paper). The current version includes both two- and three-class classification with a larger set of classifiers and features. Furthermore, the current version describes the most informative features in classification.
The conference version included more extensive analysis of group-level differences using analyses of variance to verify that different game difficulty levels induce different physiological responses. This has been omitted from the current paper, as we wished to focus more on classification and adaptation.
The conference version included a somewhat smaller open-loop participant sample (12 pairs vs. 16 in the current paper).
2. Methodology
This paper describes two related studies: an open-loop study used to train affect classification algorithms for a competitive scenario based on two participants’ physiological responses, and a closed-loop study where difficulty adaptation based on this affect classification was compared to performance-based adaptation. Both studies were approved by the University of Wyoming Institutional Review Board. Section 2.1 presents the study hardware and setup, which was identical for both studies. Section 2.2 presents the open-loop study protocol. Section 2.3 presents the individual physiological features extracted from each participant’s raw signals (i.e., without considering the other participant) while section 2.4 presents the physiological linkage features extracted from both participants’ time-synchronized signals. Section 2.5 then presents the methods for affect classification into two or three classes using the extracted features. Finally, section 2.6 presents the closed-loop study protocol.
2.1. Study Setup
The competitive scenario was a 2-player game previously used by our group (Goršič, Cikajlo, et al., 2017) (Figure 1). It is a computer-based Pong game displayed on a standard computer screen and controlled with two joysticks (3D Pro, Logitech). The screen shows a two-dimensional board with a puck and two paddles (one at the top, one at the bottom). Each player controls one paddle and can move it left and right (but not up or down) with their joystick. The puck moves around the screen with a constant velocity, and each player must intercept it with their paddle so that it does not reach the top or bottom of the board. If the puck passes a player’s paddle and reaches the top or bottom of the board, the other player scores a point and the puck instantly moves in front of a random player’s paddle, where it remains stationary for a second before moving toward the other player’s side of the board. The game difficulty can be changed by increasing or decreasing the ball speed and the paddle sizes.
Figure 1.

The study setup for both the open-loop and closed-loop studies. Figure reprinted from our prior publication (Darzi & Novak, 2019) with permission.
Two g.USBamp biosignal amplifiers and associated sensors (g.tec Medical Engineering GmbH, Austria) were used to record five physiological signals for each participant (Figure 1, left). The electrocardiogram (ECG) was recorded using four electrodes placed as recommended by the g.USBamp manufacturer (two on the chest, one on the back, one on the abdomen). Respiration was recorded with a thermistor-based flow sensor (g.tec) in front of the nose and mouth. Skin conductance was recorded with two electrodes (g.GSRsensor2, g.tec) attached to the fingertips of the second and third fingers of the non-dominant hand (Figure 1, bottom right). Two facial electromyograms (FEMG) were recorded using 5 electrodes: two on the zygomaticus major muscle to measure smiling, two on the corrugator supercilii to measure frowning, and a ground electrode on the forehead (Figure 1, top right).
2.2. Open-loop Study Protocol
The open-loop study involved 16 pairs of healthy university students (mean 26.5 years old; 3 female-female pairs, 7 mixed-gender pairs, 6 male-male pairs; 2 individuals self-reported as left-handed), with each pair participating in a single session. Participants were asked to volunteer for the study in self-selected pairs, and thus already knew each other. The study purpose and procedure were explained at the start of the session, and participants then signed an informed consent form. The sensors were attached, and physiological responses were measured for a 3-minute baseline period, during which participants were instructed to relax without doing anything. The main part of experiment then consisted of six conditions, each three minutes long (Figure 2). The six conditions included two single-player conditions (one for each player), one fair and slow competitive condition, one fair and fast competitive condition, and two unfair conditions. In single-player conditions, ball speed and paddle sizes were set to a medium setting (used in our previous single-player study (Goršič, Cikajlo, et al., 2017) to provide a moderate challenge for the average participant), and one participant played against a computer-controlled opponent while the other participant watched. In both fair competitive conditions, both paddle sizes were also set to medium. In the ‘fair and slow’ condition, ball speed was 70% of the medium value; in the ‘fair and fast’ condition, it was 130% of the medium value. In the unfair conditions, ball speed was set to medium, but one participant played with a small paddle while the other played with a large paddle (50% or 150% of the medium paddle width). The two unfair conditions differed according to which participant played with the small paddle. The conditions were played in random order.
Figure 2.

The protocol of the open-loop study. Cond = condition. Figure reprinted from our prior publication (Darzi & Novak, 2019) with permission.
Five physiological signals (ECG, respiration, skin conductance, 2 FEMGs) were recorded during all 6 conditions, and participants were told to not speak in order to reduce measurement artefacts. However, they were allowed to make noises, laugh and frown. In addition to physiological measurements, each participant’s performance was tracked individually using their in-game scores. We acknowledge that this is a limited way of quantifying performance and that a more complex performance analysis could have included, e.g., analysis of gestures and gesture synchronization (Delaherche et al., 2012; Varni, Avril, Usta, & Chetouani, 2015); however, at this stage of research, our focus was on physiological analysis and performance was considered a secondary data source.
After each condition, a short questionnaire was filled out to assess cognitive and affective states using four self-reported variables: perceived difficulty (1–7), enjoyment (1–7), valence (1–9, with 1 being very negative and 9 being very positive), and arousal (1–9). These four variables were reused from our previous physiological research involving a similar single-player game (Darzi, Wondra, McCrea, & Novak, 2019). As part of that single-player research, a pilot study was conducted where different game difficulty conditions were presented to participants and a larger set of questionnaires was filled out with no physiological measurements; the selected four variables showed the largest differences between conditions. Arousal, valence and enjoyment are commonly used as self-reported variables in physiological studies of single-player computer games (Liu, Conn, Sarkar, & Stone, 2008; Mandryk & Atkins, 2007), while perceived difficulty was included as a variable that may have a nonlinear relationship with the other three.
2.3. Individual Physiological Features
Five physiological signals were recorded from each participant during the baseline and six game conditions with a sampling frequency of 256 Hz. For each condition, 19 individual physiological features were calculated from each participant’s five signals. None of the features were normalized (see section 2.5.1).
ECG:
Two time-domain features were calculated: mean heart rate and standard deviation of interbeat intervals. Furthermore, three frequency-domain features of heart rate variability were calculated: power of low frequencies (LF), power of high frequencies (HF) and the ratio of LF/HF power. The LF range is 0.04–0.15 Hz while the HF range is 0.15–0.4 Hz (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology, 1996). These frequency-domain features were calculated by first calculating the instantaneous time series of heart rate (i.e., heart rate as a function of time), using interpolation to obtain the time series with a constant sampling frequency, and performing frequency analysis of this interpolated time series, as recommended in the literature (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology, 1996). Heart rate is commonly associated with tonic arousal while the different heart rate variability metrics have diverse and sometimes unclear response patterns to different affective states (Kreibig, 2010).
Skin conductance:
The signal can be divided into two components: tonic (slow changes) and phasic (transient fast events called skin conductance responses). Five features were calculated from the signal: the mean skin conductance, skin conductance difference between the first and last second of the condition, skin conductance response frequency (number of responses per minute), mean skin conductance response amplitude, and standard deviation of response amplitude (Boucsein, 2012). Skin conductance is generally agreed to be innervated primarily by the sympathetic nervous system, and both mean skin conductance and skin conductance response frequency thus primarily increase as a result of general psychological arousal (Boucsein, 2012; Kreibig, 2010).
Respiration:
The mean respiration rate (RR), standard deviation of RR, and root-mean-square of successive differences of respiration periods were calculated. Similarly to heart rate, respiration rate is commonly associated with tonic arousal (Boiten, Frijda, & Wientjes, 1994; Kreibig, 2010) while respiratory variability is somewhat underexplored but has been shown to indicate emotional states such as anxiety (Kreibig, 2010; Van Diest, Thayer, Vandeputte, Van de Woestijne, & Van den Bergh, 2006).
FEMG:
Three features were calculated from each FEMG measurement. First, the root-mean-square (RMS) value was calculated over the entire condition. Then, the RMS value of the signal was calculated over consecutive 5-second windows within the condition, and the maximum and minimum RMS value over all windows were calculated as the second and third features. As smiling and frowning can be very brief, they might be missed by the RMS value over the entire condition. EMG of the zygomaticus major indicates smiling while EMG of the corrugator supercilii indicates frowning, and features extracted from them are thus correlated with positive (for the former) and negative (for the latter) affective states (Mandryk & Atkins, 2007; Spapé et al., 2013). However, facial EMG is also influenced by other facial movements, such as talking and concentration.
2.4. Physiological Linkage
Three physiological linkage methods were selected from the literature (Chênes et al., 2013; Delaherche et al., 2012; Ekman et al., 2012; Järvelä, Kivikangas, Kätsyri, & Ravaja, 2014; Varni et al., 2015): the coherence of raw signals, correlation of physiological features every 30 seconds, and the correlation of instantaneous HR and RR signals. None of the features were normalized (see section 2.5.1).
Coherence of raw physiological measurements:
Welch’s averaged periodogram method with a Hann window was used to calculate the coherence between two equivalent raw signals (e.g., skin conductance of participant 1 and participant 2). Coherence is a standard signal processing method that finds the co-oscillation of two measurements up to half of the sampling frequency (Alessio, 2016). Meaningful frequency ranges were defined differently for each signal. For ECG and skin conductance, the Hann window length was 20 seconds (5120 samples) to achieve a resolution of 0.05 Hz; for respiration and FEMG, the Hann window length was 2 seconds to achieve a resolution of 0.5 Hz. Coherence features were then obtained as follows: for ECG, coherence was calculated for two bands (0.1–0.15 Hz and 0.15–0.4 Hz); for skin conductance, it was calculated for one band (0–0.4 Hz); for respiration, it was calculated for one band (0–2 Hz); for FEMG, it was calculated for 5 bands (0–8 Hz, 8–16 Hz, 16–24 Hz, 24–32 Hz, 32–40 Hz). The bands for FEMG were chosen based on coherence analysis recommendations from other EMG applications (Dideriksen et al., 2018). We note that, due to the use of relatively long windows, coherence does not represent truly ‘instantaneous’ synchronization but rather frequency synchronization over longer intervals.
Correlation of physiological features every 30 seconds:
To assess the covariation of both participants’ individual features, each individual feature was calculated over successive 30-second periods within a condition, and Pearson correlations between the two participants’ features were calculated. As 19 features were calculated from five physiological measurements of each participant, this correlation-based method resulted in 19 physiological linkage features.
Correlation of instantaneous HR and RR:
While raw ECG and respiration signals are unlikely to be correlated between participants due to signal noise and interpersonal differences, the time series of HR and RR might be better-correlated. Instantaneous HR and RR values were thus calculated from peak-to-peak intervals in each condition, and linear interpolation was used to create a 256-Hz signal from the obtained values (similarly to how frequency analysis of ECG was done in section 2.3). Physiological linkage was then calculated as the Pearson correlation coefficient between the two participants’ interpolated instantaneous HR and RR. This resulted in two Pearson coefficients for each condition: one for HR and one for RR.
2.5. Classification and Validation
Our ultimate goal is to automatically classify the cognitive and affective states of both participants in a competitive scenario and use that information as a basis for difficulty adaptation. In single-user affective computing scenarios, physiological measurements are commonly classified into either two classes (e.g., low/high enjoyment) or three classes (e.g., low/medium/high stress), and difficulty adaptation is done based on the inferred class using relatively simple if-then rules (e.g., if stress high, decrease difficulty) (Novak et al., 2012). We thus also chose to use the same two- and three-class classification approach in the current study. The focus of the current research was on physiological measurements; however, as a basis for qualitative comparison, a simple performance measure (scores of both players) was also collected and used in classification.
To create the classification algorithms, the two types of data (performance and physiological measurements) were used to obtain three types of features: performance features, individual physiological features, and physiological linkage features. Five combinations of these three feature sets (performance only, individual physiological only, individual physiological + linkage, performance + individual physiological, all features) were input into machine learning methods to classify four self-reported variables into either two or three classes. These self-reported variables were obtained from the short questionnaire (section 2.2): enjoyment, emotional valence, arousal, and perceived difficulty. The current ball speed and paddle sizes were added to all input data combinations since they indicate the current game state and would be available to any practical model.
Classification into two classes:
The input data were classified into “low” or “high” for perceived difficulty, enjoyment, valence and arousal using the ranges defined in Table 1. These ranges were defined manually after data collection based on histograms of participants’ responses to the short questionnaire so that the two classes contain roughly equal numbers of samples. For enjoyment and arousal, this resulted in responses of “5” (neutral) being omitted from the dataset. Table 1 also presents the number of samples in each class.
Table 1.
The defined ranges for two-class classification. # = the number of samples in each class.
| Class definition | Difficulty (#) | Enjoyment (#) | Valence (#) | Arousal (#) |
|---|---|---|---|---|
| Low | 1–3 (70) | 1–4 (66) | 1–3 (91) | 1–4 (64) |
| High | 4–7 (90) | 6–7 (56) | 4–9 (69) | 6–9 (59) |
Classification into three classes:
The input data were classified similarly to the above approach, but the possible classes were now low/medium/high for perceived difficulty, enjoyment, valence, and arousal using the ranges defined in Table 2. Again, these ranges were manually defined based on histograms of participants’ answers so that the three classes contain roughly equal numbers of samples. Moreover, Table 2 presents the number of samples in each class.
Table 2.
The defined ranges for three-class classification. # = the number of samples in each class.
| Class definition | Difficulty (#) | Enjoyment (#) | Valence (#) | Arousal (#) |
|---|---|---|---|---|
| Low | 1–2 (40) | 1–4 (66) | 1–2 (64) | 1–4 (45) |
| Medium | 3–4 (62) | 5 (38) | 3–4 (79) | 5–6 (51) |
| High | 5–7 (58) | 6–7 (56) | 5–9 (49) | 7–9 (64) |
Classification of each self-reported variable began with stepwise regression-based forward feature selection (Keough & Quinn, 1995) (inclusion threshold: p = 0.05) applied to the full feature set under investigation in order to select the most relevant set of input features. To classify the selected features into either two or three classes, two different classifiers were used: support vector machine with a linear kernel and ensemble decision tree (a nonlinear method). The classifiers were validated using 10-fold crossvalidation (28 or 29 participants’ data used to train, three participants’ data to validate the classifier; procedure repeated 10 times with each participant in the validation dataset once) and the mean classification accuracy over all 10 folds is reported as the final result.
Finally, to determine whether one of the two classifiers is more accurate than the other, McNemar’s test was used to compare the accuracies obtained with the support vector machine to those obtained with the ensemble decision tree for each two-class classification problem (i.e., each self-reported variable and each input dataset). Similarly, to determine whether the addition of linkage features increases classification accuracy, McNemar’s test was used to compare accuracies obtained with only individual features to accuracies obtained with both individual and linkage features for each self-reported variable and classifier. The significance threshold for McNemar’s tests was alpha = 0.05.
2.5.1. A Note on Other Classification Approaches
For full disclosure: During data analysis, we also tested three other approaches to feature selection (no feature selection, stepwise backward, stepwise forward-backward selection) and two other classifiers (linear discriminant analysis, multiple linear regression). We also analyzed all features both without normalization (as done above) and with normalization: subtracting the baseline value of a feature from all in-game values or dividing the in-game feature values by the baseline value. Finally, we also calculated coherence (section 2.4) on rectified and/or bandpass-filtered rather than raw signals, as suggested by studies of coherence in other applications (Dideriksen et al., 2018). These approaches either did not improve classification (in the case of, e.g., stepwise forward-backward selection, normalization and coherence) or led to much worse results (e.g., no feature selection), and are thus not reported for conciseness.
2.6. Closed-loop Study Protocol
The closed-loop study involved a comparison of two adaptation strategies: performance-based and physiology-based. Six pairs of healthy university students (mean 29.8 years old; 4 mixed-gender pairs, 2 male-male pairs; 1 individual self-reported as left-handed) participated, with three assigned to each adaptation strategy; pairs were not told what strategy they had been assigned to until they had completed the session. Participants were asked to volunteer for the study in self-selected pairs, and thus already knew each other. Similarly to the open-loop study, the protocol started with an explanation of the study purpose and procedure, and participants signed an informed consent form. Then, the physiological sensors were attached and physiological responses were measured for a 3-minute baseline period, during which participants did not do anything and were instructed to relax. Participants then played the Pong game together for 18 minutes with no interruptions. The game started at a medium difficulty level (paddle size = 2 and ball speed = 3), and difficulty was then adapted every three minutes by increasing/decreasing the paddle size and ball speed. Possible values for paddle size were limited to 1–4 while possible values for ball speed were limited to 1–5 to avoid extreme difficulties.
For pairs with performance-based adaptation, the game parameters (ball speed and paddle sizes) were adapted based on the output of the most accurate performance-based classifier trained on data from the open-loop study. The performance-based classifiers were based only on in-game scores, current ball speed and current paddle sizes; however, pairs in this group still wore the physiological sensors.
For pairs with physiology-based adaptation, the game parameters were adapted based on the output of the most accurate physiology-based affect classifier trained on data from the open-loop study. The inputs to the classifier were the individual physiological features, physiological linkage features, and current ball speed and paddle sizes.
For both adaptation types, the classifier used was the support vector machine, and its output was the perceived difficulty for each participant with 3 possible classes (easy, medium or hard). Perceived difficulty and the support vector machine were used since they exhibited the highest accuracy in the open-loop crossvalidation (see Results). Three-class rather than two-class classification was used to provide a higher resolution and potentially more effective adaptation. The following if-then rules were then used to adapt difficulty based on both participants’ classified perceived difficulty:
If the game was easy or hard for both participants, increase or decrease the ball speed by one level, respectively.
If the game was hard for one participant and easy or medium for the other one, increase the former participant’s paddle size by one level.
If the game was easy for one participant and hard or medium for the other one, make the paddle size smaller for the former participant by one level. This can overlap with the previous rule, resulting in the two paddles changing size simultaneously (one increasing and one decreasing).
No change in case of medium difficulty for both.
If the ball speed has reached an extreme value (1 or 5) and would be increased/decreased past it by rule 1, the ball speed instead stays constant and both paddle sizes are changed instead to make the game easier or harder for both participants.
After playing the game for 18 minutes, both participants filled out two questionnaires: the Intrinsic Motivation Inventory (IMI) (Markland & Hardy, 1997) and the Flow Experience Measure (FEM) (Sung, Hwang, & Yen, 2015). The IMI is an 8-item questionnaire that assesses effort/importance, perceived competence, interest/enjoyment, and pressure/tension with two items per assessed variable. The FEM assesses a single variable, flow (a mental state of being immersed in a feeling of energized focus (Nakamura & Csikszentmihalyi, 2014)) using 8 items.
3. Results
3.1. Open-loop Study
Table 3 presents the mean 2-class classification accuracies for five combinations of three feature sets. The highest accuracy (84.3%) was obtained for classification of perceived difficulty using the combination of all feature sets. The combination of individual physiological features and physiological linkage features as well as the combination of all feature sets yielded the highest accuracy for enjoyment and valence as well; for arousal, the highest accuracy was obtained with a combination of individual physiological features and performance data. The lowest classification accuracy (65.6%) was obtained for valence. Table 3 presents the results of only the most accurate of the four classifiers in each classification scenario.
Table 3.
Mean two-class classification accuracies for five combinations of three input feature sets. The accuracy of best classifier is listed for each classification case and the highest accuracy in each column is bolded. Individual = Individual physiological features, Linkage = physiological linkage, S = support vector machine, E = ensemble decision tree.
| Feature Set | Difficulty | Enjoyment | Valence | Arousal |
|---|---|---|---|---|
| Performance | 79.5% (S) | 62.8% (S) | 57.2% (E) | 74.4% (E) |
| Individual | 79.6% (S) | 67.0% (S) | 62.3% (E) | 74.9% (S) |
| Individual + Linkage | 81.1% (E) | 77.3% (S) | 65.6% (S) | 73.8% (S) |
| Individual + Performance | 81.9% (E) | 66.2% (S) | 62.3% (E) | 75.0% (E) |
| All | 84.3% (S) | 77.3% (S) | 65.6% (S) | 73.8% (S) |
Table 4 presents the mean 3-class classification accuracies for five combinations of the three feature sets. The highest classification accuracy was obtained for perceived difficulty (60.5%) while the lowest was obtained for arousal (47.9%). Unlike two-class classification, no specific combination of feature sets is dominant for all four self-reported variables.
Table 4.
Mean three-class classification accuracies for five combinations of three input feature sets. The accuracy of best classifier is listed for each classification case and the highest accuracy in each column is bolded. Individual = Individual physiological features, Linkage = physiological linkage, S = support vector machine, E = ensemble decision tree.
| Feature Set | Difficulty | Enjoyment | Valence | Arousal |
|---|---|---|---|---|
| Performance | 60.5% (E) | 47.3% (S) | 44.3% (S) | 45.5% (S) |
| Individual | 58.4% (E) | 51.5% (E) | 47.9% (S) | 46.4% (S) |
| Individual + Linkage | 59.5% (E) | 52.4% (S) | 46.9% (S) | 45.5% (E) |
| Individual + Performance | 57.0% (S) | 51.7% (E) | 47.9% (S) | 46.4% (S) |
| All | 59.5% (S) | 52.4% (S) | 47.6% (S) | 44.8% (S) |
Table 5 shows the features chosen by forward stepwise feature selection for two-class classification of four self-reported variables: perceived difficulty, enjoyment, valence, and arousal. These were selected among all features from all three data sets (performance, individual physiological, linkage). The significance level of each feature’s differences between the two classes is indicated with p-values; all features that met the inclusion threshold are reported.
Table 5.
Features selected for two-class classification by forward stepwise selection among all possible features. RMS = root mean square value, SCR = Skin Conductance Response, EMG = electromyogram, μ = mean, SD = standard deviation.
| Dimension | Rank | Feature | P-value | Group Low (μ ± SD) | Group High (μ ± SD) |
|---|---|---|---|---|---|
| Difficulty | 1 | Current paddle size | <0.001 | 2.38 ± 0.52 | 1.7 ± 0.55 |
| 2 | Current ball speed | <0.001 | 3.24 ± 0.53 | 3.7 ± 0.64 | |
| 3 | Minimum RMS of 5-sec frown EMG | 0.010 | 4.03 ± 1.73 | 1.13 ± 0.94 | |
| 4 | In-game score | 0.024 | 7.21 ± 11.2 | −2.4 ± 12.4 | |
| 5 | Current opponent paddle size | 0.036 | 1.36 ± 0.78 | 1.97 ± 1.00 | |
| 6 | Power of low frequencies in heart rate | 0.037 | 0.12 ± 0.60 | 0.36 ± 1.27 | |
| Enjoyment | 1 | Correlation of smile EMG RMS | <0.001 | 0.77 ± 1.44 | 2.59 ± 3.71 |
| 2 | Correlation of SD of respiration rate | 0.005 | 0.92 ± 0.51 | 1.17 ± 0.69 | |
| 3 | Minimum RMS of 5-sec frown EMG | 0.017 | 1.94 ± 3.09 | 1.33 ± 1.45 | |
| 4 | Current ball speed | 0.045 | 3.36 ± 0.55 | 3.64 ± 0.64 | |
| Valence | 1 | SD of SCR amplitude | 0.005 | 3.69 ± 6.22 | 4.22 ± 6.89 |
| 2 | Current ball speed | 0.011 | 3.35 ± 0.54 | 3.61 ± 0.67 | |
| 3 | Correlation of skin conductance | 0.020 | 1.08 ± 0.32 | 1.02 ± 0.26 | |
| 4 | Correlation of smile EMG RMS | 0.043 | 1.16 ± 1.71 | 2.01 ± 3.05 | |
| Arousal | 1 | Paddle size of opponent | <0.001 | 1.18 ± 0.93 | 2.00 ± 0.99 |
| 2 | Current ball speed | <0.001 | 3.31 ± 0.60 | 3.65 ± 0.62 | |
| 3 | Correlation of instantaneous heart rate | 0.005 | 0.15 ± 1.54 | −1.87 ± 2.12 | |
| 4 | Skin conductance difference | 0.041 | 1.03 ± 2.22 | −0.06 ± 4.55 |
McNemar’s tests found only two significant differences between two-class classifiers. First, when classifying perceived difficulty with both individual physiological and linkage features, the ensemble decision tree was more accurate than the support vector machine (p = 0.025). Second, when classifying arousal with performance features, the ensemble decision tree was again more accurate (p < 0.001). McNemar’s tests also found only one significant difference between using only individual physiological features and using both individual and linkage features: when classifying enjoyment using the support vector machine, the addition of linkage features resulted in higher accuracy (p = 0.012).
3.2. Closed-loop Study
Table 6 shows the five outputs of the IMI and FEM for the two adaptation strategies. Results are shown separately for the winner (player with higher final score) and loser in each pair since we qualitatively observed that they had different reactions to the game. Due to the small sample size (3 pairs per each adaptation strategy), the obtained differences were not evaluated for significance using methods such as analysis of variance.
Table 6.
Results of the Intrinsic Motivation Inventory (IMI) and Flow Experience Measure (FEM) for the closed-loop study. All IMI scales have a range of 2–14 while the FEM has a range of 8–40. Columns represent participants, with 1–3 being different pairs within the adaptation strategy and W and L indicating the winner and loser in each pair.
| Performance-based adaptation | Physiology-based adaptation | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1-W | 1-L | 2-W | 2-L | 3-W | 3-L | 1-W | 1-L | 1-W | 2-L | 3-W | 3-L | |
| IMI Interest/Enjoyment | 10 | 11 | 10 | 6 | 13 | 12 | 14 | 9 | 12 | 8 | 8 | 11 |
| IMI Effort/Importance | 14 | 11 | 11 | 9 | 9 | 14 | 13 | 10 | 11 | 8 | 12 | 12 |
| IMI Perceived Competence | 13 | 11 | 13 | 7 | 11 | 7 | 5 | 5 | 10 | 9 | 11 | 4 |
| IMI Pressure/Tension | 2 | 9 | 9 | 9 | 9 | 7 | 13 | 4 | 8 | 6 | 10 | 11 |
| FEM | 36 | 30 | 35 | 23 | 35 | 32 | 29 | 26 | 29 | 28 | 31 | 36 |
4. Discussion
4.1. Open-Loop Classification Accuracies
Tables 3 and 4 show classification accuracies for both two-class and three-class classification of different cognitive and affective states (perceived difficulty, enjoyment, valence, arousal) using different feature sets. Both the two-class and three-class classification accuracies are comparable to those seen in single-user studies. Our highest reported accuracies based on only physiological responses were 81.1% for two-class and 59.5% for three-class classification; for comparison, our 2012 review of classification of affective and cognitive states based on autonomic nervous system responses found that two-class classification accuracies mostly range from 60% to 90% while three-class classification accuracies range from 40% to 75% (Novak et al., 2012). Furthermore, the result is in line with a related study on classification of physiological linkage, which classified highlights vs. non-highlights during joint movie watching and reported a two-class accuracy of 78.2% (Chênes et al., 2013). Thus, to answer RQ1: physiological measurements can be used to classify cognitive and affective states in competitive scenarios with similar accuracies to those observed in single-user studies.
When comparing physiology-based classification to performance-based classification, the addition of physiology increased classification accuracy for enjoyment and valence, with the difference being largest for two-class enjoyment classification – 77.3% with physiological features compared to 62.8% with only performance features. However, results for perceived difficulty and arousal are less promising: while there is some accuracy gain when using physiological data for two-class difficulty classification (79.5% with only performance, 84.3% with both performance and physiology), there is no accuracy advantage to physiological features for three-class perceived difficulty classification or two/three-class arousal classification. Furthermore, as acknowledged, the performance measures were limited (game score only) since the focus of our work was on physiology, and it is entirely possible that more complex performance analysis would have resulted in higher accuracy, as discussed further in later sections.
The inclusion of physiological linkage tended to slightly improve classification accuracy compared to using only individual physiological features. The biggest increase was again observed for two-class enjoyment classification (67.4% with only individual features vs. 77.3% with individual + linkage features), and was also found to be significant using McNemar’s test. However, most other increases were smaller and not considered significant by McNemar’s test. Thus, to answer RQ2: the inclusion of physiological linkage can increase affect classification accuracy, but there is only limited evidence (one significant McNemar’s test) that these increases are worthwhile.
4.2. Open-Loop: Best Features, Classifiers, and Normalization
While there were generally small differences in accuracy between the support vector machine and ensemble decision tree, only two differences were significant (section 3.1). In a practical application, we would likely simply choose the most appropriate classifier for our collected data and classified self-report variable.
Table 5 lists the most relevant selected features for two-class classification of four self-reported variables (perceived difficulty, enjoyment, valence, and arousal). These features were selected by the forward stepwise method; however, the causal relationship between the features and the output is unclear and beyond the scope of this paper. As mentioned, both current ball speed and current paddle size are among the best features for all four self-reported variables; this is unsurprising and simply indicates that current difficulty heavily influences the participant’s cognitive and affective state. On the other hand, several physiological responses do indicate human cognitive and affective states in both expected and unexpected ways. For example, as both positive and negative emotions cause rapid changes in skin conductance (Kreibig, 2010) and since happy participants are likely to smile more, it is not surprising that both the skin conductance responses and the smile electromyogram are correlated with valence. However, we were surprised to see that, for example, the frown electromyogram was correlated with enjoyment, with participants with high enjoyment exhibiting more frowning. Finally, several physiological linkage features do appear among the selected features, indicating their potential for affect classification in competitive scenarios.
We do acknowledge that not all possible features (especially with regard to linkage) were investigated, and that the utilized stepwise forward selection method may not optimally reduce multicollinearity between features. We did compare forward feature selection to backward and forward-backward selection (section 2.5.1), but found no systematic differences in classification accuracy. However, in the future, it may be worthwhile to investigate other measures of physiological linkage, including dynamic time warping, nonlinear interdependence and Granger causality (Muszynski et al., 2018; Varni et al., 2015); furthermore, it may be worthwhile to investigate other feature reduction methods such as unsupervised approaches.
4.3. Closed-Loop Difficulty Adaptation
As perceived difficulty exhibited the highest classification accuracies in Tables 3 and 4, it was chosen as the basis for the closed-loop study. Furthermore, it is the only one that can be directly used as a basis for adaptation – if-then rules can be designed to adapt difficulty based on perceived difficulty more easily than rules based on, e.g., enjoyment. Table 6 shows results of IMI and FEM questionnaires for performance-based and physiology-based closed-loop adaptation strategies, presented separately for the winner and loser in each pair. IMI scores have a range of 2 to 14, and interest/enjoyment was mostly in the 9–13 range for winners and 7–11 range for losers; effort/importance was mostly in the 9–13 range for both winners and losers. Our previous nonphysiological research with the same version of the IMI, competitive games and unimpaired participants similarly observed interest/enjoyment and effort/importance ratings mostly in the 9–13 range (Goršič, Darzi, et al., 2017). Thus, the physiology-based difficulty adaptation appears to have been reasonably successful and can be considered feasible.
While some differences in IMI scores can be seen between performance- and physiology-based adaptation (e.g., lower perceived competence for physiology-based adaptation), the sample size is too small and the performance-based features are too simple to make a direct comparison in that regard. However, interesting qualitative reactions were observed in participants. Since they were told of the existence of the two adaptation methods but not which one they were assigned to, participants were curious about potential physiology-based adaptation. In discussion after the gameplay period, most participants had an opinion on which adaptation strategy they had been assigned to, and tended to base this opinion on perceived events during the game (e.g., “I noticed that the game got harder after I’d been smiling”). Four participants stated that they had intentionally tried to modify their physiological responses to evoke a reaction from the system by changing their respiration patterns or facial expressions, and one participant stated that they had intentionally tried to match their partner’s facial expressions to see how it would affect the game. This confirms observations from single-user studies that found initial biases regarding affective computing as well as attempts to influence the decision-making behavior of the system (Fairclough et al., 2015; McCrea et al., 2017).
4.4. Possible improvements to study methodology
Before discussing next steps, two suboptimal choices with regard to our study methodology should be acknowledged. First, the classification design was not optimal. While we successfully induced different cognitive and affective states in the 6 conditions, classification of the 4 self-reported variables into two or three classes may be problematic since participants might not accurately self-report their psychological states. Moreover, the ranges for the classes were defined manually based on the histogram of participants’ responses, which may influence classification accuracies. Future studies should thus consider alternative classification designs – for example, class definitions such as “both participants are equally challenged” vs. “one participant is challenged significantly more”.
Second, in the closed-loop study, each pair experienced only one adaptation strategy. While a direct comparison of performance- and physiology-based adaptation was infeasible due to the sample size anyway, a future study more extensively comparing the two should consider a repeated-measures design: having each pair experience both strategies one after the other.
4.5. Next Steps
The current study demonstrated the feasibility of classifying cognitive and affective states in a competitive scenario based on physiological measurements, with two- and three-class classification accuracies similar to those observed in single-user scenarios. Furthermore, it demonstrated the technical feasibility of dynamic closed-loop scenario adaptation based on a combination of individual physiological responses and physiological linkage, with participants reporting a reasonably positive experience. However, multiple additional steps need to be taken before physiology-based adaptation of multi-user scenarios can deliver potential practical benefits.
First, while physiological measurements and physiological linkage may have potential in competitive and cooperative scenarios, they should not be studied in isolation. Extensive work has been done on analysis of other types of interpersonal synchronization (e.g., motion synchronization (Delaherche et al., 2012; Varni et al., 2015)), but there have been few attempts to combine different synchronization types for more accurate inference of user states. Such multimodal information fusion has also been emphasized as a potential future research direction by other studies of physiological linkage (Muszynski et al., 2018). We consider such fusion to be especially important since task performance, behavior and physiology influence each other, and evaluating them separately (as done in our closed-loop study, where physiology- and performance-based adaptation were examined separately) is perhaps unreasonably dualistic. . Furthermore, such multimodal information should be combined with information about the participants : for example, their genders and personalities as well as the relationship between them. These individual characteristics have the potential to enable more personalized adaptation of multi-user scenarios and consequently result in a more positive user experience, as seen in studies of single-user scenarios (Nagle, Wolf, & Riener, 2016; F. Zhou, Qu, Helander, & Jiao, 2011). In our subjective opinion, such multimodal information fusion is more worthwhile than attempting to obtain incremental improvements through, e.g., slightly different classifiers.
Second, while we have demonstrated the technical feasibility of dynamic closed-loop competitive scenario adaptation based on both participants’ physiological responses, we did not demonstrate that the inclusion of physiological measurements is worthwhile – that it results in a better user experience than simpler performance-based adaptation. We did demonstrate differences in classification accuracy between physiology and performance data in some classification scenarios (e.g., two-class enjoyment classification: 77.3% with physiology vs. 62.8% with performance), but results in other scenarios were less promising; furthermore, since very few performance features were included, it is possible that a more fine-grained performance analysis would have yielded better classification results. Furthermore, studies of single-user affective computing have previously emphasized that a higher open-loop classification accuracy does not necessarily result in a better closed-loop user experience (Fairclough et al., 2015; McCrea et al., 2017). The effect on user experience had originally been intended as a greater focus of the current study before data collection was interrupted by COVID-19; in future larger studies, it could be examined by, for example, conducting a correlation analysis between closed-loop classification accuracy and self-reported user experience (Fairclough et al., 2015) or by systematically inducing different classification accuracies via a ‘Wizard of Oz’ approach (McCrea et al., 2017). Such follow-up studies may also illuminate the effects of user biases, which were qualitatively observed in the current study (section 4.3) and have been noted but not extensively studied in single-user affective computing (Fairclough et al., 2015).
Related to the previous point, future studies should examine not only how well a machine can adapt itself to the human users, but also how the users react to this adaptation. In the current study, we qualitatively observed that participants in the closed-loop scenario would attempt to manipulate the closed-loop system (section 4.3), and this has also been emphasized as an underexplored aspect of single-user affective computing (Fairclough & Lotte, 2020). Thus, future research on the effects of physiology-based adaptation of multi-user scenarios should also attempt to identify users’ reactions to the adaptation. This could be done qualitatively by, e.g., having the researcher make subjective observations and divide users into groups (e.g., those attempting to influence the system), or quantitatively by, e.g., measuring behavioral and physiological reactions to specific adaptation events.
While the above steps may seem daunting, we do subjectively believe that, in the long term, multimodal integration of physiological linkage with other synchronization measurements could enable more effective adaptation of both competitive and cooperative scenarios, thus transforming the field of user-tailored human-machine interaction by augmenting the ability of people to work together in many multi-user settings – for example, education, manufacturing, and physical exercise.
5. Conclusion
To our knowledge, this paper presents the first time that two participants’ physiological responses (including physiological linkage) were used to classify the participants’ cognitive and affective states in a competitive scenario and dynamically adapt the difficulty of that scenario based on the classified states. In the open-loop part of the paper, different combinations of physiological and performance data were used to classify four self-reported variables related to cognitive and affective states (enjoyment, valence, arousal, and perceived difficulty) into either two or three classes. The highest obtained classification accuracies were 84% for two-class classification and 60.5% for three-class classification, which is comparable to those seen in single-user affective computing. Furthermore, adding physiological linkage features generally resulted in a slight increased to classification accuracy compared to using only individual physiological features, with the largest difference observed for two-class classification of enjoyment (77.3% with individual + linkage features vs. 67.4% with only individual features).
In the closed-loop part of the paper, we then demonstrated the technical feasibility of dynamically adapting the difficulty of a competitive scenario based on real-time affect classification of both participants’ cognitive and affective states from physiological measurements. However, due to a small sample size and relatively simple performance metrics, we were unable to show whether adaptation based on physiological measurements results in practical benefits over simpler performance-based adaptation. Thus, future studies should focus on both combining physiological measurements with other types of data (e.g., gesture synchronization) and evaluating the effects of closed-loop adaptation on user experience in both competitive and cooperative scenarios. If these follow-up steps are successfully completed, unobtrusive physiology-based recognition of users’ cognitive and affective states has the long-term potential to improve competition and cooperation in many multi-user settings.
Highlights.
In multi-user scenarios, difficulty generally adapted based on performance
Could alternatively be adapted based on physiological measurements of both users
Automated classification of human psychological states in competitive scenario
Based on individual physiological responses and physiological linkage
Task difficulty then dynamically adapted based on classified human states
Acknowledgment
Research supported by the National Science Foundation under grants no. 1717705 and 2007908 as well as by the National Institute of General Medical Sciences of the National Institutes of Health under grant no. 2P20GM103432.
Footnotes
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Ahonen L, Cowley B, Torniainen J, Ukkonen A, Vihavainen A, & Puolamäki K (2016). Cognitive collaboration found in cardiac physiology: Study in classroom environment. PLoS ONE, 11(7), e0159178. 10.1371/journal.pone.0159178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alessio SM (2016). Digital Signal Processing and Spectral Analysis for Scientists. 10.1007/978-3-319-25468-5 [DOI]
- Baldwin A, Johnson D, Wyeth P, & Sweetser P (2013). A framework of dynamic difficulty adjustment in competitive multiplayer video games. 2013 IEEE International Games Innovation Conference (IGIC), 16–19. 10.1109/IGIC.2013.6659150 [DOI] [Google Scholar]
- Baur K, Schättin A, de Bruin ED, Riener R, Duarte JE, & Wolf P (2018). Trends in robot-assisted and virtual reality-assisted neuromuscular therapy: a systematic review of health-related multiplayer games. Journal of Neuroengineering and Rehabilitation, 15(107). 10.1186/s12984-018-0449-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bian D, Wade J, Swanson A, Weitlauf A, Warren Z, & Sarkar N (2019). Design of a physiology-based adaptive virtual reality driving platform for individuals with ASD. ACM Transactions on Accessible Computing, 12(1), 2. [Google Scholar]
- Boiten FA, Frijda NH, & Wientjes CJE (1994). Emotions and respiratory patterns: review and critical analysis. International Journal of Psychophysiology, 17, 103–128. 10.1016/0167-8760(94)90027-2 [DOI] [PubMed] [Google Scholar]
- Boucsein W (2012). Electrodermal Activity (2nd ed.). Springer. [Google Scholar]
- Chanel G, Kivikangas JM, & Ravaja N (2012). Physiological compliance for social gaming analysis: Cooperative versus competitive play. Interacting with Computers, 24(4), 306–316. 10.1016/j.intcom.2012.04.012 [DOI] [Google Scholar]
- Chanel G, Lalanne D, Lavoué E, Lund K, Molinari G, Ringeval F, & Weinberger A (2016). Grand Challenge Problem 2: Adaptive Awareness for Social Regulation of Emotions in Online Collaborative Learning Environments. In Grand Challenge Problems in Technology-Enhanced Learning II: MOOCs and Beyond. 10.1007/978-3-319-12562-6_3 [DOI] [Google Scholar]
- Chênes C, Chanel G, Soleymani M, & Pun T (2013). Highlight detection in movie scenes through inter-users, physiological linkage. In Social Media Retrieval (pp. 217–237). 10.1007/978-1-4471-4555-4_10 [DOI] [Google Scholar]
- Chih-Yueh C, Shao-Pin LU, & Zhi-Hong C (2013). Eevenly matched competitive strategies: dynamic difficulty adaptation in a game-based learning system. Research & Practice in Technology Enhanced Learning, 8(2), 225–243. [Google Scholar]
- Darzi A, Goršič M, & Novak D (2017). Difficulty adaptation in a competitive arm rehabilitation game using real-time control of arm electromyogram and respiration. Proceedings of the 2017 IEEE International Conference on Rehabilitation Robotics, 857–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Darzi A, & Novak D (2019). Using physiological linkage for patient state assessment in a competitive rehabilitation game. Proceedings of the 16th IEEE/RAS-EMBS International Conference on Rehabilitation Robotics (ICORR 2019). [DOI] [PubMed] [Google Scholar]
- Darzi A, Wondra T, McCrea SM, & Novak D (2019). Classification of multiple psychological dimensions of computer game players using physiology, performance and personality characteristics. Frontiers in Neuroscience, 13, 1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delaherche E, Chetouani M, Mahdhaoui A, Saint-Georges C, Viaux S, & Cohen D (2012). Interpersonal synchrony: A survey of evaluation methods across disciplines. IEEE Transactions on Affective Computing, Vol. 3, pp. 349–365. 10.1109/T-AFFC.2012.12 [DOI] [Google Scholar]
- Dideriksen JL, Negro F, Falla D, Kristensen SR, Mrachacz-Kersting N, & Farina D (2018). Coherence of the surface EMG and common synaptic input to motor neurons. Frontiers in Human Neuroscience, 12, 207. 10.3389/fnhum.2018.00207 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ekman I, Chanel G, Järvelä S, Kivikangas JM, Salminen M, & Ravaja N (2012). Social interaction in games. Simulation & Gaming, 43(3), 321–338. 10.1177/1046878111422121 [DOI] [Google Scholar]
- Esakia A, McCrickard S, Harden S, Horning M, & Ramalingam NP (2020). Using smartwatches to facilitate a group dynamics-based statewide physical activity intervention. International Journal of Human-Computer Studies, 142, 102501. 10.1016/j.ijhcs.2020.102501 [DOI] [Google Scholar]
- Fairclough SH, Karran AJ, & Gilleade K (2015). Classification accuracy from the perspective of the user: real-time interaction with physiological computing. Proceedings of the 33rd Annual Conference on Human Factors in Computing Systems (CHI ‘15), 3029–3038. [Google Scholar]
- Fairclough SH, & Lotte F (2020). Grand challenges in neurotechnology and system neuroergonomics. Frontiers in Neuroergonomics, 1, 602504. 10.3389/fnrgo.2020.602504 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J, Wade JW, Key AP, Warren Z, & Sarkar N (2018). EEG-based affect and workload recognition in a virtual driving environment for ASD intervention. IEEE Transactions on Biomedical Engineering, 65(1), 43–51. 10.1109/TBME.2017.2693157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goršič M, Cikajlo I, & Novak D (2017). Competitive and cooperative arm rehabilitation games played by a patient and unimpaired person: Effects on motivation and exercise intensity. Journal of NeuroEngineering and Rehabilitation, 14, 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goršič M, Darzi A, & Novak D (2017). Comparison of two difficulty adaptation strategies for competitive arm rehabilitation exercises. Proceedings of the 2017 IEEE International Conference on Rehabilitation Robotics, 640–645. London, UK. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grechuta K, Rubio B, Duff A, Oller ED, Pulvermüller F, & Verschure P (2016). Intensive language-action therapy in virtual reality for a rehabilitation gaming system. Journal of Pain Management, 9(3), 243–254. [Google Scholar]
- Haataja E, Malmberg J, & Järvelä S (2018). Monitoring in collaborative learning: Co-occurrence of observed behavior and physiological synchrony explored. Computers in Human Behavior, 87, 337–347. [Google Scholar]
- Hu Y, Pan Y, Shi X, Cai Q, Li X, & Cheng X (2018). Inter-brain synchrony and cooperation context in interactive decision making. Biological Psychology, 133, 54–62. 10.1016/j.biopsycho.2017.12.005 [DOI] [PubMed] [Google Scholar]
- Järvelä S, Kivikangas JM, Kätsyri J, & Ravaja N (2014). Physiological linkage of dyadic gaming experience. Simulation & Gaming, 45(1), 24–40. 10.1177/1046878113513080 [DOI] [Google Scholar]
- Keough & Quinn. (1995). Multiple regression and correlation. In Design and Analysis for Biologists (Vol. 22, pp. 1315–1316). 10.1037/0022-3514.90.4.644 [DOI] [Google Scholar]
- Kreibig SD (2010). Autonomic nervous system activity in emotion: a review. Biological Psychology, 84(3), 394–421. 10.1016/j.biopsycho.2010.03.010 [DOI] [PubMed] [Google Scholar]
- Liu C, Agrawal P, Sarkar N, & Chen S (2009). Dynamic difficulty adjustment in computer games through real-time anxiety-based affective feedback. International Journal of Human-Computer Interaction, 25(6), 506–529. 10.1080/10447310902963944 [DOI] [Google Scholar]
- Liu C, Conn K, Sarkar N, & Stone W (2008). Online affect detection and robot behavior adaptation for intervention of children with autism. IEEE Transactions on Robotics, 24(4), 883–896. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4598899 [Google Scholar]
- Liu T, Saito G, Lin C, & Saito H (2017). Inter-brain network underlying turn-based cooperation and competition: A hyperscanning study using near-infrared spectroscopy. Scientific Reports, 7, 8684. 10.1038/s41598-017-09226-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandryk RL, & Atkins MS (2007). A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies. International Journal of Human-Computer Studies, 65(4), 329–347. 10.1016/j.ijhcs.2006.11.011 [DOI] [Google Scholar]
- Markland D, & Hardy L (1997). On the factorial and construct validity of the Intrinsic Motivation Inventory: Conceptual and operational concerns. Research Quarterly for Exercise and Sport, 68(1), 20–32. 10.1080/02701367.1997.10608863 [DOI] [PubMed] [Google Scholar]
- McCrea SM, Geršak G, & Novak D (2017). Absolute and relative user perception of classification accuracy in an affective videogame. Interacting with Computers, 29(2), 271–286. [Google Scholar]
- Muszynski M, Kostoulas T, Lombardo P, Pun T, & Chanel G (2018). Aesthetic highlight detection in movies based on synchronization of spectators’ reactions. ACM Transactions on Multimedia Computing, Communications and Applications, 14(3), 1–23. 10.1145/3175497 [DOI] [Google Scholar]
- Nagle A, Wolf P, & Riener R (2016). Toward a system of customized video game mechanics based on player personality: relating the Big Five personality traits with difficulty adaptation in a first-person shooter game. Entertainment Computing, 13, 10–24. [Google Scholar]
- Nakamura J, & Csikszentmihalyi M (2014). The concept of flow. In Flow and the Foundations of Positive Psychology: The Collected Works of Mihaly Csikszentmihalyi (pp. 239–263). 10.1007/978-94-017-9088-8_16 [DOI] [Google Scholar]
- Novak D, Mihelj M, & Munih M (2012). A survey of methods for data fusion and system adaptation using autonomic nervous system responses in physiological computing. Interacting with Computers, 24, 154–172. 10.1016/j.intcom.2012.04.003 [DOI] [Google Scholar]
- Pérez A, Carreiras M, & Duñabeitia JA (2017). Brain-To-brain entrainment: EEG interbrain synchronization while speaking and listening. Scientific Reports, 7, 4190. 10.1038/s41598-017-04464-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Picard RW, Vyzas E, & Healey J (2001). Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1175–1191. [Google Scholar]
- Reidsma D, Nijholt A, Tschacher W, & Ramseyer F (2010). Measuring multimodal synchrony for human-computer interaction. Proceedings - 2010 International Conference on Cyberworlds, CW 2010, 67–71. 10.1109/CW.2010.21 [DOI] [Google Scholar]
- Spapé MM, Kivikangas JM, Järvelä S, Kosunen I, Jacucci G, & Ravaja N (2013). Keep your opponents close: Social context affects EEG and fEMG linkage in a turn-based computer game. PLoS ONE, 8(11), e78795. 10.1371/journal.pone.0078795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sung H-Y, Hwang G-J, & Yen Y-F (2015). Development of a contextual decision-making game for improving students’ learning performance in a health education course. Computers & Education, 82, 179–190. 10.1016/j.compedu.2014.11.012 [DOI] [Google Scholar]
- Szymanski C, Pesquita A, Brennan AA, Perdikis D, Enns JT, Brick TR, … Lindenberger U (2017). Teams on the same wavelength perform better: Inter-brain phase synchronization constitutes a neural substrate for social facilitation. NeuroImage, 152, 425–436. 10.1016/j.neuroimage.2017.03.013 [DOI] [PubMed] [Google Scholar]
- Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. (1996). Heart rate variability: Standards of measurement, physiological interpretation, and clinical use. European Heart Journal, 17(3), 354–381. [PubMed] [Google Scholar]
- Tschacher W, Rees GM, & Ramseyer F (2014). Nonverbal synchrony and affect in dyadic interactions. Frontiers in Psychology, 5, 1323. 10.3389/fpsyg.2014.01323 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Diest I, Thayer JF, Vandeputte B, Van de Woestijne KP, & Van den Bergh O (2006). Anxiety and respiratory variability. Physiology and Behavior, 89, 189–195. 10.1016/j.physbeh.2006.05.041 [DOI] [PubMed] [Google Scholar]
- Varni G, Avril M, Usta A, & Chetouani M (2015). SyncPy - A unified open-source analytic library for synchrony. INTERPERSONAL 2015 - Proceedings of the 1st ACM Workshop on Modeling INTERPERsonal SynchrONy And InfLuence, Co-Located with ICMI 2015. 10.1145/2823513.2823520 [DOI] [Google Scholar]
- Vicencio-Moreira R, Mandryk RL, & Gutwin C (2014). Balancing multiplayer first-person shooter games using aiming assistance. 2014 IEEE Games Media Entertainment, 1–8. 10.1109/GEM.2014.7048086 [DOI] [Google Scholar]
- Watson AR, Ntuen C, & Park E (1996). Effects of task difficulty on pilot workload. Computers & Industrial Engineering, 31(1–2), 487–490. 10.1016/0360-8352(96)00181-7 [DOI] [Google Scholar]
- Wilson GF, & Russell CA (2007). Performance enhancement in an uninhabited air vehicle task using psychophysiologically determined adaptive aiding. Human Factors, 49(6), 1005–1018. 10.1518/001872007X249875 [DOI] [PubMed] [Google Scholar]
- Xu G, Gao X, Pan L, Chen S, Wang Q, Zhu B, & Li J (2018). Anxiety detection and training task adaptation in robot-assisted active stroke rehabilitation. International Journal of Advanced Robotic Systems, 15(6). 10.1177/1729881418806433 [DOI] [Google Scholar]
- Zhou F, Qu X, Helander MG, & Jiao J (2011). Affect prediction from physiological measures via visual stimuli. International Journal of Human-Computer Studies, 69(12), 801–819. 10.1016/j.ijhcs.2011.07.005 [DOI] [Google Scholar]
- Zhou N, Kisselburgh L, Chandrasegaran S, Badam SK, Elmqvist N, & Ramani K (2020). Using social interaction trace data and context to predict collaboration quality and creative fluency in collaborative design learning environments. International Journal of Human-Computer Studies. 10.1016/j.ijhcs.2019.102378 [DOI] [Google Scholar]
