Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 9.
Published in final edited form as: IEEE Trans Affect Comput. 2025 Apr 21;16(3):2428–2439. doi: 10.1109/taffc.2025.3562787

Wearable Sensor-based Multimodal Physiological Responses of Socially Anxious Individuals in Social Contexts on Zoom

Emma R Toner 1,*, Mark Rucker 2,*, Zhiyuan Wang 3, Maria A Larrazabal 4, Lihua Cai 5, Debajyoti Datta 6, Haroon Lone 7, Mehdi Boukhechba 8, Bethany A Teachman 9, Laura E Barnes 10
PMCID: PMC12416631  NIHMSID: NIHMS2083788  PMID: 40927232

Abstract

Correctly identifying an individual’s social context from passively worn sensors holds promise for delivering just-in-time adaptive interventions (JITAIs) to treat social anxiety. In this study, we present results using passively collected data from a within-subjects experiment that assessed physiological responses across different social contexts (i.e., alone vs. with others), social phases (i.e., pre- and post-interaction vs. during an interaction), social interaction sizes (i.e., dyadic vs. group interactions), and levels of social threat (i.e., implicit vs. explicit social evaluation). Participants in the study (N = 46) reported moderate to severe social anxiety symptoms as assessed by the Social Interaction Anxiety Scale (≥34 out of 80). Univariate paired difference tests, multivariate random forest models, and cluster analyses were used to explore physiological response patterns across different social and non-social contexts. Our results suggest that social context is more reliably distinguishable than social phase, group size, or level of social threat, and that there is considerable variability in physiological response patterns even among distinguishable contexts. Implications for real-world context detection and future deployment of JITAIs are discussed.

Keywords: Anxiety disorders, context awareness, wearable sensors, physiological responses

I. Introduction

Social anxiety disorder (SAD) is among the most common mental health disorders in the United States: an estimated 13% of adults will be diagnosed with SAD during their lifetime [1]. Characterized by intense anxiety about social situations, fear of negative evaluation, and avoidance of social situations [2], SAD is associated with substantial functional impairment across work and social domains [3]. Despite the burden conferred by SAD, most individuals with this diagnosis do not seek or cannot access treatment [4] and many wait decades before receiving care [5]. If effective psychological treatments exist for SAD [6], why do so many people fail to receive treatment? One particularly compelling explanation is that the dominant mode of psychological treatment delivery (i.e., one-on-one therapy delivered in-person by a trained mental health professional) is difficult to access and cannot meet the overwhelming need [7]. For socially anxious individuals, there are also frequently logistical, financial, and emotional barriers to seeking treatment (e.g., shame; anxiety about talking to a provider; [8]).

Digital mental health interventions (DMHIs) offer a potential method by which to either supplement the standard, one-on-one treatment delivery model or provide services to those who would otherwise have difficulty accessing care. For instance, DMHIs can be used by providers to monitor patient symptoms or assign practice exercises between sessions to facilitate progress, but they also have the potential to serve as standalone interventions for those requiring less intensive intervention or who are unable to attend face-to-face therapy sessions due to cost, time constraints, or stigma. Indeed, several recent studies have found that digital mental health interventions for common mental health problems like anxiety and depression perform significantly better than control conditions and are often comparably effective to face-to-face interventions [9]–[12]. In particular, just-in-time adaptive interventions (JITAIs; [13]), which aim to personalize and optimize DMHIs by delivering context-relevant interventions at times when they are most needed, may hold particular promise for closing the social anxiety treatment gap. To most effectively deploy JITAIs, it is necessary to determine when people are most in need of an intervention and which intervention(s) would be most useful and appropriate given their current context (e.g., their location). One way to gather this information is to ask participants to report on their emotional experiences and situational contexts in daily life (e.g., via ecological momentary assessment), but this can be a burden [14] and requires participants to respond to prompts frequently and accurately. Additionally, it may be challenging to collect self-report data when participants are actively engaged in contexts of interest (e.g., social interaction) given that responding to a prompt would mean interrupting whatever they are doing. A potential alternative to directly asking participants about their experience is to passively gather data (e.g., physiological responding; GPS location) via unobtrusive devices and sensors (e.g., wristbands; cell phones) that can help identify one’s current emotional state and/or situational context.

Identifying social contexts can critically inform social anxiety interventions given that socially anxious individuals are more likely to experience anxiety in certain situations compared to others (e.g., social vs. non-social or evaluative vs. non-evaluative situations; [15]). Additionally, some intervention strategies are more or less feasible to implement depending on the context (e.g., if a person is actively engaged in a social interaction, they will likely be unable to write down and evaluate all of their thoughts in detail). Understanding where someone is or what they are doing can help determine when and how to intervene most appropriately and effectively.

Advances in mobile technology and passive sensing have made it possible to detect social anxiety and social context in experimental studies and in daily life from biobehavioral data. For instance, physiological data from wearable devices (e.g., skin temperature; skin conductance) has been used to accurately detect social anxiety severity in the context of a speech task [16], [17], and other passively sensed data from mobile phones (e.g., GPS; call/text data) has been leveraged to detect symptoms of social anxiety in daily life [18], [19]. In addition, physiological biomarkers such as heart rate, skin conductance, skin temperature, vocal pitch, and hand movements can be leveraged to accurately differentiate between baseline, anticipatory, and concurrent phases of social anxiety [16], [20], [21] and between levels of evaluative threat during a speech task [22]. In studies of daily life, audio data have been used to identify when participants are near human speech (e.g., conversations; [23], [24]) and location data can help detect when people are interacting in a group and differentiate between group sizes [25].

Taken together, these findings suggest that passively-sensed data are an important tool that can be used to determine when an individual may benefit from a targeted intervention based on their context. However, many open questions remain. Thus far, most efforts to detect social anxiety or social contexts have emphasized using many biobehavioral features together, which can maximize prediction but reduce the scalability of these efforts (e.g., if participants are required to wear numerous devices to collect all of the data necessary for accurate prediction), and make it hard to determine which features are most needed (and thus should be prioritized when planning data collection) and which may be redundant.

The present investigation leverages data from an experimental study conducted virtually via Zoom in which individuals high in trait social anxiety symptoms completed a non-social task and windowed social interactions involving different numbers of interaction partners and levels of experimentally-manipulated social-evaluative threat. The primary aim of this work is to describe physiological patterns associated with contexts relevant to understanding social anxiety (e.g., social vs. non-social contexts) with the goal of identifying the physiological features that may be important for social context detection. Physiological indicators of anxiety and social context were passively assessed via an Empatica E4 wristband and 13 features relevant to anxiety and social context detection were selected based on prior literature (see Table IV). This exploratory study aims to advance our understanding of context-specific physiological responding of socially anxious individuals in three primary ways:

  1. Describe the physiological response patterns associated with different social and non-social contexts and phases of social interactions.

  2. Determine if and how these physiological response patterns significantly differ across social contexts and phases of social interactions.

  3. Explore whether socially anxious individuals have common clusters of physiological responses across social contexts or if responses are relatively heterogeneous.

Table IV:

A full description of the thirteen features used in our study.

Feature Sensor Units Citation Description

NN Mean PPG Seconds [17], [35], [41] Mean of NN intervals
NN SD PPG Seconds [17], [41] Standard deviation of NN intervals
NN RMSSD PPG Seconds [17], [22], [37], [41] Root mean square of successive diffs
NN Prc20 PPG Seconds [47] 20th percentile of NN intervals
NN Prc80 PPG Seconds [48] 80th percentile of NN intervals
NN LFn PPG % Power [17], [37], [38], [41] Power in 0.04–0.15 Hz / total power
NN HFn PPG % Power [17], [37], [38], [41] Power in 0.15–0.40 Hz / total power
ST Mean TMP °Celsius [41] Mean of temperature samples
ACC Mean ACC G-force [36] Mean of [x,y,z]
ACC Sd ACC G-force [36] Standard deviation of [x,y,z]
SCL Mean EDA μSiemens [22], [40] Mean of tonic signal
SCR Mean EDA μSiemens [22], [40] Mean of phasic signal
SCR Peaksn EDA Peaks/Time [35], [39] Count of SCR peaks / total time.

II. Method

A. Participants

The participants were undergraduate students at a large public university in the southeast of the United States (see Table I). N = 55 participants consented to participate for psychology course credit. All participants completed the Social Interaction Anxiety Scale (SIAS; [26]) and scored either 10 or lower (indicating low symptoms of social anxiety) or 34 or higher (indicating moderate to severe symptoms of social anxiety). Due to logistical complexities of recruiting during COVID-19, only nine recruited participants had low SIAS scores. Given the small size of this sample, these nine participants were excluded from analyses, yielding a final sample of N = 46 participants with moderate to severe social anxiety symptoms. For this reason, the main contribution of this paper is to describe physiological responses in different social contexts among socially anxious individuals. Note also that although this is not a clinical sample, it can be considered a clinical analog sample as SIAS is correlated with diagnostic status and SIAS scores convey helpful information about the presence and severity of social anxiety symptoms. SIAS has been used extensively in the literature on social anxiety to select high socially anxious participants or assess the severity of social anxiety symptoms (e.g., [27]–[32]). Finally, the sample was not partitioned into SIAS symptom severity subgroups during analysis due to concerns about small sample sizes.

Table I:

Demographics data for participants.

Variable High SIAS (N = 46)

Age
μ 19.23
σ 1.91
Sex Assigned at Birth
 Female* 35
 Male 11
Race
 African American 4
 American Indian/Alaska Native 1
 Asian 8
 Middle Eastern 2
 Native Hawaiian/Pacific Islander 0
 White 38
English First Language
 Yes 44
 No 2
*

When self-reporting their race, participants were allowed to select all that applied. In total, 7 participants indicated more than one race.

A SIAS cutoff score of 34 was selected based on the mean score of a sample diagnosed with social anxiety disorder (M = 34.6; SD = 16.4; [26], [33]). The average pre-enrollment SIAS score was 45.57 (SD = 8.87), with scores ranging from 34–69 (out of a possible total of 80). Additionally, participants were excluded if they endorsed routine use of certain medications (bendodiazepines, stimulants, antipsychotics, mood stabilizers, beta-blockers, monoamine oxidase inhibitors, medications that cross the blood-brain barrier, or pain medications) or a diagnosis of cardiovascular disease, all of which can influence psychophysiological reactivity. For this reason, participants were also asked to refrain from using the aforementioned medications, along with caffeine, nicotine, and marijuana, and avoid engaging in vigorous exercise for at least two hours prior to the study session, and to avoid alcohol and illegal drugs for at least 24 hours prior to the study session. To assess routine use of exclusionary medications, participants completed a screening survey to determine study eligibility. Participants who were determined to be eligible and scheduled for a study session completed an additional brief screening survey on the day of their study participation immediately prior to initiating study procedures. Two participants who endorsed occasional use of stimulants and medications that cross the blood-brain barrier were permitted to participate after reporting that they had refrained from taking their medication for at least two hours prior to the start of study procedures.

B. Procedure

All study procedures were approved by a university Institutional Review Board (UVA-SBS IRB #3004). Participants attended a three-hour study session, which was conducted virtually via Zoom due to the COVID-19 pandemic, in groups of four to six. Sessions were led by trained undergraduate and/or graduate students in psychology. To standardize study procedures, each session followed a detailed protocol (including specific text to be read to participants) that was written by graduate students in clinical psychology and edited by a clinical psychology faculty member. Undergraduate students observed and practiced the protocol multiple times, and were only allowed to lead study sessions after first successfully leading a practice session that was supervised by graduate student researchers. Further, when undergraduate students led study sessions, a graduate student supervisor was present on the Zoom call to assist in case of any unanticipated events.

During the study session, participants completed up to four social tasks and one non-social task. The first task completed was always a non-social baseline in which participants watched one of three neutral videos of colorful moving shapes for two minutes by themselves (valence for the the neutral videos was established using a prior mTurk sample; [34]). Subsequently, participants completed two dyadic conversations (lasting four minutes each) and two group conversations (lasting six minutes each) in random order. In each conversation, participants were provided with one of four pre-selected conversation topics (e.g., “If you won a million dollars, what would you do with the money and why?"). We selected these general conversation questions with the goal of standardizing the experience of engaging in a conversation when presented with a novel topic. The questions used were selected based on our pilot work and experience that most people can readily come up with initial responses to these questions but have more difficulty sustaining the conversation over time, which can be particularly anxiety-provoking for people high in trait social anxiety. For each of the two dyadic and two group conversations, one was designated as being explicitly socially evaluative, such that participants were informed prior to the conversation that their conversation partner(s) would rate them on the basis of their likeability and conversation skills after talking. To reinforce the researcher’s instruction, participants rated the extent to which they believed their conversation partner(s) was likeable and a good communicator on a scale from 0 (not at all) to 4 (very much) immediately following each of the explicitly evaluative conversations. The other two conversations did not involve this instruction and were thus implicitly socially evaluative, deemed as such because socially anxious individuals are likely to perceive any unfamiliar social interaction as involving potential social-evaluative threat, even if they are not explicitly made aware of this threat [15].

All five tasks were divided into three separate temporal phases: (1) the anticipatory or pre-event phase, in which participants were informed about the upcoming task and instructed to vividly imagine doing the task for two minutes (e.g., imagine themselves talking to a stranger who would then be evaluating them); (2) the concurrent phase, during which participants completed the task; and (3) the post-event phase, in which participants were instructed to spend two minutes vividly reflecting on the task they just completed and how they felt the task went. The order of tasks was randomized for each experimental group to reduce bias due to potential order effects. Participants’ physiological responding was continuously monitored throughout the three-hour study session via an Empatica E4 wristband (https://www.empatica.com/research/e4/) worn on the left wrist. For each task, participants also completed a brief self-report survey that asked about affective and cognitive responses tied to social anxiety. Variations of the survey were completed four times per task: (1) prior to learning about the upcoming task, (2) after the anticipatory phase, (3) after the concurrent phase, and (4) after the postevent phase. Our analyses focus on the passively-collected physiological data given our interest in passive sensing as a less obtrusive and burdensome context detection method in daily life compared to self-report. See Fig. 1 and 2 for a visual representation of the study procedures.

Fig. 1:

Fig. 1:

The order of events for an experimental group. The order (y-axis) was randomized for each group – except for Alone, which was always first. The time before tasks began was due to setup needs (e.g., demonstrating how to wear wristbands).

Fig. 2:

Fig. 2:

Flowchart of study procedures. The alone experience was always first. Then, dyadic and group conversations on pre-assigned topics were presented in random order. Physiological data was collected throughout all Pre, During, and Post phases.

C. Sensors

For the present study, we collected data from four sensors on the Empatica E4: photoplethysmography (PPG), accelerometer (ACC), electrodermal activity (EDA), and skin temperature (TMP). These four sensors were a subset of the entire study dataset, which included data from additional sensors and devices. Detailed descriptions of the four sensors used in this study can be found in Table II.

Table II:

The sensors used in the study.

Name Rate Description Location Resolution

PPG 64Hz Photoplethysmography Outer Wrist 0.9 nW
ACC 32Hz 3-Axis Accelerometer Outer Wrist 0.016 g
EDA 4Hz Electrodermal Activity Inner Wrist ~900 pS
TMP 4Hz Skin Temperature Outer Wrist 0.02 °C

Sensors were selected due to their alignment with existing work on passive anxiety and/or stress detection [22], [35]–[41]. While predicting stress and anxiety was not the goal of this work, it was assumed that individuals high in trait social anxiety severity (as indicated by the SIAS) would experience elevated anxiety around social interactions. Additionally, given our goal of using scalable methods for detecting real-world social interactions, these particular sensors were selected due to their being available on consumer-grade wearable devices or having the potential to be integrated into such devices.

In addition, three of the sensors (PPG, EDA, and TMP) have biological connections to anxiety responses. PPG sensors measure heart rate. Heart rate, controlled by the sympathetic (fight-or-flight) and parasympathetic (relaxation) nervous systems [42], provides insight into a person’s balance between relaxation and fight-or-flight. EDA sensors measure skin conductance. Skin conductance is primarily controlled by the sympathetic nervous system [43] and gives insight into fight-or-flight engagement. TMP sensors measure skin temperature which has been shown to vary with stress [44]. We also chose to examine features from the ACC sensor given that physical movement may be relevant for detecting when someone is in a social context (e.g., gesturing while speaking; movement between different locations in which one may encounter other people). In line with this, previous work has found that activity level, as captured by different accelerometer features, is predictive of social anxiety symptoms, depression symptoms, and positive and negative affect [36]. Additionally, ACC data is collected by many commercial devices, thus enhancing scalability.

D. Sensor Cleaning

Here we describe the steps taken to prepare the raw sensor streams for feature generation.

a). Selection:

We excluded all sensor readings collected outside of the experimental phases and experiences. This left us with nine to fifteen (depending on how many tasks a participant completed) distinct periods of sensor readings for each participant. Distinct periods were separated by several minute gaps in-between the intervals (see Fig. 1).

b). PPG:

We applied a 3rd order Butterworth filter with a low cut of 0.5Hz and a high cut of 8Hz to the raw PPG readings. This removed artifacts occurring at frequencies outside of normal heart rates. We then used the peak detection algorithm in [46] to extract systolic peaks and calculate normal-to-normal (NN) intervals. Both the filter and the peak detection algorithm were applied using Neurokit2 [47]. We then evaluated several methods to remove noise from the raw NN intervals. Using nested leave-one-participant-out cross-validation (NLOPOCV) we trained random forest (RF) models to predict each of the tasks described in Section III from NN features (c.f., Table IV). We considered an NN interval cleaning method superior if its downstream RF models had higher macro-averaged accuracy (macro-accuracy) averaged over all prediction tasks. We did not find any one method to be universally “best” (see Fig. 3).

Fig. 3:

Fig. 3:

There is no “best” filter for NN intervals with respect to RF model performance on the five tasks in Section III. This analysis included NN features only and followed an NLOPOCV procedure to evaluate model performance. The asterisk indicates using all data (2–6 minutes). The automatic filter is described in [45], and the rules filter dropped NN intervals outside of 0.5 to 1.5 seconds.

All filtered methods outperformed the non-filtered method in shorter time windows. This is explained by the fact that features derived from shorter time windows are inherently calculated from smaller samples (sampling rates being held equal). Small sample statistics are more sensitive to noise, increasing the value of filtering techniques. As the windows increase in duration, non-filtered features begin to perform similarly to the filtered features, as one might expect assuming that our noise is independent with zero mean.

Perhaps most surprising is the median filter performance plateau for windows greater than 120 seconds. A plausible explanation for this is autocorrelation within the noise, given that the median filter attempts to replace readings using temporally adjacent data that appear less extreme. The median filter is the only filter considered here that replaces noisy readings rather than removing them.

For the analysis in Section III and the Appendix, we used the automatic filter with a 40% cutoff rule [45] to remove outlying NN-intervals before calculating features.

c). EDA:

We applied a lowpass 4th order Butterworth filter with a high cut of 3Hz to the raw EDA signal. As was the case with PPG, the cleaning of EDA was applied using Neurokit2 [47].

d). ACC and TMP:

No filtering or cleaning was applied to accelerometer and skin temperature readings. This is appropriate for the features derived from these sensors (i.e., mean and standard deviation) under the assumption that noise on these sensors was independent, additive, and zero-mean.

E. Features

To answer our three questions, we selected 13 features (Table IV) from the literature that have been found to vary with anxiety and/or stress (references are in Table IV). These features were calculated from each participant’s cleaned sensor data for each complete event-phase (i.e., full pre-event, full concurrent-event, and full post-event). While most of the features are simple summary statistics a few merit more in depth explanation for how they were calculated.

To calculate the NN LFn and NN HFn, we applied the Lomb-Scargle method to the NN Intervals. We used evenly spaced basis functions between 0.01 and 0.5Hz [49]. To decompose the EDA signal into its tonic and phasic components, we used the method described in [50].

After calculating the features we were left with a 584 sample dataset where each sample represented a specific participant in a specific event/phase (e.g., participant 3 in the pre-event phase of an implicit threat dyadic conversation). Each sample had our 13 calculated features and three labels: phase (pre, concurrent, and post), size (alone, dyadic, and group), and threat (implicit or explicit).

F. Feature Correlations

The correlation matrix for the 13 features can be seen in Fig. 4. The majority of features have small inter-correlations, which suggests that each feature is providing distinct information. The features exhibiting the highest correlations are all derived from the PPG sensor.

Fig. 4:

Fig. 4:

The correlation matrix for all 13 calculated features. Note that the majority of features have low correlation suggesting features provide distinct information.

G. Feature Conditioning

Before conducting the analysis, we considered whether to center or scale data at the participant level. To center, we subtracted a participant’s feature-wise median. To scale, participant features are divided by the participant’s feature-wise interquartile range. See Table III for the mean RF model macro-accuracy across all tasks described in Section III. Based on the results, we centered but did not scale at the participant level before performing the RF analysis in Section III. It is worth noting that, while scaling is typically not needed for RF models, in this case it is appropriate because it is being applied to each participant separately. Centering at a participant level also has the secondary effect of correcting for potential sensor biases due to environmental factors, such as ambient air temperature.

Table III:

We achieve the best mean macro-accuracy across all tasks examined in Section III when we pre-condition by only centering features at the participant level. The reported performance is realized by RF models trained via NLOPOCV.

Center Participants Scale Participants Mean Macro-Accuracy

61.3% ± 2.3%SE
X 65.7% ± 2.4%SE
X 61.7% ± 2.1%SE
X X 61.3% ± 2.1%SE

III. Results

We answer four questions regarding participants in our study:

  1. How do passively sensed biomarkers differ between being alone vs. in a social situation?

  2. How do passively sensed biomarkers differ between the anticipatory/pre-event, event, and post-event phases of social situations?

  3. How do passively sensed biomarkers differ between dyadic vs. group interactions?

  4. How do passively sensed biomarkers differ between implicitly evaluative social interactions vs. explicitly evaluative social interactions?

To answer these questions we calculated features using the full period of sensor readings for each phase. The length of these periods varied according to the event type by a few minutes. For many of our features, the literature shows that window length differences of a few minutes should not change the expected feature value [51]–[53]. As a robustness check, we also provide an identical analysis in the supplement using features calculated from equal length periods. The feature effect sizes and statistical significance are nearly identical between the two analyses. The RF model performance is slightly better for the full period features presented here.

For the analyses that follow, statistical significance was determined via a Wilcoxon signed-rank test and the Benjamini-Hochberg procedure at the task level, with α = .05. Plotted point estimates for features represent the medians with a 95% bootstrap CI (both of which more naturally align with the non-parametric Wilcoxon signed-rank test). Plots that compare model performance present the average of performance across folds in NLOPOCV. The performance metric, macro-accuracy, is macro-averaged accuracy to control for class imbalance in testing data. In all of our analysis baseline performance for the macro-average was 50%. Error bars in model performance figures represent one standard error. Hyperparameter tuning was applied in the inner cross-validation loop to determine RF complexity pruning as well as the feature selection method (i.e., either mutual information or ANOVA F-values).

A. Social Interaction

To compare our features when participants are and are not in a social interaction (i.e., alone vs. social tasks), we separated the event observations into an alone set, containing only data from the concurrent phase of the alone task, and a social set, containing data from the concurrent phase of all dyadic and group conversations.

Nine features were significantly different in the alone vs. social set (see Fig. 5 (a)). The three largest differences were NN RMSSD, ACC SD, and SCR Peaksn. We then performed a multivariate analysis to evaluate how unique each feature’s information was. Using NLOPOCV we determined that an RF model predicting whether a test example belongs to the alone or social set can achieve peak performance using only two features (see Fig. 5 (b)).

Fig. 5:

Fig. 5:

Fig. 5:

Fig. 5:

(a) Nine features have significant paired differences between the alone and social events. Points are medians with a 95% bootstrap CI. (b) Using NLOPOCV we estimate an RF model achieves near peak performance with two features. (c) We show feature importance for the two feature model via the decrease in macro-accuracy when a feature is permuted.

We used feature permutation to examine the importance of the features in the two feature random forest model Fig. 5 (c)). These features were SCR Peaksn and ACC SD. We can infer from Fig. 5 (b) that additional features do not provide additional information to the model. This is in spite of the fact that several features had significantly different medians between the two sets.

B. Social Timing

To understand how features change with respect to timing, we performed two analyses: (1) compared features from social events to their combined pre and post phases, and (2) compared features from the pre phase to the post phase across all social tasks. Our comparison of the pre- and post-event social phases was motivated by literature suggesting that these phases may be characterized by related but distinct processes. For example, anticipatory (or pre-event) anxiety may resemble in-the-moment anxiety more closely and involve physiological arousal whereas post-event processing may be a primarily cognitive process [15].

1). Social vs. Pre/Post:

There were twelve features with significant paired differences between a social event and its pre/post phases. This analysis excluded all data from the alone experience. The three largest effect sizes were observed in ACC SD, SCR Mean, and NN SD. The full feature analysis can be seen in Fig. 6 (a).

Fig. 6:

Fig. 6:

Fig. 6:

Fig. 6:

Fig. 6:

(a) Twelve features have significant differences between social events and their pre/post phases. Points are medians with a 95% bootstrap CI. (b) Using NLOPOCV we estimate an RF model has peak performance with all features. (c) We show feature importance for the five feature model via the decrease in macro-accuracy when we permute a feature.

As before, we performed a multivariate analysis using RF models. Using NLOPOCV we determined that only three features were needed for peak performance (Fig. 6 (b)). We selected the best five features using the full dataset and then use feature permutation with an RF model to determine feature importance. The order of importance is ACC SD, SCR Peaksn, SCR Mean, SCL Mean, and NN LFn (see Fig. 6 (c)).

2). Pre vs. Post:

We performed a second timing analysis comparing biomarker features between the pre- and postevent phase of social tasks. This gave eight significant paired differences: five NN features, ST Mean, SCR Mean, and SCL Mean. The full result can be seen in Fig. 7 (a). The NLOPOCV multivariate RF analysis was not able to meaningfully remove any features or even perform that well overall (Fig. 7 (b)). This shows that, given our features, it is not feasible to discriminate between pre-event and post-event phases of social situations. Feature importance analysis is provided for the four feature model in Fig. 7 (c).

Fig. 7:

Fig. 7:

Fig. 7:

Fig. 7:

(a) Eight features have significant paired differences between pre and post phases. Points medians with a 95% bootstrap CI. (b) Using NLOPOCV we estimate that an RF model achieves peak performance with five features. (c) We show feature importance for the five feature model via the decrease in macro-accuracy when a feature is permuted.

C. Social Interaction Size

Using dyadic event features and group event features we repeated our analysis to determine whether our passive sensors could distinguish the number of individuals in a social interaction (Fig. 8 (a)). We observed only four significant feature differences: ACC SD, ACC Mean, NN Prc20, and SCR Peaksn. Fitting a random forest model, we achieved a predictive accuracy of 60% with a 7 feature model (Fig. 8 (b)). Feature importance analysis revealed that ACC SD was the most important factor for this prediction task (Fig. 8 (c)).

Fig. 8:

Fig. 8:

Fig. 8:

Fig. 8:

(a) Four features have significant paired differences between dyadic and group events. Points are medians with a 95% bootstrap CI. (b) Using NLOPOCV we estimate an RF model realizes peak performance with seven features. (c) We show feature importance for the five feature model via the decrease in macro-accuracy when a feature is permuted.

D. Social Threat

Examining differences in biomarkers between implicit social threat situations and explicit social threat situations, we saw only one significant feature difference: SCL Mean (Fig. 9 (a)). Similarly, the NLOPOCV trained RF models struggle to perform above the 50% baseline (Fig. 9 (b)). The fact that the models consistently perform worse than random guessing suggests that participants differ from one another in systematic ways (since NLOPOCV excludes all data from the test participant when training). Feature importance for a four feature model is provided in Fig. 9 (c).

Fig. 9:

Fig. 9:

Fig. 9:

Fig. 9:

(a) One feature has significant paired difference between the implicit and explicit evaluation. Points are medians with a 95% bootstrap CI. (b) Using NLOPOCV we estimate that an RF model achieves peak performance with ten features. (c) We show feature importance for the four feature model via the decrease in macro-accuracy when a feature is permuted.

E. Cluster Analysis

Existing work [54], [55] has shown that physiological data can exhibit considerable heterogeneity across individuals. Motivated by this we performed a cluster analysis to evaluate the heterogeneity of our own data. Before clustering we selected down to the most important features for prediction in each task (see Fig. 5 (c), Fig. 6 (c), Fig. 7 (c), Fig. 8 (c), and Fig. 9 (c)). We then applied Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [56], [57] to our data. This algorithm both determines the appropriate number of clusters and filters out outlying data points.

For alone vs. social, HDBSCAN found six clusters with an average purity over 90% (purity is the percentage of points in the cluster belonging to the dominant label). Additionally, 54% of points were marked as outliers and therefore not assigned to any cluster. For social vs. pre/post, HDBSCAN found 22 clusters. The clusters that primarily contain samples from the social events have an average purity of 74% while the clusters primarily containing samples from the pre/post events have an average purity of 85%. For the remaining three tasks the same summary statistics can be found in Table V. A visualization of the alone vs. social clustering can be seen in Fig. 10.

Table V:

Across all tasks we identified either a high number of distinct clusters or outlying data points. This suggests a high- amount of heterogeneity within the data. The one exception to this were features from the alone event. Almost all alone example points either belonged to a single cluster or were considered outliers. In the table below purity is reported as the percentage of the majority label in the cluster.

Group
Clusters
Outliers
Task Label Count E[Size] E[Purity] Count

Alone vs. Social Alone 1 33 84.8% 16
Alone vs. Social Social 5 11.8 98.8% 95
Social vs. Pre/Post Social 7 10 74% 73
Social vs. Pre/Post Pre/Post 15 11.8 85% 129
Pre vs. Post Pre 8 6.8 69.2% 95
Pre vs. Post Post 11 6.6 76.8% 75
Dyadic vs. Group Dyadic 2 5.5 73.3% 61
Dyadic vs. Group Group 3 11.3 58.4% 52
Implicit vs. Explicit Implicit 5 8.2 65.8% 61
Implicit vs. Explicit Explicit 8 7.4 66.8% 52

Fig. 10:

Fig. 10:

Examples taken from social experiences create six distinct clusters while non-social examples largely belong to a single cluster. This two-dimensional plot is a TSNE representation of the true underlying data. This is why some of the clusters may seem more spread out than one might expect.

We did not apply PCA dimensionality reduction before clustering. PCA dimensionality reduction is a common preprocessing step for clustering algorithms that use Euclidean distances (as is the case with HDBSCAN). The motivation for PCA dimensionality reduction is to improve Euclidean distance quality, which is known to degrade in high-dimensional data [58]. In our cases, we clustered on two to five features (depending on the task), leaving little to gain from further dimensionality reduction. We did, however, apply a quantile transformation to the features before clustering to ensure all features had a similar scale.

It is interesting to observe that while we found many clusters, there were still fewer clusters than participants. This suggests that while participant physiological data was heterogeneous it fell short of each participant being entirely unique. This fits with the population thinking promoted in other areas of research [54], [55].

It is also notable that the pre/post phases exhibited the greatest heterogeneity. In these phases we identified 15 distinct clusters of physiological features. Furthermore, 42.16% of pre/post data points were not assigned to any cluster (i.e., were marked as outliers by the HDBSCAN algorithm). Despite this heterogeneity, we were still able to discriminate between social and pre/post events as shown above in Fig. 6 (b).

We also see that the cluster purity, in all tasks, is greater than the reported accuracy of the predictive models. This difference is explained by the large number of outliers in the cluster analysis. That is, in many cases, the number of outliers is greater than the number of data points within the clusters (i.e., Cluster Count∗Cluster E[Size] < Outlier Count in Table V). Even so, it is noteworthy that there are relatively large clusters with high purity across all tasks. For example, the prediction models for implicit and explicit threats were unable to outperform random guessing (i.e., Fig. 9 (b)). However, the cluster analysis on this same data, identified five clusters for implicit threat with an average 65.8% class purity and eight clusters for explicit threat with an average 66.8% class purity (see Table V). This cluster purity was only achieved after marking over half the data from this task as outliers, explaining why the prediction model performed more poorly.

Finally, we also performed a cluster analysis controlling for baseline differences between participants. To do this, we clustered on the feature differences from each participant’s alone video measurements. The results of this analysis are presented in the Appendix. In general, this analysis results in fewer clusters overall with slightly lower cluster purity.

IV. DISCUSSION

The goal of the present study was to describe physiological patterns associated with different social and non-social tasks, determine if and how these patterns differ across contexts, phases, and levels of evaluation, and explore whether socially anxious individuals are characterized by common physiological response patterns across tasks. To explore these questions, we: (1) examined whether passively sensed biomarkers relevant to anxiety and stress detection differed across social vs. non-social tasks, phases of social interactions, social interaction group sizes, and levels of evaluative threat; (2) explored feature importance and prediction accuracy for random forest models predicting social context; and (3) used clustering analyses to explore whether participants were characterized by common patterns in physiological responding across five tasks.

Our results suggest that the passively sensed biomarkers used in this study can reliably distinguish between periods when individuals are engaged in social interaction vs. not (i.e., social interaction vs. non-social task; during social interaction vs. pre/post social interaction). However, these physiological features could not distinguish between anticipatory/pre-event anxiety and post-event processing immediately before and after a social event, nor could they distinguish well different types of social interaction nuances(e.g., dyadic vs. group interactions; implicit vs. explicit social evaluation). These findings align with previous research demonstrating that it is feasible to use passively sensed biomarkers to distinguish between baseline, anticipatory, and concurrent anxiety phases [16], [20]. However, they are inconsistent with studies that have successfully used passive sensing to differentiate between different contextual features of social interactions. For instance, [22] identified differences in heart rate, skin conductance, and voice features across levels of social evaluation during a virtual reality speech task, and [25] used location data to differentiate group sizes during social interactions (although the latter also relied on WiFi and real-time location services to identify people who were near each other). In the present study, it may be that the experimental manipulations (e.g., telling people that they would be evaluated vs. not) did not produce enough of a change in anxiety to detect physiologically. Indeed, it was common for anxious responses to exhibit “discordance" [59] or a lack of strong associations among components of anxiety (e.g., heightened threat perception without accompanying physiological arousal). Consistent with this, previous work with this dataset did not find evidence for concordance among affective, cognitive, behavioral, and physiological components of social anxiety during social interactions [34].

Interestingly, there were some cases in which we observed significant paired differences in median values of a given feature across contexts but the predictive models were unable to reliably distinguish between these contexts. This pattern of results suggests that these features may differ across contexts in terms of their central tendency, but that there is substantial overlap and variability in the data across the full time window. In other words, significant differences between features could sometimes be observed by leveraging data from multiple participants to increase the power of our tests. However, this did not necessarily translate to an effective model since the model only has access to a single example at prediction time.

Regarding specific, feature-level findings, the accelerometer standard deviation consistently emerged as the feature most able to distinguish between social and non-social contexts. This is likely reflective of the fact that participants were only speaking while actively participating in one of the social interaction tasks and were otherwise asked to sit quietly at their computer. Given that people tend to gesture while speaking [60], it is understandable why this feature may be particularly useful in detecting situational contexts involving conversation. These findings are also consistent with previous work demonstrating that accelerometer data can be used to accurately detect when people are speaking in a conversation [61]. Additionally, the standard deviation of the accelerometer was significantly lower during group vs. dyadic interactions and emerged as the most informative feature in the social group size model, which may be explained by conversational turn-taking patterns [62]. That is, there were likely more opportunities (and a greater expectation) to speak, and thus gesture, during dyadic conversations as compared to group conversations based on the number of people present in the conversation. Taken together, our findings in conjunction with previous research suggest that studies aiming to detect social interaction in real time would benefit from including an accelerometer sensor among their passive sensing streams.

Our results also found that EDA features, in both the frequency and time domains, are important for detecting when a person is actively engaged in social interaction. These findings are consistent with research finding associations between increased electrodermal activity and stress and anxiety [41], [63].

Within the frequency domain, both SCR Mean and SCL Mean moved as expected based on the literature [50], [64]. This suggests that EDA frequency domain features may be robust to movement artifacts, a commonly observed problem with Empatica E4 EDA sensors [65]. This observation cannot be verified by previous work validating the E4 EDA sensor [65] because that work only considered the EDA signal itself and not the features derived from the EDA signal. However, a similar observation was made in [50] regarding EDA frequency domain features in general. Additionally, [50] found that most of the power in human EDA signals is below 0.4Hz. This implies that the E4 EDA sampling rate is sufficient to capture this feature, according to the Nyquist rate.

Within the time domain, the number of phasic peaks was also an important predictor for many of our models. Where our results deviate from previous findings, however, is in the direction of effect. In our data, the number of phasic peaks decreases during social events, suggesting participant anxiety is decreasing. We do not believe this to be the case. One potential explanation for this surprising finding is that the EDA sensor is failing to take readings due to participant hand movement during social interactions (see [65]). It also worth noting that the E4 EDA sampling rate of 4Hz is below the 8Hz recommended by other literature [66]. The low sampling rate may have resulted in certain phasic peaks not being counted, but it seems unlikely that it would invert the effect direction. Regardless, it is interesting that despite this noise, the number of phasic peaks is still useful for predicting social contexts. This suggests that by training on data from a specific device, in a specific setting, it is possible to find useful patterns within sensor noise. Further work is needed to understand how lower-noise EDA response patterns could impact social context detection.

To explore the predictive models from the five tasks in Section III in greater depth, we conducted a clustering analyses on each predictive task. The goal of these analyses was to examine whether participants exhibited shared or heterogeneous physiological responses when looking at the top-performing features in these tasks. Across all tasks, our results suggest that there is considerable variability in physiological response patterns between individuals in the same context, as well as important commonalities. For instance, five and seven clusters were found among responses to social contexts. This suggests that while there may be individual differences in responses to social interactions, we would not label these patterns as being entirely person-specific. With this in mind, applying these models to a socially anxious individual may require some data to learn the individual’s response clusters, but as much data as a totally individualized model might require.

An important next step will be to determine what these clusters signify. For instance, it is plausible that the clusters represent state anxiety levels and that socially anxious individuals exhibit different patterns of movement and skin conductance depending on how anxious they feel in the moment. Alternatively, or in addition, the clusters might reflect other momentary characteristics of social anxiety, such as fears, assumptions of negative evaluation, concerns about some aspect of their social performance, or a desire to escape the social situation, etc. Understanding common physiological response patterns and the feelings or behaviors they map onto could aid in the eventual deployment of JITAIs. Of course, the results presented here are not sufficient on their own to develop a JITAI. Rather, this work is a valuable step towards understanding how passive sensing can be leveraged for the kind of context detection that will be critical to develop context-sensitive JITAIs.

A. Limitations and Future Directions

This study must be considered in light of its limitations. We acknowledge that characteristics of the study design and sample may limit its generalizability to non-virtual contexts.

First, the study was conducted virtually via Zoom, and the within-person experimenter-manipulated conversation conditions (e.g., dyadic vs. group) all involved talking to strangers about benign topics, so the variety of interactions was limited. It is unclear to what extent these findings would hold under different conditions; when talking to friends or interacting in person versus online, for example, participants may exhibit different patterns of gesturing (e.g., movements concentrated around the face versus a broader range of movements, [67]), altering ACC SD response patterns from virtual social contexts.

Second, our sample was relatively homogeneous with respect to age (M = 19.28; SD = 1.91), sex (76.1% female), race (67.4% White), and ethnicity (89.1% Non-Latinx/Hispanic). It is critical that we conduct future research with more diverse samples, particularly in light of findings that ostensibly “gold standard" psychophysiological measures such as EDA were developed and tested with predominantly White participants and typical data cleaning procedures can systematically exclude non-White participants who exhibit lower reactivity [68]. Accordingly, researchers might consider using more lenient data cleaning and processing approaches that maximize data inclusion and minimize the risk of systematic data exclusion. This is particularly true with ambulatory psychophysiological devices, which can already be less sensitive to physiological responses compared to laboratory-based assessment tools.

Third, we did not try to match or control for the various ways in which demographic characteristics (e.g., gender or race/ethnicity) among participants, or between participants and researchers, could have influenced study experiences or levels of social anxiety. Student researchers had a range of racial and ethnic identities, but a majority identified as cisgender women.

Fourth, our analysis examined how statistical descriptors of physiology changed with respect to different virtual social contexts. This approach is interpretable, and it is how we believe statistical models would be built in real-world social anxiety interventions. This approach also aligns with previous work in the literature [55] that analyzes physiological response features in response to emotional experiences. Even so, summary statistics can mask much of the interesting information about variability that exist in dynamic processes. Future work could focus on how these processes evolve over time and how this evolution varies from person to person.

Finally, there are inherent limitations with using wristworn devices to collect psychophysiological data. For example, the wristband used in our analysis, Empatica E4, uses dry electrodes (vs. wet electrodes that have been shown to improve EDA measurements [69]) and is sensitive to movement artifacts in both its PPG and EDA readings [65], [70]. Even so, by studying the capabilities of wearable devices, the research community can develop digital interventions and treatments with fewer barriers to adoption. By reducing barriers to adoption we can increase the potential impact of this work, outweighing the loss of perfect measurements. Moving forward, replication with different devices, larger and more diverse samples, including both diagnosed and non-anxious samples, and in non-virtual settings will be key.

Even with each of these potential sources of noise, our results demonstrate that social context can be passively inferred. Moving forward, replication with different devices, larger and more diverse samples, including both diagnosed and non-anxious samples, and in non-virtual settings will be key.

B. Conclusion

Despite the limitations, the present study advances our understanding of social context detection via passive sensing in several key ways. First and foremost, with the sensors used here we could reliably differentiate between social and non-social contexts but not between social interactions characterized by more subtle contextual differences. The accelerometer standard deviation and the number of peaks in skin conductance response were the two most important individual features for differentiating between social and non-social contexts. Within these features, there was considerable variability in terms of physiological response patterns, but potentially meaningful clusters also emerged, providing some evidence for both individual differences and commonalities between participants. Taken together, these findings suggest that accelerometer and skin conductance sensors can help identify when someone is in a social interaction, but likely do not uniquely characterize social interactions. That is, similar patterns may also be observed in other contexts, such as completing a stressful non-social task. Even so, we believe that passively sensed physiological data can serve as a useful complement to other methods for approximating social context (e.g., Bluetooth proximity sensors). Importantly, even knowing whether someone is with others or not will be important to tailor and deploy JITAIs. Future work is needed to better detect characteristics of social interactions and to disentangle precisely what clusters of physiological response patterns in social interactions represent to further improve JITAIs.

Supplementary Material

Supplemental Fig 1 (b)
Supplemental Fig. 7
Supplemental Fig 2 (a)
Supplemental Fig. 1 (b)
Supplemental Fig. 1 (a)
Supplemental Fig. 4 (c)
Supplemental Fig. 1 (c)
Supplemental Fig. 4 (b)
Supplemental Fig. 2 (c)
Supplemental Fig. 2 (b)
Supplemental Fig. 5 (a)
Supplemental Fig. 5 (c)
Supplemental Fig. 5 (b)
Supplemental Fig. 3 (c)
Supplemental Fig. 3 (a)
Supplemental Fig. 3 (b)
Supplemental pdf

Acknowledgments

Research reported in this publication was partially supported by a 3Cavaliers Seed Grant and by the National Institute of Mental Health of the National Institutes of Health under award number R01MH132138. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Contributor Information

Emma R. Toner, Department of Psychology, University of Virginia, Charlottesville, VA USA..

Mark Rucker, Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA USA..

Zhiyuan Wang, Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA USA..

Maria A. Larrazabal, Department of Psychology, University of Virginia, Charlottesville, VA USA.

Lihua Cai, Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University, Guangzhou, China.

Debajyoti Datta, Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA USA..

Haroon Lone, Department of Electrical Engineering and Computer Science, Indian Institute of Science Education and Research Bhopal, Bhopal, India..

Mehdi Boukhechba, Janssen Pharmaceutical Companies of Johnson & Johnson, Titusville, NJ USA..

Bethany A. Teachman, Department of Psychology, University of Virginia, Charlottesville, VA USA.

Laura E. Barnes, Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA USA.

References

  • [1].Kessler RC, Petukhova M, Sampson NA, Zaslavsky AM, and Wittchen H-U, “Twelve-month and lifetime prevalence and lifetime morbid risk of anxiety and mood disorders in the United States,” International Journal of Methods in Psychiatric Research, vol. 21, no. 3, pp. 169–184, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Association AP, Diagnostic and statistical manual of mental disorders, 5th ed. American psychiatric association; Washington, DC, 2022. [Google Scholar]
  • [3].Aderka IM, Hofmann SG, Nickerson A, Hermesh H, Gilboa-Schechtman E, and Marom S, “Functional impairment in social anxiety disorder,” Journal of Anxiety Disorders, vol. 26, no. 3, pp. 393–400, Apr. 2012. [DOI] [PubMed] [Google Scholar]
  • [4].Grant BF, Hasin DS, Blanco C, Stinson FS, Chou SP, Goldstein RB, Dawson DA, Smith S, Saha TD, and Huang B, “The Epidemiology of Social Anxiety Disorder in the United States: Results From the National Epidemiologic Survey on Alcohol and Related Conditions,” The Journal of Clinical Psychiatry, vol. 66, no. 11, p. 6546, Nov. 2005. [Google Scholar]
  • [5].Wang PS, Berglund P, Olfson M, Pincus HA, Wells KB, and Kessler RC, “Failure and Delay in Initial Treatment Contact After First Onset of Mental Disorders in the National Comorbidity Survey Replication,” Archives of General Psychiatry, vol. 62, no. 6, pp. 603–613, Jun. 2005. [DOI] [PubMed] [Google Scholar]
  • [6].Acarturk C, Cuijpers P, van Straten A, and de Graaf R, “Psychological treatment of social anxiety disorder: a meta-analysis,” Psychological Medicine, vol. 39, no. 2, pp. 241–254, Feb. 2009. [DOI] [PubMed] [Google Scholar]
  • [7].Kazdin AE, “Addressing the treatment gap: A key challenge for extending evidence-based psychosocial interventions,” Behaviour Research and Therapy, vol. 88, pp. 7–18, Jan. 2017. [DOI] [PubMed] [Google Scholar]
  • [8].Goetter EM, Frumkin MR, Palitz SA, Swee MB, Baker AW, Bui E, and Simon NM, “Barriers to mental health treatment among individuals with social anxiety disorder and generalized anxiety disorder,” Psychological Services, vol. 17, no. 1, pp. 5–12, Feb. 2020. [Google Scholar]
  • [9].Pauley D, Cuijpers P, Papola D, Miguel C, and Karyotaki E, “Two decades of digital interventions for anxiety disorders: a systematic review and meta-analysis of treatment effectiveness,” Psychological Medicine, vol. 53, no. 2, pp. 1–13, 2023. [Google Scholar]
  • [10].Chow D. Y.-w., Jiang X, and You JHS, “Information technology-based versus face-to-face cognitive-behavioural therapy for anxiety and depression: A systematic review and meta-analysis,” Journal of affective disorders, vol. 310, pp. 429–440, 2022. [DOI] [PubMed] [Google Scholar]
  • [11].Kambeitz-Ilankovic L, Rzayeva U, Völkel L, Wenzel J, Weiske J, Jessen F, Reininghaus U, Uhlhaas PJ, Alvarez-Jimenez M, and Kambeitz J, “A systematic review of digital and face-to-face cognitive behavioral therapy for depression,” NPJ Digital Medicine, vol. 5, no. 1, p. 144, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Moshe I, Terhorst Y, Philippi P, Domhardt M, Cuijpers P, Cristea I, Pulkki-Råback L, Baumeister H, and Sander LB, “Digital interventions for the treatment of depression: A meta-analytic review.” Psychological bulletin, vol. 147, no. 8, pp. 749–786, 2021. [DOI] [PubMed] [Google Scholar]
  • [13].Nahum-Shani I, Smith SN, Spring BJ, Collins LM, Witkiewitz K, Tewari A, and Murphy SA, “Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support,” Annals of Behavioral Medicine, vol. 52, no. 6, pp. 446–462, May 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].van Genugten CR, Schuurmans J, Lamers F, Riese H, Penninx BW, Schoevers RA, Riper HM, and Smit JH, “Experienced Burden of and Adherence to Smartphone-Based Ecological Momentary Assessment in Persons with Affective Disorders,” Journal of Clinical Medicine, vol. 9, no. 2, p. 322, Jan. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Heimberg RG, Brozovich FA, and Rapee RM, “A cognitive-behavioral model of social anxiety disorder,” in Social Anxiety (Third Edition), 3rd ed. Academic Press, 2014, pp. 705–728. [Google Scholar]
  • [16].Shaukat-Jali R, N. v. Zalk, and D. E. Boyle, “Detecting Subclinical Social Anxiety Using Physiological Data From a Wrist-Worn Wearable: Small-Scale Feasibility Study,” JMIR Formative Research, vol. 5, no. 10, p. e32656, Oct. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Wen W, Liu G, Mao Z-H, Huang W, Zhang X, Hu H, Yang J, and Jia W, “Toward constructing a real-time social anxiety evaluation system: Exploring effective heart rate features,” IEEE Transactions on Affective Computing, vol. 11, no. 1, pp. 100–110, 2020. [Google Scholar]
  • [18].Rashid H, Mendu S, Daniel KE, Beltzer ML, Teachman BA, Boukhechba M, and Barnes LE, “Predicting Subjective Measures of Social Anxiety from Sparsely Collected Mobile Sensor Data,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4, no. 3, pp. 1–24, Sep. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Boukhechba M, Chow P, Fua K, Teachman BA, and Barnes LE, “Predicting Social Anxiety From Global Positioning System Traces of College Students: Feasibility Study,” JMIR Mental Health, vol. 5, no. 3, p. e10101, Jul. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Pisanski K, Kobylarek A, Jakubowska L, Nowak J, Walter A, Błaszczyński K, Kasprzyk M, Łysenko K, Sukiennik I, Piątek K, Frackowiak T, and Sorokowski P, “Multimodal stress detection: Testing for covariation in vocal, hormonal and physiological responses to Trier Social Stress Test,” Hormones and Behavior, vol. 106, pp. 52–61, Nov. 2018. [DOI] [PubMed] [Google Scholar]
  • [21].Yadav M, Sakib MN, Nirjhar EH, Feng K, Behzadan AH, and Chaspari T, “Exploring individual differences of public speaking anxiety in real-life and virtual presentations,” IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1168–1182, 2022. [Google Scholar]
  • [22].Barreda-Ángeles M, Aleix-Guillaume S, and Pereda-Baños A, “Users’ psychophysiological, vocal, and self-reported responses to the apparent attitude of a virtual audience in stereoscopic 360°-video,” Virtual Reality, vol. 24, no. 2, pp. 289–302, Jun. 2020. [Google Scholar]
  • [23].Wang R, Chen F, Chen Z, Li T, Harari G, Tignor S, Zhou X, Ben-Zeev D, and Campbell AT, “Studentlife: assessing mental health, academic performance and behavioral trends of college students using smartphones,” in Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing, 2014, pp. 3–14. [Google Scholar]
  • [24].Ben-Zeev D, Scherer EA, Wang R, Xie H, and Campbell AT, “Next-generation psychiatric assessment: Using smartphone sensors to monitor behavior and mental health,” Psychiatric Rehabilitation Journal, vol. 38, no. 3, pp. 218–226, Sep. 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].ZAKARIA NCB, BALAN R, and LEE Y, “StressMon: Scalable detection of perceived stress and depression using passive sensing of changes in work routines and group interactions,” Proceedings of the ACM on Human-Computer Interaction, vol. 3, pp. 37:1–29, Nov. 2019. [Google Scholar]
  • [26].Mattick RP and Clarke JC, “Development and validation of measures of social phobia scrutiny fear and social interaction anxiety,” Behaviour Research and Therapy, vol. 36, no. 4, pp. 455–470, Apr. 1998. [DOI] [PubMed] [Google Scholar]
  • [27].Beltzer ML, Daniel KE, Daros AR, and Teachman BA, “Examining social reinforcement learning in social anxiety,” Journal of Behavior Therapy and Experimental Psychiatry, vol. 80, p. 101810, 2023. [Google Scholar]
  • [28].Chen J, Milne K, Dayman J, and Kemps E, “Interpretation bias and social anxiety: does interpretation bias mediate the relationship between trait social anxiety and state anxiety responses?” Cognition and Emotion, vol. 33, no. 4, pp. 630–645, 2019. [DOI] [PubMed] [Google Scholar]
  • [29].Daniel KE, Daros AR, Beltzer ML, Boukhechba M, Barnes LE, and Teachman BA, “How Anxious are You Right Now? Using Ecological Momentary Assessment to Evaluate the Effects of Cognitive Bias Modification for Social Threat Interpretations,” Cognitive Therapy and Research, vol. 44, no. 3, pp. 538–556, Jun. 2020. [Google Scholar]
  • [30].Jacobson NC, Summers B, and Wilhelm S, “Digital biomarkers of social anxiety severity: Digital phenotyping using passive smartphone sensors,” vol. 22, no. 5, p. e16875. [Google Scholar]
  • [31].Rösler L, Göhring S, Strunz M, and Gamer M, “Social anxiety is associated with heart rate but not gaze behavior in a real social interaction,” vol. 70, p. 101600. [Google Scholar]
  • [32].Howell AN, Zibulsky DA, Srivastav A, and Weeks JW, “Relations among social anxiety, eye contact avoidance, state anxiety, and perception of interaction performance during a live conversation,” Cognitive behaviour therapy, vol. 45, no. 2, pp. 111–122, 2016. [DOI] [PubMed] [Google Scholar]
  • [33].Brown EJ, Turovsky J, Heimberg RG, Juster HR, Brown TA, and Barlow DH, “Validation of the social interaction anxiety scale and the social phobia scale across the anxiety disorders.” Psychological assessment, vol. 9, no. 1, pp. 21–27, 1997. [Google Scholar]
  • [34].Toner ER, Larrazabal MA, Cai L, Henry TR, MacCormack J, Boukhechba M, Barnes L, and Teachman B, “Social anxiety and concordance in emotional responses across levels of evaluative threat,” OSF, Oct 2023. [Google Scholar]
  • [35].Sun F-T, Kuo C, Cheng H-T, Buthpitiya S, Collins P, and Griss M, “Activity-aware mental stress detection using physiological sensors,” in Mobile Computing, Applications, and Services: Second International ICST Conference, MobiCASE 2010, Santa Clara, CA, USA, October 25–28, 2010, Revised Selected Papers 2. Springer, 2012, pp. 282–301. [Google Scholar]
  • [36].Boukhechba M, Daros AR, Fua K, Chow PI, Teachman BA, and Barnes LE, “Demonicsalmon: Monitoring mental health and social interactions of college students using smartphones,” Smart Health, vol. 9, pp. 192–203, 2018. [Google Scholar]
  • [37].Hernando D, Roca S, Sancho J, Alesanco Á, and Bailón R, “Validation of the apple watch for heart rate variability measurements during relax and mental stress in healthy subjects,” Sensors, vol. 18, no. 8, p. 2619, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Ahn JW, Ku Y, and Kim HC, “A novel wearable eeg and ecg recording system for stress assessment,” Sensors, vol. 19, no. 9, p. 1991, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Ihmig FR, Neurohr-Parakenings F, Schäfer SK, Lass-Hennemann J, and Michael T, “On-line anxiety level detection from biosignals: Machine learning based on a randomized controlled trial with spider-fearful individuals,” Plos one, vol. 15, no. 6, p. e0231517, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Petrescu L, Petrescu C, Mitrut O ,, Moise G, Moldoveanu A, Moldoveanu F, and Leordeanu M, “Integrating biosignals measurement in virtual reality environments for anxiety detection,” Sensors, vol. 20, no. 24, p. 7088, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Hickey BA, Chalmers T, Newton P, Lin C-T, Sibbritt D, McLachlan CS, Clifton-Bligh R, Morley J, and Lal S, “Smart devices and wearable technologies to detect and monitor mental health conditions and stress: A systematic review,” Sensors, vol. 21, no. 10, p. 3461, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Kim H-G, Cheon E-J, Bai D-S, Lee YH, and Koo B-H, “Stress and heart rate variability: a meta-analysis and review of the literature,” Psychiatry investigation, vol. 15, no. 3, p. 235, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Posada-Quintero HF and Chon KH, “Innovations in electrodermal activity data collection and signal processing: A systematic review,” Sensors, vol. 20, no. 2, p. 479, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Vinkers CH, Penning R, Hellhammer J, Verster JC, Klaessens JH, Olivier B, and Kalkman CJ, “The effect of stress on core and peripheral body temperature in humans,” Stress, vol. 16, no. 5, pp. 520–530, 2013. [DOI] [PubMed] [Google Scholar]
  • [45].Karlsson M, Hörnsten R, Rydberg A, and Wiklund U, “Automatic filtering of outliers in rr intervals before analysis of heart rate variability in holter recordings: a comparison with carefully edited data,” Biomedical engineering online, vol. 11, pp. 1–12, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Elgendi M, Norton I, Brearley M, Abbott D, and Schuurmans D, “Systolic peak detection in acceleration photoplethysmograms measured from emergency responders in tropical conditions,” PloS one, vol. 8, no. 10, p. e76585, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Makowski D, Pham T, Lau ZJ, Brammer JC, Lespinasse F, Pham H, Schölzel C, and Chen SHA, “NeuroKit2: A python toolbox for neurophysiological signal processing,” Behavior Research Methods, vol. 53, no. 4, pp. 1689–1696, feb 2021. [DOI] [PubMed] [Google Scholar]
  • [48].Plarre K, Raij A, Hossain SM, Ali AA, Nakajima M, Al’Absi M, Ertin E, Kamarck T, Kumar S, Scott M et al. , “Continuous inference of psychological stress from sensory measurements collected in the natural environment,” in Proceedings of the 10th ACM/IEEE international conference on information processing in sensor networks. IEEE, 2011, pp. 97–108. [Google Scholar]
  • [49].Fonseca DS, Netto A, Ferreira RB, and De Sa AM, “Lomb-scargle periodogram applied to heart rate variability study,” in 2013 ISSNIP Biosignals and Biorobotics Conference: Biosignals and Robotics for Better and Safer Living (BRC). IEEE, 2013, pp. 1–4. [Google Scholar]
  • [50].Posada-Quintero HF, Florian JP, Orjuela-Cañón AD, Aljama-Corrales T, Charleston-Villalobos S, and Chon KH, “Power spectral density analysis of electrodermal activity for sympathetic function assessment,” Annals of biomedical engineering, vol. 44, pp. 3124–3135, 2016. [DOI] [PubMed] [Google Scholar]
  • [51].Munoz ML, Van Roon A, Riese H, Thio C, Oostenbroek E, Westrik I, de Geus EJ, Gansevoort R, Lefrandt J, Nolte IM et al. , “Validity of (ultra-) short recordings for heart rate variability measurements,” PloS one, vol. 10, no. 9, p. e0138921, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Shaffer F, Meehan ZM, and Zerr CL, “A critical review of ultrashort-term heart rate variability norms research,” Frontiers in neuroscience, vol. 14, p. 594880, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Baek HJ, Cho C-H, Cho J, and Woo J-M, “Reliability of ultrashort-term analysis as a surrogate of standard 5-min analysis of heart rate variability,” Telemedicine and e-Health, vol. 21, no. 5, pp. 404–414, 2015. [DOI] [PubMed] [Google Scholar]
  • [54].Siegel EH, Sands MK, Van den Noortgate W, Condon P, Chang Y, Dy J, Quigley KS, and Barrett LF, “Emotion fingerprints or emotion populations? a meta-analytic investigation of autonomic features of emotion categories.” Psychological bulletin, vol. 144, no. 4, p. 343, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Hoemann K, Khan Z, Feldman MJ, Nielson C, Devlin M, Dy J, Barrett LF, Wormwood JB, and Quigley KS, “Context-aware experience sampling reveals the scale of variation in affective experience,” Scientific reports, vol. 10, no. 1, pp. 1–16, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].McInnes L and Healy J, “Accelerated hierarchical density based clustering,” in Data Mining Workshops (ICDMW), 2017 IEEE International Conference on. IEEE, 2017, pp. 33–42. [Google Scholar]
  • [57].McInnes L, Healy J, and Astels S, “hdbscan: Hierarchical density based clustering,” The Journal of Open Source Software, vol. 2, no. 11, p. 205, 2017. [Google Scholar]
  • [58].Xia S, Xiong Z, Luo Y, Zhang G et al. , “Effectiveness of the euclidean distance in high dimensional spaces,” Optik, vol. 126, no. 24, pp. 5614–5619, 2015. [Google Scholar]
  • [59].Hollenstein T and Lanteigne D, “Models and methods of emotional concordance,” Biological Psychology, vol. 98, pp. 1–5, Apr. 2014. [DOI] [PubMed] [Google Scholar]
  • [60].Goldin-Meadow S, “The role of gesture in communication and thinking,” Trends in Cognitive Sciences, vol. 3, no. 11, pp. 419–429, Nov. 1999. [DOI] [PubMed] [Google Scholar]
  • [61].Hung H, Englebienne G, and Cabrera Quiros L, “Detecting conversing groups with a single worn accelerometer,” in Proceedings of the 16th International Conference on Multimodal Interaction. ACM, Nov. 2014, pp. 84–91. [Google Scholar]
  • [62].Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T, Hoymann G, Rossano F, de Ruiter JP, Yoon K-E, and Levinson SC, “Universals and cultural variation in turn-taking in conversation,” Proceedings of the National Academy of Sciences, vol. 106, no. 26, pp. 10587–10592, Jun. 2009. [Google Scholar]
  • [63].Globisch J, Hamm AO, Esteves F, and Öhman A, “Fear appears fast: Temporal course of startle reflex potentiation in animal fearful subjects,” Psychophysiology, vol. 36, no. 1, pp. 66–75, 1999. [DOI] [PubMed] [Google Scholar]
  • [64].Shimomura Y, Yoda T, Sugiura K, Horiguchi A, Iwanaga K, and Katsuura T, “Use of frequency domain analysis of skin conductance for evaluation of mental workload,” Journal of physiological anthropology, vol. 27, no. 4, pp. 173–177, 2008. [DOI] [PubMed] [Google Scholar]
  • [65].Milstein N and Gordon I, “Validating measures of electrodermal activity and heart rate variability derived from the empatica e4 utilized in research settings that involve interactive dyadic states,” Frontiers in Behavioral Neuroscience, vol. 14, p. 148, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Boucsein W, Electrodermal activity. Springer Science & Business Media, 2012. [Google Scholar]
  • [67].Garcia AC, “Embodied action in remote online interaction: A preliminary investigation of hand raising gestures in a zoom meeting,” vol. 14, no. 1, pp. 3–32. [Google Scholar]
  • [68].Bradford DE, DeFalco A, Perkins ER, Carbajal I, Kwasa J, Goodman FR, Jackson F, Richardson LNS, Woodley N, Neuberger L, Sandoval JA, Huang HJ, and Joyner KJ, “Whose Signals Are Being Amplified? Toward a More Equitable Clinical Psychophysiology,” Clinical Psychological Science, Dec. 2022. [Google Scholar]
  • [69].Kleckner IR, Feldman MJ, Goodwin MS, and Quigley KS, “Framework for selecting and benchmarking mobile devices in psychophysiological research,” Behavior Research Methods, vol. 53, no. 2, pp. 518–535, Apr. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70].Schuurmans AA, De Looff P, Nijhof KS, Rosada C, Scholte RH, Popma A, and Otten R, “Validity of the empatica e4 wristband to measure heart rate variability (hrv) parameters: A comparison to electrocardiography (ecg),” Journal of medical systems, vol. 44, pp. 1–11, 2020. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Fig 1 (b)
Supplemental Fig. 7
Supplemental Fig 2 (a)
Supplemental Fig. 1 (b)
Supplemental Fig. 1 (a)
Supplemental Fig. 4 (c)
Supplemental Fig. 1 (c)
Supplemental Fig. 4 (b)
Supplemental Fig. 2 (c)
Supplemental Fig. 2 (b)
Supplemental Fig. 5 (a)
Supplemental Fig. 5 (c)
Supplemental Fig. 5 (b)
Supplemental Fig. 3 (c)
Supplemental Fig. 3 (a)
Supplemental Fig. 3 (b)
Supplemental pdf

RESOURCES