Rapid processing of neutral and angry expressions within ongoing facial stimulus streams: Is it all about isolated facial features?

Antonio Schettino; Emanuele Porcu; Christopher Gundlach; Christian Keitel; Matthias M Müller

doi:10.1371/journal.pone.0231982

. 2020 Apr 24;15(4):e0231982. doi: 10.1371/journal.pone.0231982

Rapid processing of neutral and angry expressions within ongoing facial stimulus streams: Is it all about isolated facial features?

Antonio Schettino ^1,^2,^*, Emanuele Porcu ³, Christopher Gundlach ⁴, Christian Keitel ^5,^6,^#, Matthias M Müller ^4,^#

Editor: José A Hinojosa⁷

PMCID: PMC7182236 PMID: 32330160

Abstract

Our visual system extracts the emotional meaning of human facial expressions rapidly and automatically. Novel paradigms using fast periodic stimulations have provided insights into the electrophysiological processes underlying emotional content extraction: the regular occurrence of specific identities and/or emotional expressions alone can drive diagnostic brain responses. Consistent with a processing advantage for social cues of threat, we expected angry facial expressions to drive larger responses than neutral expressions. In a series of four EEG experiments, we studied the potential boundary conditions of such an effect: (i) we piloted emotional cue extraction using 9 facial identities and a fast presentation rate of 15 Hz (N = 16); (ii) we reduced the facial identities from 9 to 2, to assess whether (low or high) variability across emotional expressions would modulate brain responses (N = 16); (iii) we slowed the presentation rate from 15 Hz to 6 Hz (N = 31), the optimal presentation rate for facial feature extraction; (iv) we tested whether passive viewing instead of a concurrent task at fixation would play a role (N = 30). We consistently observed neural responses reflecting the rate of regularly presented emotional expressions (5 Hz and 2 Hz at presentation rates of 15 Hz and 6 Hz, respectively). Intriguingly, neutral expressions consistently produced stronger responses than angry expressions, contrary to the predicted processing advantage for threat-related stimuli. Our findings highlight the influence of physical differences across facial identities and emotional expressions.

Introduction

The human brain is capable of rapidly processing differences in facial expressions and identifying those that signal threat, presumably due to the survival advantage of such an ability [1,2]. From an evolutionary point of view, speed of processing, identification, and discrimination are essential for the survival of an individual within a group, because not all members will exhibit identical (facial) expressions: in a threatening situation, some might express anger—i.e., aggression and the will to fight—, whereas others might express anxiety and a tendency for flight. This communication of contradictory information must be rapidly processed to allow the individual to adapt behavior accordingly.

An elegant way to investigate the time required for emotion detection and discrimination is to show stimuli in a rapid serial visual presentation stream (RSVP) at a certain frequency. In RSVP paradigms, each cycle is defined by the onset of a new image; thus, each image serves as a forward mask for the subsequent image and as a backward mask for the preceding one [3]. Such frequency-tagged stimulation streams evoke the steady-state visual evoked potential (SSVEP). SSVEPs have the same fundamental frequency as the RSVP stream and may include higher harmonics [4,5]. Attending to RSVP streams results in a modulation of the SSVEP, either quantified as an increase in amplitude [6–8] or enhanced inter-trial phase consistency [9,10].

A number of studies demonstrated that the presentation of emotionally arousing stimuli resulted in enhanced SSVEPs compared with neutral stimuli [e.g., 11–13]. Manipulating the presentation frequency of the RSVP allows to test the minimal presentation times for emotional cue extraction by analyzing SSVEPs in the frequency domain. In a recent study by Bekhtereva and Müller [14], we showed that presenting complex images from the International Affective Picture System [IAPS; 15] at a rate of 15 Hz (i.e., about 67 msec per image) was too rapid to enable emotional cue extraction, resulting in no discernible changes in SSVEPs between emotional and neutral images. Conversely, a 6 Hz presentation (i.e., about 167 msec per image) resulted in a significant modulation of SSVEPs as a function of emotional content. This presentation time was close to what was previously reported by Alonso-Prieto et al. [16] using RSVP paradigms that discriminated between faces with different identities. In that study, SSVEPs increased when different faces were presented in an RSVP compared with presenting the same face for the entire stream, a finding consistent with repetition suppression effects [17,18]. Alonso-Prieto et al. [16] reasoned that a longer interstimulus interval allowed for the full development of the N170, an event-related potential (ERP) component classically linked to the identification and discrimination of faces [19–21]. This, in turn, would enable sufficient time for individual face identification and emotional cue extraction. Subsequently, Liu-Shuang et al. [22] published an extension of the aforementioned stimulation approach by introducing regularities after a fixed number of cycles (i.e., images). In other words, a certain exemplar (“oddball”) was presented after four filler faces, which resulted in an SSVEP at the base frequency (6 Hz) and a second peak in the frequency spectrum at a slower rate of the oddball presentation (1.2 Hz). Interestingly, the frequency spectrum lacked such a 1.2 Hz peak when faces were presented upside down, an experimental condition typically used to control for low-level visual features because inverted faces share the same first-order features (e.g., eyes, nose) but disrupt second-order configuration, i.e., the relation between features [e.g., 23].

Given that previous work mainly focused on face identification using RSVP protocols with either same/different faces [22] or faces among natural images [24], in 2014 we started a series of studies to extend RSVP stimulation using complex IAPS images and faces by including regular oddball stimuli in the RSVP. The present report is a comprehensive summary of the results we obtained in a research programme consisting of two pilot studies (N = 16/16) and two experiments with larger samples (N = 31/30), for a total of 93 recorded participants. Similar to our previous work using complex pictures [14,25–27], the purpose of this series of studies was to test whether—and under which conditions—the visual system is able to identify a regularly presented neutral or emotional face (angry in the present experiments) within an RSVP stream of different facial identities and expressions as “fillers”. Given the alleged motivational relevance of angry faces, we initially expected processing advantages—and, consequently, increased SSVEPs—for regularly presented angry as compared to neutral faces. By using different facial identities and expressions as “fillers”, our experimental protocol extended the fast periodic oddball stimulation introduced by Liu-Shuang et al. [22] by using a different face and/or facial expression for each cycle (except the regularly presented items). In doing so, we significantly increased stimulus variability and set the visual system under a “perceptual stress test” [27] to probe boundary conditions of rapid facial emotional cue extraction. By presenting RSVP streams with inverted faces of exactly the same sequences, we additionally tested whether any observed SSVEP modulations may be due to low-level visual features and/or second-order relationships between facial elements. We initially explored much shorter presentation times—well above 6 Hz—to probe the temporal boundary conditions established for complex images in previous work [14]. Specifically, we tested whether the modulation of the oddball frequency via fixed exemplars requires the full processing of an individual face or, instead, regular presentations within a longer RSVP stream may allow for a significantly shorter presentation time because the gist of the emotional expression can be integrated through regular presentation. To that end, similar to our first IAPS study [14], we piloted with a stimulation frequency of 15 Hz (i.e., cycle length ~67 msec). To ensure that participants attended the RSVP, we included an orthogonal attention task instructing participants to detect and respond to a colored dot that was unpredictably overlayed on the face stream. We started our experimental series with the initial hypotheses that, if the emotional facial expression could be extracted reliably from the stream of faces, SSVEPs at the regularity frequency should be measurable for the regular but not the irregular conditions. SSVEPs for angry faces should be higher than for neutral images and should be also higher for upright than for inverted faces.

In the meantime, Dzhelyova and colleagues [28] published a study based on a similar rationale and methodology. These authors presented an RSVP at about 6 Hz with one neutral face of one individual and inserted, once every 5^th stimulus, a face with an emotional expression of that individual, respectively. In a second experiment, they increased the base presentation rate of neutral faces to 12 Hz, with an emotional face presented every 9^th stimulus. They analysed SSVEPs to the oddball frequency (about 1.2 Hz in Experiment 1 and about 1.3 Hz in Experiment 2) and additionally explored all higher harmonics, i.e, integer multiples of the oddball frequency. Identical to our protocol, they presented streams with inverted faces at the same stimulation frequencies, to control for the influence of low-level features. SSVEPs were found to be above noise level at the sum of all oddball frequencies (i.e., including all higher harmonics) elicited by emotional faces. Interestingly, this was also true for inverted faces, although the effect was smaller in magnitude compared to upright faces when looking at more occipito-temporal electrodes.

In their experimental paradigm, Dzhelyova et al. [28] always switched from a neutral to an emotional expression. This methodological choice might have overlooked systematic low-level, physical differences between stimuli, which could also explain the reported SSVEPs above noise level in the inverted condition. In our studies, we used different emotional facial expressions and the regular expression was either an angry or a neutral face. In contrast to Dzhelyova et al. [28] and other studies with similar paradigms, our use of a range of expressions and identities also changed the status of the “oddball” to yet another facial expression in the RSVP, with the only difference that it was presented regularly among irregular presentations of filler faces. Throughout this manuscript, we will therefore refer to it as the regular rather than the oddball expression and term its presentation rate regularity rate and the corresponding neural response, if present, regularity-driven SSVEP.

Given the increased diversity of emotional facial expressions compared with only neutral faces as filler stimuli, we were able to better randomize physical stimulus differences which may arguably influence the neural response to the regular presentations. In addition, we swapped emotional expressions and either embedded angry or neutral faces regularly within the RSVP. This allowed us to test whether angry faces elicited a larger response compared to neutral expressions, theoretically expected under the assumption that threatening information is of greater behavioural significance and thus leads to prioritised neural processing. Furthermore, we included another important control condition: we presented the respective RSVP stream with different facial expressions without any regular repetitions. This important manipulation controls for an inherent problem of the stimulation protocol: the SSVEP driven by the regular presentation (i.e., the “oddball” in previous studies) is always a subharmonic of the RSVP frequency. Thus, only testing whether a regularity-driven SSVEP is above noise-level cannot exclude that it is, in fact, a subharmonic of the SSVEP to the RSVP rate and does not specifically indicate a functional processing of such regularity. With our control condition, presenting faces in irregular order only as a contrast, we were able to test for that possible confound.

Pilot 1

Materials and methods

Participants

Sixteen participants (10F/6M, median age 22.5 years, range 19–31, normal or corrected-to-normal vision, no self-reported history of neurological or psychiatric disorders) were recruited from the student population and the general public. Participants gave informed written consent prior to the experiment and were financially reimbursed €12 afterwards. All studies reported here were approved by the ethics committee of the University of Leipzig.

Stimuli

Stimuli were selected from NimStim, a validated database of facial expressions free for academic use [29]. This pilot experiment included identities #21, #22, #23, #25, #26, #33, #34, #36, and #37 (all males); the selected expressions were neutral, angry, and happy (all with closed mouth). All stimuli were resized to 152 x 190 pixels using Irfanview (https://www.irfanview.com/) and converted to grayscale in MATLAB R2015a (The MathWorks, Inc., Natick, MA) via the standard NTSC conversion formula used for calculating the effective luminance of a pixel: intensity = (0.2989 * red) + (0.5870 * green) + (0.1140 * blue) (see https://tinyurl.com/rgb2gray). Their luminance was matched using the SHINE toolbox [30]. To remove external facial features (e.g., hair and ears) and to standardize the spatial layout occupied by each face, stimuli were enclosed in an oval frame at presentation.

Procedure

After signing the informed consent, participants were seated comfortably in an acoustically dampened and electromagnetically shielded chamber and directed their gaze towards a central fixation cross (0.8° x 0.8° of visual angle) displayed on a 19-inch CRT screen (refresh rate: 60 frames per second; 640 x 480 pixel resolution) placed at a distance of 80 cm. The experimental stimulation consisted of a rapid serial visual presentation (RSVP) of face images showing each stimulus (size = 3.5°) in the center of the screen. The RSVP was presented at a rate of 15 faces per second (15 Hz), resulting in a presentation cycle of four frames (cycle length ~67 msec). Each face was shown during the first 50% of the frames of one cycle, producing a 50/50 on/off luminance flicker. Within the RSVP in each trial, faces were randomly drawn and organized in triplets. Depending on the experimental condition, the first image of each triplet was either an angry or a neutral face. For position two and three, images were pseudo-randomly drawn from the remaining expression categories so that emotional expressions were evenly distributed. Faces within one triplet were not allowed to re-occur in the following trial, to avoid short-term repetitions of identical faces. Happy faces were never presented regularly and only served as filler items.

In addition to the physical RSVP frequency (stimulation frequency), this presentation protocol introduced a second rhythm defined by the regular occurrence of faces of one emotional category with each third face. The category angry or neutral thus repeated at 5 Hz, i.e., at one-third of the RSVP rate (regularity frequency). We further added a third irregular condition, for which image sequences were created by randomly drawing from all emotional categories (i.e., no regularity at 5 Hz). As control conditions, we mirrored the set-up of upright-face RSVPs (regular angry, regular neutral, and irregular) but presented all stimuli upside-down, i.e. inverted (see Fig 1).

Fig 1 — (A) Sequence of six exemplar images for each experimental condition. For the regular conditions, every third image contained a repetition of the same emotional category (angry or neutral). For the irregular condition, image triplets contained all three emotional categories (neutral, angry, happy) in random order. Images were presented either upright or upside down. Note that, for illustration purposes, we here show images from the Face Research Lab London Set [31], additionally modified in GIMP 2.10.14 (https://www.gimp.org/) to display anger. A set of different (only male) face identities was presented during the experiment, but cannot be shown here due to copyright restrictions. Please refer to https://www.macbrain.org/resources.htm for examples of faces used in the experiment. (B) Exemplar visual presentation of stimuli within each trial. Single images were presented for 33 msec followed by a fixation cross-only image for 33 msec, leading to a presentation rate of 15 Hz for the image stream with a regularity frequency of 5 Hz during the regular conditions. Note: image not to scale.

At the beginning of each trial, participants were presented with a fixation cross for 1.2 sec. Subsequently, the RSVP was presented for 3.8 sec. Participants were instructed to press the spacebar on a standard QWERTZ USB keyboard any time they detected a turquoise dot (RGB: 128, 128, 196; diameter = 0.3° of visual angle) briefly displayed within the face area (2 consecutive frames = 33 msec). Targets occurred in 40% of trials and up to three times in one trial with a minimum interval of 600 msec between onsets. At the end of each trial, the fixation cross remained on screen for an extra 1 sec, allowing participants to blink before the next trial started.

We presented a total of 576 trials (96 trials per condition), divided into 8 blocks (~6 min 20 sec each). Before the start of the experiment, participants performed a few blocks of training. After each training and experimental block, they received feedback on their performance (average hit rate, reaction times, and number of false alarms within the block).

To ensure that our pre-selected facial expressions were processed in accordance with normative categorization, at the end of the main task we asked participants to judge the level of conveyed anger and happiness using a 9-point Likert scale (1: very low anger/happiness; 9 very high anger/happiness) (see results in the Supplementary Materials).

EEG recording and preprocessing

Electroencephalographic activity (EEG) was recorded with an ActiveTwo amplifier (BioSemi, Inc., The Netherlands) at a sampling rate of 256 Hz. Sixty-four Ag/AgCl electrodes were fitted into an elastic cap following the international 10/20 system [32]. Electrodes T7 and T8 of the standard BioSemi layout were moved in position I1 and I2 to increase spatial resolution at occipital sites. The common mode sense (CMS) active electrode and the driven right leg (DRL) passive electrode were used as reference and ground electrodes, respectively. Horizontal and vertical electrooculogram (EOG) were monitored using four facial bipolar electrodes placed on the outer canthi of each eye and in the inferior and superior areas of the left orbit.

EEG preprocessing was performed offline with custom MATLAB scripts and functions included in EEGLAB v14.1.1b [33] and FASTER v1.2.3b [34] toolboxes. The continuous EEG signal was referenced to the average activity of all electrodes. After subtracting the mean value of the waveform (DC offset), we assigned electrode coordinates and segmented the signal into epochs time-locked to the beginning of the flickering stimulation (0–3.8 sec). We discarded all trials with behavioral responses (N = 216), leaving 360 epochs (60 per condition). After re-referencing to electrode Cz, FASTER was used for artifact identification and rejection (see commented script at https://osf.io/au73y/) using the following settings: (i) over the whole normalized EEG signal, channels with variance, mean correlation, and Hurst exponent exceeding z = ±3 were interpolated via a spherical spline procedure [35]; (ii) the mean across channels was computed for each epoch and, if amplitude range, variance, and channel deviation exceeded z = ±3, the whole epoch was removed; (iii) within each epoch, channels with variance, median gradient, amplitude range, and channel deviation exceeding z = ±3 were interpolated; (iv) condition averages with amplitude range, variance, channel deviation, and maximum EOG value exceeding z = ±3 were removed; (v) epochs containing more than 12 interpolated channels were discarded. We also discarded epochs whose signal exceeded 5 standard deviations from single- and all-channel mean kurtosis value, i.e., displaying abrupt spikes or flat activity. In addition, we verified that the spectral estimates would not deviate from baseline by +/-50 dB in the 0–2 Hz frequency window (indicating blinks) or +25/-100 dB in the 20–40 Hz frequency window (indicating muscular activity) [36]. The number of interpolated channels was low (M = 4.06, SD = 1.25). For an overview of the mean percentage of rejected epochs per condition, see Table 1. Finally, the resulting epoched data were re-referenced to the average activity of all scalp electrodes.

Table 1. Percentage of rejected epochs after preprocessing.

experiment	variability	orientation	regularity	mean	st.dev.	min.	max.
Pilot 1		upright	angry	13.85	3.57	8.33	21.67
			neutral	14.27	4.45	10.00	23.33
			irregular	16.35	4.69	10.00	25.00
		inverted	angry	15.31	5.78	6.67	28.33
			neutral	15.73	4.64	6.67	26.67
			irregular	15.42	5.64	6.67	23.33
Pilot 2	high	upright	angry	11.25	4.06	3.33	16.67
			neutral	15.62	5.36	6.67	23.33
			irregular	15.42	5.76	6.67	30.00
		inverted	angry	13.54	6.18	3.33	26.67
			neutral	12.29	4.37	3.33	20.00
			irregular	15.62	6.64	3.33	30.00
	low	upright	angry	14.58	5.51	10.00	30.00
			neutral	13.75	5.64	3.33	26.67
			irregular	13.79	4.57	3.33	23.33
		inverted	angry	18.12	8.08	3.33	33.33
			neutral	15.83	7.41	3.33	26.67
			irregular	13.54	6.29	3.33	26.67
Experiment 1	high	upright	angry	15.73	6.87	3.45	33.33
			neutral	16.13	7.90	0.00	36.67
			irregular	18.45	8.05	3.33	40.00
		inverted	angry	17.26	8.39	0.00	40.00
			neutral	16.44	8.02	3.33	36.67
			irregular	15.12	6.74	3.33	33.33
	low	upright	angry	19.78	8.80	3.33	43.33
			neutral	15.63	7.49	3.33	36.67
			irregular	17.15	9.16	3.33	36.67
		inverted	angry	16.00	8.05	0.00	40.00
			neutral	15.96	6.04	3.33	30.00
			irregular	15.77	7.49	0.00	30.00
Experiment 2	high	upright	angry	14.11	8.20	3.33	33.33
			neutral	15.37	7.24	3.33	36.67
			irregular	12.20	5.78	0.00	24.14
		inverted	angry	14.13	6.19	0.00	30.00
			neutral	13.56	7.84	0.00	30.00
			irregular	14.33	6.62	3.33	36.67
	low	upright	angry	15.00	6.25	3.33	26.67
			neutral	12.81	7.07	0.00	23.33
			irregular	15.57	9.56	3.33	43.33
		inverted	angry	13.89	6.33	0.00	26.67
			neutral	14.56	9.08	3.33	43.33
			irregular	15.02	9.01	3.33	40.00

Open in a new tab

Descriptive statistics of the percentage of removed trials after preprocessing, separately for all experiments and experimental conditions.

Spectral decomposition of stimulus-driven EEG signals

Artifact-free epochs were truncated to 3 sec starting 0.5 sec after RSVP onset, to exclude the initial event-related potentials. In truncated and detrended (i.e., linear trend removed) epochs, we quantified SSVEPs by means of Fourier transforms (technical details below) at each EEG sensor and for each condition, separately. We first inspected power spectra for peaks at RSVP (15 Hz) and face-regularity rates (5 Hz). Provided these peaks were present (which was the case in all participants), we used a recently developed spatial filtering approach to represent SSVEPs as an optimally weighted sum of all EEG sensors [37]. Spatial filters were derived separately for each experimental condition. We defined the signal bandwidth as +/-0.5 Hz, centered on the SSVEP frequency of the regularity-driven signal (5 Hz). Noise was defined as the spectral components centred on frequencies -/+ 1 Hz surrounding frequencies of interest with a bandwidth of +/-0.5 Hz (FWHM), respectively. To reduce numerical instabilities in spatial-filter estimation that may arise due to low trial numbers, the noise covariance matrix was regularised by adding 1% of the mean of its eigenvalues to its diagonal [38]. We opted to derive spatial filters per condition because topographical maps of regularity-driven SSVEPs differed substantially between conditions and thus prevented the alternative common-filter approach (one filter for all conditions) as well as the traditional definition of regions (i.e., electrode clusters with largest amplitude identified on topographical maps).

Note that we applied the same approach to conditions with irregular stimulus presentation (in the absence of a regularly repeated emotional expression) for reasons of consistency, although we did not expect to find a regularity-driven SSVEP. This can lead to overfitting noise and produce a spectral peak in the absence of an SSVEP [37]. However, in comparing regular and irregular conditions, the former should always produce a greater response when driving an SSVEP (see Statistical Analysis section below).

Filter-projected single-trial EEG time series were then multiplied with a Tukey window and subjected to Fourier transforms using the Fieldtrip function ft_freqanalysis (method ‘mtmfft’) [39]. To this end, data were zero-padded to a length of 10 sec, allowing for a unified frequency resolution of 0.1 Hz across experiments. From complex spectral representations of single trials we then computed the Cosine Similarity Index [40]—a measure of inter-trial phase clustering that can be interpreted similarly to the classic inter-trial coherence, ITC [41,42], or its derivative ITCz [43,44] but is less sensitive to the number of trials n—according to:

C S = \frac{2}{n (n - 1)} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} c o s (θ_{i} - θ_{j})

(1)

where cos denotes the cosine transform and θ is the angle of the complex Fourier coefficient at the frequency of interest in separate trials i and j. Essentially, CS is based on quantifying and then summing the cosine of the angle differences between any two pairs of trials.

An overview of the CS values for each condition can be found in Table 2.

Table 2. Cosine Similarity Index (CS), regularity frequency.

experiment	variability	orientation	regularity	CS
experiment	variability	orientation	regularity	mean [95% CI]
Pilot 1		upright	angry	0.03 [0.01, 0.04]
			neutral	0.05 [0.04, 0.06]
			irregular	0.00 [0.00, 0.01]
		inverted	angry	0.03 [0.01, 0.04]
			neutral	0.05 [0.04, 0.06]
			irregular	0.00 [-0.01, 0.01]
Pilot 2	high	upright	angry	0.06 [0.02, 0.09]
			neutral	0.09 [0.06, 0.12]
			irregular	-0.02 [-0.04, -0.01]
		inverted	angry	0.03 [0.02, 0.04]
			neutral	0.09 [0.06, 0.12]
			irregular	-0.01 [-0.02, 0.00]
	low	upright	angry	0.02 [0.00, 0.04]
			neutral	0.04 [0.03, 0.05]
			irregular	-0.02 [-0.03, -0.01]
		inverted	angry	0.03 [0.01, 0.05]
			neutral	0.03 [0.01, 0.05]
			irregular	-0.03 [-0.04, -0.01]
Experiment 1	high	upright	angry	0.04 [0.03, 0.05]
			neutral	0.06 [0.04, 0.07]
			irregular	0.01 [0.00, 0.02]
		inverted	angry	0.03 [0.02, 0.04]
			neutral	0.04 [0.03, 0.05]
			irregular	0.00 [0.00, 0.01]
	low	upright	angry	0.03 [0.02, 0.04]
			neutral	0.04 [0.02, 0.05]
			irregular	0.01 [0.00, 0.02]
		inverted	angry	0.02 [0.01, 0.03]
			neutral	0.03 [0.02, 0.04]
			irregular	0.00 [0.00, 0.01]
Experiment 2	high	upright	angry	0.04 [0.03, 0.05]
			neutral	0.06 [0.05, 0.08]
			irregular	0.00 [0.00, 0.01]
		inverted	angry	0.04 [0.02, 0.05]
			neutral	0.05 [0.04, 0.06]
			irregular	0.01 [0.00, 0.01]
	low	upright	angry	0.02 [0.01, 0.03]
			neutral	0.03 [0.02, 0.04]
			irregular	0.01 [0.00, 0.02]
		inverted	angry	0.02 [0.01, 0.02]
			neutral	0.02 [0.01, 0.03]
			irregular	0.00 [0.00, 0.01]

Open in a new tab

Statistics of Cosine Similarity Index (CS) of the signals at the regularity frequency, separately for the different experiments and experimental conditions. Regularity frequencies: 5 Hz in Pilot 1 and Pilot 2, 2 Hz in Experiment 1 and Experiment 2.

Statistical analysis

CS values were analyzed with Bayesian multilevel regressions using brms [45], a user-friendly R package that interfaces with the probabilistic programming language STAN [46] to estimate posterior distributions of the parameters of interest via Markov Chain Monte Carlo (MCMC) algorithms [47]. All models were fitted using weakly informative priors, i.e., Normal(0,3) on beta coefficients and Student(3,0,2) on the standard deviation of varying effects (i.e., participants). As a response distribution function, a Gaussian distribution was chosen. Parameters were estimated using 8 MCMC chains with 8,000 iterations each, 4,000 warmup samples—to get the sequences closer to the mass of the posterior distributions and then discarded—, and a thinning interval of 2, to minimize sample autocorrelation. Thus, the total number of retained posterior samples per parameter was 16,000.

We verified model convergence by visually inspecting trace plots and graphical posterior predictive checks [48]. We also examined: (i) the ratio of effective numbers of samples—i.e., the effective number of samples divided by the total number of samples—, which we aimed to keep larger than 0.1 to avoid excessive dependency between samples; (ii) the Gelman-Rubin $\hat{R}$ statistic [49]—comparing the between-chains variability (how much do chains differ from each other?) to the within-chain variability (how widely did a chain explore the parameter space?) [50]—which, as a rule of thumb, should not be larger than 1.05 or chains may not have successfully converged; (iii) the Monte Carlo standard error (MCSE)—the standard deviation of the chains divided by their effective sample size—a measure of sampling noise [51].

Differences between conditions were assessed by computing the mean and the 95% highest density interval (HDI) of the difference between posterior distributions of the relevant parameters [51] and calculating evidence ratios (ERs), i.e., the ratios between the proportion of posterior samples on each side of zero. ERs can be interpreted as the probability of a directional hypothesis (e.g., “condition A is larger than condition B”) against its alternative (e.g., “condition B is larger than condition A”). As a rule of thumb, we interpreted our results as providing “inconclusive” evidence for a specified directional hypothesis when 1 < ER < 3, “anecdotal” evidence when 3 < ER < 10, and “strong” evidence when ER > 10. When ER = Inf, the posterior distribution was completely on one side of zero, thus providing “very strong” evidence. Please note that contrasts between different conditions in the irregular stimulus presentations are excluded from the results because uninterpretable, due to the overfitting issue with the spatial filtering approach mentioned above (see section Spectral decomposition of stimulus-driven EEG signals).

Throughout the main text we report the results of the analysis carried out on the cosine-similarity values at the Fourier coefficients that correspond to the regularity frequency in respective stimulation conditions (also for irregular presentation conditions). The results of behavioral performance, post-experiment emotion ratings, and SSVEP activity at the stimulation frequency are described in their respective sections in the Supplementary Materials.

Software

Data visualization and statistical analyses were carried out in R v3.6.1 [52] via RStudio v1.2.1335 [53]. We used the following packages (and their respective dependencies):

data manipulation: tidyverse v1.2.1 [54], Rmisc v1.5 [55];
statistical analyses: brms v2.10 [56], rstan v2.19.2 [57];

isualization: ggplot2 v3.2.1 [58], ggpirate v0.1.1 [59], bayesplot v1.7.0 [48], tidybayes v1.1.0 [60], bayestestR v0.3.0 [61], BEST v0.5.1 [62], viridis v0.5.1 [63], cowplot v1.0.0 [64];
report generation: knitr v1.25 [65].

Results

Irrespective of face orientation, regular conditions elicited larger SSVEPs relative to irregular presentations (see Table 3), indicating that our stimulation protocol produced the intended regularity-driven SSVEPs. This is further demonstrated by the prominent parieto-occipital topographies of SSVEP maxima in the regular conditions that are absent in the irregular conditions (see Fig 2). We also observed that angry and neutral conditions elicited comparable SSVEPs when upright and inverted. Interestingly, neutral regular conditions showed larger SSVEPs compared to angry, both when upright [ER = 162.27] and inverted [ER = 25.02].

Table 3. Results statistical analyses Cosine Similarity Index (CS) at regularity frequency.

experiment	variability	orientation	regularity	comparison	mean [95% HDI]	evidence ratio
Pilot 1			angry	upright vs. inverted	0.00 [-0.02, 0.02]	1.38
			neutral	upright vs. inverted	0.01 [-0.01, 0.02]	3.39
		upright		neutral vs. angry	-0.03 [-0.05, -0.01]	162.27
				irregular vs. angry	0.02 [0.01, 0.04]	234.29
				irregular vs. neutral	0.05 [0.04, 0.06]	*Inf*
		inverted		neutral vs. angry	-0.02 [-0.04, 0.00]	25.02
				irregular vs. angry	0.03 [0.01, 0.04]	1,141.86
				irregular vs. neutral	0.05 [0.03, 0.06]	*Inf*
Pilot 2	high		angry	upright vs. inverted	0.03 [-0.02, 0.07]	8.91
	high		neutral	upright vs. inverted	0.00 [-0.06, 0.06]	1.03
	low		angry	upright vs. inverted	-0.01 [-0.04, 0.02]	2.54
	low		neutral	upright vs. inverted	0.01 [-0.01, 0.04]	4.52
	high	upright		neutral vs. angry	-0.03 [-0.08, 0.02]	9.09
				irregular vs. angry	0.08 [0.04, 0.12]	1,776.78
				irregular vs. neutral	0.11 [0.07, 0.15]	*Inf*
		inverted		neutral vs. angry	-0.06 [-0.09, -0.02]	614.38
				irregular vs. angry	0.04 [0.02, 0.06]	1,332.33
				irregular vs. neutral	0.10 [0.06, 0.13]	*Inf*
	low	upright		neutral vs. angry	-0.02 [-0.04, 0.01]	8.19
				irregular vs. angry	0.04 [0.02, 0.07]	1,999.00
				irregular vs. neutral	0.06 [0.04, 0.08]	*Inf*
		inverted		neutral vs. angry	0.00 [-0.03, 0.04]	1.29
				irregular vs. angry	0.06 [0.03, 0.08]	15,999.00
				irregular vs. neutral	0.05 [0.03, 0.08]	1,776.78
		upright	angry	high vs. low	0.04 [-0.01, 0.08]	12.69
		upright	neutral	high vs. low	0.05 [0.01, 0.09]	90.95
		inverted	angry	high vs. low	0.00 [-0.03, 0.03]	1.09
		inverted	neutral	high vs. low	0.06 [0.02, 0.10]	409.26
Experiment 1	high		angry	upright vs. inverted	0.01 [0.00, 0.02]	17.16
	high		neutral	upright vs. inverted	0.01 [-0.01, 0.03]	7.62
	low		angry	upright vs. inverted	0.01 [0.00, 0.02]	10.15
	low		neutral	upright vs. inverted	0.01 [-0.01, 0.02]	2.63
	high	upright		neutral vs. angry	-0.02 [-0.04, 0.01]	10.83
				irregular vs. angry	0.03 [0.02, 0.04]	*Inf*
				irregular vs. neutral	0.05 [0.03, 0.07]	15,999.00
		inverted		neutral vs. angry	-0.01 [-0.03, 0.00]	16.72
				irregular vs. angry	0.03 [0.02, 0.04]	*Inf*
				irregular vs. neutral	0.04 [0.03, 0.05]	*Inf*
	low	upright		neutral vs. angry	-0.01 [-0.02, 0.01]	6.57
				irregular vs. angry	0.02 [0.01, 0.03]	245.15
				irregular vs. neutral	0.03 [0.01, 0.05]	1,141.86
		inverted		neutral vs. angry	-0.01 [-0.02, 0.00]	102.90
				irregular vs. angry	0.01 [0.00, 0.02]	409.26
				irregular vs. neutral	0.03 [0.01, 0.04]	*Inf*
		upright	angry	high vs. low	0.01 [0.00, 0.03]	30.37
		upright	neutral	high vs. low	0.02 [0.00, 0.04]	39.51
		inverted	angry	high vs. low	0.01 [0.00, 0.02]	49.63
		inverted	neutral	high vs. low	0.01 [0.00, 0.03]	23.46
Experiment 2	high		angry	upright vs. inverted	0.00 [-0.01, 0.02]	1.66
	high		neutral	upright vs. inverted	0.01 [-0.01, 0.03]	11.44
	low		angry	upright vs. inverted	0.00 [-0.01, 0.02]	1.89
	low		neutral	upright vs. inverted	0.01 [0.00, 0.02]	75.56
	high	upright		neutral vs. angry	-0.03 [-0.05, -0.01]	107.11
				irregular vs. angry	0.04 [0.02, 0.05]	*Inf*
				irregular vs. neutral	0.06 [0.05, 0.08]	*Inf*
		inverted		neutral vs. angry	-0.01 [-0.03, 0.00]	17.14
				irregular vs. angry	0.03 [0.01, 0.04]	15,999.00
				irregular vs. neutral	0.04 [0.03, 0.06]	*Inf*
	low	upright		neutral vs. angry	-0.01 [-0.02, 0.00]	9.51
				irregular vs. angry	0.0 [-0.01, 0.02]	3.35
				irregular vs. neutral	0.02 [0.00, 0.03]	189.48
		inverted		neutral vs. angry	0.00 [-0.01, 0.01]	1.03
				irregular vs. angry	0.01 [0.01, 0.02]	940.18
				irregular vs. neutral	0.01 [0.01, 0.02]	2,284.71
		upright	angry	high vs. low	0.02 [0.00, 0.04]	42.24
		upright	neutral	high vs. low	0.04 [0.02, 0.05]	2,284.71
		inverted	angry	high vs. low	0.02 [0.01, 0.03]	362.64
		inverted	neutral	high vs. low	0.03 [0.02, 0.05]	3,199.00

Open in a new tab

Statistical comparisons of Cosine Similarity values between all pairs of factor levels, separately for all experiments and experimental conditions. Mean and 95% HDI are related to differences in CS value between the respective comparison. Comparisons showing strong evidence against the hypothesis of no difference are presented in bold.

Fig 2 — (A) Topographical distributions of phase-locking, quantified as the cosine similarity (CS) index, at the regularity frequency of 5 Hz. Note the lack of phase-locking, i.e., consistent responses to the irregular stimulation conditions (maps use the same scale, in arbitrary units); (B) CS index of phase locking across the EEG spectrum (arbitrary scale) with the group average superimposed on single subject spectra, based on RESS spatial filter projections and collapsed across conditions featuring a regular presentation (i.e., excluding irregular stimulation conditions for their lack of signal; see panel A). For visualization only, CS has been converted to log(CS). (C) CS at the regularity frequency for each participant (single dots) and condition. Mean values are marked by horizontal black lines and 95% confidence intervals represented as transparent boxes. See *Table 3* for specific results of statistical comparisons.

Discussion

In this first pilot experiment (Pilot 1), we tested for enhanced SSVEPs driven by the regular presentation of angry over neutral faces at a rate of five similar emotional expressions per second (5 Hz) embedded in a stream of 15 facial identities per second (15 Hz). Based on previous studies [e.g., 25], we expected enhanced SSVEPs for regularly presented angry compared to neutral faces. Instead, our results showed the opposite pattern: neutral faces drove a more robust response. Further, upright and inverted regular faces showed comparable SSVEPs, in contrast with an expected effect of face inversion [i.e., smaller SSVEPs for inverted faces; see 22]. We speculated that the high variability of the stimulus material—nine different facial identities displaying three emotional expressions to various degrees of intensity—might have been a confounding factor. Specifically, angry expressions may differ more between identities than neutral expressions, and this greater dissimilarity may have led to a less consistent brain response.

To evaluate this hypothesis, in a follow-up pilot study we decided to use two instead of nine face identities. We chose a female identity with relatively low variability between angry and neutral expressions and one male identity with relatively high variability (see Fig 3). Trials were split evenly into only presenting either the female or the male identity. We expected that using fewer identities might facilitate the extraction of emotional information from the fast RSVP, thus leading to a face inversion effect (larger responses for upright relative to inverted faces). Furthermore, using one identity per trial should attenuate effects of variability in low-level visual features across facial expressions, thereby facilitating the actual extraction of emotional information leading to the expected gain effect for angry faces (i.e., enhanced SSVEP). Supporting this assumption, Vakli et al. [66] showed that the SSVEP adaptation effect to prolonged stimulation was invariant to changes in facial expression only when the same identity was presented.

Fig 3 — (A) Sequence of six exemplar images for each experimental condition. For the regular conditions, every third image contained a repetition of the same emotional category (angry or neutral). For the irregular condition, image triplets contained all four emotional categories (neutral, angry, happy, disgusted) in random order. Images were presented either upright or upside down. For each trial only one of two face identities was presented, either with low (see top row) or high (see row two) dissimilarity between emotional expressions. Note that, for illustration purposes, we here show images from the Face Research Lab London Set [31], additionally modified in GIMP 2.10.14 (https://www.gimp.org/) to display anger. A set of different (only male) face identities was presented during the experiment, but cannot be shown here due to copyright restrictions. Please refer to https://www.macbrain.org/resources.htm for examples of faces used in the experiment. (B) Exemplar visual presentation of stimuli within each trial. Single images were presented for 33 msec in *Pilot 2* or 83 msec in *Experiment 1* and 2, followed by a fixation cross-only image of similar duration, leading to a presentation rate of 15 Hz for the image stream in *Pilot 2* and 6 Hz in *Experiment 1* and 2. The resulting regularity frequency was 5 Hz for *Pilot 2* and 2 Hz for *Experiment 1* and 2 during the regular conditions. Note: image not to scale.