Sequential effects in facial attractiveness judgements: No evidence of stable individual differences

Robin S S Kramer; Charlotte Cartledge

doi:10.1177/03010066251387893

. 2025 Oct 28;55(3):266–279. doi: 10.1177/03010066251387893

Sequential effects in facial attractiveness judgements: No evidence of stable individual differences

Robin S S Kramer ^1,^✉, Charlotte Cartledge ¹

PMCID: PMC12979639 PMID: 41148247

Abstract

When items are judged in a sequence, evaluation of the current item is biased by the one preceding it. These sequential effects have been found for judgements of facial attractiveness, where studies have often shown an assimilation effect – ratings of the current face are pulled towards the attractiveness of the preceding face. However, the focus has been on the average bias across participants in general, with little consideration of individual differences. Here, we investigated an important first question – are individual differences in how sequential effects bias our judgements stable? Establishing this stability is crucial before considering potential associations between these individual differences in bias and other observer-level traits. To this end, we asked participants to provide attractiveness ratings for two different sequences of faces. In Experiment 1, one sequence comprised neutral, passport-style photos, while the other showed more unconstrained, naturalistic images. In Experiment 2, both sequences were composed of images taken from the same (constrained) photoset. Our results were identical for both experiments, with participants in general showing assimilation in their attractiveness judgements. However, for a given individual, we found no evidence that the strength of this bias was stable across the two sequences that they rated. These findings may be the result of within-person inconsistencies in perceiving facial attractiveness more broadly, and should serve to motivate further investigation of individual differences as applied to the domain of sequential effects.

Keywords: facial attractiveness, sequential effects, serial dependence, individual differences

Introduction

People are rarely perceived in isolation. As a result, the context in which they are viewed may influence our perceptions of them. For instance, judgements of attractiveness are higher when someone appears in a group (rather than alone; e.g., Kramer, Javorková, et al., 2024; Walker & Vul, 2014) or alongside an unattractive person (Kernis & Wheeler, 1981). Further, the temporal context also plays a role in the formation of first impressions. Sequential effects refer to biases caused by the previous item in a sequence when responding to the current item. In other words, the attractiveness of any given face depends, to some degree, on the attractiveness of the face that preceded it. While research has now established the presence of these biases in facial attractiveness judgements (e.g., Kramer et al., 2013; Xia et al., 2016), evidence is mixed as to the direction in which these biases act (e.g., Kondo et al., 2012; Pegors et al., 2015). In addition, since individuals vary in how much their judgements of the current face are influenced by the previous image (e.g., Huang et al., 2018), it is important to investigate whether these between-person differences are consistent/stable across tasks. If they are, we might begin to search for factors that account for these differences. However, few studies have considered this topic to date, and so here, we focus on the question of an individual's consistency in how sequential effects influence their facial attractiveness ratings.

Studies investigating sequential effects in attractiveness judgements have identified biases acting in opposing directions. Assimilation (or serial dependence; Manassi et al., 2023) refers to the current judgement being pulled towards the previous response (e.g., Kok et al., 2017; Kondo et al., 2012, 2013; Kramer et al., 2013; Taubert & Alais, 2016), whereas a contrast effect describes the current judgement being pushed away from the previous response (e.g., Huang et al., 2018; Pegors et al., 2015). Indeed, both types of bias may be present simultaneously while operating through different mechanisms (e.g., Pegors et al., 2015). A bias when responding to the current image may be caused by (1) the perception of the previous image's attractiveness (i.e., a perceptual or stimulus bias) and/or (2) the response given to the previous image (i.e., a response bias). Problematically, distinguishing between these two mechanisms can be difficult using traditional ratings tasks because of the strong correlation between the perceived attractiveness of the previous face and the rating it received. As a result, statistical artefacts due to multicollinearity can lead to uninterpretable analytical models (see Kramer & Jones, 2020).

To better distinguish between perceptual and response biases, researchers have utilised more complex experimental designs. For instance, Pegors et al. (2015) asked participants to alternate the type of judgement given on each trial (attractiveness and hair darkness). Their results demonstrated that the attractiveness rating given to the current face assimilated towards the hair darkness rating given to the previous face (a response bias) while contrasting away from the attractiveness value of the previous face (a perceptual bias). This pattern of results was replicated by Huang et al. (2018), who alternated the presentation of faces and ringtones. Here, no cross-modal contrast effect of the previous stimulus was found – the current face's attractiveness rating was not influenced by the agreeableness of the sound preceding it. However, assimilation due to the previous response remained – responding to the preceding ringtone biased the response given to the current face. Interestingly, the authors also demonstrated an assimilation effect, although weaker, when the previous and current responses were given orally, suggesting that response biases are unlikely to be caused by action repetition alone. This aligns with the findings of Kramer and Jones (Experiment 5; 2020), where relocating the mouse cursor to the centre of a circular scale before each response (and therefore minimising action repetition) failed to prevent assimilation from occurring.

Taking a different approach, Kramer and Pustelnik (2021) attempted to isolate each type of bias. In their first task, faces were presented in pairs, with the first viewed for 3 s (without being rated) before being replaced by the second face, which was rated for attractiveness. As such, a lack of response to the preceding face meant that any perceptual bias (here, there was evidence of a contrast effect) was isolated. In their second task, a rating was collected with no face presented (participants were simply instructed to respond with a specified value on the scale), and this was followed by a face that was rated for attractiveness. Therefore, responding to the current face without one preceding it meant that a response bias (which was absent in their findings) was isolated.

Another method for ruling out a response bias was developed by Xia and colleagues (2016). By requiring participants to rate the attractiveness of a sequence of faces twice, each time presented in a different random order, the researchers were able to use responses taken from the second (independent) run when modelling ratings given during the first run. These independent face ratings were not preceded by the same responses given during the first run, thereby removing the possibility of any response bias influencing the current image's rating during that first run. The findings of the study demonstrated an assimilation effect, where the attractiveness rating given to the current face was pulled towards the perceived attractiveness of the previous face (specifically, the difference in attractiveness between the previous and current faces).

Across all the studies mentioned so far, the focus has been on detecting and quantifying sequential effects across participants on average. In other words, do people in general show an assimilation or contrast effect when rating facial attractiveness? In a recent meta-analysis and review, Manassi and colleagues (2023) highlighted the study of individual differences as an important gap in the literature within the domain of sequential effects. For a given sample, the majority of participants may demonstrate assimilation towards the previous image in their judgements (for example) but there remain some who show no influence or even a contrast effect. However, these between-person differences have received little attention to date. In one study, the size of participant-level sequential effects during a facial identity task showed a small correlation with scores on a test of face recognition (Turbett et al., 2019). Importantly, to our knowledge, researchers have yet to investigate individual differences in relation to sequential biases during judgements of facial attractiveness.

Perceptions of attractiveness may be especially prone to individual differences in biases in comparison with other types of visual judgement. Low-level perceptual tasks tend to involve decisions based solely on the properties of the stimulus presented (e.g., brightness or size), and this may also be the case for the perception of faces when the task focusses on judging identity, for instance (e.g., Turbett et al., 2019). In contrast, facial attractiveness judgements are also strongly dependent on the individual perceiver, who is influenced by their own personal taste (e.g., Hönekopp, 2006; Kramer et al., 2018). Perhaps due to this subjectivity, judgements are often influenced by a variety of additional factors (e.g., presenting a face among others – Walker & Vul, 2014). Indeed, attractiveness judgements also vary for the same observer perceiving the same faces (e.g., Kramer et al., 2018, 2024). As such, given the somewhat unstable nature of these perceptions, we might predict that they would be particularly susceptible to sequential effects. Of course, this does not mean that an individual's bias resulting from the previous stimulus is stable across different sequences, and this provides the motivation for the investigation presented here.

In the current work, we therefore focussed on individual differences in sequential effects when rating facial attractiveness. Before researchers can explore whether such differences may be associated with other observer-level traits, we must first establish the stability of these individual differences. If the magnitude and direction of sequential effects are dictated by the individual then we should be able to detect a degree of consistency in these across similar versions of a task (e.g., the rating of attractiveness for two sequences of faces). However, if these effects are more temporary in nature (e.g., sensitive to the specific instance of the task or overshadowed by noise and other factors) then their magnitude/direction will not be stable across similar versions. Consequently, searching for additional factors associated with such biases would make little sense. While previous work has shown that individual differences in sequential effects in relation to orientation perception were highly stable when measured on two separate occasions (Kondo et al., 2022), this has yet to be considered within the domain of attractiveness judgements (or face perception more broadly). Here, across two experiments, we sought to address this question.

Experiment 1

In this first experiment, we investigated whether people showed consistency in how much their attractiveness ratings were influenced by sequential effects. By asking participants to provide ratings for two sets of face images taken from different databases, we were able to quantify this between-sequence consistency.

Method

Participants

A sample of 200 participants (136 women, 61 men, 3 nonbinary; age M = 29.2 years, SD = 14.9 years; 91% self-reported ethnicity as White) provided written informed consent online before taking part, and received an onscreen debriefing upon completion of the experiment. Participants were recruited by word of mouth (e.g., through asking friends and family, and sharing the experiment's weblink on social media) and were not paid to take part in the experiment. The data from 62 additional participants were excluded after failing to meet the predefined criteria (see below). Both Experiments 1 and 2 received ethical approval from the university's research ethics committee (ID 20105) and were carried out in accordance with the provisions of the World Medical Association Declaration of Helsinki.

The sample sizes for this experiment and Experiment 2 were determined a priori by conducting a power analysis using G*Power 3.1 (Faul et al., 2007). Since our main analysis would be a correlation, a total sample size of at least 138 participants was required to achieve 95% power to detect medium-sized effects at an alpha of .05 (two-tailed). In addition, the use of one-sample t-tests (following the same parameters) would require 54 participants. However, recruitment was allowed to continue until the end of a prespecified period.

Stimuli

For the first sequence, we randomly selected 30 images from a larger set of White women featured in the Chicago face database (CFD; Ma et al., 2015). In all cases, the women were displaying neutral facial expressions, with images containing the head and neck, as well as the top of the shoulders (see Figure 1). We chose to limit our set to a single gender and ethnicity to avoid additional influences on sequential effects that result from rating mixed sequences (Kramer et al., 2013).

Figure 1. — The Two Styles of Image Presented in Experiment 1. Left: An image illustrative of those featured in the Chicago face database. Right: An image Iilustrative of the bridesmaids photoset. (The actual images used in the experiments cannot be reproduced here due to copyright restrictions.).

For the second sequence, we selected 30 images from a larger set featured in previous research which depicted White female bridesmaids collected through an online search engine (Carragher et al., 2021). Each image contained only the head and neck, with the face showing a positive expression (see Figure 1). Since these individuals were originally cropped from group photographs, the 30 images used here were chosen at random from the initial set with the proviso that no more than one image from each group photo was selected to avoid some images appearing more similar to others (e.g., by featuring a common background and dress style).

Procedure

The experiment was completed using the Gorilla online testing platform (Anwyl-Irvine et al., 2020). After consent was obtained, participants provided demographic information. Each participant was then presented with both sequences, twice each, in one of the following orders: ABAB or BABA. Order assignment was counterbalanced across participants.

For each sequence, participants were presented with the 30 face photos in a random order. For each image, participants were asked ‘how attractive is this face?’ and responded using a 7-point scale with labelled anchors (1 = low; 7 = high; see Figure 2). Images remained onscreen until ratings were provided by clicking the corresponding buttons. Responses were self-paced with no time limit. As soon as a rating was given, the next image appeared onscreen, with no intertrial interval.

Figure 2. — An Example Trial During the Task, Featuring an Illustrative Image. (The actual images used in the experiments cannot be reproduced here due to copyright restrictions.).

Between each of the four blocks (two sequences × two repetitions), participants were presented with onscreen instructions where they were informed of their progress (i.e., how many blocks they had completed) and told to click the button provided when they were ready to continue.

Results

Exclusions

We identified and excluded participants who may not have attended to the task – a common concern when collecting online data (Hauser & Schwarz, 2016). First, if participants provided the same response for every image during one or more of the four blocks, their data were excluded (n = 8).

Second, since participants rated each sequence twice, we would expect these two sets of ratings for the same faces to be strongly correlated. As such, for each of the two sequences separately, we determined whether the participant's ratings were statistically different from chance/random responding. Following the approach of Xia and colleagues (2016), we carried out a permutation test. To produce the distribution of correlations expected from random responding, we permuted/shuffled the first set of ratings for a given sequence 1,000 times. For each permutation, we calculated the correlation between the two sets of ratings after this random shuffling of one set. The distribution of these 1,000 correlations was then used to calculate the significance level of the original correlation resulting from the unshuffled ratings (two-tailed; 0.05 threshold). This process was repeated for both sequences for each participant, where a nonsignificant permutation test for one or both sequences resulted in the exclusion of that participant (n = 54). Although relatively high, the proportion of participants excluded in this way (21%) was lower than the proportion of 29% reported by Xia and colleagues (2016).

Analytical Strategy

To quantify the influence of the previous image on the current rating, we followed the analytical strategy of Xia and colleagues (2016).¹ For each participant, for each of the two sequences separately, we fitted a multiple linear regression:

{resp}_{i} = β_{0} + β_{1} \cdot {idp}_{i} + β_{2} \cdot ({idp}_{i - 1} - {idp}_{i})

Here, resp _i is the response given to the ith trial when the sequence was rated for the first time. As such, the preceding trial during that presentation is referred to as i−1. During the second viewing of that same sequence (now presented in a different, random order), the ratings given to these two images (trials i and i−1) were independent of each other (since i−1 no longer appeared before i) and are referred to as idp _i and idp _i ₋₁ respectively. Finally, β₀, β₁ and β₂ are the coefficients of the model.

To simplify, for a given image, the model predicts the first rating it received (i.e., when the sequence was first presented) using its second rating (i.e., received during its second presentation), along with the difference between the ratings given to it and the preceding image (but with both of these ratings taken from their second presentations). Since we used ratings from the second presentation as predictors in the model, any resulting sequential effects would be independent of a response bias. This is because a response bias is caused by responding to the previous image prior to the current one, while here, we utilised responses from the second presentation, meaning that idp _i ₋₁ was not given directly before idp _i and, as a result, could not influence it.

In the model, β₁ represents how well the second rating of an image predicts the first rating it received – presumably, this coefficient will be positive and large. Importantly, β₂ represents the magnitude of any sequential effects. An attractiveness difference between the preceding face and the current one should produce assimilation (if the value is positive) or a contrast effect (if negative).

Regression Results

Across our sample of participants, β₁ values were large (see Table 1). For both sequences, a one-unit increase in the second rating of attractiveness for a given face predicted an increase of close to 0.8 when that same face was first rated. As noted above, we expected a strong influence of this predictor since we were considering two responses to the same image. However, previous research has identified some within-person inconsistency when participants rated the same face twice (e.g., Kramer et al., 2018; Kramer, Ritchie, et al., 2024), explaining why the coefficient was not closer to a value of one here.

Table 1.

A Summary of the Regression Model Coefficients for Both Experiments.

Experiment	Sequence	β ₁	β ₂
1	CFD Set 1	0.77 [0.73, 0.81]	0.05 [0.03, 0.07]
	Bridesmaids	0.79 [0.75, 0.83]	0.10 [0.07, 0.12]
2	CFD Set 1	0.72 [0.68, 0.75]	0.05 [0.03, 0.07]
	CFD Set 2	0.77 [0.73, 0.80]	0.07 [0.05, 0.09]

Open in a new tab

Note. CFD = Chicago face database. Values are reported as means and 95% confidence intervals.

The β₂ values in Table 1 quantified the influence of sequential effects for the two sequences. In both cases, positive values suggested that the response to the current face was pulled towards the attractiveness level of the previous face. One-sample t-tests demonstrated that these values were significantly larger than zero: CFD Set 1 – t(199) = 4.38, p < .001, Cohen's d = 0.31; Bridesmaids – t(199) = 7.86, p < .001, Cohen's d = 0.56. However, they remained small, with a one-unit increase in the difference between the (independent responses to the) current and previous faces predicting an increase of only 0.05 to 0.10 when that current face was first rated for attractiveness. Importantly, these values were comparable with the size of assimilation (β₂ = 0.042) reported by Xia and colleagues (2016).

Finally, we correlated participants’ β₂ values to determine whether there was evidence of a stable/consistent influence of sequential effects for the same person across the two sequences. We found a small, nonsignificant association: r(198) = .12, p = .106 (see Figure 3).

Figure 3. — The Correlation Between β₂ Values for the Two Sequences in Experiment 1.

Discussion

For both sequences, we found evidence of a small assimilation effect in attractiveness ratings resulting from a perceptual bias, in line with previous work (Xia et al., 2016). While this assimilation was present across the sample as a whole, individual differences in the magnitude and direction of any sequential effects were not stable across the two sequences. That is, participants who showed assimilation while rating the CFD images were not more likely to show this effect in their ratings of the bridesmaids’ images.

Both sequences involved rating the facial attractiveness of White women and so presumably incorporated similar cognitive and perceptual demands. However, the images in the two sequences differed somewhat in their style (i.e., standardised neutral photos versus unconstrained photos with positive expressions; see Figure 1). As such, the magnitude and direction of sequential effects evident in participants’ ratings may have been influenced by these characteristics of the image sets, which could have resulted in differing biases for the same participant. We therefore carried out a second experiment to address this possibility.

Experiment 2

The results of Experiment 1 showed no evidence of stable individual differences in the influence of sequential effects when rating two face sequences. However, differences in the image characteristics of the two sequences may have resulted in the absence of this stability. Therefore, in this second experiment, we again presented participants with two different sequences to rate for facial attractiveness. Crucially, both sequences comprised images taken from the same photoset (the CFD).