Skip to main content
Philosophical Transactions of the Royal Society B: Biological Sciences logoLink to Philosophical Transactions of the Royal Society B: Biological Sciences
. 2023 Dec 18;379(1895):20220414. doi: 10.1098/rstb.2022.0414

Modelling individual aesthetic judgements over time

Aenne A Brielmann 1,3,, Max Berentelg 1, Peter Dayan 1,2
PMCID: PMC10725758  PMID: 38104603

Abstract

Listening to music, watching a sunset—many sensory experiences are valuable to us, to a degree that differs significantly between individuals, and within an individual over time. We have theorized (Brielmann & Dayan 2022 Psychol. Rev. 129, 1319–1337 (doi:10.1037/rev0000337))) that these idiosyncratic values derive from the task of using experiences to tune the sensory-cognitive system to current and likely future input. We tested the theory using participants’ (n = 59) ratings of a set of dog images (n = 55) created using the NeuralCrossbreed morphing algorithm. A full realization of our model that uses feature representations extracted from image-recognizing deep neural nets (e.g. VGG-16) is able to capture liking judgements on a trial-by-trial basis (median r = 0.65), outperforming predictions based on population averages (median r = 0.01). Furthermore, the model’s learning component allows it to explain image sequence dependent rating changes, capturing on average 17% more variance in the ratings for the true trial order than for simulated random trial orders. This validation of our theory is the first step towards a comprehensive treatment of individual differences in evaluation.

This article is part of the theme issue ‘Art, aesthetics and predictive processing: theoretical and empirical perspectives’.

Keywords: aesthetics, individual differences, temporal dynamics, value

1. Introduction

Where do you want to live? With whom? What do you want to eat for dinner? Numerous decisions, big and small, co-depend on the options’ sensory appeal—whether you like their looks, sound, taste or feel. Still, we know little about how we come to like certain objects and how their appeal influences decisions. In particular, we struggle to understand why preferences differ so radically between people and how they change over time for a single person. Here, we tackle the challenge of understanding both these aspects of individual aesthetic preferences. We do this in the framework of a recent theory and its realization in a simple model that allows us to make quantitative predictions about the aesthetic value an individual associates with a given object at a particular point in time. We fit and validate the model using newly collected data.

(a) . Individual sensory value judgements across time

Historically, the majority of studies on sensory valuation has been concerned with identifying object features that predict average aesthetic value judgements across participants. This endeavour has proven moderately successful, e.g. identifying some object characteristics such as symmetry, prototypicality and roundness that are associated with higher liking or beauty ratings (see also [1]).

However, scholars increasingly acknowledge the importance of individual differences in people’s evaluation of sensory objects (e.g. [27]). Indeed, systematic assessments of the relative contribution of individual differences (sometimes also referred to as individual or idiosyncratic taste) have revealed that at least 50%, and up to 92%, of the variance in sensory value judgements stem from differences between individual observers [7,8].

Individual differences have also been investigated by looking at distinct sub-populations based on cultural background and other category-specific kinds of expertize (e.g. [911]). These studies have shown that experts often give overall higher value judgements to objects that fall into their area of expertize [9] and sometimes demonstrated that experts’ judgements relate more strongly to different object features than novices’ [10].

Only a few studies have explicitly looked at differences between single individuals. One of them is by Clemente et al. [3], who showed that the extent to which, and even the direction in which, object properties like symmetry and complexity influence liking ratings varies vastly between individuals. They also showed that these idiosyncratic relationships between object properties and sensory evaluations are modality specific.

One factor that might have complicated attempts to predict individual evaluations of sensory objects in the past is the influence of sequence effects in trial-by-trial data [12]. The common practice of randomizing stimulus order aims to average out these effects on the population level. When accounting for a single individual’s responses, however, sequence effects cannot be ignored.

Various aspects of the dynamics of aesthetic judgements have been documented. For instance, subjectively experienced pleasure increases for a given object during the first 1 or 2 s of presentation [1,13,14]. Furthermore, both the preceding response and the pleasure initially elicited by the preceding image influence the current pleasure response: the assimilation effect describes the phenomenon that, under certain circumstances, people’s rating of a given stimulus are biased towards the rating they gave to the previous stimulus. By contrast, the contrast effect describes the opposite effect, i.e. that under certain circumstances, people’s ratings of a given stimulus are biased away from the rating they give to the preceding stimulus [12,15]. Of note, the occurrence of contrast effects [16] and sometimes even assimilation effects [15] seem to depend on the degree of similarity between objects, i.e. they are only present in homogeneous stimulus sets.

(b) . Missing mechanistic foundations

Mechanistic accounts of these various subject-specific and temporal idiosyncrasies are broadly absent. However, there has been some work trying to unearth relevant factors, at least in the static case (e.g. [911]). There is no work we know of that encompasses the temporal effects.

One partial approach to static evaluation has been to look at differences between the averages of various groups. However, differences based on group means can only explain one portion of individual differences. Closer to individual differences are studies showing that general music-reward sensitivity is linked to stronger connections between auditory processing areas and the reward circuity [17]. However, these only speak to differences in people’s general ability to find music rewarding, not differences in what kind of music people like to listen to.

Our work owes most to a recent study by Iigaya et al. [18] who used linear regression to fit individual (n = 7) participants’ liking ratings of art images on the basis of feature values derived from deep neural nets. This work shows the power of an appropriate feature space, but does not suggest a rational basis for the individual differences it mechanistically describes, and also lacks a way to account for temporal and sequence effects. Here, we wish to go one step further and use similar feature spaces to fit and predict individual ratings based on a model that aims to capture the mechanisms underlying differences between individual people’s sensory value judgements. In addition, we also aim to explain and predict a second source of variability in sensory value judgements: the temporal sequence in which people experience the objects that they evaluate.

(c) . A computational theory of aesthetic value

We propose that our recent computational theory and model of aesthetic value [19] can help us understand the nature and origin of both static and time-varying evaluations of individuals. To be clear from the outset: we do not consider aesthetic value to be limited to works of art or ascribe special meaning to the term ‘aesthetic value’ but use it as a synonym for the value ascribed to sensory experiences. Thus, we consider aesthetic value to be the basis of a variety of aesthetic evaluations, such as (dis-)liking, (dis-)pleasure or beauty, as well as of aesthetic preferences and choices. In this sense, every sensory object has a certain aesthetic value that varies according to the observer evaluating it. We first outline the theory, and then indicate its account of these individual differences.

According to the theory, an observer strives to adapt her sensory-cognitive system based on current input with the goal of processing sensory objects effectively now and in the future. The aesthetic values of sensory objects derive directly from their malign or benign contribution to this goal. The theory proposes that the sensory system of the observer consists of two generative models. One, (the conventional one in Helmholtzian, generative model accounts of perception [2024]), reflects the observer’s current probability distribution over the features expressed in sensory inputs. We call this the observer’s system state; it determines the effectiveness of the processing of those features when they arise. The other generative model reflects the observer’s belief about the long-term future distribution of sensory feature values. We call this the expected true distribution; it determines which features are likely to need to be well processed in the future. Critically, we presume that the system state adapts relatively quickly to feature values that are currently present, i.e. that it exhibits learning. In the current study, we follow Iigaya et al. [18] in deriving n-dimensional feature values for objects from the activations in a deep neural network trained to recognize images. The geometry of this feature space dictates how learning occasioned by one object impacts judgements of objects that are close to it in feature space.

Within our theoretical framework (for details, see the Methods section), an object’s aesthetic value depends on two interlinked components: (i) the immediate sensory reward—the likelihood of a stimulus given an observer’s system state, and (ii) the value of learning—the change in the average likelihood of expected future stimuli. Intuitively, these two components of aesthetic value correspond to (i) the signal that the current input is easy to process and (ii) the signal that the current experience will help making the processing of future inputs, on average, easier. This arrangement mimics that of standard reinforcement learning (RL) models which focus on the reward that is expected to arise in the long-term future. Like standard reward-learning models, it combines an immediate reward (i) with the change in expected future reward (ii). The extent to which aesthetic value can emerge from the latter depends on the extent to which that stimulus is unexpected. In that sense, our framework is similar to previous predictive coding frameworks of aesthetic value (e.g. [2527]). Like Rutledge et al. [28], we consider the weighting between immediate and expected change in future reward to be flexible—in our case as an equivalent of the effect of temporal discounting of the future. Along with other treatments that consider intrinsic rather than extrinsic rewards (e.g. [29,30]), the rewards in our theory are not quantified by measurable object properties (such as calories or monetary value) and are entirely generated within the observer. Furthermore, we focus on the aesthetic value of the sensory stimuli themselves, and do not attempt to operate at the ‘meta’-level at which participants could learn that arbitrary, e.g. aesthetically neutral, cues could gain positive or negative value by predicting the likely occurrence of subsequent stimuli that are themselves either aesthetically pleasing or displeasing.

This theory accounts for differences between individuals in two major ways. First, the states of the generative models presumably differ between people since they reflect the observers’ past experience, i.e. the statistics of the environment they have occupied. Second, the weighting of the two components of aesthetic value (immediate reward and the value of learning) probably differs between individuals. The former potential source of individual differences links well to evidence for expertize- and culture-related differences in sensory valuation; both change the statistics of one’s past and expected future sensory environment. The latter source of individual differences in our model may link better to previous work on the influence of personality traits on sensory valuation. For example, openness to experience has been linked to higher appreciation of the arts (e.g. [31,32]) and to stronger experience of awe for nature [33].

Changes in aesthetic value over time are an integral part of our theory, since we presume that learning occurs constantly. Thus, according to the theory, sequence effects are not merely a nuisance, and explicable by response biases, but are rather inherent to the nature of valuation itself. This can take on a richly structured form, even for a single, constant object. For instance, learning always has a positive effect on immediate sensory reward, because the mean of the system state shifts towards the values of the current object and hence makes them more likely. However, such a shift can have negative effects on the value of learning if the mean of the system state shifts away from the mean of the expected true distribution at the same time, rendering the learning process detrimental in the (expected) long run. We initially described this negative turn in the value of learning in the context of the inversion of the mere-exposure effect [34] in [19]; we have since examined its broader contributions to our understanding of the experience of boredom [35].

(d) . Current study

We designed an initial, but stringent, test of our theory of individual aesthetic judgements over time by asking whether a simple model instantiation can predict individual judgements on a trial-by-trial basis. For careful control, we use a constrained set of morphed images from an ecologically valid category, i.e. dog images. The experiment and data themselves are simple: people see one image at a time and rate how much they like it. What makes our current study special is that we predict individual judgements using a model with a theoretically grounded, interpretable architecture that can further our understanding of how and why such judgements vary across observers and time.

2. Methods

(a) . Participants

We recruited 100 United States (US)-American participants via Amazon Mechanical Turk to perform a task involving a rating block; a free-viewing block and then a second rating block. Sample size was determined heuristically since we had no means of estimating the set of individual differences that we were interested in here. We considered 100 a sufficient number given an expected rejection rate of 30–40% and considering a standard sample size of 25–50 participants in comparable studies. All participants were at least 18 years old and gave informed consent according to the ethical guidelines of the University of Tübingen.

Out of these participants, 39 were excluded from analyses based on pre-registered exclusion criteria. These were: (i) failing two or more of the five attention checks, (ii) having a viewing time greater than 60 s for any single trial in the viewing, block (see below), and (iii) having a median viewing time smaller than 1 s in the viewing block. Two additional participants were excluded owing to a lack of variability (s.d. = 0) in at least one of the two rating blocks. Thus, data from 59 participants were analysed.

The average age of the participants was 44.5 years (s.d. =10.1); 27 identified as female, 32 as male. Most participants had a graduate degree (n = 23), followed by those with a high school (n = 18) and a college degree (n = 17), and one participant had a PhD. About half (n = 30) of the participants currently owned a dog, and most (n = 54) had owned one at some point in their life. None of the participants reported disliking dogs, only one said that they neither liked nor disliked them, while all others either liked (n = 17) or strongly liked (n = 41) dogs. The majority of participants reported that they browsed images online or on social media for 5–30 min per day (n = 42), while seven reported less time, and 10 more time than that. Accordingly, most participants also said that they liked to browse images (n = 49), while eight were undecided and only two disliked it. When asked about their experience of the entire experiment, most participants disagreed with the statement that they were bored (n = 40), while a minority agreed (n = 10), and the remainder was undecided.

Participants received $9 for completing the study and up to $1 as additional compensation based on their performance in the attention check.

(b) . Stimuli

We used Neural Crossbreed [36] as a morphing algorithm for creating our stimulus set. This neural network is particularly well-suited to, and pre-trained on, morphing between dog images. Besides this technical reason, dog images are also (i) common on social media, which highlights their ecological validity and (ii) enjoyed by social media users according to the big popularity of dog-specific hashtags (e.g. ‘#dogsofinstagram’ with 274 million posts as of March 2022). We, therefore, assumed that the experiment should be pleasant for most participants [37].

We generated three morphs per image pair with 0.33/0.66, 0.5/0.5 and 0.66/0.33 ratios between source images. We chose seven source images from the Stanford Dog Dataset [38], which is a sub-selection of dog images from ImageNet [39]. The source images and their specific pairings were handpicked to ensure that all morphed dog images were realistic-looking and free of obvious visual artefacts. We ended up with 16 unique morph pairs based on seven different source images. This resulted in a total of 55 unique images for our experiment (7sourceimages+163morphs). We aimed for approximately 50 images because this allowed us to both collect sufficient data for fitting the model on an individual level but did not exceed an estimated experiment duration of 30 min. Further details of the stimulus selection are described in the electronic supplementary material, S8. All images are available within the repository for this article at https://github.com/aenneb/test-aesthetic-value-model/tree/main/experiment_code/images/experiment.

An overview of the rating distributions for images and participants as well as intra-class correlation coefficients are provided in the electronic supplementary material, S1.

(c) . Procedure

The experiment was conducted as an online study and was hosted at a server of the Max Planck Institute for Biological Cybernetics. It was automatically presented in full-screen mode. The experiment was programmed in JavaScript with jsPsych [40]. The experiment code and images are available at https://github.com/aenneb/test-aesthetic-value-model. Image resolution was fixed to 350 px × 350 px and images were presented in the centre of the screen.

In total, the experiment consisted of three blocks. Here, we report results regarding the first block, in which participants were asked to rate the images according to how much they like them on a continuous scale ranging from ‘not at all’ to ‘very much’. We do not report results from the remaining two blocks because the second block included a different task that recorded participants’ viewing behaviour. The modelling of the results of this part of the task go beyond the scope of this paper. We also excluded data from the second rating block because we reserve these data for later, joint analyses with data from the second free-viewing block, and because of the complexities of learning predicted by our theory.

Responses were recorded on a scale from 0 to 500 which we re-scaled to range from 0 to 1. Before starting the experiment, participants were informed about the structure of the experiment. They then continued to the instructions for the first block that explained that they should rate the images according to their own personal liking. It was emphasized that there is no right or wrong answer, and that the potential opinions of others and the content of the image are also not relevant. We placed these emphases to parallel instructions in other, similar rating experiments [8,41] to encourage participants to express their personal evaluations independent of concerns about standard ideals of liked images and independent of the valence of the depicted object (e.g. a Doberman is aggressive and therefore negative). Participants saw the question ‘How much do you like this image?’ below each presented image. They used their mouse to indicate their liking with a slider on a continuous scale from ‘not at all’ to ‘very much’. The start position of the slider was randomized. Only after moving the slider were participants able to click the ‘Continue’ button on the bottom of the page to see the next image. Each image was presented one at a time until participants made their rating and clicked on a designated button at the bottom of the screen to continue to the next trial. Participants rated each of the 55 unique images once during the rating block. The 55 images were presented in random order.

In the second experimental block, participants were given 15 min to view the same 55 images, one at a time, for as long as they wanted to. They were informed that they would spend 15 min looking at the images, independent of how long they looked at a single image. They were free to move on to the next image (displayed as a thumbnail preview on the right side of the main image) at any time after 200 ms. During this second block, we also implemented an attention check that was explained to participants beforehand. On five random trials, a red dot appeared on the main image and participants had to click on it within 1 s. They lost 20 cents of their allotted $1 bonus payment if they missed an attention check. The final and third experimental block was identical to the first.

After completing all three experimental blocks, participants answered demographic questions, as well as questions regarding their experience with dogs and their general impression of the experiment, how much fun they had and how bored they felt.

(d) . Analyses and software

All analyses were performed with Python 3.8.3 in Spyder 4.1.4. The complete code, raw and processed data are available at https://github.com/aenneb/test-aesthetic-value-model.

Unless otherwise noted, we used the respective functions of the scipy.stats package for standard statistical tests. Linear regressions were run with the statsmodels package. We implemented repeated measures analyses of variance (rmANOVAs) and associated follow-up tests with the pengouin package, using Greenhouse–Geiger corrections in cases where the assumption of sphericity was violated. Post hoc comparisons were run as paired t-tests with Bonferroni-correction. All reported correlations are Pearson’s correlations and were calculated with numpy. Comparisons of averages were tested for statistical significance with paired t-tests if Shapiro–Wilk tests indicated normal distributions, and with Wilcoxon-signed-rank tests if the normality assumption was violated. We used the scipy.optimize package for fitting our custom models, using the SLSQP method. Principal components analysis (PCA) was run with the sklearn package. Input values were scaled with sklearn’s StandardScaler function before applying PCA.

(i) . Models and fitting procedure

As a baseline model for predicting liking ratings, we used the average rating per image across all participants but the one whose ratings we predicted, i.e. a leave-one-out average (LOO-average). We compared this model to our own, full model as well as a specialization of it that does not include any learning component. In one limit, the latter, no-learning, variation actually corresponds to a generalized linear model given proper normalization of features and weights (see the electronic supplementary material for details). We described the architecture and detailed implementation of our model in a previous paper [19] and will, therefore, only briefly explain its components and relevant parameters. In addition, figure 1 illustrates the full model and how stimuli were represented.

Figure 1.

Figure 1.

Illustration of the full model and the process of acquiring feature representations for the stimuli. Outer left column: the flow from stimulus image to the low-dimensional representation as a vector of numbers. Remaining figure parts: this vector is then used as the representation of the stimulus in the model (red dot). The system state is illustrated as the blue curves (left probability densities), the expected distribution as the green curve (right densities). The top row represents an arbitrary point t, the bottom row a later one, t + 1, in the case that the same stimulus is shown. The dashed box outlines the components of the model (system state only) that are relevant for the no-learning model. (Online version in colour.)

The model consists of two generative distributions, each represented by an n-dimensional Gaussian. The first one, which we call the system state, X, describes the observer’s current generative model. The log-likelihood of the considered object’s features given this generative model constitutes the immediate sensory reward r(t) of the object. The second generative model, which we call the expected true distribution, pT, represents the observer’s long-term expectations. Learning adapts the mean of the system state X towards the location of the current object in feature space by an amount per unit time that is determined by the learning rate α. The value of learning ΔV(t) is operationalized as the difference over adaptation in the distance between system state X and expected true distribution pT. Aesthetic value, then, is a weighted sum of immediate reward (r(t)) and value of learning (ΔV(t)) plus a constant bias. Immediate reward can be directly related to processing fluency. However, the value of learning is not related to current processing fluency, but is instead an expression of the expected increase in long-term processing fluency as a result of learning. The long-term processing fluency depends on the stimuli that are expected to be seen in the long term—this is quantified by the expected true distribution. The value of reward can, thus, be especially high when the current stimulus is novel and/or surprising as far as the system state is concerned, and therefore triggers a considerable amount of learning, but is accurately reflective of the expected long-term distribution. We denote the weight of immediate reward as wr and the weight of the value of learning as wV.

We here confine ourselves to a simplified version of our model in which the covariance matrices of both generative models are spherical. Therefore, our full model has 2n + 6 free parameters: n + 1 parameters that describe the mean and spherical covariance of each of the two generative models, the weight of immediate reward, the weight of learning, the learning rate and the bias parameter.

We report findings from models that used pre-determined stimulus features. We chose to use deep neural network (DNN)-derived rather than subjectively rated features to remain agnostic about the properties that might influence people’s ratings to begin with or assume that people (consciously) know which features are relevant for their ratings. Plus, using DNN-features with the particular stimulus set used here gave us the opportunity to test whether it is indeed feasible to use them since we also tested whether fit feature values would improve model performance (see the electronic supplementary material, S4 for a comparison between fit and pre-determined stimulus features as well as S8 for illustrations of the two-dimensional and three-dimensional feature spaces). To assign feature values to each image, we processed each stimulus image with VGG-16 using the keras package. We used the standard version of VGG-16 that was trained to categorize images from the ImageNet database. We then extracted a vector of features from the last max pooling layer of VGG-16. We used VGG-16 because it has previously been employed as the basis for predicting aesthetic value judgements [18] and because it is one of the widely used and easily accessible pre-trained image recognition DNNs. In principle, outputs of other DNNs can be used, and we provide another example (ResNet50) in the electronic supplementary material. We subjected the resulting 25 088 VGG-16 features per image to PCA to reduce the dimensionality of the feature space; this procedure is common, reliable and preserves data variability. The complete process of transforming stimulus images into low-dimensional, numerical input for our model is illustrated on the left-hand side of figure 1. We report results for models in a two-dimensional and three-dimensional feature space because the number of trials restrict our ability to fit models with more degrees of freedom reliably. Exploratorily, we also fitted four-dimensional models to a smaller number of participants, again replicating our main findings (see the electronic supplementary material, S5).

We used cross-validation (CV) for fitting our models to participants’ first liking ratings (n = 55). We ran 16 CV folds in total, leaving out the three morphed images of one image pair in each fold. Thus, we fitted the data to 52 trials, using the root-mean-squared error (training RMSE) as measure of goodness of fit. Fits were evaluated based on the RMSE for the three held-out trials (test RMSE). In addition, we ran fits for each CV fold 10 times using different random starting points to ensure convergence and to avoid local minima. The best fit across random starting points according to test RMSE was retained and stored for each CV fold.

Table 1 lists all models that we ran to account for participants’ liking ratings. Note that CV helps prevent models with more free parameters from over-fitting and that all models were fitted to one individual participant at a time.

Table 1.

Overview of the different models used to fit and predict liking ratings. (Models are listed in increasing order of the number of free parameters. RMSE values represent median RMSEs across participants based on median RMSEs across 16 cross-validation folds per participant (see main text for details). Note that RMSE values reported for all but the LOO-average model are based on held-out test trials only. The leftmost column lists the model names, d.f. refers to the number of free parameters, fixed components lists all model parameters that were not fitted for the given model variation where ‘irrelevant’ indicates that any value assigned to the following parameters would not change model predictions in that model variation.)

model d.f. fixed components RMSE mdn RMSE s.d.
LOO-average 0 N/A 0.20 0.08
no-learning model two-dimensional 5 α, irrelevant: wV, pT 0.19 0.08
no-learning model three-dimensional 6 α, irrelevant: wV, pT 0.17 0.06
full model two-dimensional 10 none 0.17 0.07
full model three-dimensional 12 none 0.15 0.06

3. Results

(a) . Model fits

We take as a baseline model (called ‘LOO’) for the ratings of one individual, the mean rating across all other participants. The RMSE between predicted and actual ratings of LOO is then the basis for comparing the goodness of fit of other models, all of which duly out-perform it (figure 2; tables 1 and 2). The RMSE was higher for the LOO-average compared to the other models both in the mean and for the majority of individual participants (see the electronic supplementary material, S2 for details on individually best-fitting models). Nonetheless, figure 2 also shows that the lower the RMSE for the LOO-average, the lower the RMSE of all remaining models tended to be, too, indicating that even the individually fitted models work best for participants whose rating behaviour is typical for the entire population.

Figure 2.

Figure 2.

Scatterplots of root mean square errors (RMSEs) comparing all considered models (vertical axes) to errors of predictions based on leave-one-out-averages (horizontal axes). No-learning models are shown on top, full models on the bottom. The three-dimensional models are shown on the left, two-dimensional models on the right. Each cross corresponds to one participant; the red dot represents the average RMSE across participants. The dotted line is the equality line. (Online version in colour.)

Table 2.

Results of post hoc pairwise comparisons between models following the rmANOVA. (BF, Bayes factor.)

model A model B t p-corr BF10 Hedges
LOOavg two-dimensional full 5.597 0.000 2.47 × 104 0.634
LOOavg two-dimensional no learning 2.062 0.437 1.017 0.221
LOOavg three-dimensional full 8.430 0.000 7.788 × 108 0.992
LOOavg three-dimensional no learning 6.749 0.000 1.572 × 106 0.776
two-dimensional no learning two-dimensional full 8.963 0.000 5.53 × 109 0.381
two-dimensional full three-dimensional full 8.276 0.000 4.421 × 108 0.347
two-dimensional full three-dimensional no learning 3.252 0.019 15.093 0.131
two-dimensional no learning three-dimensional full 12.048 0.000 2.913 × 1014 0.712
two-dimensional no learning three-dimensional no learning 9.529 0.000 4.343 × 1010 0.511
three-dimensional no learning three-dimensional full 9.108 0.000 9.38 × 109 0.222

To clarify further which model performed best overall, we used an rmANOVA with the median RMSE across CV folds per model per participant as input. Each model comprised one repeated measure for each participant. Median RMSEs differed between models, F(4,232) = 47.00, p = 6.73 × 10−29, η2 = 0.45. Post hoc pairwise t-tests with Bonferroni correction showed that the LOO-average model performed worse than all model variations apart from the two-dimensional no-learning model (see table 2). The three-dimensional models outperformed the two-dimensional models. Crucially, models that include a learning component predicted participants’ ratings better than models without that learning component given the same number of feature dimensions. We additionally checked whether the full model truly outperformed other models with model-recovery. We found that data which was simulated using each participant’s full model parameters was better fitted by the full model as opposed to the no-learning model, t(58) = −4.42, p = 4.459 × 10−5, for the two-dimensional model, t(58) = −4.87, 9.03 × 10−6, for the three-dimensional model.

In addition to the population-level approach, we were interested in evaluating model performance at the individual level. We, therefore, repeated the above analyses for individual participants. To do so, we ran one rmANOVA for every participant using the individual RMSEs of each CV fold. Note that we cannot include the LOO-average baseline model in these analyses because we can only compute one RMSE per participant for this model.

We found significant differences in model performance for 9 out of 59 participants. Within this small subset, Bonferroni-corrected post hoc pairwise t-tests revealed meaningful differences between models for all of these participants. The detailed results of these pairwise comparisons are listed in the electronic supplementary material, S3. Even though sparse, the pattern of these individual significant differences mirrors the results we obtain across participants: models that include a learning component perform better than those without, and three-dimensional models perform better than two-dimensional models.

The results of the individual-participant rmANOVAs along with the distribution of RMSEs illustrated in figure 2 suggest that the quality of model fits could be more variable between participants than between models. Indeed, the minimum correlation between median RMSEs of two models (excluding the baseline LOO-average model) across participants with r = 0.95 indicates that when our basic model architecture was able to account for a participant’s rating pattern, it did so with and without learning and in both two-dimensional and three-dimensional feature spaces. The correlations between our models’ and the LOO-average model’s RMSEs were also high, 0.61 ≤ r ≤ 0.66, indicating that our model accounts better for ratings of those participants whose ratings are more similar to the population average (see also figure 2). Note that these correlations do not take away from the fact that the higher dimensional models that include a learning component were better at predicting liking ratings than the other models but merely indicate that all models potentially can capture similar rating patterns best.

(b) . Can learning capture unique rating changes?

Above, we showed some evidence that a higher dimensional model which includes a learning component accounts best for individual aesthetic judgements. However, the differences between models were small and the correlation between the models’ performances high. We, therefore, sought to check how much the learning component of our model matters by assessing the importance of trial order for the predictive quality of our full model. The full model makes precise predictions about how ratings change over time depending on which images have already been seen and for how long. To test whether the unique influences of stimulus presentation order are present in the data, we shuffled the trial orders randomly 100 times and re-fitted the model parameters for each shuffled order. We used 10 different random starting values for each optimization for each of the 100 shuffled trial orders per participant and selected the predictions of the best-fitting one. Note that we did not employ CV for these re-fits because the procedure was already computationally expensive with 10 (random starting points) *100 (different shuffled trial orders) fits per participant. We calculated the RMSE and percentage of explained variance for each order and compared these to the RMSE and explained variance for the true order. We report findings for the better-performing three-dimensional version of our model.

Predictions were better for the actual compared to a scrambled trial order, p = 2.071 × 10−5, with an average 17% loss in explained variance when discarding the true trial order. Figure 3 illustrates the consistency of this effect across participants. It was apparent that the difference between true and shuffled trial order was correlated with the quality of the model’s fit (r = 0.50, p = 4.83 × 105 for the two-dimensional case, r = 0.34, p = 0.007 for the three-dimensional case) indicating that the better our model fitted the data, the more crucial the order effects.

Figure 3.

Figure 3.

Scatterplot of R2 values for model predictions made for the original trial order (horizontal axis) versus R2 values for predictions made based on a shuffled trial order where model parameters were re-fitted to that shuffled trial order (vertical axis). Each dot represents one participant. Lighter, greener dots represent participants with lower estimated weight of the value of learning wV, smaller dots participants with a smaller estimated learning rate α, hollow dots participants for which α = 0. Red crosses additionally mark participants for which the difference between no-learning and full model was significant in the two-dimensional case, black crosses such for which this was the case for three-dimensional models. Note that marker appearance depends on parameter values from the best model fit according to RMSE on held-out trials across all CV folds for the true order. (Online version in colour.)

In summary, the specific predictions about how ratings change depending on the observer’s previous experiences accounted for additional variance in participants’ ratings that a model without learning cannot explain.

4. Discussion

We constantly evaluate how much we like the objects around us. These aesthetic judgements can have a profound impact on our behaviour. Past research has made great progress in predicting the average likeability or aesthetic value of an object. However, predicting a given individual’s response to a particular object at a particular point in time has remained a challenge. Here, we show that a computational model of aesthetic value can make such idiosyncratic trial-by-trial predictions.

Our model far outperformed predictions based on a simplistic model that solely relies on population averages, i.e. the LOO-average model. Our finding, together with previous ones (e.g. [42]), underscores the fact that individual people’s evaluations are not well-predicted by population averages. Consequently, the purely population-average-oriented approach that has been favoured in most studies on evaluative judgements of sensory experiences comes at the cost of being effectively unable to make claims about individuals.

Another feature of our model and its predictions is that they are dynamic in nature, i.e. which images have already been seen and for how long matters. Our analyses of synthetic data with randomly shuffled trial orders show that these dynamics matter. In that sense, our model is similar to IDyOM [43] which is able to compute the expected probability of tones in a sequence while taking both short- (within one musical expression) and long-term (based on a large corpus of Western music scores) expectations into account. Indeed, both IDyOM and our model share the idea that both current processing fluency as well as long-term improvements can shape sensory experiences with IDyOM specializing in the auditory domain (music) and our model in the visual domain.

Our theory also fits comfortably within the original tradition of generative modelling of the sensory environment, which is closely associated with Helmholtz’ notion of unconscious inference, and has recently proliferated in terms of the ’predictive mind’. The notion that learning in itself constitutes an essential aspect of aesthetic value has been discussed before within the framework of predictive coding (e.g. [26,44]) and found some empirical substantiation (e.g. [45,46]). Here, we combined these lines of works that clearly focus on aesthetic valuation with less specific notions of value in RL. The way in which, in our account, learning is responsible for a critical component of aesthetic value couples it closely to the rich body of theory in conventional notions of RL [47]—and, through the second generative model, namely the expected true distribution, links it to an important form of longer-term prediction.1

The two components that constitute aesthetic value in our theory mirror not only previous reward-learning approaches to modelling happiness [28] but might also be seen as a variation on the theme of distinguishing pleasure and interest in models of aesthetic experiences [48,49]. One might speculate that there is an intuitive correspondence between pleasure and immediate sensory reward, and between interest and the value of learning. However, interest is arguably a more complex construct, with elements of curiosity that operate before a stimulus is actually seen, and engrossment that happens once it is being processed. The initial version of our theory [19] only encompasses the latter; the former would arise from taking the potential benefits from exploration into account, too.

(a) . Individual differences and temporal changes in sensory value judgements

Well-documented differences in people’s evaluations of sensory objects (e.g. [27]) have so far led to difficulties in predicting more than population-averages of such evaluative judgements. Our model succeeded in predicting individual participants’ liking ratings better than estimates based on average ratings. We go one step beyond the approach of Iigaya et al. [18] who predicted individuals’ ratings via linear regression from VGG features in that we make our predictions based on a theoretically limpid model whose components provide a basis for explaining the source of individual differences. Our model accounts for differences between participants in three interpretable ways: (i) the initial location of the generative models can differ between participants (parameters: means and variances), reflecting the fact that people have different expectations about the statistics of their sensory environments based on their past experiences; (ii) participants can have different learning rates (parameter: α) and thus adjust their expectations more or less quickly; and (iii) participants may weigh the two components of aesthetic value differently—some placing a higher weight on immediate reward while others valuing the reward of learning more highly.

It will be an interesting avenue for future research to relate these various differences in our model’s parameters to other participant characteristics such as personality, mood or expertize in a given area. One might, for instance, speculate that people who score higher on openness to experiences or novelty seeking are better modelled with a higher wV parameter, placing greater weight on the value of learning. In a similar manner, idiosyncratic model parameter values might be related to the idiosyncratic strength and direction of object property effects on peoples’ appreciation of sensory objects [3]. When working in a pre-defined, set feature space, e.g. only changing the degree of symmetry between objects, the fit location of the means of the generative models can give insight into which kinds of expectations and previous experiences give rise to what kinds of preferences in these feature spaces. Someone who grew up in a modern city centre with a rigid, geometrical structure might, for instance, expect the world to be more symmetrical than someone who grew up on the countryside where nature provides a more asymmetrically patterned environment. This would link our current approach to past research that has investigated how cultural background and other category-specific kinds of expertize influence sensory value judgements (e.g. [911]).

The success of our model not only rests on its ability to predict differences between individuals, but also on its ability to predict changes in ratings over time. When we compared model performance in the true (compared to randomly shuffled) trial orders, we were able to account for a higher percentage of rating variability, showing that the precise predictions that our model makes regarding sequential effects matter. Crucially, our model provides a mechanistic explanation for why these sequence effects occur that rests on the idea that the sensory-cognitive system learns and thus adapts to the current sensory input in a continuous manner. It thus goes beyond previously suggested response biases (see [12,15]) because it suggests an explanation for why these biases occur instead of merely documenting when these biases occur or how their strength depends on certain stimulus properties.

Our experimental design was not suited to directly compare our model’s predictions to the one’s that previous accounts of assimilation and contrast effects (see §1a) would make (as in [12,15]). This is because the linear regression approach of previous studies necessitates repeated measures for the stimuli which we did not collect within the same block in this study. In future work, it would be important to assess the additional contribution of these effects to temporal variation in ratings.

(b) . Using deep neural network features to predict sensory value judgements

The final notable feature of the current study is our use of stimulus representations that were based on the outputs of DNNs trained for image recognition. We thereby add to a small, recent line of research which shows that aesthetic ratings can be estimated based on feature representations extracted from DNNs [18,50]. Unlike that work, though, we use a tiny, compressed feature space instead of the potential hundreds of feature values that can be obtained from the output of a DNN. Accordingly, we were able to fit the data of individual participants because the low number of free parameters allowed us to work with a relatively small dataset. This allowed us to fit and predict ratings for a substantial number of participants because we only require a moderate number of trials per participant to fit our model. In addition, our approach differs from previous ones in that our model goes far beyond linear regression in terms of interpretability. Most notably, employing features that are otherwise used for object categorization is motivated and justified by the computational goal of our model framework which links aesthetic valuation to the goal of long-term optimization of sensory perception. What is more, our model also offers the opportunity to interpret the participants’ expectations about the probability distribution over the features expressed in sensory inputs both now and in the future, instead of only making statements about current preferences for certain feature values.

Despite this relatively small number of features, we achieve high accuracy (median r = 0.65 between fit and actual ratings), even compared to a previous study that relied on larger data and feature sets (average r ≈ 0.3 for individual participants in [18]). While part of this discrepancy might be attributed to the more homogeneous stimulus set in our experiment, we believe that our model architecture as well as the inclusion of sequence information and learning contributed to the relatively greater success of our model in capturing individual participants’ rating behaviour. Along with using PCA, which finds likely useful directions of maximal variation, potentially crucial to the success of using such a small feature space, was the highly restricted class of images. Given that we most often know the set of objects that is relevant for making predictions in a given scenario, we believe that this approach is valid. Future work will have to show whether a similarly sparse feature spaces will suffice to model sensory value judgements for more diverse, heterogeneous image sets.

(c) . Limitations

The current study represents an important, albeit small, step forward in the bigger endeavour of gaining a comprehensive understanding of how people value sensory experiences. Our experimental design was intentionally simple, a first test for our model to check whether it can account for human behaviour. Owing to this simplicity, we cannot directly relate our findings to certain relevant previous results, such as the dependence of sequence effects on stimulus homogeneity or the effect of previous responses (as opposed to stimuli) on further ones. Having established a baseline for how well our model can capture liking ratings, however, future studies will be able to tackle these further questions.

Even though our model far outperformed the baseline model that was based on average liking ratings, the quality of the fits was not excellent for every single participant (with a minimum R2 = 0.12 for the full three-dimensional model). In addition, the individually fitted models worked best for participants whose rating behaviour was better fitted by the population average too. Reasonably, some participants may have used idiosyncratic rating strategies that are not captured by any model considered here such as using one part of the rating scale more than another, refusing to give high ratings on principle or binarizing their ratings into two extremes. Future work could explore how our model could be extended with heuristics that can capture such biases.

Finally, we are aware that we used a highly constrained stimulus set and that future studies will need to show that our model can also be applied to a wider range of (visual) objects. Given that feature representations derived from DNNs pre-trained on object recognition for a vast number of categories we are optimistic that our approach will generalize. However, the import of the learning component and the strong temporal dynamics it enables might not be as evident when using a more heterogeneous stimulus set (see also [15]), so it will remain to be determined whether its contributions are discernible when modelling ratings of a more diverse set of images. Possibly, such experiments would need to be longer or deliberately manipulate exposure to provoke rating changes that are large and reliable enough to be detected.

5. Conclusion

We have shown that even relatively simple models can provide excellent fits for individual liking ratings over time. This result is an important step forward in predicting sensory value judgements on a trial-by-trial level and encourages the pursuit of two approaches to understanding people’s sensory value judgements that have so far received little attention. First, we show that it is feasible to model more than average sensory value judgements. By using a computational model with interpretable components, we also provide a means to start understanding the sources of the individual differences that underlie individual deviations from averages. Second, we show that one can use DNN-feature-based representations of objects for predicting sensory value judgements. These representations need not be specific to this evaluative task. They can be derived from feature spaces that have already been established and are based on the task of object recognition.

Acknowledgements

We thank Noémi Éltetö and Andrew Webb for their help with reviewing the analysis code. We also thank Colin Conwell, Surabhi Nath, Edward Vessel and Florian Wickelmaier for helpful comments.

Endnote

1

A third form of immediate prediction, associated with dynamic sensory stimuli, such as music or movies, should also be within the scope of the account.

Ethics

All participants were at least 18 years old and gave informed consent according to the ethical guidelines of the University of Tuebingen. The study protocols have been approved by the ethics committee of the University of Tübingen.

Data accessibility

The complete code, raw and processed data are available from the GitHub repository: https://github.com/aenneb/test-aesthetic-value-model.

Supplementary material is available online [51].

Declaration of AI use

We have not used AI-assisted technologies in creating this article.

Authors' contributions

A.A.B.: conceptualization, formal analysis, methodology, software, supervision, visualization, writing—original draft, writing—review and editing; M.B.: conceptualization, data curation, investigation, methodology; P.D.: conceptualization, funding acquisition, project administration, resources, supervision, validation, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Conflict of interest declaration

We declare we have no competing interests.

Funding

Open access funding provided by the Max Planck Society.

This work was supported by the Alexander von Humboldt foundation and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—grant no. 461354985.

References

  • 1.Brielmann AA, Pelli DG. 2018. Aesthetics. Curr. Biol. 28, R859-R863. ( 10.1016/j.cub.2018.06.004) [DOI] [PubMed] [Google Scholar]
  • 2.Mas-Herrero E, Marco-Pallares J, Lorenzo-Seva U, Zatorre RJ, Rodriguez-Fornells A. 2013. Individual differences in music reward experiences. Music Percept. 31, 118-138. ( 10.1525/mp.2013.31.2.118) [DOI] [Google Scholar]
  • 3.Clemente A, Pearce MT, Skov M, Nadal M. 2021. Evaluative judgment across domains: liking balance, contour, symmetry and complexity in melodies and visual designs. Brain Cogn. 151, 105729. [DOI] [PubMed]
  • 4.Corradi G, Chuquichambi EG, Barrada JR, Clemente A, Nadal M. 2020. A new conception of visual aesthetic sensitivity. Br. J. Psychol. 111, 630-658. ( 10.1111/bjop.12427) [DOI] [PubMed] [Google Scholar]
  • 5.Zhan J, Liu M, Garrod OGB, Daube C, Ince RAA, Jack RE, Schyns PG. 2021. Modeling individual preferences reveals that face beauty is not universally perceived across cultures. Curr. Biol. 31, 2243-2252. ( 10.1016/j.cub.2021.03.013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martinez JE, Funk F, Todorov A. 2020. Quantifying idiosyncratic and shared contributions to judgment. Behav. Res. Methods 52, 1428-1444. ( 10.3758/s13428-019-01323-0) [DOI] [PubMed] [Google Scholar]
  • 7.Vessel EA, Maurer N, Denker AH, Starr GG. 2018. Stronger shared taste for natural aesthetic domains than for artifacts of human culture. Cognition 179, 121-131. ( 10.1016/j.cognition.2018.06.009) [DOI] [PubMed] [Google Scholar]
  • 8.Brielmann AA, Pelli DG. 2019. Intense beauty requires intense pleasure. Front. Psychol. 10, 2420. ( 10.3389/fpsyg.2019.02420) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.van Paasschen J, Bacci F, Melcher DP. 2015. The influence of art expertise and training on emotion and preference ratings for representational and abstract artworks. PLoS ONE 10, e0134241. ( 10.1371/journal.pone.0134241) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hayn-Leichsenring GU, Vartanian O, Chatterjee A. 2021. The role of expertise in the aesthetic evaluation of mathematical equations. Psychol. Res. 86, 1655-1664. ( 10.1007/s00426-021-01592-5) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lahdelma I, Eerola T. 2020. Cultural familiarity and musical expertise impact the pleasantness of consonance/dissonance but not its perceived tension. Sci. Rep. 10, 8693. ( 10.1038/s41598-020-65615-8) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chang S, Kim CY, Cho YS. 2017. Sequential effects in preference decision: prior preference assimilates current preference. PLoS ONE 12, e0182442. ( 10.1371/journal.pone.0182442) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Isik AI, Vessel EA. 2021. From visual perception to aesthetic appeal: brain responses to aesthetically appealing natural landscape movies. Front. Human Neurosci. 15, 676032. ( 10.3389/fnhum.2021.676032) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Belfi AM, Vessel EA, Brielmann A, Isik AI, Chatterjee A, Leder H, Pelli DG, Starr GG. 2019. Dynamics of aesthetic experience are reflected in the default-mode network. Neuroimage 188, 584-597. ( 10.1016/j.neuroimage.2018.12.017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pombo M, Brielmann AA, Pelli DG. 2023. The intrinsic variance of beauty judgment. Attention Percep. Psychophy. 85, 1355–1373. [DOI] [PMC free article] [PubMed]
  • 16.Huang J, He X, Ma X, Ren Y, Zhao T, Zeng X, Li H, Chen Y. 2018. Sequential biases on subjective judgments: evidence from face attractiveness and ringtone agreeableness judgment. PLoS ONE 13, e0198723. ( 10.1371/journal.pone.0198723) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Martínez-Molina N, Mas-Herrero E, Rodríguez-Fornells A, Zatorre RJ, Marco-Pallarés J. 2019. White matter microstructure reflects individual differences in music reward sensitivity. J. Neurosci. 39, 5018-5027. ( 10.1523/JNEUROSCI.2020-18.2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Iigaya K, Yi S, Wahle IA, Tanwisuth K, O’Doherty JP. 2021. Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features. Nat. Human Behav. 5, 743-755. ( 10.1038/s41562-021-01124-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Brielmann AA, Dayan P. 2022. A computational model of aesthetic value. Psychol. Rev. 129, 1319-1337. ( 10.1037/rev0000337) [DOI] [PubMed] [Google Scholar]
  • 20.MacKay DM. 1956. Towards an information-flow model of human behaviour. Br. J. Psychol. 47, 30-43. ( 10.1111/j.2044-8295.1956.tb00559.x) [DOI] [PubMed] [Google Scholar]
  • 21.Kawato M, Hayakawa H, Inui T. 1993. A forward-inverse optics model of reciprocal connections between visual cortical areas. Netw.: Comput. Neural Syst. 4, 415. ( 10.1088/0954-898X_4_4_001) [DOI] [Google Scholar]
  • 22.Dayan P, Hinton GE, Neal RM, Zemel RS. 1995. The Helmholtz machine. Neural Comput. 7, 889-904. ( 10.1162/neco.1995.7.5.889) [DOI] [PubMed] [Google Scholar]
  • 23.Rao RP, Ballard DH. 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79-87. ( 10.1038/4580) [DOI] [PubMed] [Google Scholar]
  • 24.Parr T, Pezzulo G, Friston KJ. 2022. Active inference: the free energy principle in mind, brain, and behavior. Cambridge, MA: MIT Press. [Google Scholar]
  • 25.Gebauer L, Kringelbach ML, Vuust P. 2012. Ever-changing cycles of musical pleasure: the role of dopamine and anticipation. Psychomusicol.: Music Mind Brain 22, 152-167. ( 10.1037/a0031126) [DOI] [Google Scholar]
  • 26.Koelsch S, Vuust P, Friston K. 2019. Predictive processes and the peculiar case of music. Trends Cogn. Sci. 23, 63-77. ( 10.1016/j.tics.2018.10.006) [DOI] [PubMed] [Google Scholar]
  • 27.Van de Cruys S, Wagemans J. 2011. Putting reward in art: a tentative prediction error account of visual art. i-Perception 2, 1035-1062. ( 10.1068/i0466aap) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Rutledge RB, Skandali N, Dayan P, Dolan RJ. 2014. A computational and neural model of momentary subjective well-being. Proc. Natl Acad. Sci. USA 111, 12 252-12 257. ( 10.1073/pnas.1407535111) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Schmidhuber J. 2010. Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2, 230-247. ( 10.1109/TAMD.2010.2056368) [DOI] [Google Scholar]
  • 30.Gottlieb J, Oudeyer PY. 2018. Towards a neuroscience of active sampling and curiosity. Nat. Rev. Neurosci. 19, 758-770. ( 10.1038/s41583-018-0078-0) [DOI] [PubMed] [Google Scholar]
  • 31.Chamorro-Premuzic T, Reimers S, Hsu A, Ahmetoglu G. 2009. Who art thou? Personality predictors of artistic preferences in a large UK sample: the importance of openness. Br. J. Psychol. 100, 501-516. ( 10.1348/000712608X366867) [DOI] [PubMed] [Google Scholar]
  • 32.McManus IC, Furnham A. 2006. Aesthetic activities and aesthetic attitudes: influences of education, background and personality on interest and involvement in the arts. Br. J. Psychol. 97, 555-587. ( 10.1348/000712606X101088) [DOI] [PubMed] [Google Scholar]
  • 33.Silvia PJ, Fayn K, Nusbaum EC, Beaty RE. 2015. Openness to experience and awe in response to nature and music: personality and profound aesthetic experiences. Psychol. Aesthet. Creat. Arts 9, 376-384. ( 10.1037/aca0000028) [DOI] [Google Scholar]
  • 34.Montoya RM, Horton RS, Vevea JL, Citkowicz M, Lauber EA. 2017. A re-examination of the mere exposure effect: the influence of repeated exposure on recognition, familiarity, and liking. Psychol. Bull. 143, 459-498. ( 10.1037/bul0000085) [DOI] [PubMed] [Google Scholar]
  • 35.Brielmann Aenne A, Berentelg Max DP. 2023. Boredom in aesthetic experiences. Oxford, UK: Taylor & Francis Routledge. [DOI] [PMC free article] [PubMed]
  • 36.Park S, Seo K, Noh J. 2020. Neural crossbreed. ACM Trans. Graph. 39, 1-15. ( 10.1145/3414685.3417797) [DOI] [Google Scholar]
  • 37.Leaver T, Highfield T, Abidin C. 2020. Instagram: visual social media cultures. Cambridge, UK: Polity. [Google Scholar]
  • 38.Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L. 2011. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization IEEE Conf. on Computer Vision and Pattern Recognition, 20-25 June, Colorado Springs, CO. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). [Google Scholar]
  • 39.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. 2009. Imagenet: a large-scale hierarchical image database. In IEEE Conf. on computer vision and pattern recognition, 20-25 June, Miami, FL. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). [Google Scholar]
  • 40.De Leeuw JR. 2015. jsPsych: a JavaScript library for creating behavioral experiments in a Web browser. Behav. Res. Methods 47, 1-12. ( 10.3758/s13428-014-0458-y) [DOI] [PubMed] [Google Scholar]
  • 41.Kurdi B, Lozano S, Banaji MR. 2017. Introducing the open affective standardized image set (OASIS). Behav. Res. Methods 49, 457-470. ( 10.3758/s13428-016-0715-3) [DOI] [PubMed] [Google Scholar]
  • 42.Wallisch P, Whritner JA. 2017. Strikingly low agreement in the appraisal of motion pictures. Projections 11, 102-120. ( 10.3167/proj.2017.110107) [DOI] [Google Scholar]
  • 43.Pearce MT. 2005. The construction and evaluation of statistical models of melodic structure in music perception and composition. PhD thesis, City University London, London, UK.
  • 44.Van de Cruys S, Bervoets J, Moors A. 2021. Preferences need inferences: learning, valuation, and curiosity in aesthetic experience. In The Routledge international handbook of neuroaesthetics (eds M Skov, M Nadal), pp. 475-506. Milton Park, UK: Taylor & Francis.
  • 45.Shany O, Singer N, Gold BP, Jacoby N, Tarrasch R, Hendler T, Granot R. 2019. Surprise-related activation in the nucleus accumbens interacts with music-induced pleasantness. Soc. Cogn. Affect. Neurosci. 14, 459-470. ( 10.1093/scan/nsz019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sarasso P, Neppi-Modona M, Rosaia N, Perna P, Barbieri P, Del Fante E, Ricci R, Sacco K, Ronga I. 2022. Nice and easy: mismatch negativity responses reveal a significant correlation between aesthetic appreciation and perceptual learning. J. Exp. Psychol.: General 151, 1433-1445. ( 10.1037/xge0001149) [DOI] [PubMed] [Google Scholar]
  • 47.Sutton RS, Barto AG. 2018. Reinforcement learning: an introduction. Cambridge, MA: MIT Press. [Google Scholar]
  • 48.Graf LKM, Landwehr JR. 2015. A dual-process perspective on fluency-based aesthetics: the pleasure-interest model of aesthetic liking. Pers. Soc. Psychol. Rev. 19, 395-410. ( 10.1177/1088868315574978) [DOI] [PubMed] [Google Scholar]
  • 49.Cupchik GC, Gebotys RJ. 1990. Interest and pleasure as dimensions of aesthetic response. Empir. Stud. Arts 8, 1-14. ( 10.2190/L789-TPPY-BD2Q-T7TW) [DOI] [Google Scholar]
  • 50.Conwell C, Graham D, Vessel EA. 2021. The perceptual primacy of beauty: deep net features learned for computer vision linearly predict image aesthetics, arousal & valence – but aesthetics above all. PsyArXiv. ( 10.31234/osf.io/5wg4s) [DOI]
  • 51.Brielmann AA, Berentelg M, Dayan P. 2023. Modelling individual aesthetic judgements over time. Figshare. ( 10.6084/m9.figshare.c.6960665) [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Brielmann AA, Berentelg M, Dayan P. 2023. Modelling individual aesthetic judgements over time. Figshare. ( 10.6084/m9.figshare.c.6960665) [DOI] [PMC free article] [PubMed]

Data Availability Statement

The complete code, raw and processed data are available from the GitHub repository: https://github.com/aenneb/test-aesthetic-value-model.

Supplementary material is available online [51].


Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES