Get the Picture? Goodness of Image Organization Contributes to Image Memorability

Lore Goetschalckx; Pieter Moors; Steven Vanmarcke; Johan Wagemans

doi:10.5334/joc.80

. 2019 Aug 12;2(1):22. doi: 10.5334/joc.80

Get the Picture? Goodness of Image Organization Contributes to Image Memorability

Lore Goetschalckx ¹, Pieter Moors ¹, Steven Vanmarcke ¹, Johan Wagemans ¹

PMCID: PMC6696787 PMID: 31517240

Abstract

According to Gestalt psychologists, goodness is a crucial variable for image organization. We hypothesized that these differences in goodness contribute to variability in image memorability. Building on this, we predicted that two characteristics of good organizations, (i) fast, efficient processing and (ii) robustness against transformations (e.g., shrinking), would be characteristic of memorable images. Two planned (Study 1, Study 2) and one follow-up (Study 3) study were conducted to test this. Study 1 operationalized fast processing as accuracy in a rapid-scene categorization task (“categorizability”). Study 2 operationalized robustness against shrinking as reaction time in a thumbnail search task (“shrinkability”). We used 44 real-life scene images of 14 semantic categories from a previous memorability study. Each image was assigned a categorizability and shrinkability score. The predicted positive relation between categorizability and memorability was not observed in Study 1. A post-hoc explanation attributed this null result to a masking role of image distinctiveness. Furthermore, memorable images were located faster in the thumbnail search task, as predicted, but Study 2 could not rule whether this was merely a result of their distinctiveness. To elucidate these results, Study 3 quantified the images on distinctiveness and statistically controlled for this variable in a reanalysis of Study 1 and Study 2. When distinctiveness was controlled for, categorizability and memorability did show a significant positive correlation. Moreover, the results also argued against the alternative explanation of the results of Study 2. Taken together, the results support the hypothesis that goodness of organization contributes to image memorability.

Keywords: Memory, Visual perception, Vision

In our modern, digital world, you never have to look very far to encounter hundreds of photographic images. They are everywhere around us. With their predominance increasing, so seems the interest in understanding their properties. For example, researchers have asked what will make an image popular on social media (e.g., McParlane, Moshfeghi, & Jose, 2014), what will make it aesthetically pleasing (e.g., Kong, Shen, Lin, Mech, & Fowlkes, 2016), or where in the image people tend to fixate (e.g., Judd, Ehinger, Durand, & Torralba, 2009).

One interesting image property that has been gaining attention is image memorability, or the likelihood that an image will later be recognized by observers as having been seen before (Isola, Xiao, Parikh, Torralba, & Oliva, 2014). Traditionally, questions about visual memory had been mostly about its capacity or fidelity. Isola et al. (2014) were the first to comprehensively study interstimulus variability in visual memory performance at the image level. Using a repeat-detection memory task, in which participants watch a sequence of images and press a button whenever they see a repeat of a previously shown image, the researchers quantified 2222 images on memorability by computing the proportion of participants who correctly recognized the image upon its repeat. They found high levels of consistency of the memorability scores across participants. The finding has since been replicated with different image sets, such as other scene images (Bylinskii, Isola, Bainbridge, Torralba, & Oliva, 2015), face images (Bainbridge, Isola, & Oliva, 2013), data visualizations (e.g., charts; Borkin et al., 2013), and a huge image set consisting of images of different kinds (LaMem; Khosla, Raju, Torralba, & Oliva, 2015). Moreover, memorability rankings seem to be stable across image contexts (Bylinskii et al., 2015), across time (Goetschalckx, Moors, & Wagemans, 2017; Isola et al., 2014), and across memory paradigms (Goetschalckx et al., 2017). Together, these findings support the concept of memorability as an intrinsic image property.

Interestingly, image memorability does not seem to simply boil down to image popularity or other image properties. Popular images are not necessarily also memorable (Khosla et al., 2015). In a similar vein, memorability is also separate from interestingness (Isola et al., 2014), aesthetics (Isola et al., 2014; Khosla et al., 2015), or an image’s ability to capture attention or cause priming effects (Bainbridge, 2017). Instead, memorability seems to constitute an image property, the origins of which remain unclear. Nevertheless, studies have pointed towards the existence of distinct memorability neural signatures (Bainbridge, Dilks, & Oliva, 2017; Khaligh-Razavi, Bainbridge, Pantazis, & Oliva, 2016).

The establishment of the concept of memorability as an intrinsic, separate image property, raised the question of how it can be predicted and explained. When it comes to automatically predicting an image’s memorability without the need for human annotations, good results have been achieved using global image descriptors such as GIST (Isola et al., 2014; Oliva & Torralba, 2001). Even better results were obtained with a convolutional neural network (CNN; Khosla et al., 2015). While these techniques are very valuable for automated prediction, they do not readily allow us to understand what exactly makes an image memorable. Although, admittedly, there are some ways of gaining more insight from them. Khosla et al. (2015), for example, revealed that their CNN particularly predicted high memorability scores for close-ups of humans, faces and objects, and also often for images containing animals or text. Among the more human-understandable image features that have been investigated so far, are color features, such as mean hue, mean saturation, etc. They only correlated weakly with memorability (Isola et al., 2014). Similarly, most object statistics, such as object counts and pixel coverage, were not very predictive of memorability either (Isola et al., 2014). However, when the semantic label of the scene and the objects within it was taken into account, a support vector regression trained on labeled object statistics did explain a considerable amount of variability in memorability: ρ = .54. People, interiors, foregrounds, and human-scale objects seemed to contribute positively to memorability, while the opposite was true for exteriors, wide angle vistas, backgrounds and natural scenes (Isola et al., 2014). In a further attempt to better understand memorability, Isola, Xiao, Parikh, Torralba, & Oliva (2011) had Amazon’s Mechanical Turk workers annotate their 2222 images with an extensive list of attributes, such as lot going on, funny, famous place, etc. They used a feature selection scheme to identify attributes playing a role in memorability. Enclosed space, face visible, tells a story, and recognize place were found to be positively related to memorability, whereas peaceful had a negative relation.

Despite the invaluable work reviewed above, the factors driving memorability are not yet fully understood to date. There is still a fair amount of variability left unexplained. Previous work has mostly focused on either low-level visual features (e.g., mean overall hue) or more abstract, high-level characteristics (e.g., the type of content), except maybe for some of the object statistics studied by Isola et al. (2014), which could, in a way, be considered to be more mid-level. Here, we aimed to contribute to the understanding of image memorability by focusing more on factors at that intermediate level of the visual hierarchy. More specifically, we hypothesized that part of the variability in image memorability resides in the “goodness” of an image’s organization.

It has long been argued by Gestalt psychologists that visual stimuli differ in how well they can be organized. Good organizations are believed to be characterized by, among other things, regularity, symmetry, and simplicity (Koffka, 1935). In a good organization, the constituting parts are combined into a strong, coherent whole, following Gestalt principles (Wertheimer, 1923/2012; for extensive reviews, see Wagemans, Elder, et al., 2012; Wagemans, Feldman, et al., 2012). For example, a dot lattice is considered to constitute a better organization than a random dot pattern because of its larger degree of regularity, and the set of all possible dot lattices can also be ranked in terms of goodness based on their degree of symmetry (Kubovy, 1994; Kubovy & Wagemans, 1995). In his information theoretic approach, Garner (1962) postulated that good patterns are those that have few equivalents. In the popular game “Tetris”, the O-tetromino would be the one with the highest goodness, as it has no equivalents other than itself (i.e., it does not change under the rotation transformation). The I-tetromino scores only a little less, as it only has two equivalents (landscape and portrait), whereas the T-tetromino has four. Garner and Clement (1963) tested his theory with dot patterns and found that participants assigned higher goodness ratings to those patterns for which an independent group of participants had assigned fewer other patterns to its group of equivalents. Other ways to quantify goodness have been proposed by Hochberg and McAlister (1953) and by van der Helm and Leeuwenberg (1996).

While the aforementioned work dealt with relatively basic, easy to parametrize stimuli (e.g., dot patterns), goodness is much harder to quantify with more rich and complex visual material (e.g., paintings and photographic images). Yet, Gestalt notions similar to goodness have made their way into that literature as well. For example, according to Arnheim (1954/2004) a visual artwork with a good organization is one that is in visual balance. It is believed that visual balance facilitates the combination of pictorial parts into a coherent, comprehensive whole and thus helps to convey the meaning of the visual display. Arnheim identified a structural skeleton consisting of multiple axes (horizontal, vertical, and diagonal) and nine primary locations (including the center of the frame), which are said to attract pictorial parts towards them. A composition is in visual balance when all the attractive forces, as Arnheim calls them, cancel each other out and everything seems at rest (e.g., parts placed at the center or in symmetric positions to the center). This is opposed to a case in which the pull is stronger in one direction and no such equilibrium is established (e.g., parts predominantly in one part of the picture). This idea is somewhat supported by a study by Abeln et al. (2016), who found that when participants have to crop photographs to make them look as nice as possible, they not only tend to go for high overall saliency, but also for a balance in the distribution of the salient regions. The center-of-mean for saliency was generally close to the geometrical center of the frame. In addition, Jahanian, Vishwanathan, and Allebach (2015) started from a large set of aesthetically highly rated photographs and tried to model the probability of a high saliency value at each pixel location using a mixture of Gaussians. Their results show hotspots similar to Arnheim’s nine primary locations, and share other characteristics with the structural skeleton as well. Another, very much related, if not synonymous concept, is that of visual rightness (Carpenter & Graham, 1971; Locher, Stappers, & Overbeeke, 1999), which also encompasses the notion that there is a right way to arrange the parts in order to maximize the impact on the viewer and that artists tend to know how. Locher et al. (1999) stress that it is not so much specific, individual locations that can be “right”, but the entire spatial system of interrelations. In an empirical investigation, Locher (2003) experimentally manipulated artworks to make them less well-organized. He then presented participants with both the original works and the manipulations and had them decide which one was more likely the real one (i.e., the original). When the manipulation saliently disrupted the spatial organization, participants identified the real one more often than expected by chance, leading Locher to conclude that visually right compositions are salient to viewers, even when they lack formal training in the visual arts. Finally, the concept of a good Gestalt has also been picked up outside the field of psychology and has become popular in photography and design textbooks (e.g., Freeman, 2015; Macnab, 2011).

Of specific interest to the hypothesis about image memorability and goodness of organization raised above, is that goodness of organization has been associated with memory benefits. To quote Attneave, “It has been generally held by Gestalt psychologists that ‘good’ figures are remembered more accurately than ‘poor’ ones” (Attneave, 1955, p. 209). Checkosky and Whitlock (1973), for example, found that better dot patterns, as defined in terms of Garner’s (1962) number of equivalent patterns, are recognized more easily in a recognition memory task (see also Garner, 1974). Using different patterns, Attneave (1955) and Schnore and Partington (1967) found similar results for reproduction memory (see also Garner, 1974). More recently, there has also been a lot of research on the benefits of Gestalt organizational cues for visual working memory (Gao, Gao, Tang, Shui, & Shen, 2016; Peterson & Berryhill, 2013; Woodman, Vecera, & Luck, 2003). Furthermore, Brady, Konkle, and Alvarez (2009) demonstrated that statistical regularities in the input arrays were easier to remember in visual short term memory, which could be attributed to a compression advantage at the encoding stage. In a more recent modeling study, Brady and Tenenbaum (2013) formalized the role of perceptual organization in a probabilistic model of visual working memory in which higher-order structure was explicitly incorporated.

In addition to memory benefits, goodness of organization has also been said to be characterized by faster, more efficient processing (e.g., Garner, 1974). For example, when participants need to sort cards displaying one of two alternative dot patterns into two piles, they do so faster for better patterns (Clement & Varnadoe, 1967). Better patterns also yield shorter reaction times in a discrete reaction time task using classification (Garner & Sutliff, 1974) and seem to suffer less from backward masking (Bell & Handel, 1976). In all these cases, the effects were attributed to faster processing of better patterns. Bell and Handel (1976) also predicted, but did not test, that better patterns would therefore suffer less from short stimulus presentations compared to poor patterns. In a speeded classification task, however, Pomerantz (1977) found that good patterns were encoded no faster than poor patterns. He explained the discrepancy with the earlier results by attributing the earlier results to response bias in favor of the good pattern or to intercept effects resulting from decision or response selection rather than encoding as such.

Better forms and patterns are also more robust against transformation. For example, Stadler, Stegagno, and Trombini (1987, as cited in Luccio, 1999) showed that good forms do not transform into poor forms as easily as vice versa in a stroboscopic transformation experiment. Wagemans (1992, 1993) showed that it is easier to match dot patterns and polygons with their affine and perspective transformed counterparts (corresponding to presenting the stimuli on differently oriented planes relative to the viewer) if they contain mirror symmetry than if they do not. A similar advantage was also obtained for dot patterns with other types of regularities (e.g., collinearity, parallelism) compared to patterns that appear more random (Kukkonen, Foster, Wood, Wagemans, & Van Gool, 1996; Wagemans, Van Gool, Lamote, & Foster, 2000). Another transformation to which good Gestalts are considered to be more robust, is reduction in size. In the world of art directors, it is generally believed that if an image does not speak to observers under minified viewing (i.e., reduced to thumbnail size), it will definitely not work in a magazine either (Koenderink, 2015). Koenderink (2015) proposes that those images that do survive “have ‘something’ that other pictures lack. The ‘something’ evidently has to do with the perceptual organization evoked by them. The images have a Gestalt quality that easily survives reduction to postage stamp size” (Koenderink, 2015, p. 908). Interesting in this regard is a study by Suh, Ling, Bederson, and Jacobs (2003), who proposed a more effective way to generate thumbnails of images than mere shrinking. Their algorithm first crops an image and searches for a cropping window that maximizes the saliency of the result and minimizes its size. The saliency is computed based on both low level features and semantic information (i.e., face detection). They found that their method, when compared to mere shrinking, increases participants’ performance in identifying the thumbnail content after a presentation of 2 s and results in faster search times when participants need to find a target thumbnail among distractor thumbnails. Although not directly tested, one could reason that those images that are good from the start, such that the algorithm cannot contribute anything beyond mere shrinking, will yield better identification and search times after shrinking than those that were not as good from the start and could have benefitted from the algorithm.

Based on the literature reviewed above, we operationalized our main hypothesis that image memorability (at least partly) relates to goodness of image organization into two more specific hypotheses that are more directly testable. Specifically, the work presented here is based on the idea that goodness of organization is characterized by faster, more efficient processing and larger robustness against transformation. First, we hypothesized that memorable images would be processed faster (or, more accurately at ultra-rapid stimulus presentation). In Study 1, we tested this hypothesis using a rapid-scene categorization task (for more details, we refer to the introduction of Study 1; Thorpe, Fize, & Marlot, 1996). Our second hypothesis was that memorable images would be more robust against transformation. This hypothesis was addressed in Study 2. More specifically, we studied a shrinking transformation and developed a thumbnail search task to quantify to what extent an image survives shrinking to thumbnail size and can still convey its meaning (for more details, we refer to the introduction of Study 2). We discuss the results of Study 1 and Study 2 together in an interim discussion, before introducing a third study aimed at clarifying some of those results.

Study 1: Categorizability

In the General Introduction, we proposed that part of the variability in image memorability resides in the goodness of an image’s perceptual organization. Fast, efficient processing is a characteristic often ascribed to visual displays of good organization and therefore, we hypothesized that memorable images would be processed faster (or, more accurately at ultra-rapid stimulus presentation). In Study 1, we tested this hypothesis adopting a rapid-scene categorization task (Thorpe et al., 1996). On each trial, participants were very briefly (33 ms) presented with a scene image, followed by a mask and then a label. They had to judge whether the label matched the scene or not. We asked two questions. First, we asked whether there was consistent interstimulus variability in this task, as was observed for memorability, in the sense that images would differ consistently in their probability of being categorized correctly at this very short stimulus duration. We will refer to this probability as “categorizability”. Second, we asked whether categorizability correlates positively with memorability. To this end, we selected scene images from a set for which memorability scores were already available (Bylinskii et al., 2015).

Methods

Participants

A total of 147 undergraduate psychology students participated in this study in exchange for course credits (125 women, 22 men).1 Ages ranged from 17 to 29 years old (M = 18.45, SD = 1.37). The study was approved by the Ethical Committee of the Division of Humanities and Social Sciences, KU Leuven, Belgium. All participants gave written informed consent prior to the start of the study.

Stimuli

Images

We selected 14 of the 21 scene categories in the FIGRIM-dataset (Bylinskii et al., 2015), in such a way that the final selection contained equal numbers of indoor and outdoor categories (see Figure 1). We specifically avoided including image categories that are too similar (e.g., pasture and golf course). The original English category labels were translated to Dutch. All but one category received a Dutch label that paralleled the English label. The only exception was the scene category house, which was more loosely translated as gevel van een huis (façade of a house). This was to avoid confusion with certain indoor categories (e.g., bedroom, which is part of a house). For each category, we then randomly selected 44 images after excluding those we deemed unsuitable for the current purposes. Reasons for exclusion were: not having been scored on memorability, having an ambiguous category membership (e.g., an image of a pasture on a mountain could belong to either pasture or mountain), and containing text that is not part of the scene (e.g., the date the image was captured). An additional two images were selected for each category: one of these was used for the familiarization procedure (i.e., example images), the other for the practice trials (see Procedure). All the above selections were carried out once, such that all participants received the same stimulus set with the same assignment to familiarization phase, practice phase, and main task. The IDs of the selected images, together with the category label (both original and translated) are available for download (see the Data Accessibility Statement).

Masks

With the exception of the example images, we generated a colored mask for each of the selected images. This was achieved by adding random deviations to the phase spectrum of each image in the Fourier domain, while preserving the original amplitude spectrum (Hansen & Loschky, 2013; Vanmarcke, Noens, Steyaert, & Wagemans, 2017). For each given point of an image, the random deviation was constant across the three color dimensions (RGB).

Task and Procedure

Participants were invited to a computer lab of the university, where they were seated individually in front of a computer. All computers had a 21.5-inch TFT-monitor with a resolution of 1920 × 1080 px and a refresh rate of 60 Hz. The study started out with a short familiarization procedure. During this procedure, an example image of each category was shown on the screen, along with the corresponding category label. The presentation was self-paced and in a random order. The familiarization procedure was intended to give participants an idea of which scene categories would be involved in the actual rapid-scene categorization task.

Each trial of this rapid-scene categorization task (see Figure 2) consisted of the consecutive presentations of a fixation dot (500 ms), the target image (33 ms) and its corresponding mask (83 ms). Both the target image and the mask were presented at a size of 512 × 512 px. The mask was followed by a brief interval of 33 ms in which only the grey background was presented. Finally, a scene label appeared on the screen. The instructions were to indicate whether the label matched the scene, ‘yes’ (J-key) or ‘no’ (F-key). There was a response limit of 3 s. Upon the registration of a response, the label disappeared and the screen remained blank for the remainder of the 3-s response interval (before the onset of the next experimental trial). When participants failed to respond within this 3-s response interval, the response was regarded as incorrect and the next trial was initiated. Importantly, a random half of the target images of a given category were presented with the congruent category label (e.g., a random half of the bathroom images were followed by the label ‘Bathroom?’). The remaining half of the target images were presented with an incongruent category label (e.g., images of a bathroom followed by the label ‘Kitchen?’). The incongruent labels were randomly selected from the same superordinate category (i.e., indoor or outdoor). This resulted in a total of 616 trials per participant, of which 308 (22 × 14) were congruent (requiring a ‘yes’ response of the participant) and 308 (22 × 14) were incongruent (requiring a ‘no’ response of the participant). The 14 target categories were presented randomly across trials. The order of the trials was also fully randomized over categories. All the aforementioned randomizations were performed separately for each participant.

Schematic of the rapid-scene categorization task.

Fourteen practice trials, presenting the 14 practice images (see Stimuli), were added to the beginning of the task to acquaint participants with the task (especially the short presentation durations). During these practice trials, but not in the main experiment, feedback was provided. In total, the task lasted about an hour. Participants were offered 11 self-timed breaks, one immediately after the practice trials, the others every 56 trials.

Categorizability Scores

To quantify how easy it is to categorize a certain image rapidly, we assigned each image a “categorizability” score. These scores were computed using a similar method as the one used by Bylinskii et al. (2015) to compute the memorability scores for the images we used here (also see Isola et al., 2014). Specifically, for each image we calculated the proportion of participants who correctly categorized the image on the congruent trials. As a result, the scores were based on an average of 74 responses per image.

Results

Whenever we describe hypothesis tests in this section or any of the other Results sections, the adopted alpha level was .05 unless otherwise indicated.

General Performance

Despite the brief and masked target scene presentation, participants were able to categorize the images relatively well, with a mean percentage correct of 78% (SD = 8%) and a mean d’ of 1.69 (SD = 0.55; see Macmillan & Creelman, 2005, for an explanation of the d’-measure from signal detection theory). This is in line with previous findings on the time-course of masked rapid-scene categorization, showing that remarkably short presentation durations often suffice for observers to be able to extract the gist of a scene (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007; Thorpe et al., 1996; Vanmarcke & Wagemans, 2015). Average percent correct scores were higher on incongruent trials (M = 84%, SD = 10%) than on congruent trials (M = 73%, SD = 13%); t(146) = 8.75, p < .001. This is likely due to an overall bias to respond ‘no’ when being uncertain about the correct response (mean β = 1.52; SD = 0.59; see Macmillan & Creelman, 2005). When unsure, participants perhaps did not feel inclined to respond ‘yes’ knowing there were 14 categories involved in the task and only one would imply a match. They might not have picked up on the fact that there were as many congruent as incongruent trials.

Categorizability Scores

Table 1 shows descriptive statistics for the categorizability scores per category. There is considerable variation across the categories in the mean categorizability scores, suggesting that some categories were easier than others in this rapid-scene categorization task. Notice that the easiest categories (highest mean categorizability score) also showed the least variation in the categorizability scores of their members. This was confirmed when correlating the mean and the standard deviation of the categorizability scores of a category: r(12) = –.81, p < .001. This is probably due to a performance ceiling effect (leading all participants to perform more similar on the easier task categories).

Table 1.

Descriptive Statistics for Categorizability Scores per Category.

	Living room	Bridge	Kitchen	Bathroom	Conference room	Bedroom	Airport terminal	Amusement park	Playground	Mountain	Cockpit	House	Pasture	Skyscraper

Mean	.55	.60	.63	.64	.65	.66	.67	.70	.73	.82	.85	.87	.88	.89
Median	.56	.64	.65	.69	.64	.69	.69	.76	.79	.85	.88	.89	.90	.90
SD	.17	.21	.15	.19	.18	.16	.14	.23	.19	.09	.10	.08	.10	.06
Min	.22	.17	.20	.27	.30	.28	.27	.15	.23	.61	.43	.61	.40	.77
Max	.87	.97	.95	.92	.92	.94	.90	.96	.94	.97	.97	.98	.99	.97

Open in a new tab

Consistency across participants

To assess the consistency of the categorizability scores across participants, we applied the same method that has previously been applied to memorability scores (e.g., Bylinskii et al., 2015; Goetschalckx et al., 2017; Isola et al., 2014). That is, we randomly split our participant pool into two halves and calculated Spearman’s rank correlation (ρ) between the categorizability scores based on the responses of the first half and those based on the responses of the second half. Repeating this 1000 times and averaging the resulting correlations provided an estimate of the consistency. Figure 3 presents the mean split-half Spearman’s rank correlations in function of the category. Most categories reach high levels of consistency (ρ’s up to .90), suggesting that categorizability can also be considered a meaningful image property. However, some categories seem to be lagging behind and show lower levels of consistency with the available number of responses. Suspecting this might be due to range restriction for the easier categories, we tested the Pearson (r) correlation of the consistency estimate with the mean categorizability for a category, as well as with the standard deviation of the categorizability scores. The results were r(12) = –.87, p < .001, and r(12) = .91, p < .001, supporting our suspicion that these low reliability scores were due to high mean categorizability scores.

Consistency of categorizability scores across participants. Mean-split-half Spearman’s rank correlations were calculated based on 1000 random splits.

Categorizability versus memorability

Figure 4 shows a scatterplot of the z score of the memorability score (taken from Bylinskii et al., 2015; AMT1: within-category experiment) for each image against the z score of its categorizability score (collected here). Z scores were computed per category in order to ensure that any general correlation between categorizability and memorability we might observe would not be distorted by differences in mean categorizability and memorability across categories (i.e., to ensure that this correlation would not be driven by between-category differences rather than within-category differences). A one-tailed hypothesis test for the Pearson correlation (r) between the two z-scored variables failed to reject the null hypothesis that the correlation was smaller than or equal to zero : r(614) = –.07, p = .96. If anything, the nominal correlation value was negative. We offer a possible explanation for this result in the Interim Discussion following Study 2.

Memorability in function of categorizability. Each point represents an image (N = 616). The blue line indicates the best fitting regression line and the bands show 95% confidence intervals. The corresponding Pearson correlation is indicated in the bottom left corner.

Study 2: Shrinkability

While in Study 1, we approached our main hypothesis about image memorability and goodness of perceptual organization from the angle of fast and efficient processing, we turned to a different characteristic of good organizations in Study 2. Here, we focused on robustness against transformation and more specifically on a shrinking transformation. We used the same images as in Study 1 and quantified to what extent they survive shrinking to thumbnail size using what we called a thumbnail search task. In this task, participants had to look for a thumbnail version of regular-sized image among eight distractor thumbnails. The rationale here was that it would be easier to find the matching thumbnail for those images that succeed better at conveying their meaning under minified viewing. Therefore, we operationalized the “shrinkability” of an image as being negatively related to the mean reaction time across participants. Analogously to Study 1, we asked two questions: (1) are shrinkability scores consistent across participants, and (2) does shrinkability correlate positively with memorability?