Skip to main content
PLOS One logoLink to PLOS One
. 2022 Feb 11;17(2):e0263781. doi: 10.1371/journal.pone.0263781

Difficulty and pleasure in the comprehension of verb-based metaphor sentences: A behavioral study

Patrick J Errington 1,*,#, Melissa Thye 2,#, Daniel Mirman 2,#
Editor: Thomas Holtgraves3
PMCID: PMC8836342  PMID: 35148355

Abstract

What is difficult is not usually pleasurable. Yet, for certain unfamiliar figurative language, like that which is common in poetry, while comprehension is often more difficult than for more conventional language, it is in many cases more pleasurable. Concentrating our investigation on verb-based metaphors, we examined whether and to what degree the novel variations (in the form of verb changes and extensions) of conventional verb metaphors were both more difficult to comprehend and yet induced more pleasure. To test this relationship, we developed a set of 62 familiar metaphor stimuli, each with corresponding optimal and excessive verb variation and metaphor extension conditions, and normed these stimuli using both objective measures and participant subjective ratings. We then tested the pleasure-difficulty relationship with an online behavioral study. Based on Rachel Giora and her colleagues’ ‘optimal innovation hypothesis’, we anticipated an inverse U-shaped relationship between ease and pleasure, with an optimal degree of difficulty, introduced by metaphor variations, producing the highest degree of pleasure when compared to familiar or excessive conditions. Results, however, revealed a more complex picture, with only metaphor extension conditions (not verb variation conditions) producing the anticipated pleasure effects. Individual differences in semantic cognition and verbal reasoning assessed using the Semantic Similarities Test, while clearly influential, further complicated the pleasure-difficulty relationship, suggesting an important avenue for further investigation.

1. Introduction

‘The new dawn blooms as we free it’ [1]. That line, like many others in Amanda Gorman’s famous poem ‘The Hill We Climb’, is striking. First performed to great acclaim for US President Joe Biden’s inauguration in January 2021, her poem quickly became one of the most celebrated and influential pieces of writing in recent years. But more than the circumstances, the tumultuous politics surrounding its recital, more even than Gorman’s riveting performance, the language itself produces a kind of thrill.

All the same, the poem’s language, like that of many poems, can hardly be said to be easy to understand. (Melanie McDonagh [2], writing in the Spectator, opined that the poem is in fact too difficult.) A line like ‘The new dawn blooms as we free it’ is undoubtedly more challenging to comprehend than any (egregiously blunt) literal paraphrase like ‘The future will be good’. Why use this ‘difficult’ language, then? Why can’t the poet, as some might complain, ‘just say what she means’? After all, things that are easier tend to be more pleasurable. What then is the link, if any, between the particular kind of difficulty encountered in expressive language and the ‘affectiveness’, the pleasure even, that many experience when reading a piece of writing like Gorman’s?

In the present study, we hypothesised that there is such a connection between sentence processing difficulty and pleasure, particularly when it comes to figurative language like metaphor. Specifically, we suggested that pleasure induced by reading metaphorical sentences of increasing degrees of difficulty would also increase, peaking at a certain point before then falling as stimuli sentences become too difficult to resolve. In this, we were informed by the ‘optimal innovation hypothesis’ put forward by Rachel Giora and her team [3, 4], which suggests that pleasurability is sensitive to what they claim to be an optimal degree of innovation of a given stimulus. A stimulus can, they suggest [4], be considered optimally innovative if it provokes a nondefault response, ‘which differs from the default response(s) associated with it, both quantitatively and qualitatively’, all the while ‘allowing for the automatic recoverability of the default response(s) related to that stimulus, so that both the default and nondefault responses may be weighed against each other, their similarity and differences assessable’ (p. 10).

There is a common assumption that it is the figurativeness of poetic language—its use of metaphor, among other non-literal language—that creates difficulty in comprehension. ‘The new dawn blooms as we free it’ is indeed hardly literal—dawn cannot literally ‘bloom’ any more than it can be ‘freed’. Moreover, ‘dawn’ here suggests a vision of the future rather than a literal daybreak. Yet such figuration is by no means uncommon. An enormous amount of everyday thought and language is metaphorical [see e.g., 412]; we ‘run for office’, ‘grasp meanings’, ‘raise problems’, and none of those strictly literally. Moreover, it is not always the case that metaphor comprehension involves a more lengthy and complicated process than that of literal language [cf. 13, 14]–an attempt at a literal interpretation of the sentence does not always precede a figurative interpretation [see, e.g., 1517].

Another suggestion is that difficulty is the result of unfamiliar or ‘novel’ language. Again, this is true of Gorman’s poem and enduring trends in literary and artistic theory—beginning, perhaps, with the Russian formalists but with a lineage tracing back as far as Aristotle—also contend that the defining feature of art is novelty and that this novelty slows perception and comprehension by inducing difficulty [see, e.g., 18]. According to formalist Viktor Shklovsky [18], by ‘defamiliarizing’ or ‘making strange’ (ostranenie, in Russian) habitual experiences and perceptions, art makes it so that ‘one may recover the sensation of life; it exists to make one feel things, to make the stone stony’ (p. 4).

However, any increased ‘affectiveness’ or ‘pleasure’ cannot be the result of novelty alone. For one, one encounters unfamiliar sentences every day and yet has little difficulty comprehending them—indeed, many of the sentences in this very article are likely novel to many readers. It must be recalled that, in the ‘defamiliarization’ that Shklovsky [18] and other formalists advocate, the familiar must nevertheless remain perceptible despite whatever manipulations the artist has subjected it to, as some theorists have emphasised more recently [see, e.g., 19]. In other words, in making ‘the stone stony’, one cannot change it so much that it is no longer recognisable as a stone.

Much of this aligns with Rachel Giora and her team’s ‘optimal innovation hypothesis’ [3, 4]. The most recent iteration of this hypothesis [4] proposes that a pleasurable experience results from the altering of a stimulus enough that it elicits a novel, nondefault response, yet all the while continuing to elicit the default response: ‘The result is that both interpretations [default and nondefault] are entertained and interact’ (p. 10). This, they suggest, is at the heart of the pleasure felt in experiencing art, hearing jokes, and reading poetry—they are pleasurable not despite the fact that understanding such experiences is more difficult, but because of it.

The optimal innovation hypothesis adds to a long tradition of research regarding the relationship between stimulus complexity and aesthetic appreciation [see 20 for a recent review]. While research in this tradition has tended toward visual or auditory stimuli rather than linguistic, Giora et al.’s hypothesis can be considered an extension of Berlyne’s 1971 proposed inverted U-shaped relationship between stimulus complexity (producing processing difficulty) and aesthetic experience (pleasure), with the highest preference for stimuli of an intermediate level of complexity [21]. Attempts to test Berlyne’s hypothesis have yielded complicated findings, with evidence both supporting [see e.g., an overview of U-shaped preferences in music, 22] and contradicting Berlyne’s prediction [for evidence of linear relationships, see 23, and non-inverted U-shaped relationships, see 24]. These conflicting empirical results have been suggested to indicate not only crucial differences in how complexity is defined, measured, and manipulated, but also the ways that individual differences might shift this relationship [20].

For their part, Giora et al. [3] specify that optimal innovation is more than any simple ‘variant’ or ‘complication’ of a given stimulus. Both a familiar (e.g., ‘A piece of paper’) and a variant (e.g., ‘A single piece of paper’) ‘refer to the same concept […] to which the variant stimulus contributes no qualitatively different response’ (p. 117). As such, however, it becomes unclear if a novelised metaphor, like Gorman’s [1] ‘The new dawn blooms’, could be considered optimally innovative or simply a variant: both the line and a more familiar alternative (e.g., ‘The new dawn breaks’) would provoke the same default interpretation (e.g., ‘The future is here’). Gorman’s is a significant change from the familiar phrase, in that ‘bloom’ is generally more associatively positive than ‘break’, but is it sufficiently different to be more than just a variant? ‘A single piece of paper’ is also quite different from ‘A piece of paper’, underscoring its singularity, whilst still provoking a default interpretation. However, the majority of examples of optimal innovations Giora et al. [3] provide are puns—e.g., ‘a peace of paper’ as an optimal innovation of ‘a piece of paper’ (p. 117), and as such invoke an entirely new conceptual domain from the default (‘a piece of paper’). Does a metaphor variation, like exchanging ‘bloom’ for ‘break’, have the similar effect of invoking a whole new conceptual domain?

Novel variations of familiar metaphors may be ideal candidates for optimal innovation, despite the fact that Giora et al. [4] explicitly separate defaultness from figurativeness. As suggested in Lakoff and Johnson’s Metaphors We Live By (1980) [9] and quantified by Pollio et al. [11], a considerable amount of everyday language can be considered metaphorical, with many of these metaphors taking highly conventional forms and thereby eliciting an especially strong default response. While Lakoff and Johnson claim that all conceptual metaphors, regardless of conventionality, involve an active mapping from one conceptual domain onto another (i.e., that comprehending ‘I run for office’ involves a mapping of the domain of ‘running’ onto the domain of ‘seeking elected office’) [910], evidence from recent studies does not bear this out [25, 26]. ‘Highly conventional metaphors do not appear to require online access to conceptual mappings’, Holyoak and Stamenkovíc [27] summarise; ‘(i.e., such mappings are even easier than “automatic”—they are not performed at all)’ (p. 655). Therefore, highly conventional metaphors like ‘I run for office’, would only recruit the default interpretation (e.g., ‘I am seeking elected office’) and require no mapping of two conceptual domains.

Still, some aspect of that ‘mapping’ must remain because novel variations of conventional metaphors like ‘I run for office’ are more permissible than variations of fully lexicalised metaphors like ‘I kicked the bucket’; ‘I dash for office’ is still comprehensible as ‘I am seeking elected office’ in a way that ‘I punt the bucket’ is likely to be interpreted only literally. Notably, variation like ‘I eat for office’ does not provoke the default interpretation at all, since ‘eat’ is a completely different conceptual domain than ‘run’. A variation like ‘I dash for office’, however, is novel whilst remaining within the same conceptual domain as ‘run’. Such a variation should theoretically invite both a default interpretation (‘I am seeking to be elected’) while re-activating the latent conceptual mapping. Thus, it should provoke a nondefault meaning in the form of the source/vehicle of the metaphor (‘I am [literally] dashing / to get elected’). The same should be true of extensions to those metaphors based on the critical verb: e.g., ‘I run for office but get tripped up along the way’ is resolvable via the familiar metaphor in a way that ‘I run for office but get so full I can no longer move’ (whose extension is outside the domain of ‘running’) is not. As a result, such optimally innovative metaphors should produce a pleasurable response akin to the puns Giora et al. [3, 4] have tested.

Nevertheless, the balance that allows both default and nondefault interpretations is likely to be quite delicate, and even small changes could upset that balance. We hypothesised that making the source/vehicle (in the form of the verb or extension) too domain-specific might overprivilege the nondefault (literal) interpretation and make simultaneously resolving the figurative interpretation too difficult to be pleasurable. In cases like ‘I ski for office’, it fits the domain of ‘competitive physical movement’, like ‘run’ or ‘sprint’, and can possibly be resolved coherently–unlike ‘I eat for office’–but we hypothesised that this would be too much for most readers.

What is more, the mapping and resolution processes involved in comprehending metaphors are, like other cognitive processes, subject to differences between individuals. A metaphor will be differentially difficult for different readers and, by hypothesis, so will the pleasure derived from it, even if the underlying relationship between difficulty and pleasure is the same. Experimental design provides some control of difficulty as a property of the stimuli, but it cannot be ignored that what some people find difficult others will find very easy. Indeed, Reber and his colleagues, have suggested [e.g., 28] that, for visual stimuli at least, an individual perceiver’s processing dynamics are the key determinant of aesthetic experience: ‘The more fluently the perceiver can process an object, the more positive is his or her aesthetic response’ (p. 366). While this would seem to predict a linear relationship between difficulty and pleasure, in the case of our study, the point at which a stimulus metaphor is complex enough to provoke both default and nondefault responses simultaneously (the apex of the inverted U-shape) is likely to depend on an individual’s processing aptitude. To that end, in addition to participant assessments of processing difficulty, our study also used a Semantic Similarities Test (SST). This test assesses an individual’s ability to identify conceptual mappings between words (a form of crystalized verbal intelligence), an ability that has been shown to influence one’s capacity to process both novel and familiar metaphor comprehension [29]. High scores on the SST should correlate with higher ease ratings for all classes of stimuli—familiar, moderately innovative, and very innovative—and should therefore shift the apex of any inverted U-shaped pleasure curve toward the more innovative.

One limitation of current metaphor research is its tendency to focus on ‘nominal metaphors’ (metaphors expressed in the traditional ‘X is Y’ formulation), as noted in Holyoak and Stamenković’s review [27]. For this reason, and because of what we perceive to be a relative lack of straightforward nominal metaphors in poetry—Gorman’s poem being a prime example—and other expressive writing, we have elected to focus on other forms of metaphor. Because conceptual metaphors, like those described by Lakoff and Johnson [9, 10], tend to be expressed in verb-form rather than the nominal, we have chosen to focus on verb metaphors such as ‘I run for office’ (where the verb ‘run’ is being used metaphorically, with the nominal metaphor ‘elected offices are the finish-line of a race’ implied by that verb).

Several studies have suggested that aptness rather than conventionality is a better predictor of metaphor performance [3032]. In our case, we have elected to use highly conventional metaphors as the base and vary these by either changing the metaphorically employed verb or by extending the metaphor (with extension being related to the operative verb). Since all our stimuli are based on highly conventional metaphors, aptness should be to some degree controlled, even as in progressively more novel variation, that aptness may be less readily apparent. (It should also be noted that aptness is highly context dependent, something that we did not manipulate in this study).

Other research, such as a series of studies by Al-Azary and Buchanan [33] suggest that metaphor comprehensibility can be related to what has been called ‘semantic neighborhood density’ (SND)–the number and proximity of semantically similar words. The semantic field around a term like ‘ski’, for instance is much less dense than around a term like ‘run’. Finding, in an offline comprehension task, that tenors and vehicles with low-SND were generally more comprehensible than those with high-SND tenor and vehicles, they suggested that the many semantic neighbours of high-SND terms interfere with the computation of a new meaning for that term. However, the metaphors they tested were all relatively novel, nominal (x is y) metaphors, quite unlike the verb-based, familiar (and varied) metaphors we examined here. Moreover, while they also examined the interaction of SND with tenor concreteness, in our case, the tenors in our study are all abstract. While SND was not calculated or manipulated in our study, its potential to play a role in comprehensibility of novel metaphors is worth bearing in mind for future studies.

As it stands, present study examined the optimal innovation hypothesis [3, 4] using novel variations of familiar metaphors. This complements prior studies that focused on comprehension of completely novel metaphors [e.g., 33, 34], though we believe variations of familiar metaphors are actually more common in both everyday communication and highly specialised communication like poetry. To make this possible, we developed a broad set of stimuli phrases, matched for psycholinguistic characteristics, and assessed on familiarity, ease of interpretation, figurativeness, and imageability. These stimuli and their characteristics are available at https://osf.io/hjcyd/ as a resource for future studies of non-nominal metaphor comprehension. The present study examines the relationship between comprehension difficulty and pleasure using a combination of experimentally manipulated stimuli and observational measures of individual differences between participants. Individual differences were measured using the Semantic Similarities Test (SST), to begin to unpick how the aptitudes and characteristics of individuals might influence the difficulty-pleasure relationship.

2. Stimulus set development and norming

In order to examine the relationship between pleasure and difficulty in the comprehension of verb-based metaphors (Experiment 1), we first developed and normed a set of sentence stimuli using objective measures and subjective ratings. All human data collection was conducted online using Qualtrics.

2.1. Stimulus development

A set of 372 English-language stimulus sentences were developed, in part from conceptual metaphor types compiled in Lakoff, Espenson and Schwartz [35]. These were organised into 62 sets of 6 variation categories. The starting point was a familiar conceptual metaphor and a minimally different literal sentence. Novel metaphoric sentences were derived from the familiar metaphor in two ways: by changing the critical verb or by extending it with an additional phrase. These derivations were also done to two degrees: an ‘optimal’ innovation that was moderately close to the familiar metaphor and an ‘excessive’ innovation that was substantially farther from the familiar metaphor. Table 1 shows two example sets of 6 sentences.

Table 1. Two example sets of sentences.

Literal Sentence Familiar Metaphor Optimal Verb Optimal Extension Excessive Verb Excessive Extension
I grasp the railing I grasp the meaning I brush the meaning I grasp the meaning and shake it vigorously I tickle the meaning I grasp the meaning and swing on it
I gather my sticks I gather my strength I amass my strength I gather my strength until I can’t hold it any more I pile my strength I gather my strength into a bundle and then tie it

The first-person subject (‘I’) was applied across all sentences, to avoid gender and animate/inanimate distinctions of the English third-person subject (‘he’/‘she’/‘they’/‘it’), and the verb tense was uniformly present tense. Six sentence sets were in passive voice (e.g. ‘I am transported by a poem’); 2 sentence sets employed prepositional variations rather than verb (e.g. ‘I am in trouble’ / ‘I am nearing trouble’); 2 sentence sets employed adjectival present participle variations (e.g., ‘I have a burning desire’ / ‘I have a smouldering desire’); and 1 sentence set had a subject other than ‘I’ but maintained the first-person voice (‘My hand/plan hits a brick wall’). These variations were normed for potential use, but not included in the stimulus set used in Experiment 1.

2.2. Objective measures

The number of words in each sentence was matched within sets, with literal, familiar metaphoric, optimal verb, and excessive verb sentences all containing the same number of words (ranging from 4 to 6 words); both optimal and excessive metaphor extensions had an equivalent number of words within each set (ranging from 9 to 14). In order to derive an objective estimate of semantic distance between the various critical verbs across the different conditions (familiar, optimal, excessive) and between those critical verbs and the abstract nouns they metaphorically modify within each condition, the cosine similarity between (1) the critical verbs across the condition variations (i.e., grasp—brush) and (2) the critical verb-noun words within condition (i.e., graspmeaning) was calculated using pre-trained vector-based representations of word meaning (word2vec). Phrase frequencies were generated for literal, familiar, optimal verb and excessive verb variations using the Google NGram search engine implemented with the ngramr package in R [36]. Phrases were modified to include wildcard tags for variations in determiner use and inflection to include subtle phrase modifications in the frequency calculation. As anticipated, familiar metaphor and literal sentences were more frequent than the optimal or excessive verb variations (i.e., the novelised metaphors were indeed novel; see bottom of Fig 1).

Fig 1. Objective and subjective measures for the 6 sentence categories.

Fig 1

The subjective ratings of Ease (n = 20), Familiarity (n = 21), Figurativeness (n = 20), and Imageability (n = 20) are shown for each variation category in the top row. The NGram phrase frequency (log-scaled) values are shown in the bottom left panel: the peaks on the left edge indicate 0 frequency (i.e., not found in the corpus) for many of the metaphor variations (particularly the verb variations), the peaks in middle-right indicate moderately high frequencies for the literal sentences and familiar metaphors. Cosine similarity within variation condition (Verb-Noun Similarity) and between variation condition (Verb Similarity) are shown in the bottom right panel.

2.3. Subjective ratings

A total of 94 adult participants were recruited: 74 via the University of Edinburgh’s SONA student participant recruitment program and 20 via Prolific. Participants either received course credit or £3.50 upon completion of the 30-minute study. To prevent sentence variations from the same set appearing in consecutive trials, the sentences within the stimulus sets were assigned to one of six lists. Each list contained 62 sentences and an equal number of sentences from each variation category. An attention check question was added to each list to assess participant engagement and data quality. These questions instructed participants to select a specific number on the scale (i.e., ‘Select number six for this question’). The presentation of the lists and the sentences within each list were randomized. At the start of the experiment, participants read the description of the property they would rate sentences on and were given at least 3 example sentences with variable ratings illustrating how to use the 7-point scale (full instructions with examples are available on the project OSF page). Participants rated the 372 stimulus sentences on one of the four following properties (approximately 20 participants per property):

  • Ease: How easy the sentence was to interpret on a scale from 1 (very difficult) to 7 (very easy)

  • Imageability: How quickly and easily each sentence aroused a sensory experience (i.e., a mental picture, sound, texture, or action) on a scale from 1 (no image) to 7 (clear, immediate image)

  • Familiarity: How familiar the meaning of each sentence seemed or how commonly one might encounter such a meaning on a scale from 1 (very unfamiliar) to 7 (very familiar)

  • Figurativeness: Whether the event described in each sentence is literal (could actually happen) or whether the sentence likely conveys a more figurative meaning on a scale from 1 (very literal) to 7 (very figurative).

2.4 Norming results and discussion

A total of 13 participants were excluded from analysis due to failing more than 3 of the 6 attention checks, resulting in a final sample size of 81. The full stimulus set and the corresponding sentence-level objective and subjective measures are available on the project OSF page [https://osf.io/hjcyd/].

The distributions of the objective measures and the subjective ratings are shown in Fig 1. There is substantial variation within each sentence condition, but the patterns align with the intent of the conditions. Semantic relatedness (cosine similarities) between verbs and nouns was highest in the literal sentences (Mdn = 0.17), followed by the familiar metaphors (Mdn = 0.11), and lowest for the optimal verb (Mdn = 0.08) and excessive verb (Mdn = 0.07) conditions. Cosine similarities between the various verbs in each stimulus set were generally higher than the critical verb-noun similarities (Mdn = 0.19–0.28), suggesting that any difficulty in processing should not be the result of unusual verb choices in any particular condition. Excessive verbs tended to be less similar to familiar metaphor and to optimal verbs than these verbs were to each other. The observed pattern of less within (verb-noun) and between (verb-verb) condition similarity for excessive sentences is consistent with the claim that they are ‘excessively’ far from the familiar metaphors. The individual phrase frequencies generated by using the Google Ngram data were very low, with phrase frequencies of 0 for 9 familiar metaphors, 10 literal sentences, 47 optimal verb metaphors, and 52 excessive verb metaphors out of the total of 62 for each. This aligned with initial expectations that the familiar metaphors would be the most familiar of the individual sentences, followed by literal sentences, with both optimal and excessive verb metaphors being entirely or very nearly novel.

Participant ratings aligned with these objective metrics. Ease of interpretation, familiarity, and imageability were highest for the literal (MdnEase = 6.35; MdnFamiliarity = 5.74; MdnImage = 5.78) and familiar metaphor (MdnEase = 6.15; MdnFamiliarity = 5.74; MdnImage = 4.20) sentences, followed by the optimal verb (MdnEase = 4.85; MdnFamiliarity = 3.36; MdnImage = 3.53) and extension (MdnEase = 4.80; MdnFamiliarity = 3.10; MdnImage = 3.35) and the excessive verb (MdnEase = 3.75; MdnFamiliarity = 2.29; MdnImage = 2.95) and extension (MdnEase = 3.45; MdnFamiliarity = 2.12; MdnImage = 2.85) variations. Importantly, this was shown to be similar in the case of both optimal and excessive extensions, which we could not determine with objective measures. An inverse pattern was observed for figurativeness ratings which were lowest for the literal sentences (MdnEigurative = 2.15) and increased for familiar metaphors (MdnEigurative = 4.00) and the optimal verb (MdnEigurative = 4.50) and excessive verb (MdnEigurative = 4.63) variations. Optimal extensions (MdnEigurative = 4.88) and excessive extensions (MdnEigurative = 5.10) were rated as the most figurative.

These ratings were used to select a subset of 45 sentence sets for Experiment 1. The 17 stimulus sets that were not included were metaphors that were rated as not very metaphoric (low average figurativeness ratings and/or high imageability ratings; n = 3), optimal verb or extensions that were rated not very innovative (high familiarity ratings; n = 9), excessive verb and extensions that were too easy to understand (high ease of interpretation ratings; n = 3), and the use of adjectival present participle sentences (e.g. ‘I have a burning/smouldering desire’) as these stood out because they did not involve a subject acting or being acted upon (n = 2). We also opted not to test the literal sentences to simplify the experiment, since these were the stimuli that were being directly manipulated. After all, familiar metaphors were rated as familiar and as easy to comprehend, roughly, as the literal phrases, suggesting that for these phrases the metaphoric interpretation (‘I grasp the situation’ = ‘I comprehend the situation’) is equally the default as the literal interpretation of a literal phrase (‘I grasp the railing’ = ‘I hold on with my hand to the railing’). Where, for Giora et al. [3], the default interpretation of ‘A piece of paper’ was literal before being altered to produce the simultaneous metaphoric/literal pun, ‘A peace of paper’, for our study the default interpretation is metaphoric with the optimal variation intended to elicit both metaphoric and, to some degree, literal meaning.

3. Experiment 1

This experiment was designed to examine the relationship between pleasure and the ease of comprehension of verb-based metaphors using a combination of experimental manipulation (sentence category), subjective ratings (ease, pleasure), and individual differences (Semantic Similarities Test performance). Based on Giora et al.’s ‘optimal innovation hypothesis’ [3, 4], we anticipated that, as difficulty increased from familiar to excessive variations (both verb variations and extensions), pleasure would form an inverse U-shape, with optimal variations receiving higher pleasure ratings than either the familiar or excessive variations.

3.1. Method

Participants were 63 adults, recruited online via Prolific, all with English as their first language, no history of language-related disorders and no history of mild cognitive impairment or dementia. Both the stimulus norming and Experiment 1 were carried out in accordance with an ethics protocol approved by the University of Edinburgh PPLS Research Ethics panel (Ref No. 277-2021/3). As we tested a new set of stimuli and there was not a strong basis for specifying an a priori effect size, a power calculation was not possible. The sample size was determined based on prior studies of metaphor comprehension, which typically tested 30–60 participants [see e.g., 4, 31, 32], and practical limitations on data collection. We hope that providing via OSF the complete set of analysis procedures and code, along with the full stimulus set will allow researchers to determine appropriate sample sizes for future studies.

Participants received £7.25 upon completion of the approximately 1-hour study. Participants were randomly assigned to 1 of 3 experimental groups. The groups differed in the set of sentences shown to participants, but each group of stimuli contained the same overall number of items (n = 75) as well as an equal number of items from each variation category. A couple of sentence variations within a stimulus set may be included in each group, but none of the groups included all variations. Participants rated the sentences within their group on each of the subjective ratings (below), ensuring that sentence-level ratings were within-subject. Similar to the preliminary norming study, sentence variations were assigned to one of five lists. An attention check question with the same structure as those used in the norming study was added to each list. For a given rating (i.e., Ease), the set order and the presentation of the sentences within each list were randomized, but each rating block began with a description of the property and at least 3 example sentences. The presentation order of the subjective rating blocks was counterbalanced such that participants were randomly assigned to one of four presentation orders. (A complete survey file is available on the project OSF page).

Participants in all groups rated the sentences on the following properties (full instructions and examples are available on the project OSF page):

  • Ease: How easy the sentence was to interpret on a scale from 1 (very difficult) to 7 (very easy)

  • Imageability: How quickly and easily each sentence aroused a sensory experience (i.e., a mental picture, sound, texture, or action) on a scale from 1 (no image) to 7 (clear, immediate image)

  • Emotion: How strongly each sentence evoked an emotional response on a scale from 1 (no emotion) to 7 (strong emotion)

  • Pleasurability: How much the participant liked the way the message was expressed focusing on how effective, satisfying, or powerful the sentence was on a scale from 1 (not pleasurable) to 7 (very pleasurable)

Between the subjective rating blocks, participants completed blocks of individual differences measures, which were presented in a fixed order. These measures included the Semantic Similarities Test (SST), which was manually scored according to the criteria outlined by Stamenković, Ichien, and Holyoak [23].

In developing the test questions, we took particular note of the potential complications raised by Giora et al. [3] resulting from the use of the word ‘pleasure’ in the assessment of aesthetic experience of reading literary language; because the word is positively associated, we were concerned that it might contribute to a negative bias toward sentences with negative valences or meanings (e.g., ‘I am bruised by grief’). While we also note Schindler and her colleagues’ recent review of methodologies for measuring aesthetic experience [37], in the case of this study, we, following Giora and her team, were primarily interested in the potential increase in a very broad idea of ‘pleasure’, aligning more with ‘affectiveness’ or ‘felt experience’ [38] provoked primarily by the formal features of the phrase rather than any particular emotion: i.e. if the sentence content was broadly ‘sad’, did the formal variation make it ‘sadder’; if ‘pleasant’, did the variation make it more pleasant; and so on. To that end, we presented the instructions as follows:

You will read a series of sentences that have figurative (non-literal) meanings as well as literal meanings. For each sentence, rate on a 7-point scale how much you liked the way the message was expressed. This does *not* mean that you liked the message; rather, you should rate how effective, satisfying, or powerful the sentence was. It might help to think about whether you would enjoy reading such a sentence in a book or poem.

3.2. Results and discussion

Data from 3 participants were excluded from analysis due to failing any of the 5 attention checks (n = 1), providing the same response to all items within a block (n = 1), or failing to complete the entire survey (n = 1), resulting in a final sample size of 60 (20 per group; female = 38, male = 22, mean age = 31.32). All participants completed the Semantic Similarity Test and received credit for at least one response (M = 22.12, SD = 6.88).

Data were analyzed using linear mixed effects models implemented with the lme4 package (version 1.1.23) [39] in R (version 4.0.2) [40]. Model parameter p-values were obtained using the Satterthwaite method for estimating degrees of freedom via the lmerTest package (version 3.1.2) [41]. Continuous predictors were centered prior to analysis.

A first set of analyses directly compared the metaphor conditions (as fixed effects) with random by-participant intercepts and slopes of variation category and random intercepts of item. The results of these analyses (Table 2 and Fig 2) revealed that both verb and extension manipulations produced monotonic differences in ease of comprehension: Familiar metaphors easier than optimal variations, which were easier than excessive variations. This provides a validation of the manipulation—the “excessive” metaphor variations were indeed more excessive (difficult to comprehend) than the “optimal” metaphor variations. For pleasure ratings, the verb variations elicited an analogous monotonic effect: Familiar metaphors were rated as more pleasurable than optimal verb variations, which were more pleasurable than excessive verb variations. Pleasure ratings for metaphor extensions exhibited the predicted U-shape for extensions: Familiar metaphors were rated as less pleasurable than optimal extensions and marginally more pleasurable than excessive extensions.

Table 2. Parameter estimates (standard error in parentheses) for variation conditions relative to the familiar metaphor condition.

Condition Ease Pleasure
Optimal Verb -1.32 (0.12) *** -0.27 (0.13) *
Excessive Verb -2.14 (0.16) *** -0.75 (0.15) ***
Optimal Extension -0.89 (0.12) *** 0.40 (0.18) *
Excessive Extension -2.03 (0.21) *** -0.37 (0.20)

Note: p < 0.1,

* p < 0.05,

** p < 0.01,

*** p < 0.001

Fig 2. Ease and pleasure ratings by condition.

Fig 2

Although the metaphor variations conditions were intended to manipulate comprehension difficulty, there was substantial variation in ease of comprehending metaphors within each condition. Further, differences between participants in semantic knowledge were also expected to influence comprehension difficulty. Therefore, a second set of analyses assessed how pleasure was predicted by sentence variation category (fixed effect with familiar metaphor as the reference level) and ease of interpretation (Ease fixed effect in Model 1) or individual differences in semantic knowledge (SST fixed effect in Model 2). Model 3 assessed the impact of sentence variation type and semantic knowledge on ease of interpretation. All models included random by-participant intercepts and slopes of variation category and random intercepts of item.

Model 1 results are shown in the top section of Table 3 and the left panel of Fig 3. Pleasure ratings were highest for the optimal extension sentences (they were approximately equal for the other 4 categories of sentences) and tended to increase with ease of interpretation. There was also a significant interaction between ease and sentence category: the positive association between ease and pleasure was weaker for the familiar metaphors than for the other 4 sentence categories (though not statistically significantly different from the optimal extension category).

Table 3. Experiment 1 continuous analyses of effect of ease on pleasure ratings (Model 1), effect of individual differences (SST) on pleasure ratings (Model 2), and effect of individual differences (SST) on ease ratings (Model 3).

Model 1 Term Estimate (SE) p-value
Optimal Verb -0.02 (0.13) 0.873
Optimal Extension 0.54 (0.18) 0.005**
Excessive Verb -0.23 (0.15) 0.125
Excessive Extension 0.09 (0.21) 0.683
Ease 0.19 (0.03) 0.000***
Optimal Verb x Ease 0.10 (0.04) 0.029*
Optimal Extension x Ease 0.08 (0.05) 0.103
Excessive Verb x Ease 0.13 (0.04) 0.003**
Excessive Extension x Ease 0.10 (0.05) 0.032*
Model 2
Optimal Verb -0.27 (0.13) 0.040*
Optimal Extension 0.40 (0.17) 0.025*
Excessive Verb -0.747 (0.15) 0.000***
Excessive Extension -0.37 (0.19) 0.063
SST Score -0.05 (0.02) 0.034*
Optimal Verb x SST Score 0.00 (0.12) 0.822
Optimal Extension x SST Score 0.06 (0.03) 0.020*
Excessive Verb x SST Score 0.00 (0.02) 0.855
Excessive Extension x SST Score 0.06 (0.03) 0.030*
Model 3
Optimal Verb -1.32 (0.12) 0.000***
Optimal Extension -0.89 (0.12) 0.000***
Excessive Verb -2.14 (0.15) 0.000***
Excessive Extension -2.03 (0.19) 0.000***
SST Score 0.05 (0.02) 0.012*
Optimal Verb x SST Score -0.03 (0.02) 0.145
Optimal Extension x SST Score -0.02 (0.02) 0.354
Excessive Verb x SST Score -0.06 (0.02) 0.016*
Excessive Extension x SST Score -0.04 (0.03) 0.144

Note. SE, standard error. Sentence variation conditions are referenced to the familiar metaphor condition.

*p < .05.

**p < .01.

***p < .001.

Fig 3. Experiment 1 model predictions for each variation category (indicated by line style and coloring) with bands showing 95% confidence intervals.

Fig 3

Model 2 results are shown in the middle section of Table 3 and the middle panel of Fig 3. Participants with higher SST scores (better semantic knowledge) tended to give lower pleasure ratings. However, this was qualified by an interaction: pleasure ratings for the two extension sentence types (optimal extension and excessive extension) were essentially constant across the range of SST performance.

Model 3 results are shown in the bottom section of Table 3 and the right panel of Fig 3. Not surprisingly, ease of interpretation was positively associated with SST scores: participants with better semantic knowledge found the metaphoric sentences easier to understand. Also not surprising (and replicating the preliminary norming results) was that familiar metaphors were rated the easiest to understand, followed by optimal verb and extension sentences, and excessive verb and extension sentences were rated the most difficult to understand. There was also an interaction: SST was most strongly associated with ease of interpreting the familiar metaphor sentences and least associated for the excessive verb sentences (the other sentence categories were intermediate). That is, semantic knowledge appeared to be particularly important for understanding familiar metaphors, but not for making sense of novel sentences.

4. General discussion

Our working hypothesis was that as metaphor variants moved farther from their familiar metaphor base, they would become more difficult to comprehend and that pleasure would peak at an intermediate point—where innovation was ‘optimal’ [3, 4]. The results of both the norming study and the experiment confirmed that our stimulus manipulation elicited the intended ease of comprehension effect: familiar metaphors were rated the easiest to understand, the ‘optimal’ verb and extension variants were somewhat more difficult, and the ‘excessive’ verb and extension variants were the most difficult. However, the consequent effect on pleasure ratings were only partly consistent with the ‘optimal innovation hypothesis’.

The strongest support came from the variations made by extending the familiar metaphors. The optimal metaphor extensions were rated as intermediate in terms of ease of comprehension (more difficult than the familiar metaphors and easier than the excessive metaphor extensions), but highest in terms of pleasure (higher than familiar metaphors and excessive metaphor extensions, which were rated approximately equally pleasurable). This is consistent with the optimal innovation hypothesis and counter to the typical pattern that pleasure is monotonically associated with ease.

Verb variations followed the more typical pattern of a monotonic relationship between ease and pleasure: familiar metaphors were rated both easier and more pleasurable than ‘optimal’ verb variants, which were rated both easier and more pleasurable than ‘excessive’ verb variants. For the ‘excessive’ sentence types, extensions were rated as being more pleasurable without being easier than verb variants, which is partially consistent with the optimal innovation hypothesis. The broader pattern that metaphor extensions were rated as being more pleasurable than verb variants (among both ‘optimal’ and ‘excessive’ sentence types) may be more informative in that it suggests that, unlike single-word changes, innovations that increase context or richness can increase pleasure without decreasing difficulty.

These results may point to the important and widely recognized limitation of using single-sentence stimuli as indicative of the experience of reading complete poems [27]. Although many poems contain individual, strikingly affective phrases, like Gorman’s cited above [1], the reading of these phrases is usually shaped by a wealth of both textual and situational context, which will inform readers of how to comprehend a given sentence. Even the reading of extremely short poems—like those in ‘haiku’ form which are hardly longer than the sentences we tested—will be influenced by situational and/or ecological factors: the presence of a title, preconceptions about the haiku form, even the foreknowledge that the given text constitutes a purposely-written poem. Such contextual factors tend to make comprehension easier and increase pleasure. For instance, studies have shown [e.g., 42] that readers find anomalous metaphorical sentences to be more meaningful if they believe that the sentences are composed by a poet rather than by a computer program, and they will try longer to find them meaningful if a meaning is not readily apparent. This belief would not likely appreciably change how easy a sentence is to understand, but increased ‘meaningfulness’ might coincide with increased pleasure. In our study, it is possible, that, although the additional words in the optimal extension variations did not make comprehension easier than the optimal verb-based variations, they did encourage readers to read in such a way as to find their comprehension pleasurable.

The timing of the variation is also an important difference between the verb and extension variants. In the case of verb variations, only a single word was changed (the verb); this word occurred early in the sentence while also being the word that indicated the metaphorical nature of the sentence. As such, in an example like ‘I dash for office’, everything depends on the verb ‘dash’ to indicate simultaneously both the familiar metaphor base (e.g., ‘I run for office’ = ‘I apply to hold elected office’) and the variation of it ‘run with great haste’. Compare this to the extension, where readers comprehend first the familiar metaphor (e.g., ‘I run for office…’) and only after doing so are given the variation (‘…but get tripped up along the way’) that encourages the nondefault interpretation. The figurative meaning of the familiar metaphor is likely to be already active when the reader reaches the extension, which can more easily create the pleasurable tension between default and nondefault interpretations. This is particularly intriguing given the general preference toward economy in poetry composition, which would favour the more economical ‘I dash for office’ over the extension. Further testing and refinement of the stimulus set regarding the precise length, wording, and placement of extensions compared to equivalent verb variations might help to narrow down the causes of these results (e.g., inclusion of internal, adverbial extensions like ‘I run flat out for office’), in addition to further manipulation of textual and situational context.

Another area for further refinement of the stimulus set would be to assess not just semantic distances between verbs across conditions and between verbs and nouns within conditions, but the semantic neighborhood densities (SND) of each word (verb or noun) individually and compare them, akin to what was done by Al-Azary and Buchanan [33]. While they found that SND did influence ease of metaphor comprehension, they only compared the SND of nouns in nominal metaphors. In translating their research to our stimuli, consideration would need to be given to whether the individual SND of verbs can be compared directly with those of nouns, or whether the relative SND of verb metaphors like ‘I run for office’ would be more effectively compared in nominal form (e.g., ‘Elections are races’). It should also be noted that Al-Azary and Buchanan indicate the influence of ‘concreteness’ (vs. ‘abstractness’) of their nouns [33]. While all the sentences in our stimulus set involves relatively concrete actions (e.g., ‘running’) interacting with relatively abstract nouns (e.g., ‘elected office’), some of those nouns can be considered either ‘concrete’ or ‘abstract’ depending on the verb priming and context (e.g., ‘office’ can be a physical space or an abstract elected position), whereas others are only abstract (e.g., ‘meaning’). Further norming of our stimulus set would help to shed additional light on the potential effect of SND and abstractness of target nouns on our findings.

Individual differences, in the form of SST scores, add further complications. Not surprisingly, better recognition of semantic similarities was associated with finding metaphoric sentences easier to understand. But it was also associated with finding them less pleasurable. This (somewhat counterintuitively) suggests that individuals who have more difficulty with metaphor comprehension also find it more pleasurable. Both ease and pleasure ratings of metaphor extensions (both ‘optimal’ and ‘excessive’) were less strongly associated with SST performance, suggesting that the kind of verbal reasoning measured by SST is particularly important for shorter metaphors that are particularly dependent on figurative interpretations of single words (verbs, in this case). It is premature, at this point, to make strong inferences based on these data. What is clear, however, is that individual differences in semantic cognition and verbal reasoning (such as those measured by the SST) need to be considered because they strongly affect both ease and pleasure of metaphor comprehension.

In this study, we used two different ways of creating variants of familiar metaphors: changing the critical verb and extending the familiar metaphor with an additional phrase. The verb and extension variants elicited strikingly different responses—the extensions were rated as being more pleasurable (without necessarily being easier to comprehend) and were less sensitive to semantic ability (SST performance). It is possible that (at least some of) the verb variations did not provoke the simultaneous default and nondefault interpretations that should produce pleasure, even in the supposedly optimal condition, and thus that our results only depict the downward trend on the far side of the U-shape. Further refinement of the stimulus set and additional testing might help to clarify this.

A final limitation is that measuring pleasure is an inherently difficult task and likely to be strongly influenced by how the instructions are phrased and how participants interpret them (as discussed above). We tried to be broad in our description of ‘pleasure’ so as to avoid privileging sentences that described pleasant things over those that described negative things in an effort to shift focus toward the more formal qualities of the sentences themselves. Our instructions undoubtedly privileged a ‘poetic’ kind of pleasure by suggesting that ‘it may help to think of how much you would like to read this sentence in a poem’. Nevertheless, the relatively high pleasure ratings for familiar metaphors—many of which are fairly mundane, possibly clichéd phrases—suggests that readers were not overly attendant to some expected ‘poeticity’ of the sentence, which might have predisposed them toward ranking the more obviously ‘poetic’ verb variations higher.

In future studies, a more fine-grained definition of what we, in this study and following Giora and her team, termed ‘pleasure’ will help to refine these results. Schindler and her team [37], for instance, provide a broad survey of methods for measuring aesthetic emotions, as well as an Aesthetic Emotions Scale (Aesthemos), that may provide further ways of clarifying our definition of pleasure. Kuiken and Douglas, on the other hand, have developed an ‘Absorption-like States Questionnaire’ (ASQ) [43] intended to help describe readerly activities and aesthetic experiences provoked by literary texts, in particular what they call ‘expressive enactment’ and ‘integrative comprehension’. The former in particular is noted to be relevant for the comprehension of literary metaphors and the production of ‘inexpressible’ felt states, like what might be characterized as ‘resonance’, ‘meaningfulness’ or ‘sublime feeling’ [44]. Such a questionnaire might allow the maintenance of the ‘breadth’ of emotions we were seeking to assess while still measuring a degree of ‘affectiveness’. Additionally, aligning reported aesthetic experiences with neural activity (e.g., increased sensorimotor simulation [45] or bihemispheric activity [46]) might represent another step in further understanding the curious relationship between optimally difficult metaphors and the feelings they provoke.

5. Conclusion

Our results offer only partial support to the hypothesis that, as comprehension difficulty is increased by varying familiar metaphor stimuli (either by changing the verb or extending the metaphor), pleasure will peak at an ‘optimal’ mid-point level of difficulty. While metaphor extensions appeared to fit this hypothesis, with optimal variation conditions producing more pleasure than the easier familiar or more difficult excessive variation conditions, variations of only the verb did not produce the same effect. Individual differences in the form of SST scores further complicated the picture, indicating that, while increased aptitudes for recognising semantic similarities correlated with reduced difficulty of comprehension across conditions (although more acutely in verb-variation conditions), surprisingly they tended to correlate with reduce pleasure as well. Additional testing, however, will be necessary to strengthen any conclusions regarding the effect of individual differences. Meanwhile, these results also suggest the potential importance of context and variation timing for the pleasure resulting from reading unfamiliar metaphors and indicate several avenues for further research; the stimulus set developed here may provide an important resource for doing so.

Supporting information

S1 Fig. In all panels, the points correspond to the behavioural data and the lines correspond to the model fits described in the main text.

The key observation is that none of the panels suggest a U-shape in the behavioural data and the linear models appear to fit the data reasonably well. Left column shows results for familiar metaphors, middle column shows results for optimal verb and optimal extension conditions, right column shows results for excessive verb and excessive extension condition. Top row: relationship between ease of comprehension and pleasure (Model 1). Middle row: relationship between SST Score (semantic knowledge) and pleasure (Model 2). Bottom row: relationship between SST Score (semantic knowledge) and ease of comprehension (Model 3).

(TIF)

Data Availability

All stimulus sets, study design information, and raw data are available from the project Open Science Framework database (https://osf.io/hjcyd/) DOI: (10.17605/OSF.IO/HJCYD).

Funding Statement

This study was funded by a grant from the Research Adaptation Fund from the University of Edinburgh’s College of Arts, Humanities and Social Sciences awarded to Patrick J Errington and Daniel Mirman. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Gorman A. The Hill We Climb: the Amanda Gorman poem that stole the inauguration show. The Guardian. 2021. Jan 20. [Cited 2021 Aug 23] https://www.theguardian.com/us-news/2021/jan/20/amanda-gorman-poem-biden-inauguration-transcript. [Google Scholar]
  • 2.McDonagh M. Amanda Gorman was let down by a terrible poem. The Spectator. 2021. Jan 21. [Cited 2021 Aug 23] https://www.spectator.co.uk/article/amanda-gorman-was-let-down-by-a-bad-poem. [Google Scholar]
  • 3.Giora R, Fein O, Kronrod A, Elnatan I, Shuval N, Zur A. Weapons of Mass Distraction: Optimal Innovation and Pleasure Ratings. Metaphor Symb. 2004;19(2): 115–141. doi: 10.1207/s15327868ms1902_2 [DOI] [Google Scholar]
  • 4.Giora R, Givoni S, Heruti V, Fein O. The Role of Defaultness in Affecting Pleasure: The Optimal Innovation Hypothesis Revisited. Metaphor Symb. 2017;32(1): 1–18. doi: 10.1080/10926488.2017.1272934 [DOI] [Google Scholar]
  • 5.Barfield O. Poetic diction: A study in meaning. New York, NY: McGraw-Hill; 1928/1964. [Google Scholar]
  • 6.Embler W. Metaphor and meaning. DeLand, FL: Everett/Edwards; 1966. [Google Scholar]
  • 7.Evans V, Green M. Cognitive linguistics: An introduction. Edinburgh, UK: Edinburgh University Press; 2006. [Google Scholar]
  • 8.Gibbs RW Jr. The poetics of mind: Figurative thought, language, and understanding. Cambridge, UK: Cambridge University Press; 1994. [Google Scholar]
  • 9.Lakoff G, Johnson M. Metaphors we live by. Chicago, IL: University of Chicago Press; 1980. doi: 10.1086/292206 [DOI] [Google Scholar]
  • 10.Lakoff G, Johnson M. Philosophy in the flesh: The embodied mind and its challenge to western thought. New York, NY: Basic Books; 1999. [Google Scholar]
  • 11.Pollio HR, Smith MK, and Pollio MR. Figurative language and cognitive psychology. Lang Cogn Process. 1990(5): 141–167. doi: 10.1080/01690969008402102 [DOI] [Google Scholar]
  • 12.Keysar B, Shen Y, Glucksberg S, Horton WS. Conventional language: How metaphorical is it? J Mem Lang. 2000;43(4): 576–593. doi: 10.1006/jmla.2000.2711 [DOI] [Google Scholar]
  • 13.Searle JR. Metaphor. In: Ortony A, editor. Metaphor and thought. Cambridge, UK: Cambridge University Press; 1979. pp. 92–123. [Google Scholar]
  • 14.Clark HH. Using Language. Cambridge, UK: Cambridge University Press; 1996. [Google Scholar]
  • 15.Glucksberg S, Gildea P, Bookin HB. On understanding nonliteral speech: Can people ignore metaphors? J Verbal Learning Verbal Behav. 1982;21(1): 85–98. doi: 10.1016/S0022-5371(82)90467-4 [DOI] [Google Scholar]
  • 16.Keysar B. On the functional equivalence of literal and metaphorical interpretations in discourse. J Mem Lang. 1989;28(4): 375–385. doi: 10.1016/0749-596X(89)90017-X [DOI] [Google Scholar]
  • 17.McElree B, Nordlie J. Literal and figurative interpretations are computed in equal time. Psychon Bul. Rev. 1999;6(3): 486–494. doi: 10.3758/bf03210839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shklovsky V. Art as Technique. In: Newton KM, editor. Twentieth Century Literary Theory. London: Palgrave; 1917/1997. pp. 3–5. [Google Scholar]
  • 19.Miall DS, Kuiken D. Foregrounding, defamiliarization, and affect response to literary stories. Poetics. 1994;22(5): 389–407. doi: 10.1016/0304-422X(94)00011-5 [DOI] [Google Scholar]
  • 20.Van Geert E, Wagemans J. Order, complexity, and aesthetic appreciation. Psychol Aesthetics, Creat Arts. 2020;14(2):135–54. [Google Scholar]
  • 21.Berlyne DE. Aesthetics and psychobiology. New York, NY: Appleton-Century-Crofts; 1971. [Google Scholar]
  • 22.Chmiel A, Schubert E. Back to the inverted-U for music preference: A review of the literature. Psychol. Music, 2017;45(6): 886–909. doi: 10.1177/0305735617697507 [DOI] [Google Scholar]
  • 23.Friedenberg J, Liby B. Perceived beauty of random texture patterns: A preference for complexity. Acta Psychol. 2016; 168: 41–49. doi: 10.1016/j.actpsy.2016.04.007 [DOI] [PubMed] [Google Scholar]
  • 24.Adkins OC, Norman JF. The visual aesthetics of snowflakes. Perception. 2016;45: 1304–1319. doi: 10.1177/2041669516661122 [DOI] [PubMed] [Google Scholar]
  • 25.Gentner D, Bowdle B, Wolff P, Boronat C. Metaphor is like analogy. In Gentner D, Holyoak KJ, Kokinov BN, editors. The analogical mind: Perspectives from cognitive science. Cambridge, MA: MIT Press; 2001. pp. 199–253. [Google Scholar]
  • 26.Glucksberg S, Brown M, McGlone MS. Conceptual metaphors are not automatically accessed during idiom comprehension. Mem Cognit. 1993;21: 711–719. doi: 10.3758/bf03197201 [DOI] [PubMed] [Google Scholar]
  • 27.Holyoak KJ, Stamenković D. Metaphor comprehension: A critical review of theories and evidence. Psychol Bull. 2018;144(6): 641–71. doi: 10.1037/bul0000145 [DOI] [PubMed] [Google Scholar]
  • 28.Reber R, Schwarz N, Winkielman P. Processing Fluency and Aesthetic Pleasure: Is Beauty in the Perceiver’s Processing Experience? Personal Soc Psychol Rev. 2004;8(4):364–82. doi: 10.1207/s15327957pspr0804_3 [DOI] [PubMed] [Google Scholar]
  • 29.Stamenković D, Ichien N, Holyoak KJ. Metaphor comprehension: An individual-differences approach. J Mem Lang. 2019;105: 108–18. doi: 10.1016/j.jml.2018.12.003 [DOI] [Google Scholar]
  • 30.Bowdle BF, Gentner D. The career of metaphor. Psychol Rev, 2005;112: 193–216. doi: 10.1037/0033-295X.112.1.193 [DOI] [PubMed] [Google Scholar]
  • 31.Chiappe DL, Kennedy JM. Aptness predicts preference for metaphors or similes, as well as recall bias. Psychon Bull Rev. 1999;6(4): 668–676. doi: 10.3758/bf03212977 [DOI] [PubMed] [Google Scholar]
  • 32.Jones LL, Estes Z. Roosters, robins, and alarm clocks: Aptness and conventionality in metaphor comprehension. J Mem Lang. 2006;55: 18–32. doi: 10.1016/j.jml.2006.02.004 [DOI] [Google Scholar]
  • 33.Al-Azary H, Buchanan L. Novel metaphor comprehension: Semantic neighbourhood density interacts with concreteness. Mem Cogn. 2017;45:296–307. doi: 10.3758/s13421-016-0650-7 [DOI] [PubMed] [Google Scholar]
  • 34.Cardillo E R, Schmidt G L, Kranjec A, Chatterjee A. Stimulus design is an obstacle course: 560 matched literal and metaphorical sentences for testing neural hypotheses about metaphor. Behav Res Methods. 2010;42(3): 651–64. doi: 10.3758/BRM.42.3.651 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lakoff G, Espenson J, Schwartz A. Master metaphor list: second edition. Cognitive Linguistics Group, University of California Berkeley. 1999. http://araw.mede.uic.edu/~alansz/metaphor/METAPHORLIST.pdf.
  • 36.Carmody S. ngramr: Retrieve and Plot Google n-Gram Data. R package version 1.7.2. 2020. https://CRAN.R-project.org/package=ngramr.
  • 37.Schindler I, Hosoya G, Menninghaus W, Beermann U, Wagner V, Eid M, et al. Measuring aesthetic emotions: A review of the literature and a new assessment tool. PLoS One. 2017;12(6): 1–45. doi: 10.1371/journal.pone.0178899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Miall DS. A Feeling for Literature: Empirical Stylistics. Lang Semiot Stud. 2015;1(2). doi: 10.1515/JLT.2007.023 [DOI] [Google Scholar]
  • 39.Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw. 2015;67(1): 1–48. doi: 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  • 40.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2020. https://www.R-project.org/.
  • 41.Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest Package: Tests in Linear Mixed Effects Models. J Stat Softw. 2017;82(13): 1–26. doi: 10.18637/jss.v082.i13 [DOI] [Google Scholar]
  • 42.Gibbs RW, Kushner JM, Mills WR. Authorial intentions and metaphor comprehension. J Psycholinguist Res. 1991;20(1): 11–30. doi: 10.1007/BF01076917 [DOI] [PubMed] [Google Scholar]
  • 43.Kuiken D, Douglas S. Forms of absorption that facilitate the aesthetic and explanatory effects of literary reading. In: Hakemulder F, Kuijpers M, Tan ES, Doicaru MM, editors. Narrative Absorption. Amsterdam: John Benjamins; 2017. p. 217–249. [Google Scholar]
  • 44.Kuiken D., Douglas S. Living metaphor as the site of bidirectional literary engagement. Sci. Study Lit. 2018;8(1): 47–76. doi: 10.1075/ssol.18004.kui [DOI] [Google Scholar]
  • 45.Al-Azary H, Katz AN. Do metaphorical sharks bite? Simulation and abstraction in metaphor processing. Mem Cogn. 2021;49(3): 557–570. doi: 10.3758/s13421-020-01109-2 [DOI] [PubMed] [Google Scholar]
  • 46.Lai VT, Van Dam W, Conant LL, Binder JR, Desai RH. Familiarity differentially affects right hemisphere contributions to processing metaphors and literals. Front Hum Neurosci. 2015;9: 1–10. doi: 10.3389/fnhum.2015.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Thomas Holtgraves

19 Nov 2021

PONE-D-21-32417Difficulty and pleasure in the comprehension of verb-based metaphor sentences: a behavioral studyPLOS ONE

Dear Dr. Errington,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. As you'll see, the reviewers are quite positive about this study.  And I agree. Your paper is very well-written, the methodology is sound, and the results are potentially interesting. At the same time, the reviewers raised several issues that need to be addressed in a revision. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Below I summarize some of the main concerns which would need to be addressed.  In addition, each of the reviewers' specific comments needs be addressed (or an explanation provided for not doing so).

First, the logic underlying the use of your individual difference variable (SST) needs to be clarified (see Reviewer 1), as well as the appropriateness of your analyses for examining its role (see Reviewer 2).

Second, as noted by Reviewer 1, there is a substantial literature on the relationship between aesthetic preferences and processing fluency/difficulty that is not referenced in your paper. You should include some of this literature in your revision; doing so may help you clarify the meaning of your results.

Third, some of the analyses you report may be less than optimal for the issues you are investigating (see especially Reviewer 2). As noted by Reviewer 2, examining the ease-pleasure relationship separately for the different categories may result in the hypothesized curvilinear relationship being obscured.  Why not examine the ease-pleasure relationship across the entire data set? As well, there appears to be some redundancy in some of the models you test because you simultaneously include comprehension ease as well as metaphor category (which is based, in part, on comprehension ease).

Fourth, you should describe how your sample size was determined and address the issue of power to detect your predicted effects.

Fifth, participants in your experiment provided four ratings.  However, the analyses you report include only two (pleasure and ease).  I assume you’ve undertaken analyses with the other two rating scales (imageability and emotion) and some mention should be made of them (at the very least in a footnote).

Please submit your revised manuscript by Jan 03 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Thomas Holtgraves, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

3. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript. 

4. Please remove your figures from within your manuscript file, leaving only the individual TIFF/EPS image files, uploaded separately.  These will be automatically included in the reviewers’ PDF.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1. It is difficult to determine what kind of “pleasure” the authors are trying to address. The terminology varies: striking, thrilling, affective, pleasurable, etc. There is a substantial psychometric issue lurking here that should be addressed more directly. The authors may find it useful to locate “effective, satisfying, or powerful” (the wording eventually chosen for their ratings; pp. 18-19) within the most comprehensive available survey of aesthetic “emotions” (Schindler et al. 2017). Although the author’s formalist intent is explicit (“how much you liked the way the message was expressed”; l. 434), the kind of “affectiveness” that is at stake also remains obscure.

Schindler, I., Hosoya, G., Menninghaus, W., Beermann, U., Wagner, V., Eid, M., & Scherer, K. R. (2017). Measuring aesthetic emotions: A review of the literature and a new assessment tool. PLOS ONE, 12(6), e0178899. https://doi.org/10.1371/journal.pone.017889

2. The authors indicate that their research is “informed” by Giora’s “optimal innovation hypothesis” (see especially, Giora et al. 2017).

a. One issue in Giora’s work is how to differentiate default literal, default metaphoric, and non-default metaphoric sentences so that metaphoric default salience can be assessed independently of metaphoric non-default salience. It is somewhat disconcerting, then, to learn that the authors did not include the default literal sentences in Study 1 because “we were only interested in metaphor comprehension” (l. 368). Giora is primarily interested in metaphor comprehension, too; so, the reason for ignoring differences between default literal and default metaphoric sentences should be more clearly articulated—and in terms she would understand.

b. A related but separate issue is how to differentiate metaphoric sentences so that, independently of defaultness, metaphoric vehicles can be contrasted according to the extent to which they are “domain-specific” (l. 167). The authors hypothesize that excessive domain specificity may make a metaphor too difficult to be pleasurable. However, emphasizing domain-specificity may oversimplify the problem. For example, Katz and Al-Azary (2017) differentiate (a) the distance between the domain of the topic and the domain of the vehicle; (b) the semantic density of the topic and of the vehicle; and (c) the specificity of the topic or vehicle within their respective domains (“domain-specificity”?). The authors should be encouraged to compare the Katz and Al-Zahry framework with their own. For example, perhaps they can clarify whether their use of word2vec (l. 267) converges with the Katz and Al-Zahry version of computational semantics.

Katz, A. N., & Al-Azary, H. (2017). Principles that promote bidirectionality in verbal metaphor. Poetics Today, 38(1), 35–59. https://doi.org/10.1215/03335372-3716215

c. Contextualizing their hypothesis within the Katz and Alzary (2017) framework may also enable the authors to clarify their rationale for including the Semantic Similarities Test (SST) as an individual differences measure of “the ability to identify conceptual mappings between words” (l. 181). The SST assesses “crystalized verbal intelligence” by evaluating a person’s ability to find similarities between two concepts. Was this measure expected to reflect (a) the capacity to identify similarities between vehicle and topic concepts even when they are “domain-specific”; (b) the capacity to identify similarities between vehicle and topic concepts even when they are from distant domains; or (c) the capacity to identify similarities between vehicle and topic concepts because they are (especially for some individuals) semantically dense? As it stands, the SST is not conceptually well coordinated with the author’s research paradigm.

d. It is a bit unsettling that there is no reference to other literature indicating that aesthetic preferences are determined by processing fluency/difficulty (cf. Reber et al., 2004), including the tradition of proposed curvilinear relations between object complexity and interest/pleasure (e.g., Berlyne, 1971). Research in the latter tradition substantiates how difficult it is to assess curvilinear relations beccause the direction of the relationship differs at different levels of the variables. More to the point, it is not clear where to locate “optimal” and “excessive” metaphors on the hypothesized curvilinear relation with pleasure. That difficulty should at least be mentioned.

3. Perhaps the most innovative aspect of the authors’ design is their attempt to examine novel extensions of familiar metaphors. In this reviewer’s judgment, their procedures for doing so are promising. The procedures used to develop these extended metaphors led to at least two interesting results. First, optimal extensions and excessive extensions were rated as the “most figurative” (l. 358). Second, metaphor extensions were rated as more pleasurable than verb variants, indicating that “innovations that increase … richness can increase pleasure without decreasing difficulty. These results are worth building on in future research efforts.

a. To explain these results, the authors emphasize “textual and situational context.” However, it may be more promising to examine specifically extended metaphors. The authors offer the following hypothesis: “The figurative meaning of the familiar metaphor is likely to be already active when the reader reaches the extension, which can more easily create the pleasurable tension between default and non-default interpretations.” The possibility of examining the interplay (and tension) between a familiar metaphor and a subsequent extension is within these authors’ methodological reach.

b. As their project unfolds, the authors may want to take advantage of Sullivan’s (2019) recent examination of mixed metaphors, some of which are subject to the domain specificity problem that makes them anathema in scholarly circles—but perhaps of particular interest to the authors.

Sullivan, K. (2018). Mixed metaphors: Their use and abuse. Bloomsbury Publishing.

4. A not-so-important note (l. 290): a 7-point rating scale is not a “Likert scale.” This common misuse of that phrase should not be repeated here.

Reviewer #2: In this paper, the authors examine the “optimal innovation hypothesis”, which posits that language is most enjoyable when it evokes a non-default response, but also brings to mind the default response so that the two responses can be weighted against one another. The authors examined this hypothesis by carefully constructing a stimulus set of metaphors and obtaining subjective ratings of pleasurability for these metaphors.

First, the paper was well-written. It discussed an interesting theory within figurative language research. Also, I really enjoyed the writing style. I thought the examples used were interesting and made the paper exciting, and the authors also explained the theories and hypotheses clearly.

Second, I think the stimuli for this study are very nice, and I appreciate that the authors made them available on OSF. The authors carefully considered a variety of factors, including semantic distance measures and frequency. I also totally agree with the authors that there is an overemphasis on examining nominal metaphors, and I appreciate that they developed a set of verb metaphors and metaphor extensions and made these publicly available. I was very impressed by this aspect of the study.

That being said, I have major concern with the analyses the authors conducted to examine their hypotheses (Section 3.2: Results and Discussion). I question the logic of these analyses for a couple reasons. First of all, the authors mention that there was no U-shaped curve observed in the data. However, to examine this, they split the familiar, optimal, and excessive metaphors into separate categories and examined these curves separately (Supplementary Figure). I think this defeats the purpose as this restricts the range of the data in each case. The whole point of the optimal and excessive categories was that they were supposed to increase difficulty, so it doesn’t make sense to me why difficulty would be examined within each of these categories separately. Wouldn’t it make more sense to either 1) look at the relationship between ease and pleasure collapsed across all the metaphors, or 2) compare the means between the categories directly (metaphors in the optimal category should have significantly higher pleasurability ratings than the familiar and excessive categories)? This criticism carries through to the first lme model (Model 1). Both the categories and the continuous ratings of ease were included in the model, but aren’t these essentially the same variable? Therefore, (if I understand lme correctly) if there is a significant effect of a category, this effect is independent of ease since ease is a variable in the model. However, the categories were specifically constructed to vary in ease, so the critical variable in the category is being controlled out of the category. Also, I didn’t quite get the logic of Model 2. It considered individual differences, but the relative differences of ease between the metaphors should still be preserved regardless of individual differences in figurative comprehension, so I’m also not sure how that tests the hypothesis.

In the discussion, the authors state “the optimal metaphor extensions were rated as intermediate in terms of ease of comprehension, but highest in terms of pleasure”. And also, “familiar metaphors were rated both easier and more pleasurable than ‘optimal’ verb variants, which were rated both easier and more pleasurable than ‘excessive’ verb variants”. But these comparisons were not tested statistically, and weren’t really discussed much in the Results. I think the lme models obscure these findings, which are probably the most interesting part of the data.

I think this is a fixable problem though. I would recommend just conducting some standard statistics on the data, namely, an ANOVA between the categories (not considering ease ratings) and a correlation between ease and pleasurability (independent of the categories). It might be beneficial to run some separate ANOVAs to compare both optimal vs. excessive, but also verb vs. extension, since there seemed to be a trend in extensions being more pleasurable. I think this would more directly test your hypotheses and would make the Results section better fit with the Discussion section.

As one additional small point, I didn’t understand the NGram frequency graph on page 14. I think the axes need to be labelled or else the graph needs to be better explained in the caption.

Overall, I think this paper has a lot of potential. The stimuli were very carefully constructed, and the data seems quite interesting. However, in my opinion, I think the analyses were somewhat inappropriate for the reasons stated above.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Feb 11;17(2):e0263781. doi: 10.1371/journal.pone.0263781.r002

Author response to Decision Letter 0


3 Jan 2022

Academic Editor Comments

1) the logic underlying the use of your individual difference variable (SST) needs to be clarified (see Reviewer 1), as well as the appropriateness of your analyses for examining its role (see Reviewer 2).

We thank the editor and both reviewers for the suggestion to further clarify the Semantic Similarities Test and its analysis. We have explained the importance of the SST as a means of ascertaining the effects of an individual’s capacity for metaphorical thinking, which we hypothesised would impact the ease with which a given individual would be able to resolve all the variations of metaphors with which they were presented. This was explained with regard to Reviewer 1’s comments (ll. 224–233). This hypothesis was borne out in our study, which found those with higher SST scores to have greater ease in comprehending all categories of metaphor; surprisingly, however, those with higher SST scores did not show greater pleasure in comprehending the ‘optimal extension’ condition, nor did they show an increase in pleasure in comprehending the ‘excessive extension’, which we would have expected if their increased metaphor comprehension capacities had only moved the apex of the inverted U-shaped relationship toward the more challenging phrases. This effect is addressed in our discussion (ll. 693–707)

2) as noted by Reviewer 1, there is a substantial literature on the relationship between aesthetic preferences and processing fluency/difficulty that is not referenced in your paper. You should include some of this literature in your revision; doing so may help you clarify the meaning of your results.

We greatly appreciate reviewer 1’s suggested literature regarding aesthetic preference and processing fluency, and we found it quite helpful in framing our study (see additions ll. 113¬–132).

3) some of the analyses you report may be less than optimal for the issues you are investigating (see especially Reviewer 2). As noted by Reviewer 2, examining the ease-pleasure relationship separately for the different categories may result in the hypothesized curvilinear relationship being obscured. Why not examine the ease-pleasure relationship across the entire data set? As well, there appears to be some redundancy in some of the models you test because you simultaneously include comprehension ease as well as metaphor category (which is based, in part, on comprehension ease).

We appreciate the recommendation for a simpler analysis approach, which we have added to the manuscript. A more detailed description is included below in our response to the reviewer’s comment. We have also retained our original analyses because we believe they provide useful additional insight into the results.

4) you should describe how your sample size was determined and address the issue of power to detect your predicted effects.

We have clarified (ll. 458–463) that due to the novelty of this stimulus set and our experiment design, there was not a strong basis for predicting an effect size, which is necessary for a proper power analysis. Instead, we determined the sample size based on sample sizes from prior metaphor comprehension studies and practical limitations. We hope that, by making available our full stimulus set and analysis code, this study will provide the basis for future power calculations.

5) participants in your experiment provided four ratings. However, the analyses you report include only two (pleasure and ease). I assume you’ve undertaken analyses with the other two rating scales (imageability and emotion) and some mention should be made of them (at the very least in a footnote).

Those other two ratings (imageability and emotion) were included for later exploratory analyses, which have not been conducted yet. Because we did not have clear hypotheses about them and they are not directly related to the hypotheses evaluated here, we feel that they are outside the scope of this manuscript and have not added them. We only mention that those ratings were collected in the Methods for the purpose of transparency.

Reviewer 1

1. It is difficult to determine what kind of “pleasure” the authors are trying to address. The terminology varies: striking, thrilling, affective, pleasurable, etc. There is a substantial psychometric issue lurking here that should be addressed more directly. The authors may find it useful to locate “effective, satisfying, or powerful” (the wording eventually chosen for their ratings; pp. 18-19) within the most comprehensive available survey of aesthetic “emotions” (Schindler et al. 2017). Although the author’s formalist intent is explicit (“how much you liked the way the message was expressed”; l. 434), the kind of “affectiveness” that is at stake also remains obscure.

Schindler, I., Hosoya, G., Menninghaus, W., Beermann, U., Wagner, V., Eid, M., & Scherer, K. R. (2017). Measuring aesthetic emotions: A review of the literature and a new assessment tool. PLOS ONE, 12(6), e0178899. https://doi.org/10.1371/journal.pone.017889

This point is well noted, and we thank the reviewer for these recommendations. We have sought to clarify (ll. 509–517) that the increase in aesthetic response (which we have, perhaps overly simply, termed ‘pleasure’) is more precisely an increase of affect in a broad sense rather than one or another particular ‘aesthetic emotion’ like those surveyed by Schindler et al. In our case, we sought to test whether whatever ‘content’ feeling was elicited by a familiar sentence could be increased or intensified by the variations.

2. The authors indicate that their research is “informed” by Giora’s “optimal innovation hypothesis” (see especially, Giora et al. 2017).

a) One issue in Giora’s work is how to differentiate default literal, default metaphoric, and non-default metaphoric sentences so that metaphoric default salience can be assessed independently of metaphoric non-default salience. It is somewhat disconcerting, then, to learn that the authors did not include the default literal sentences in Study 1 because “we were only interested in metaphor comprehension” (l. 368). Giora is primarily interested in metaphor comprehension, too; so, the reason for ignoring differences between default literal and default metaphoric sentences should be more clearly articulated—and in terms she would understand.

We again thank the reviewer for their probing questions, and we have sought to explain (ll. 429–439) that, where for Giora et al., the ‘default’ meaning is primarily non-metaphoric with a variation that introduces a metaphoric/figurative component (one of her examples is ‘A piece of paper’ which becomes ‘A peace of paper’), in our case, the ‘default’ meaning from the phrase was purely metaphoric. In ‘I run for office’, there is (according to studies and surveyed in Holyoak and Stamenkovic 2017) little recourse to any literal running, whereas we hypothesise that in ‘I dash for office’ there may be a dual activation of both the metaphorical meaning (‘I seek an elected office’) and some aspect of a literal meaning (‘I run quickly to achieve something physical). In this case the familiar metaphors provide an optimal control condition because they can be matched to the variations in terms of general meaning, figurativeness, and basic psycholinguistic properties (length, word frequency, etc.).

b) A related but separate issue is how to differentiate metaphoric sentences so that, independently of defaultness, metaphoric vehicles can be contrasted according to the extent to which they are “domain-specific” (l. 167). The authors hypothesize that excessive domain specificity may make a metaphor too difficult to be pleasurable. However, emphasizing domain-specificity may oversimplify the problem. For example, Katz and Al-Azary (2017) differentiate (a) the distance between the domain of the topic and the domain of the vehicle; (b) the semantic density of the topic and of the vehicle; and (c) the specificity of the topic or vehicle within their respective domains (“domain-specificity”?). The authors should be encouraged to compare the Katz and Al-Zahry framework with their own. For example, perhaps they can clarify whether their use of word2vec (l. 267) converges with the Katz and Al-Zahry version of computational semantics.

Katz, A. N., & Al-Azary, H. (2017). Principles that promote bidirectionality in verbal metaphor. Poetics Today, 38(1), 35–59. https://doi.org/10.1215/03335372-3716215

The reviewer’s suggestion to relate our work to that of Katz and Al-Azary is much appreciated and we have expanded our discussion regarding ‘domain-specificity’ to incorporate some of the ‘semantic density’ language offered by Katz and Al-Azary (ll. 199–208). We have specified that the word2vec norming does indeed converge with their computational methods for determining that density. This is also noted in the ‘Objective Measures’ subsection of the ‘Stimulus Development’ section (ll. 321–322).

c) Contextualizing their hypothesis within the Katz and Alzary (2017) framework may also enable the authors to clarify their rationale for including the Semantic Similarities Test (SST) as an individual differences measure of “the ability to identify conceptual mappings between words” (l. 181). The SST assesses “crystalized verbal intelligence” by evaluating a person’s ability to find similarities between two concepts. Was this measure expected to reflect (a) the capacity to identify similarities between vehicle and topic concepts even when they are “domain-specific”; (b) the capacity to identify similarities between vehicle and topic concepts even when they are from distant domains; or (c) the capacity to identify similarities between vehicle and topic concepts because they are (especially for some individuals) semantically dense? As it stands, the SST is not conceptually well coordinated with the author’s research paradigm.

Further discussion of the Semantic Similarities Test and its hypothesised and tested role in affecting metaphor phrase processing ease/difficulty has been incorporated (ll. 224–233), building upon discussion of Katz and Al-Alzary’s frameworks (see point above).

d) It is a bit unsettling that there is no reference to other literature indicating that aesthetic preferences are determined by processing fluency/difficulty (cf. Reber et al., 2004), including the tradition of proposed curvilinear relations between object complexity and interest/pleasure (e.g., Berlyne, 1971). Research in the latter tradition substantiates how difficult it is to assess curvilinear relations because the direction of the relationship differs at different levels of the variables. More to the point, it is not clear where to locate “optimal” and “excessive” metaphors on the hypothesized curvilinear relation with pleasure. That difficulty should at least be mentioned.

The reviewer’s suggestion to relate our work more explicitly to the history of proposed curvilinear relations between complexity and interest/pleasure is greatly appreciated. An expanded discussion of this history (ll. 113–132) has been included, leading to further discussions of the SST (ll. 216–233), in which we reference how our hypotheses contradict the predominantly linear hypotheses regarding processing fluency (e.g., Reber et al., 2004), wherein the difficulties in locating our stimulus categories along the hypothesized inverted U-shape relationship are indicated in response to a previous suggestion.

3. Perhaps the most innovative aspect of the authors’ design is their attempt to examine novel extensions of familiar metaphors. In this reviewer’s judgment, their procedures for doing so are promising. The procedures used to develop these extended metaphors led to at least two interesting results. First, optimal extensions and excessive extensions were rated as the “most figurative” (l. 358). Second, metaphor extensions were rated as more pleasurable than verb variants, indicating that “innovations that increase … richness can increase pleasure without decreasing difficulty. These results are worth building on in future research efforts.

We thank the reviewer for this positive appraisal of our procedures and results, and their encouragement for future research efforts.

a) To explain these results, the authors emphasize “textual and situational context.” However, it may be more promising to examine specifically extended metaphors. The authors offer the following hypothesis: “The figurative meaning of the familiar metaphor is likely to be already active when the reader reaches the extension, which can more easily create the pleasurable tension between default and non-default interpretations.” The possibility of examining the interplay (and tension) between a familiar metaphor and a subsequent extension is within these authors’ methodological reach.

b) As their project unfolds, the authors may want to take advantage of Sullivan’s (2019) recent examination of mixed metaphors, some of which are subject to the domain specificity problem that makes them anathema in scholarly circles—but perhaps of particular interest to the authors.

Sullivan, K. (2018). Mixed metaphors: Their use and abuse. Bloomsbury Publishing.

We wish again to express our thanks the reviewer for these two thoughtful suggestions for future areas of investigation.

4. A not-so-important note (l. 290): a 7-point rating scale is not a “Likert scale.” This common misuse of that phrase should not be repeated here.

We have removed the term ‘Likert’ as requested.

Reviewer 2

In this paper, the authors examine the “optimal innovation hypothesis”, which posits that language is most enjoyable when it evokes a non-default response, but also brings to mind the default response so that the two responses can be weighted against one another. The authors examined this hypothesis by carefully constructing a stimulus set of metaphors and obtaining subjective ratings of pleasurability for these metaphors.

First, the paper was well-written. It discussed an interesting theory within figurative language research. Also, I really enjoyed the writing style. I thought the examples used were interesting and made the paper exciting, and the authors also explained the theories and hypotheses clearly.

Second, I think the stimuli for this study are very nice, and I appreciate that the authors made them available on OSF. The authors carefully considered a variety of factors, including semantic distance measures and frequency. I also totally agree with the authors that there is an overemphasis on examining nominal metaphors, and I appreciate that they developed a set of verb metaphors and metaphor extensions and made these publicly available. I was very impressed by this aspect of the study.

That being said, I have major concern with the analyses the authors conducted to examine their hypotheses (Section 3.2: Results and Discussion). I question the logic of these analyses for a couple reasons.

1) First of all, the authors mention that there was no U-shaped curve observed in the data. However, to examine this, they split the familiar, optimal, and excessive metaphors into separate categories and examined these curves separately (Supplementary Figure). I think this defeats the purpose as this restricts the range of the data in each case. The whole point of the optimal and excessive categories was that they were supposed to increase difficulty, so it doesn’t make sense to me why difficulty would be examined within each of these categories separately. Wouldn’t it make more sense to either 1) look at the relationship between ease and pleasure collapsed across all the metaphors, or 2) compare the means between the categories directly (metaphors in the optimal category should have significantly higher pleasurability ratings than the familiar and excessive categories)? This criticism carries through to the first lme model (Model 1). Both the categories and the continuous ratings of ease were included in the model, but aren’t these essentially the same variable? Therefore, (if I understand lme correctly) if there is a significant effect of a category, this effect is independent of ease since ease is a variable in the model. However, the categories were specifically constructed to vary in ease, so the critical variable in the category is being controlled out of the category. Also, I didn’t quite get the logic of Model 2. It considered individual differences, but the relative differences of ease between the metaphors should still be preserved regardless of individual differences in figurative comprehension, so I’m also not sure how that tests the hypothesis.

In the discussion, the authors state “the optimal metaphor extensions were rated as intermediate in terms of ease of comprehension, but highest in terms of pleasure”. And also, “familiar metaphors were rated both easier and more pleasurable than ‘optimal’ verb variants, which were rated both easier and more pleasurable than ‘excessive’ verb variants”. But these comparisons were not tested statistically, and weren’t really discussed much in the Results. I think the lme models obscure these findings, which are probably the most interesting part of the data.

I think this is a fixable problem though. I would recommend just conducting some standard statistics on the data, namely, an ANOVA between the categories (not considering ease ratings) and a correlation between ease and pleasurability (independent of the categories). It might be beneficial to run some separate ANOVAs to compare both optimal vs. excessive, but also verb vs. extension, since there seemed to be a trend in extensions being more pleasurable. I think this would more directly test your hypotheses and would make the Results section better fit with the Discussion section.

We thank the reviewer for suggesting a simpler and more straightforward analysis strategy. Looking at the ease-pleasure relationship collapsed across qualitatively different types of metaphors would create the possibility of Simpson’s Paradox, so we have followed the second suggestion and now include simpler analyses of (i) Ease differences between conditions (to validate the experimental manipulation) and (ii) Pleasure differences between conditions (to test the critical hypotheses). We kept the analyses within the LME framework, though these simpler analyses are analogous to a repeated-measures ANOVA (LME allows more flexible specification of crossed random effects of participants and items). These analyses show monotonic differences in ease (Familiar > Optimal > Excessive) for both verb and extension manipulations. For pleasure ratings, there is a U-shape for extensions, but not for verbs.

Although the metaphor types differ in average ease of comprehension, there is substantial overlap and the range of ease values is very similar across the different types. So in addition to the simpler model suggested by the reviewer, we tested the more complex model that includes both variables (ease and metaphor type) – it allows testing whether the ease-pleasure relationship holds within categories and differs between categories.

As one additional small point, I didn’t understand the NGram frequency graph on page 14. I think the axes need to be labelled or else the graph needs to be better explained in the caption.

Thank for bringing this to our attention. We have elaborated the explanation in the Figure 1 caption.

Overall, I think this paper has a lot of potential. The stimuli were very carefully constructed, and the data seems quite interesting. However, in my opinion, I think the analyses were somewhat inappropriate for the reasons stated above.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Thomas Holtgraves

21 Jan 2022

PONE-D-21-32417R1Difficulty and pleasure in the comprehension of verb-based metaphor sentences: a behavioral studyPLOS ONE

Dear Dr. Errington,

Thank you for submitting your revised manuscript to PLOS ONE.  I sent your revision to the two original reviewers and both believed your revisions to be quite responsive to their comments.  And I agree. However, they also raise a couple of final issues that I'd like you to address before your manuscript can be accepted for publication.  Both reviewers ask for additional clarification of Katz and Al-Azary's (2017) Semantic Neighborhood Density (SND) measure, and Reviewer 1 suggests providing some guidance for future research regarding the psychometric issues involved in assessing your construct.

Please submit your revised manuscript by Mar 07 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Thomas Holtgraves, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

********** 

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

********** 

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I indicated that all of the present reviewer's comments have been addressed, but I still have qualms about how effectively two of those comments were addressed. First, although it is not possible to undo the difficulties created by a psychometrically thin and conceptually weak articulation of "affectiveness" (l. 94), "meaningfulness" (l. 639) etc., the authors would do well to say something about the steps that might be taken in future efforts to provide psychometric substance to the construct they are trying to assess. References to Shklovsky and Miall are not enough (e.g., is "strikingness" the same as "affectiveness"? how would we know?). Very briefly, please say how this psychometric issue might be addressed in future studies. Second, it is not clear that the authors have precisely coordinated their use of word2vec with the the DIFFERENTIATION that Katz and Al-Zary (2017) offer between (a) the distance between the vehicle and topic semantic neighborhoods and (b) the density of the separately considered vehicle and topic (l. 301). Is it possible to more precise about how these two aspects of metaphoric semantic structures are related to the procedures they rely on?

Reviewer #2: The authors have addressed the points I brought up in the last review. I think the manuscript looks excellent and will be very interesting to the PLoS ONE readership.

There is one small point of clarification about Katz and Al-Azary's (2017) Semantic Neighborhood Density (SND) measure. It is not exactly synonymous to semantic distance where two terms are compared, but rather, it is a measure of the average distance from a single term to its own neighbors. So, for a metaphor like "ski for office", SND doesn't actually measure the overlap between ski and office. Rather, it measures the semantic density of "ski" on its own and "office" on its own. If the sentence was "ski downhill", the SND for "ski" is still the exact same. It may indirectly get at overlap -- if "ski" and "office" both have high SND, then they may have more overlapping near neighbors, but SND on its own is not a measure of this. Just wanted to clarify this because the way it is written in the manuscript, it sounds like SND is a measure of the overlapping semantic neighbors between two terms.

They explain Semantic Neighborhood Density in a little more detail in this paper:

Al-Azary, H., McAuley, T., Buchanan, L., & Katz, A. N. (2019). Semantic processing of metaphor: A case-study of deep dyslexia. Journal of Neurolinguistics, 51, 297-308.

Thank you very much for the opportunity to review this work!

********** 

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Feb 11;17(2):e0263781. doi: 10.1371/journal.pone.0263781.r004

Author response to Decision Letter 1


25 Jan 2022

Reviewer #1: I indicated that all of the present reviewer's comments have been addressed, but I still have qualms about how effectively two of those comments were addressed. First, although it is not possible to undo the difficulties created by a psychometrically thin and conceptually weak articulation of "affectiveness" (l. 94), "meaningfulness" (l. 639) etc., the authors would do well to say something about the steps that might be taken in future efforts to provide psychometric substance to the construct they are trying to assess. References to Shklovsky and Miall are not enough (e.g., is "strikingness" the same as "affectiveness"? how would we know?). Very briefly, please say how this psychometric issue might be addressed in future studies.

Review 1’s suggestions to further explain how, in future experiments, might refine our definition of ‘aesthetic pleasure’, which were, following Giora et al., perhaps overbroad, are much appreciated. We have, in response, added a final note to our Discussion section indicating the possibilities of using both Schindler et al.’s survey and Aesthetic Emotions Scale and Kuiken and Douglas’s Absorption-like States Questionnaire to clarify this definition and refine our questions for future experiments. See lines 747–765.

Second, it is not clear that the authors have precisely coordinated their use of word2vec with the the DIFFERENTIATION that Katz and Al-Zary (2017) offer between (a) the distance between the vehicle and topic semantic neighborhoods and (b) the density of the separately considered vehicle and topic (l. 301). Is it possible to more precise about how these two aspects of metaphoric semantic structures are related to the procedures they rely on?

Reviewer #2: The authors have addressed the points I brought up in the last review. I think the manuscript looks excellent and will be very interesting to the PLoS ONE readership.

There is one small point of clarification about Katz and Al-Azary's (2017) Semantic Neighborhood Density (SND) measure. It is not exactly synonymous to semantic distance where two terms are compared, but rather, it is a measure of the average distance from a single term to its own neighbors. So, for a metaphor like "ski for office", SND doesn't actually measure the overlap between ski and office. Rather, it measures the semantic density of "ski" on its own and "office" on its own. If the sentence was "ski downhill", the SND for "ski" is still the exact same. It may indirectly get at overlap -- if "ski" and "office" both have high SND, then they may have more overlapping near neighbors, but SND on its own is not a measure of this. Just wanted to clarify this because the way it is written in the manuscript, it sounds like SND is a measure of the overlapping semantic neighbors between two terms.

They explain Semantic Neighborhood Density in a little more detail in this paper:

Al-Azary, H., McAuley, T., Buchanan, L., & Katz, A. N. (2019). Semantic processing of metaphor: A case-study of deep dyslexia. Journal of Neurolinguistics, 51, 297-308.

Thank you very much for the opportunity to review this work!

We thank both the reviewers for their requests and suggestions for additional clarification on the potential relevance of semantic density (SND) to our study. We have removed our erroneous suggestion that our use of word2vec aligned with semantic density in our ‘Objective Measures’ subsection (l. 320–322 – formerly l. 301), as this was truly designed to measure distance between nouns and their interactive verbs within each condition, and between critical verbs between conditions.

A brief discussion of Al-Azary and Buchanan’s (2017) study of the influence of SND’s on novel metaphor comprehension is included in the introduction section (ll. 245–260). This replaces the less-appropriate discussion of Katz and Al-Azary (2017) (formerly l. 188). This article’s examination of SND and the bidirectionality of metaphors seems, upon reflection, to be less directly relevant than the Al-Azary and Buchanan’s (2017) examination of SND, concreteness, and metaphor comprehensibility.

A discussion of the possible influence of SND and difficulties of calculating and comparing the SND in our verb-metaphor stimulus set has been added to the Discussion section, lines 688–707.

Attachment

Submitted filename: Response to Reviewers2.docx

Decision Letter 2

Thomas Holtgraves

27 Jan 2022

Difficulty and pleasure in the comprehension of verb-based metaphor sentences: a behavioral study

PONE-D-21-32417R2

Dear Dr. Errington,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Thomas Holtgraves, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Thomas Holtgraves

2 Feb 2022

PONE-D-21-32417R2

Difficulty and pleasure in the comprehension of verb-based metaphor sentences: A behavioral study

Dear Dr. Errington:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Thomas Holtgraves

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. In all panels, the points correspond to the behavioural data and the lines correspond to the model fits described in the main text.

    The key observation is that none of the panels suggest a U-shape in the behavioural data and the linear models appear to fit the data reasonably well. Left column shows results for familiar metaphors, middle column shows results for optimal verb and optimal extension conditions, right column shows results for excessive verb and excessive extension condition. Top row: relationship between ease of comprehension and pleasure (Model 1). Middle row: relationship between SST Score (semantic knowledge) and pleasure (Model 2). Bottom row: relationship between SST Score (semantic knowledge) and ease of comprehension (Model 3).

    (TIF)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers2.docx

    Data Availability Statement

    All stimulus sets, study design information, and raw data are available from the project Open Science Framework database (https://osf.io/hjcyd/) DOI: (10.17605/OSF.IO/HJCYD).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES