Statistical and methodological problems with concreteness and other semantic variables: A list memory experiment case study

Lewis Pollock

doi:10.3758/s13428-017-0938-y

. 2017 Jul 13;50(3):1198–1216. doi: 10.3758/s13428-017-0938-y

Statistical and methodological problems with concreteness and other semantic variables: A list memory experiment case study

Lewis Pollock ^1,^✉

PMCID: PMC5990559 PMID: 28707214

Abstract

The purpose of this article is to highlight problems with a range of semantic psycholinguistic variables (concreteness, imageability, individual modality norms, and emotional valence) and to provide a way of avoiding these problems. Focusing on concreteness, I show that for a large class of words in the Brysbaert, Warriner, and Kuperman (Behavior Research Methods 46: 904–911, 2013) concreteness norms, the mean concreteness values do not reflect the judgments that actual participants made. This problem applies to nearly every word in the middle of the concreteness scale. Using list memory experiments as a case study, I show that many of the “abstract” stimuli in concreteness experiments are not unequivocally abstract. Instead, they are simply those words about which participants tend to disagree. I report three replications of list memory experiments in which the contrast between concrete and abstract stimuli was maximized, so that the mean concreteness values were accurate reflections of participants’ judgments. The first two experiments did not produce a concreteness effect. After I introduced an additional control, the third experiment did produce a concreteness effect. The article closes with a discussion of the implications of these results, as well as a consideration of variables other than concreteness. The sensorimotor experience variables (imageability and individual modality norms) show the same distribution as concreteness. The distribution of emotional valence scores is healthier, but variability in ratings takes on a special significance for this measure because of how the scale is constructed. I recommend that researchers using these variables keep the standard deviations of the ratings of their stimuli as low as possible.

Keywords: Concreteness, Semantic variables, List memory, Methodology

Word concreteness has become one of the most studied variables in the psycholinguistic literature. Since Paivio, Yuille, and Madigan (1968) published one of the first large-scale databases of word concreteness norms, “concreteness effects” have emerged in a variety of investigations of various cognitive processes, and a range of theories have been proposed in an attempt to explain these effects. Independent teams of researchers operating over a period of decades have repeatedly shown that concrete words show a processing advantage over abstract words in certain experimental paradigms. For example, concrete words are easier to remember than abstract words (Allen & Hulme, 2006; Miller & Roodenrys, 2009; Romani, McAlpine, & Martin, 2008; Walker & Hulme, 1999), are easier to make associations with (de Groot, 1989), and are more easily and more thoroughly defined in dictionary definition tasks (Sadoski, Kealy, Goetz, & Paivio, 1997). Historically, it was claimed that concrete words are responded to more quickly than abstract words in lexical decision tasks (Bleasdale, 1987; James, 1975; Kroll & Merves, 1985), although more recent experiments have shown no difference (Brysbaert, Stevens, Mandera, & Keuleers, 2016), or even that abstract words might have an advantage after various other variables have been accounted for (Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011). However, even an abstractness advantage in lexical decision points to the utility of word concreteness as a psycholinguistic variable.

Brain-imaging techniques have also been employed to determine whether the neural systems underpinning concrete words and abstract words are distinct (Binder, Westbury, McKiernan, Possing, & Medler, 2005; Dhond, Witzel, Dale, & Halgren, 2007; Kounios & Holcomb, 1994; Pexman, Hargreaves, Edwards, Henry, & Goodyear, 2007; Sabsevitz, Medler, Seidenberg, & Binder, 2005). The general consensus from these brain-imaging studies is that there is evidence of a neuroanatomical difference in the processing of concrete versus abstract words.

Psychologists are clearly heavily invested in the investigation of word concreteness, and for good reasons. If there are properties that define a cognitively relevant ontology of concepts, concreteness seems like a good candidate: Something about what constitutes the concept of “elephants” (highly concrete) is probably different from what constitutes the concept of “paradoxes” (highly abstract). However, in this article I will highlight a problem with the concreteness measure, based on a simple statistical summary of the Brysbaert, Warriner, and Kuperman (2013) concreteness norms. I report three replication experiments that together suggest that this problem is not fatal to concreteness research, but also that it should be acknowledged when researchers design their stimuli. I also show that the same problem applies to other variables in semantic databases, such as imageability (Cortese & Fugett, 2004; Schock, Cortese, & Khanna, 2012) and individual modality norms (Lynott & Connell, 2012).

Word concreteness

A word’s concreteness rating is derived by asking a group of participants to rate that word for concreteness on a Likert scale. A low score indicates that a word is highly “abstract,” whereas a high rating indicates that a word is highly “concrete.” The mean value of all participants’ ratings is taken to be an approximation of a word’s position on an abstract-concrete continuum. I will now develop some theoretical concerns about the validity of traditional concreteness norms before turning to a statistical analysis of the Brysbaert et al. (2013) database. Consider the job a participant is being asked to do when she is told to rate a word between, say, 1 and 5 on a scale of concreteness. She is told that “concrete words are experienced by the senses,” whereas abstract words are not (Paivio et al., 1968). For some words, the interpretation of traditional concreteness norming instructions is relatively straightforward. A participant who is presented with the word “apple” is likely to have seen, touched, smelled, and tasted apples throughout the course of their life, and will unproblematically assign “apple” a high concreteness rating. Similarly, a participant that is presented with the word “serendipity” is likely to reason that since serendipity is a loose association between some coincidental, nonspecified events, and is not something that affords direct sensory experience, the word “serendipity” should be assigned a low concreteness rating. However, what are the properties that a word/concept should have in order for it to be assigned a mid-scale rating? It is difficult to formulate a coherent approach to this task: Can an entity or idea be “half-seen” or “half-touched”? What does it mean to have intermediate sensory experience of an entity or idea? That is to ask: What is a participant telling us about a word when they rate it a 3 out of 5? They could mean any one of the following:

Adding up all of my sensory experience of this object across all five of the sensory modalities, I realize that I have seen and heard it, but never touched, smelled, or tasted it. So I suppose I’ll rate it a 3.
One interpretation of this word brings to mind something that cannot be directly experienced, whereas a different interpretation of this word brings to mind something that can be directly experienced. So I suppose I’ll rate it a 3.
Sometimes I associate sensory experience with this word, but sometimes I don’t. So I suppose I’ll rate it a 3.

It is certainly possible to imagine more potential approaches, and there is no empirical basis for selecting one of these approaches over another. Furthermore, it is likely that different participants will generate different interpretations for many of the words in any list of words to be normed. When a participant sees the letter string < deed > presented in isolation, there is no way that a researcher can control for the fact that half of the participants may interpret < deed > as referring to a document associated with proof of property ownership (high concreteness value?), and the other half may interpret it as referring to some unspecified action, perhaps involving some element of heroism (low concreteness value?). Consequently, for a number of words it is just not clear what word/concept the mean concreteness rating is supposed to reflect.

This point on its own might be enough to motivate the avoidance of words with a mean value in the middle of a concreteness–abstractness scale. Given that it is not clear what it is that participants are even telling us when they rate a word a 3, we might also wonder how often participants actually use values from the middle of the concreteness scale when making their judgments. Recently, Brysbaert et al. (2013) provided a concreteness norm database of 40,000 English words, which dwarfs the previously popular MRC database used in most studies (Coltheart, 1981). This new, larger database allows a statistical analysis of the distributions of concreteness norms across a much larger section of the English lexicon. I now present this analysis and use it to develop the concerns raised in this section.

Brysbaert et al. (2013) concreteness norms

Brysbaert et al. (2013) collected a new set of concreteness norms for 40,000 English words. Groups of approximately 25 participants rated subsets of the whole list of 40,000 words on a concreteness scale of 1 (very abstract) to 5 (very concrete). The participants (n = 4,237) came from a range of ages, with approximately one third between 17 and 25 years old, and two thirds between 26 and 65. The mean value of a group of participants’ judgments about the concreteness of a stimulus word was assumed to be a useful approximation of that word’s position on a hypothesized concrete–abstract continuum. I shall now argue that this is not necessarily the case. The standard deviation of a dataset is a measure of the average distance between all data points in that dataset and the mean value of all data points in the dataset. If every participant rates a word as a 1 (highly abstract), then that word’s concreteness rating will have a standard deviation of 0. However, if half of the participants rated a word as a 1, but the other half rated the word as a 5 (highly concrete), that word would have a mean concreteness rating of 3 but a standard deviation of 2. In Likert scale norming tasks, the standard deviation of a set of ratings is therefore a blunt index of the extent to which participants agreed with each other about how a word should be rated.

If a dataset contains 25 numbers (in our case, 25 individual concreteness judgments), all of which are integers between 1 and 5, then there are a finite number of possible combinations of means and standard deviations for that dataset. Figure 1 below plots all of these possible combinations:

Fig. 1 — Theoretically possible locations for words rated between 1 and 5 by 25 different participants

Note how, at the extreme ends of the x-axis, only a standard deviation of 0 is possible, because for a mean value to be 1 or 5, all 25 participants must have rated a word as 1 or 5, respectively. However, in the middle of the scale the disagreement that is theoretically possible increases, reaching a peak at mean value ~3, standard deviation ~2. Crucially, it is still theoretically possible for a data point to occur with a mean value located in the middle of the scale, but with a relatively low standard deviation. That is, it is still clearly theoretically possible for participants to more or less consistently agree that a word is of intermediate concreteness.

Now, consider Fig. 2, which plots the actual mean concreteness value and the standard deviation of every noun in the Brysbaert et al. (2013) concreteness norm dataset (n = 14,592) over the top of the theoretically possible combinations depicted in Fig. 1.

The pattern is striking. At the extreme concrete end of the scale, many items have high concreteness ratings and relatively low standard deviations, indicating that participants more or less agreed in their judgments about how to rate these words. At the extreme abstract end of the scale, there are likewise words with low concreteness ratings and relatively low standard deviations, although not to the same extent as at the extreme concrete end. However, in the middle of the scale there is an obvious rise in the standard deviation. Only a handful of words have a mean value near 3 and a standard deviation even slightly below 1. Indeed, a large class of words have a standard deviation well over 1, ranging from mean values of 1.5 to 4.5.

This indicates that for a great number of items, participants were not agreeing in their judgments of how concrete a stimulus word was. At mean values of 2 and 4 there are many cases of standard deviations above 1. Remember that ratings on this scale can only take integer values between 1 and 5. This means that for many of the words with a mean value of 2 or 4, some participants must have judged these words as belonging at the opposite end of the concreteness scale from the position where the mean value suggests the word belongs. This phenomenon is problematic for the assumption that concreteness should be treated as a continuous variable. This is because in a vast number of cases, participants’ judgments tended not to be continuous; instead, they tended to be binary: Participants were using values of 1, 2, 4, and 5 in producing these concreteness norms, and avoided using 3. Furthermore, in many cases participants were judging a word as a 1 (totally abstract), whereas others were judging that same word as a 4 (somewhat concrete).

Given these methodological issues, it might seem surprising that concreteness effects are so widely reported. If measurements for a large section of the hypothesized concreteness spectrum are actually procedural artifacts, it is then unclear what phenomenon it is that concreteness effects are actually indexing. One potential explanation is that generally, when investigating the effect of a variable, researchers try to choose stimuli that maximize a change in this variable, in order to generate the maximum possible effect. It is therefore possible that empirical concreteness research might not suffer too badly from the problem of binary disagreements concerning midscale items, because researchers will have aimed to pick stimuli from the extreme ends of the scale, and these polar items are less subject to disagreement.

However, if it turns out that many experimental stimuli do suffer from the disagreement phenomenon, this poses an explanatory problem concerning the evidence in favor of processing differences between abstract and concrete items. The typical finding is that there are processing advantages for concrete items relative to abstract items, and the typical explanation of this finding is that concrete and abstract items have different neurologically instantiated formats and/or structural relationships. If a significant number of the stimuli included in an abstract or concrete experimental condition actually come from the middle of the concreteness scale, then the typical claim that there are processing differences between concrete and abstract items is no longer supported by the data. This is because words from the middle of the scale must have high standard deviations. This means that only half of the participants who produced the concreteness measure for that word judged it to be abstract, and the other half judged it to be concrete. Therefore, there are no empirical grounds for calling these words “concrete” or “abstract” in the first place.

Stimuli in concreteness experiments: A case study of list memory paradigms

In this section I plot the stimuli featured in four list memory experimental studies against the entire Brysbaert et al. (2013) database. These studies are Allen and Hulme (2006), Walker and Hulme (1999), Romani et al. (2008), and Miller and Roodenrys (2009). We should note a few things. First, although the replication experiments that I report below feature noun stimuli, and most studies under discussion here also featured nouns, occasionally their stimulus sets featured other word classes alongside nouns. In the case of Allen and Hulme, many of the stimuli in the abstract condition were not nominal. Therefore, to display the maximum number of stimuli for all experiments, I have plotted the entire Brysbaert et al. (2013) database (n = 40,000) instead of just the nominal subsection of it. Not all of the stimuli featured in all experiments appeared in the Brysbaert et al. norms, and these stimuli have been omitted from the analysis. Second, the pattern of means and standard deviations is absolutely unchanged when we compare the entire Brysbaert et al. database with the noun subsection of it.

Now, consider Fig. 3. The stimuli featured in Romani et al. (2008) best exemplify the problem, although the intention here is not to single out Romani et al. or any of the other authors under discussion for criticism. The analysis I present here would have been almost impossible to carry out at the time that these experiments were conducted, given that the Brysbaert et al. concreteness database was only published in 2013. In brief, the problem is that the concrete words tend to have low standard deviations, whereas the abstract stimuli tend to have high standard deviations and to be drawn from the middle of the scale, rather than the unequivocally abstract part of the scale. This is potentially problematic for the validity of Romani et al.’s conclusions regarding concreteness effects, because many of the stimuli that made up their abstract stimuli were not unequivocally abstract. For the standard deviations of many of the “abstract” stimuli to be as high as they are—in many cases, well above 1—many participants must have been judging those words to be concrete during the Brysbaert et al. (2013) norming process. Some of the abstract stimuli have standard deviations approaching the theoretical maximum of 2, indicating maximum disagreement among participants about whether that word is concrete or abstract. To reiterate: Participants could only apply integer values in making their judgments. Therefore, even if a word has a mean concreteness rating of approximately 2, but also a standard deviation of the rating above 1, that means that some participants must have been crossing scale halves in making their judgments. Ultimately, it is not clear what comparison is actually being made here. The concrete stimulus lists were more or less unproblematically concrete. However, the abstract stimulus lists contained words drawn from nearly the entire length of the concreteness scale, and also tended to feature words that participants disagreed about how to rate.

Figure 4 depicts the abstract and concrete stimuli featured in Allen and Hulme (2006). Again, many “abstract” stimuli here have standard deviations well above 1, indicating that people disagreed about whether the words were abstract in the first place. The range of mean ratings of concreteness for the abstract condition is also clearly much higher than in the concrete condition. Once again, a relatively homogeneous group of concrete words has been compared to a heterogeneous group of words about which participants tended to disagree.

Figure 5 plots the stimuli featured in Miller and Roodenrys (2009). Again, there is a marked difference in standard deviations between the concrete and the abstract stimuli. Furthermore, the standard deviations of the abstract stimuli are so high (well above 1 in the majority of cases) that the mean value does not reflect the judgments that participants were actually making.

Fig. 5 — Miller and Roodenrys (2009) stimuli

Finally, consider Fig. 6, which depicts the stimuli featured in Walker and Hulme (1999). The midscale criticism applies least to this set of stimuli, although it is still clearly the case that the concrete stimuli tended to have lower standard deviations than the abstract stimuli. The reasons for this have already been expounded. The upshot is that a skeptic could reasonably argue that these experiments do not actually provide evidence for concreteness effects. The reason is that the comparison being made was meant to be between concrete and abstract items, but the comparison that was actually made was between concrete items, on the one hand, and a group of stimuli about which participants disagree, on the other. It could be the case that words that engender disagreement are those that are hard to remember, and that this explains processing differences that have previously been attributed to concreteness/abstractness. The experiments that I report below were designed to test this possibility.

Before moving on to a report of these replication attempts, I wish to point out that list memory paradigms are not a special case when it comes to the properties of “abstract” stimuli. Table 1 presents a number of experimental concreteness studies from a wide variety of paradigms, as well as a summary of the concreteness values and standard deviations of the stimuli featured in their experiments. The abstract–midscale stimulus pattern applies to every single experiment.

Table 1.

Concreteness statistics in various experimental paradigms

Article	Type of Data	Experimental Paradigm	Concrete		Abstract
Article	Type of Data	Experimental Paradigm	Mean Concreteness	Mean SD	Mean Concreteness	Mean SD
Kroll & Merves (1985)	Behavioral	Lexical decision	4.55	0.74	2.17	1.22
de Groot (1989)	Behavioral	Word association	4.66	0.6	2.36	1.24
Paivio et al. (1994)	Behavioral	Recall	4.83	0.47	2.29	1.28
Gee et al. (1999)	Behavioral	Recall	4.73	0.57	3	1.33
Binder, Nelson, & Krawczyk (2005)	fMRI	Lexical decision	4.76	0.52	2.34	1.23
Crutch & Warrington (2005)	Patient population	Word matching	4.83	0.46	3.53	1.18
Sabsevitz et al. (2005)	fMRI	Semantic judgment	4.86	0.45	2.58	1.31
ter Doest & Semin (2005)	Behavioral	Recall	4.72	0.57	2.45	1.26
Lee & Federmeier (2008)	EEG	Semantic judgment	4.41	0.88	2.27	1.24
Huang et al. (2010)	EEG	Semantic judgment	3.82	1.17	2.53	1.21
Skipper-Kallal, Mirman, & Olson (2015)	fMRI	Deep thought	4.44	0.81	2.38	1.22
Jager & Cleland (2016)	Behavioral	Lexical decision	4.62	0.64	3.29	1.19

Condition	Mean Concreteness	SD Concreteness	AoA	Zipf Frequency	L Phon	Length	N Syll
Concrete	4.38 (0.17)	1.02 (0.11)	10.45 (2.05)	3.34 (0.79)	5.59 (0.94)	6.93 (1.06)	2.00
Abstract	1.78 (0.14)	1.04 (0.12)	10.58 (2.09)	3.38 (0.83)	5.56 (0.93)	6.84 (1.17)	2.00
Agree	3.17 (0.7)	1.08 (0.07)	10.09 (1.9)	3.15 (0.85)	5.63 (1.03)	6.93 (1.21)	2.00
Disagree	3.1 (0.36)	1.65 (0.05)	10.23 (2.04)	3.13 (0.81)	5.76 (1.10)	6.9 (1.32)	2.00

Condition	Word 1	Word 2	Word 3	Word 4	Word 5	Word 6	Word 7	Word 8
Concrete	Beaker	Clinic	Tango	Clothing	Amber	Jackal	Roulette	Survey
Abstract	Desire	Mystique	Intent	Vantage	Glory	Nuance	Unease	Motive
Agree	Diesel	Roughhouse	Attempt	Whiner	Viewpoint	Freshness	Stampede	Leader
Disagree	Slipstream	Audit	Poorhouse	Minute	Rival	Tribune	Abyss	Spectrum

Condition	Mean Words Recalled (SD)	Mean Percentage Recalled
Concrete	4.67 (1.35)	58.4%
Abstract	4.48 (1.24)	56%
Disagree	4.38 (1.28)	54.6%
Agree	4.45 (1.35)	55.6%

Fixed Effects	Effect Estimate	Error	df	t	p	Lower 95%CI for Effect	Higher 95%CI for Effect
Abstract	–.19	–.12	39.25	–1.56	.13	–.43	.05
Agree	–.22	–.12	39.25	–1.79	.08	–.46	.03
Disagree	–.29	–.12	39.25	–2.42	.02	–.54	–.05

Condition	Mean Concreteness	SD Concreteness	AOA	Zipf Frequency	L Phon	N Syll	Length	BG Mean
Concrete	4.51 (0.23)	0.91 (0.13)	9.92 (1.9)	3.54 (0.56)	4.75 (0.2)	1.75 (0.43)	6.125 (1.41)	3,573 (1,151)
Abstract	1.61 (0.17)	0.81 (0.11)	10.04 (1.64)	3.48 (0.69)	5.25 (1.44)	1.75 (0.43)	6.44 (1.5)	3,457 (1,176)
Disagreement	3 (0.23)	1.33 (0.02)	9.78 (1.95)	3.72 (0.78)	5.75 (1.48)	1.81 (0.39)	6.38 (1.45)	3,218 (957)

Condition	Mean Words Recalled	Mean Percentage Recalled
Concrete	3 (2.73)	18.6%
Abstract	3.43 (3.07)	21.5%
Disagree	3.05 (2.84)	19.1%

Condition	Mean Concreteness	SD Concreteness	AoA	Zipf Frequency	N Syll	Length	BG mean	Absolute Valence	Percent Known
Concrete	4.55 (0.17)	0.81 (0.12)	10.11 (1.28)	3.41 (0.48)	2.42 (0.86)	7.63 (1.79)	3,649 (1,134)	1.12 (0.77)	99%
Abstract	1.61 (0.15)	0.85 (0.11)	10.2 (1.95)	3.54 (0.72)	2.53 (0.89)	7.63 (1.95)	3,710 (1,208)	1.15 (0.78)	99%
Midscale	3.02 (0.26)	1.51 (0.77)	10.11 (1.99)	3.53 (0.72)	2.54 (0.86)	7.57 (1.89)	3,737 (1,184)	1.15 (0.77)	98.7%

Condition	Mean Words Recalled (SD)	Mean Percentage Recalled
Concrete	4.06 (1.31)	67.7%
Abstract	3.7 (1.25)	61.7%
Midscale	3.85 (1.28)	64.2%

Fixed Effects	Effect Estimate	Error	df	t	p	Lower 95% CI for Effect	Higher 95% CI for Effect
Abstract	–.37	.12	44.34	–3.11	.003	–.61	–.13
Midscale	–.21	.12	44.34	–1.79	.08	–.45	.03

List	condition	Word 1	Word 2	Word 3	Word 4	Word 5	Word 6	Word 7	Word 8
1	disagree	polling	dipstick	decade	centaur	exhaust	foreword	limbo	spender
2	disagree	physic	sequel	deacon	nettle	output	earshot	deadline	cackle
3	disagree	brethren	zenith	deluge	silence	lawsuit	theorist	polka	margin
4	disagree	nappy	degree	panic	bearings	legend	request	physics	prefect
5	disagree	sponsor	delta	dropper	phantom	egghead	rightness	aerial	eyesight
6	disagree	halter	brainwave	mankind	nightlife	surname	scrounger	tunic	omen
7	disagree	pariah	divorce	cosmos	sundries	purveyor	demon	crosswind	alias
8	disagree	grammar	conveyance	easement	blackball	woodland	giantess	weeknight	instant
9	disagree	tidbit	shallows	photon	plural	hallmark	grafting	sandman	nature
10	disagree	slipstream	audit	poorhouse	minute	rival	tribune	abyss	spectrum
11	agree	menace	bookie	tinting	flicker	rebound	squatter	tempo	pusher
12	agree	uprise	digest	tiling	region	charmer	joyride	outbreak	nutrient
13	agree	hubbub	matron	median	nuthouse	pullout	partner	distaste	refill
14	agree	burial	backwash	mover	career	event	footing	caper	peacetime
15	agree	jailbreak	torment	hazard	instinct	guru	downpour	richness	glucose
16	agree	bunting	rhythm	stalker	dullness	ascent	headache	gunpoint	welfare
17	agree	ringside	archduke	turmoil	shyness	posse	gangway	shipping	outreach
18	agree	sunburst	mishap	bumpkin	deceit	villain	bloodlust	misdeed	hunting
19	agree	diesel	roughhouse	attempt	whiner	viewpoint	freshness	stampede	leader
20	agree	semblance	havoc	broadside	dining	image	dissent	goner	culprit
21	abstract	setback	vagueness	spirit	notion	loyalty	esteem	phrasing	credence
22	abstract	charade	rapture	betrayal	logic	backlash	renown	letdown	affront
23	abstract	desire	mystique	intent	vantage	glory	nuance	unease	motive
24	abstract	amends	prestige	godsend	satire	leeway	wordplay	pretense	calmness
25	abstract	accord	whimsy	disdain	hardship	virtue	manner	regard	effect
26	abstract	freelance	mischief	respite	folly	pureness	repute	courage	meantime
27	abstract	merit	standpoint	future	allure	rapport	wisdom	prudence	insight
28	abstract	mistake	quantum	dogma	function	purpose	willpower	hearsay	meaning
29	abstract	patience	aspect	debut	fairness	pity	taboo	riddance	appeal
30	abstract	piety	finesse	foresight	longshot	loathing	stigma	concern	control
31	concrete	leaflet	roadhouse	artist	lighting	parsley	seabed	ironwork	lacrosse
32	concrete	clipper	pewter	cauldron	quarry	blockade	earwig	clubfoot	logbook
33	concrete	summit	breeches	abscess	foreman	award	entree	funnel	beacon
34	concrete	corset	template	pigment	fuchsia	urchin	ringworm	crewman	mansion
35	concrete	jester	gasket	sternum	backdrop	bouncer	chapel	resort	county
36	concrete	penthouse	fracture	entrails	vinyl	buckskin	tundra	barrier	plumbing
37	concrete	timepiece	methane	record	tiller	grindstone	merchant	shrapnel	duchess
38	concrete	quarter	bulkhead	sarong	tenant	chamber	canon	bailiff	machine
39	concrete	beaker	clinic	tango	clothing	amber	jackal	roulette	survey
40	concrete	spiral	marrow	billiard	bootlace	scabies	saffron	captain	product

Effect	Effect Estimate	Std. Error	z	p
Abstract	.19	.15	1.3	.2
Disagree	.02	.15	.15	.88

Experiment	Concrete	Abstract	Disagree	Agree
1	98.5%	98.3%	97.7%	98.5%
2	99.5%	99.1%	98%	N/A

Pair	Condition	Word 1	Word 2
1	concrete	cauldron	hike
2	concrete	footman	band
3	concrete	blazer	creature
4	concrete	rubble	liqueur
5	concrete	throttle	ulcer
6	concrete	ranch	gauntlet
7	concrete	cadet	concert
8	concrete	ledge	manor
9	abstract	betrayal	urge
10	abstract	revenge	foresight
11	abstract	godsend	risk
12	abstract	wisdom	psyche
13	abstract	hardship	malice
14	abstract	greed	riddance
15	abstract	loyalty	lenience
16	abstract	bliss	mercy
17	midscale	genius	royalty
18	midscale	foreground	district
19	midscale	gleam	patriot
20	midscale	view	approach
21	midscale	upstart	brawn
22	midscale	expanse	profit
23	midscale	asset	vortex
24	midscale	habit	encore

List Number	Condition	Word 1	Word 2	Word 3	Word 4	Word 5	Word 6
1	concrete	pad	harpoon	stretcher	kennel	ulcer	aftershave
2	concrete	trachea	parsley	fuselage	rifleman	plaster	medallion
3	concrete	cedar	rubble	trinket	composer	liver	dormitory
4	concrete	scale	shipment	gladiator	guesthouse	morgue	marrow
5	concrete	vineyard	porcelain	cocktail	warship	advisor	slate
6	concrete	supervisor	infirmary	bouquet	manicure	bay	tomb
7	concrete	graphics	sage	smoothie	wildfire	prosecutor	sapphire
8	concrete	inspector	minefield	tourist	stub	horseradish	frostbite
9	concrete	guitarist	notch	gauntlet	orphanage	vegetation	bomber
10	concrete	greenhouse	sedative	museum	silicon	wreckage	accountant
11	concrete	incubator	lavender	surgeon	violinist	courtroom	embroidery
12	concrete	landlord	measles	dictator	pacemaker	minibus	plumber
13	concrete	newsletter	bodyguard	stockbroker	foliage	petroleum	liqueur
14	concrete	plantation	attorney	blockade	antibiotic	concert	currency
15	concrete	stroke	titanium	bile	sniper	massage	adhesive
16	abstract	urge	renown	patience	motive	malice	quandary
17	abstract	penance	belief	indulgence	reproach	version	fixation
18	abstract	mercy	glory	charade	aptitude	manner	formality
19	abstract	risk	psyche	rhetoric	foresight	fraud	regard
20	abstract	prudence	oblivion	hardship	mood	sarcasm	fate
21	abstract	extent	imposition	purpose	competence	luck	whim
22	abstract	willpower	bias	indecision	loyalty	seriousness	knowledge
23	abstract	involvement	existence	coincidence	ruse	principles	betrayal
24	abstract	detriment	subtlety	tradition	damnation	wisdom	fantasy
25	abstract	forgiveness	semantics	value	sanctity	godsend	discretion
26	abstract	eternity	politeness	concept	reasoning	anomaly	symbolism
27	abstract	suspicion	goodness	arrogance	mortality	chance	theory
28	abstract	precedent	privacy	likelihood	lunacy	oversight	revenge
29	abstract	affirmative	repentance	leniency	similarity	merit	expertise
30	abstract	wickedness	analogy	bliss	coercion	courage	avoidance
31	midscale	plot	molecule	mankind	format	swindle	motherland
32	midscale	hormone	reply	tarot	tribune	routine	pushover
33	midscale	delay	gossip	slumber	bandwagon	response	vigilante
34	midscale	zone	shallows	pinnacle	wavelength	grief	degree
35	midscale	envoy	character	fallout	clue	vacancy	tone
36	midscale	circulation	drunkenness	midsummer	doctorate	goal	hoax
37	midscale	cutthroat	rift	corporation	lawsuit	translation	sweetness
38	midscale	announcement	activist	process	slack	formation	whiplash
39	midscale	chronicle	monologue	overlap	motherhood	virus	penalty
40	midscale	exhaustion	delegate	magic	rebuttal	crackpot	diversion

PERMALINK

Statistical and methodological problems with concreteness and other semantic variables: A list memory experiment case study

Lewis Pollock

Abstract

Word concreteness

Brysbaert et al. (2013) concreteness norms

Fig. 1.

Fig. 2.

Stimuli in concreteness experiments: A case study of list memory paradigms

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Table 1.

Experiment 1

Fig. 7.

Method

Participants

Materials

Table 2.

Table 3.

Procedure

Results

Table 4.

Table 5.

Experiment 2

Method

Participants

Materials

Fig. 8.

Table 6.

Procedure

Results

Table 7.

Table 8.

Interim summary

Table 9.

Table 10.

Experiment 3

Method

Participants

Materials

Table 11.

Procedure

Results

Table 12.

Table 13.

General discussion

Fig. 9.

Fig. 10.

Fig. 11.

Conclusion

Author note

Appendices

Table 14.

Table 15.

Table 16.

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases