Match me if you can: Evidence for a domain-general visual comparison ability

Bethany Growns; James D Dunn; Erwin J A T Mattijssen; Adele Quigley-McBride; Alice Towler

doi:10.3758/s13423-021-02044-2

. 2022 Jan 7;29(3):866–881. doi: 10.3758/s13423-021-02044-2

Match me if you can: Evidence for a domain-general visual comparison ability

Bethany Growns ^1,^2,^✉,^#, James D Dunn ^3,^#, Erwin J A T Mattijssen ⁴, Adele Quigley-McBride ⁵, Alice Towler ³

PMCID: PMC9166871 PMID: 34997551

Abstract

Visual comparison—comparing visual stimuli (e.g., fingerprints) side by side and determining whether they originate from the same or different source (i.e., “match”)—is a complex discrimination task involving many cognitive and perceptual processes. Despite the real-world consequences of this task, which is often conducted by forensic scientists, little is understood about the psychological processes underpinning this ability. There are substantial individual differences in visual comparison accuracy amongst both professionals and novices. The source of this variation is unknown, but may reflect a domain-general and naturally varying perceptual ability. Here, we investigate this by comparing individual differences (N = 248 across two studies) in four visual comparison domains: faces, fingerprints, firearms, and artificial prints. Accuracy on all comparison tasks was significantly correlated and accounted for a substantial portion of variance (e.g., 42% in Exp. 1) in performance across all tasks. Importantly, this relationship cannot be attributed to participants’ intrinsic motivation or skill in other visual-perceptual tasks (visual search and visual statistical learning). This paper provides novel evidence of a reliable, domain-general visual comparison ability.

Keywords: Individual differences, Visual comparison, Perceptual expertise, Forensic science

People complete many complex visual tasks in their day-to-day life. One such task is visual comparison—comparing visual stimuli shown side by side and providing judgements about whether they originate from the same or different origins (i.e., “match”). This complex task involves many cognitive and perceptual processes—including visual perception, memory, similarity judgements, categorization and decision-making (Busey & Dror, 2011; Growns & Martire, 2020b), and is used in important real-world judgements. For example, forensic science examiners in feature-comparison disciplines “match” evidence samples (e.g., firearms, faces, fingerprints) to provide judgments about the source of the evidence to investigators or in court (Towler et al., 2018). Critically, it is human decision-makers who complete this task with limited input from technology (Thompson et al., 2013; Towler, Kemp, & White, 2017)—making it vital to understand how individuals perform these tasks. Yet research is only beginning to explore human performance in visual comparison.

Professional examiners typically outperform novices on tasks within their domain of experience: facial examiners outperform novices on facial comparison (Phillips et al., 2018; Towler, White, & Kemp, 2017; White, Phillips, et al., 2015; White et al., 2020); fingerprint examiners outperform novices on fingerprint comparison (Busey & Vanderkolk, 2005; Tangen et al., 2011; Ulery et al., 2011); firearm examiners have a higher rate of correct matches than do standard computer algorithms in firearm comparison (Mattijssen et al., 2021); and document examiners are better at avoiding the errors that novices make in handwriting comparison (Bird, Found, Ballantyne, & Rogers, 2010; Bird, Found, & Rogers, 2010; Kam et al., 1997). This superior visual comparison performance is typically attributed to the acquisition of domain-specific knowledge within an examiners’ domain of expertise—that is, examiners’ skill is attributed to their training and experience. For example, fingerprint and document examiners have better knowledge of statistical frequencies in forensic stimuli within their domain of expertise (Growns et al., 2021; Martire et al., 2018; Mattijssen et al., 2020), but not outside their domain (Growns & Martire, 2020a, 2020b). Further, fingerprint examiners’ also outperform novices in visual search tasks with fingerprints, but do not outperform novices in the same task with nonfingerprint stimuli (Searston & Tangen, 2017a). The domain-specific nature of examiners’ skill could be unsurprising given that cognitive psychology typically attributes superior performance and expertise to deliberate practice and experience engaging in a task (Charness et al., 2005; Ericsson, 2007, 2014).

Yet something else may be at play in accurate visual comparison performance beyond simply experience or deliberate practice—something that is hinted at by individual differences in this task. While forensic examiners outperform novices as a group, there is substantial variation in visual comparison accuracy even among professionals with equivalent training and experience (Busey & Vanderkolk, 2005; Mattijssen et al., 2020; Phillips et al., 2018; Searston & Tangen, 2017b). Further, facial examiners’ accuracy does not increase with their length of employment (White, Dunn, et al., 2015), and individual differences in fingerprint trainees’ skills are maintained even after 12 months of training (Searston & Tangen, 2017b). This variation in visual comparison ability suggests other factors may contribute to accurate performance beyond experience, deliberate practice, or training.

Recent evidence suggests individual differences in visual comparison could also be driven, at least in part, by a domain-general comparison ability. People with superior face recognition skills—or the ability to identify faces (“super-recognizers”; Noyes et al., 2017; Russell et al., 2009), also score above average on primate-face and fingerprint-comparison tasks (Towler, Dunn, et al., 2021a). Further, fingerprint examiners not only outperform novices in fingerprint-comparison (i.e., domain-specific; Busey & Vanderkolk, 2005; Tangen et al., 2011), but also on face-comparison tasks (i.e., domain-general; Phillips et al., 2018). Together, this emerging evidence suggests visual comparison may be driven by both a domain-specific skill and a natural domain-general visual comparison skill.

Overall, this converging evidence provides a first hint that there may be a generalizable visual comparison ability in specialist populations. However, no research has investigated this in the general population to determine whether it is a domain-general and naturally varying ability. Similar variable and domain-general abilities have been identified in other perceptual processes, such as visual recognition—the ability to identify visual objects. This ability is typically seen as a generalizable psychological process with substantial natural individual variation. For example, people who are better at recognizing some visual objects (e.g., faces) are also better at recognizing other visual objects (e.g., cars; Geskin & Behrmann, 2018; Richler et al., 2019). However, can the same be said of visual comparison? Does someone’s ability to “match” visual stimuli in one domain (e.g., faces) predict comparison performance in other domains (e.g., fingerprints)?

The current paper presents two experiments that are the first to explore whether there is a generalizable and domain-general psychological ability underpinning the ability to compare different complex visual stimuli, or whether these require separate skills. We explore individual differences in four visual comparison tasks to investigate the overlap or independence of performance in each task: face comparison, fingerprint comparison, firearms comparison, and a novel artificial print comparison task. Importantly, these tasks vary in familiarity—from familiar (faces) to unfamiliar (fingerprints and firearms) to entirely novel (artificial prints)—to ensure that accurate performance cannot be attributed to prior experience. If there is a generalizable ability underpinning visual comparison performance, we would expect performance in all comparison tasks to account for a substantial portion of shared variance across tasks. Conversely, if these are separate processes, we would expect performance in each comparison task to account for largely independent portions of variance. We also explore two alternative hypotheses: that individual differences in visual comparison accuracy are driven by intrinsic motivation as high performance could be determined by someone’s motivation to succeed (Experiment 1); or that individual differences in accuracy are driven by a broader visual-perceptual skill (Experiment 2). To examine this, participants in Experiment 1 also completed a measure of intrinsic motivation (the Intrinsic Motivation Inventory; McAuley et al., 1989; Tsigilis & Theodosiou, 2003) to determine whether any overlapping visual comparison ability is predicted by individual differences in motivation. In Experiment 2, participants also completed two other noncomparison visual-perceptual tasks (visual search and visual statistical learning) to determine whether the shared variance can be linked to a broader visual-perceptual ability.

Experiment 1

Method

Design

We used a within-subjects design where participants completed four comparison tasks (described below; see Fig. 1) and a measure of intrinsic motivation (the Intrinsic Motivation Inventory; McAuley et al., 1989; Tsigilis & Theodosiou, 2003) as a discriminant validity measure. The study preregistration data and analysis scripts can be found at https://osf.io/bvzpd/. Images used in this study are available upon request.

Participants

We recruited 124 participants online via Prolific Academic based on an a priori power analysis for detecting a two-sided correlation (r = .3) with 90% power (including an additional 10% to account for attrition). To be eligible for the study, participants were required to have normal or corrected-to-normal vision, live in the United States, have a Prolific approval rating of at least 95%, and have completed the experiment on a tablet or computer (not a mobile phone). No participants were excluded from the final sample as no participants met our preregistered exclusion criteria, which required them to correctly respond on at least three out of the five attention-checks (5.65% correctly passed just four questions; 94.35% passed all five).

Participants were 32.1 years old on average (SD = 12.1, range: 18–73), and the majority (62.9%) self-identified as female (36.3% male; 0.8% gender diverse) and White (64.5%; 13.7% Asian, 8.9% Black, 6.5% Hispanic, 6.5% Biracial, 0.81% Indian). Each participant was compensated USD$5.96 for completing the 50-minute experiment.

Tasks

Participants completed each of the four comparison tasks below. We selected two existing face and fingerprint comparison tasks (with minor modifications; Burton et al., 2010; Tangen et al., 2011), and created two additional novel comparison tasks: a cartridge case comparison task in firearms analysis, and a novel artificial-print comparison task. Pilot testing ensured each novel tests’ internal reliability and consistency were suitable for the assessment of individual differences (Siegelman et al., 2017; see Supplementary Materials on OSF). In cases where Cronbach’s α fell below recommended values for standardized tests (α > .8; Streiner, 2003a, 2003b) for our piloted tasks, we removed selected trials until α was ≥ .8.

Face comparison

Participants completed 40 face comparison trials (20 match and 20 nonmatch) from the Glasgow Face-Matching Task (GFMT-short; Burton et al., 2010; see upper-left panel of Fig. 1). The GFMT is a standardized face comparison task (Burton et al., 2010). Participants viewed two faces side by side and were asked, “Are these images of the same person or two different people?” on each trial. They responded by selecting one of two buttons (“same” or “different”) at the bottom of the screen.

Fingerprint comparison

Participants completed 56 fingerprint comparison trials (32 match and 32 nonmatch; see middle-left panel of Fig. 1) from the Fingerprint Matching Test from Tangen et al. (2011; Thompson & Tangen, 2014). Participants viewed two fingerprints side by side and were asked, “Are these fingerprints from the same person or two different people?” on each trial. They responded by selecting one of two buttons (“same” or “different”) at the bottom of the screen.

Firearms comparison

Participants completed 98 firearms comparison trials (49 match and 49 nonmatch; no trials were removed after pilot testing as α was ≥ .8) that were created for this experiment (see right panel of Fig. 1). Participants viewed two cartridge cases side by side and were asked, “Are these cartridge cases from the same firearm or two different firearms?” on each trial. They responded by selecting one of two buttons (“same” or “different”) at the bottom of the screen.

Artificial-print comparison

Participants completed 94 artificial-print comparison trials (47 match and 47 nonmatch; after excluding 10 trials based on pilot testing so that α ≥ .8) that were created for this experiment (see lower-left panel of Fig. 1). Artificial prints were created by carving the same basic pattern (four vertical lines and two diagonal intersecting lines inside a standardized circle) into potato halves. We then inked and stamped each half onto cardboard, dried the stamps, then scanned and digitized all prints.

Participants viewed two artificial prints side by side and were asked, :Are these prints from the same stamping tool or two different stamping tools?” on each trial. They responded by selecting one of two buttons (“same” or “different”) at the bottom of the screen.

Intrinsic motivation inventory

Participants completed a measure of their intrinsic motivation and subjective experience during the experiment: the Intrinsic Motivation Inventory (McAuley et al., 1989). The Intrinsic Motivation Inventory is a validated measure of intrinsic motivation as it has acceptable reliability and stability (McAuley et al., 1989; Tsigilis & Theodosiou, 2003) and has been used across multiple domains—from education to mental health research (Choi et al., 2010; Leng et al., 2010; Monteiro et al., 2015).

Participants completed three subscales of the inventory: the Effort, Enjoyment, and Perceived Competence subscales. They answered questions on a 7-point Likert scale from not at all true to very true. They answered questions such as, “I put a lot of effort into this” (effort subscale); “I enjoyed doing this activity very much” (enjoyment subscale); and “I am satisfied with my performance in this task” (perceived competence subscale). A full list of the questions can be found at https://selfdeterminationtheory.org/intrinsic-motivation-inventory/.

Dependent measures

Comparison performance in each task was computed using the signal-detection measure sensitivity (d'; Phillips et al., 2001; Stanislaw & Todorov, 1999). Higher d' values indicate higher sensitivity to the presence of a target stimulus independent of a tendency to respond “same” or “different” (response bias) and higher values are typically interpreted as higher “accuracy” in a task. We also calculated participants’ criterion (C)—a measure of tendency to respond ‘same’ or different—in each task and these analyses can be found in the Supplementary Materials on OSF (https://osf.io/bvzpd/).

Intrinsic motivation scores were calculated by averaging participants’ Likert-scale responses on the Effort, Enjoyment, and Perceived Competence inventory subscales (including the reverse-scored items).

Procedure

Participants completed the experiment via an online survey platform Qualtrics (https://www.qualtrics.com/). Participants completed all four comparison tasks in a randomized order, and all trials within each comparison task in a pseudo-randomized order (where one trial order was randomly generated when coding the experiment in each task and all participants completed trials in this order) to minimize error variance (Mollon et al., 2017). At the beginning of each comparison task, participants received brief task instructions and completed two practice trials where they were given corrective feedback (one match and one nonmatch). Upon completion of the comparison tasks, participants then completed the three subscales of the intrinsic motivation inventory, provided demographic information, and then viewed a debriefing statement.

Results and discussion

Descriptive results and psychometrics

The descriptive statistics and psychometric properties of all five tasks are presented in Table 1. Sensitivity was significantly above chance (i.e., above 0) on all four comparison tasks—face: t(123) = 27.23, p < .001; fingerprint: t(123) = 19.90, p < .001; firearms: t(123) = 33.43, p < .001; artificial-print: t(123) = 21.56, p < .001. Psychometric properties for all five measures were close to or above recommended values for standardized tests on a typical measure of scale reliability (see Table 1; Cronbach's α > .8; Streiner, 2003a, 2003b), except for the fingerprint comparison task where the values fell below typically recommended values for test evaluation (α = .61).

Table 1.

Descriptive statistics for each task (standard deviation in parentheses) Task performance for face, fingerprint, firearms, and artificial prints are shown in d', while intrinsic motivation is the mean response rating

	Mean task performance	α	Skewness	Kurtosis
Face comparison	2.21 (.90)	.75	−.10	2.59
Fingerprint comparison	1.06 (.59)	.61	.11	3.04
Firearms comparison	2.90 (.97)	.92	−1.10	3.80
Artificial-print comparison	1.21 (.63)	.82	−.01	3.28
Intrinsic motivation	4.88 (1.06)	.94	.16	2.40

Open in a new tab

Task performance for face, fingerprint, firearms, and artificial prints are shown in d', while intrinsic motivation is the mean response rating. Cronbach’s alpha was calculated on raw accuracy scores per participant (not d' scores)

Correlations between comparison performance

To investigate the relationships between task performance, we calculated Pearson’s correlations between sensitivity on all comparison tasks. Sensitivity on all comparison tasks was significantly and positively correlated with one another (see Fig. 2, and Table 5 in the Appendix for detailed statistics).1 We also calculated Bayes Factors to examine the likelihood of the observed data under the null hypothesis (i.e., the absence of correlations) compared with an alternative hypothesis (i.e., the presence of correlations) using the BayesFactor package in R (Morey et al., 2018). We observed a Bayes Factor >10 supporting the alternative hypothesis for four of the six comparison sensitivity correlations, providing strong evidence for the observed positive correlations (Wetzels et al., 2011). Smaller Bayes factors were observed for the remaining two sensitivity correlations (face and fingerprint: BF = 1.47, face and firearms: BF = 2.26)—providing weaker support for the presence of correlations between these tasks (see Table 5 in the Appendix).

Fig. 2 — Pearson correlations between task performance in Experiment 1

Table 5.

Correlations for performance between tasks in Experiment 1 (Pearson correlations reported with p values in parentheses, with Bayes factors displayed below)

	Face comparison	Fingerprint comparison	Firearms comparison	Artificial-print comparison
Face comparison	–
Fingerprint comparison	.182 (.043) BF = 1.47	–
Firearms comparison	.201 (.026) BF = 2.26	.334 (< .001) BF = 202.73	–
Artificial-print comparison	.418 (<.001) BF = 1.50e4	.530 (<.001) BF = 4.47e7	.470 (<.001) BF = 4.30e5	–
Intrinsic motivation	−.066 (.468) BF = .27	.083 (.360) BF = .31	−.023 (.801) BF = .21	−.007 (.937) BF = .21

Open in a new tab

Correlations between comparison performance and intrinsic motivation

To investigate the relationship between each comparison task and intrinsic motivation, we calculated Pearson’s correlations between intrinsic motivation and sensitivity separately. Importantly, intrinsic motivation did not significantly correlate with sensitivity on any comparison tasks (see Table 5, in the Appendix, and Fig. 3). We observed a Bayes factor of less than or close to .3 for all correlations between intrinsic motivation and sensitivity in each comparison task, which provides substantial evidence for the absence of correlations (Wetzels et al., 2011).

Fig. 3 — Two discriminant validity tasks used in Experiment 2: Visual search (left panel) and visual statistical learning (right panel)

Principal component analysis (PCA)

We explored the shared and unshared variance in sensitivity values across the four comparison tasks and intrinsic motivation scores with a Principal Component Analysis (PCA) using the prcomp function from the core stats package in R. Rotation was not conducted in the PCA. The loadings of all tasks on the five components and the proportion of variance explained by each component can be seen in Table 2.

Table 2.

Results of the principal components analysis (loadings matrix and percentage of variance explained)

	Component 1	Component 2	Component 3	Component 4	Component 5
Face comparison	.40	−.32	.80	.08	−.32
Fingerprint comparison	.50	.27	−.26	−.64	−.45
Firearms comparison	.48	−.00	−.42	.73	−.26
Artificial-print comparison	.59	−.01	.02	−.10	.79
Intrinsic motivation	<.01	.91	.36	.21	.02
Variance explained	41.99%	20.95%	16.37%	13.10%	7.60%

Open in a new tab

Component 1 explained a substantial portion of the variance across all five tasks (41.99%), and sensitivity on all four comparison tasks loaded strongly onto this component, but intrinsic motivation did not. This suggests that this component represents a generalizable comparison ability unrelated to intrinsic motivation. Component 2 explained an important portion of the variance across all tasks (20.95%) and intrinsic motivation scores loaded strongly onto this component, with sensitivity in each comparison task loading weakly or not at all onto this component. This suggests intrinsic motivation represents separate and unshared variance to performance on all comparison tasks.

Components 3–5 also explained an important portion of the variance across all tasks (37.07%; from 16.37–7.60%). Face comparison sensitivity strongly loaded onto Component 3 alone which explained the next greatest portion of variance (16.37%), fingerprint and firearms comparison sensitivity strongly loaded on Component 4 (positively correlated with firearms comparison and negatively correlated with fingerprint comparison; explaining 13.10% of variance), while artificial-print comparison sensitivity strongly loaded onto Component 5 alone (explaining 7.60% of variance). Overall, these results suggested that sensitivity on all comparison tasks reflect a mixture of shared (Component 1) and nonshared variance (Components 3, 4, and 5), whilst intrinsic motivation scores reflect separate nonshared variance (Component 2).

Experiment 1 explored whether there is a generalizable, domain-general perceptual skill underlying the comparison of visual stimuli and whether this relationship could be attributed to intrinsic motivation. Participants’ visual comparison sensitivity significantly correlated on all four tasks and accounted for a substantial portion of the variance in performance across all tasks—but intrinsic motivation did not and accounted for a separate portion of the variance. These results provide the first indication of a domain-general visual comparison ability that varies naturally in the general population.

Experiment 2

While the results of Experiment 1 suggest there is shared ability across visual comparison performance, these results may reflect a broader perceptual visual ability—rather than a skill specific to visual comparison. To investigate this possibility, Experiment 2 examined whether there is a relationship between visual comparison performance and performance on two other tasks that rely on visual-perceptual skills: visual search and visual statistical learning.

Visual search tasks are measures of attentional deployment and control that ask participants to search for a target among surrounding distractors (for review, see Chan & Hayward, 2013). Visual statistical learning is the ability to extract and encode statistical information from the visual environment around you (e.g., learning that black or white cars are more common than yellow cars; Fiser & Aslin, 2001; Turk-Browne et al., 2005). We selected these two tasks as they both engage processing of visual-perceptual information and show stable individual differences (e.g., visual search: Ericson et al., 2017; and, e.g., visual statistical learning: Growns et al., 2020). Importantly however, these tasks are theoretically unrelated to the ability to compare and evaluate similarity between visual stimuli. Therefore, we predict that if there is a domain-general ability specific to visual comparison, performance across visual comparison tasks will correlate and load similarly onto the same component in the PCA, but visual search and visual statistical learning performance will not.