Skip to main content
Archives of Clinical Neuropsychology logoLink to Archives of Clinical Neuropsychology
. 2014 Aug 22;29(7):609–613. doi: 10.1093/arclin/acu039

Effort in College Undergraduates Is Sufficient on the Word Memory Test

Octavio A Santos 1, Dmitriy Kazakov 1, Mary K Reamer 1, Sydney E Park 1, David C Osmon 1,*
PMCID: PMC4263921  PMID: 25149077

Abstract

A prior report found unusually high rates of performance validity test (PVT) failure in undergraduate research participants (31%–56%). The present study examined 110 undergraduate volunteers in three conditions (positive, neutral, or negative demand characteristics) in either an easy to hard or a hard to easy progression of neuropsychological tests using the Word Memory Test PVT. Neither demand characteristics nor test order had a substantial effect on test performance, and only a 6.4% failure rate was found on the PVT. These results suggest that neuropsychological testing experiments are completed faithfully by the vast majority of college undergraduates, although excluding the small number of participants failing PVTs would strengthen the internal validity of most studies.

Keywords: Effort, Undergraduate students, Performance validity tests

Introduction

An, Zakzanis, and Joordens (2012) reported that 30.8%–55.6% of college undergraduates failed at least one of three performance validity tests (PVTs), such as the Test of Memory Malingering, Dot Counting Test, or Victoria Symptom Validity Test, in a study completed with introductory psychology class volunteers who received extracredit for participation. The study included other neuropsychological and psychological measures, such as the Raven's Advanced Progressive Matrices, Delis–Kaplan Executive Function System Color-Word Interference tasks, the Wechsler Memory Scale-Third Edition, and the Beck Depression Inventory (BDI). To our knowledge, this has been the only study investigating suboptimal effort in this population. Effort test failure correlated with poor performance on neuropsychological testing and, though the sample size was small on repeat testing, those who failed effort testing during the first session tended to fail again in the second session. The authors concluded that such high rates of poor effort call into question the validity of research using undergraduate college volunteers.

An and colleagues (2012) results run counter to work in our laboratory where over 170 undergraduate volunteers have been tested as controls using either the Word Memory Test (WMT; Green, 2003) or the Medical Symptom Validity Test (MSVT; Green, 2004) over the past 5 years in studies examining simulation of various disorders (e.g., attention deficit hyperactivity disorder, learning disability, and mild traumatic brain injury). In the control groups in these studies, we found an overall rate of 2.1% failure with a range of 1.9%–2.6% across four studies. As a result, we designed the current study to test failure on the WMT (Green, 2003), which is generally held to be the most sensitive measure of effort currently available (Gervais, Rohling, Green, & Ford, 2004; Green, 2007; O'Bryant & Lucas, 2006; Tan, Slick, Strauss, & Hultsch, 2002). We decided to include different demand characteristics in the experimental situation to examine whether differential treatment of research participants accounted for differences in PVT scores. Additionally, we used a rationally derived easy to hard test progression with half of the participants, and a hard to easy progression with the other half, to examine whether differences in these variables controlled the failure rate of undergraduate volunteers. Considering the low insufficient effort rates (e.g., ∼2.1%) in undergraduate volunteers tested on the WMT or MSVT in our laboratory and McCambridge, de Bruin, and Witton's (2012) study on demand characteristics, we hypothesized that similar failure rates would be found in both the positive and neutral condition as opposed to the negative condition, which would exhibit higher failure rates, regardless of test progression.

Methods

A sample of 110 psychology undergraduate volunteers who received extracredit for voluntary participation were included in the study and divided into three conditions as follows: positive, neutral, and negative. Table 1 displays the participants' demographic information per condition. After signing the informed consent, participants were asked whether they had a pre-existing neurological and/or psychiatric conditions; participants who had any of such conditions, except those previously or currently treated for depression, anxiety disorders, and/or Attention Deficit Hyperactivity Disorder (ADHD) were excluded (Table 1). Five participants were excluded in total. There were no significant differences between the three experimental conditions with regard to age, gender, race, education level, and psychiatric disorder. Participants self-registered for time slots through an online experiment management system (SONA Systems, Ltd, Version 2.72; Tallinn, Estonia) that was only available to students enrolled in psychology classes. The time slots on SONA for participants to sign up were predetermined considering research assistants' schedules. Therefore, group assignment was contingent upon the participants' self-registered time slot and the research assistant's schedules, although the participants were blind to the conditions. The positive condition included greeting the participants as they arrived with a smile and an accommodating manner, and using optimistic encouragement throughout the testing. In contrast, the negative condition consisted of a brusque, but not rude, manner and a 15-min wait before commencing testing. The research assistant had an expedient and somewhat harried style of test administration without anticipatory accommodation to subject needs. The neutral manner was a professional interpersonal style that was neither positive nor negative in approach. In addition, half of the participants in each condition were given an easy to hard order of tests and the other half were given tests in the reverse order. The battery of tests included: Beck Anxiety Inventory (BAI; Beck & Steer, 1993); BDI-Second Edition (BDI-II; Beck, Steer, & Brown, 1996); Stroop Word Reading and Color-Word pages (Stroop-A and -C; Golden, 1978); Mini-International Personality Item Pool (Mini-IPIP; Donnellan, Oswald, Baird, & Lucas, 2006); Shipley-2 Abstraction and Vocabulary (Shipley-2-A and -V; Shipley, Gruber, Martin, & Klein, 2009); Trail Making Test Parts A and B (TMT-A and -B; Army Individual Test Battery, 1944); Wide Range Achievement Test-Fourth Edition Word Reading and Sentence Comprehension (WRAT-4-WR and -SC; Wilkinson & Robertson, 2006); and Word Memory Test Immediate Recognition (WMT-IR), Delayed Recognition (WMT-DR), Consistency (WMT-CNS), Multiple Choice (WMT-MC), Paired Associates (WMT-PA), Free Recall (WMT-FR), and Long Delayed Free Recall (WMT-LDFR). Apart from the WMT, which was administered in order to accommodate the test's timing of delay subtests, the administration order from easy to hard included: Stroop-A, TMT-A, Shipley-2-V, WMT-IR, BAI, BDI-II, Mini-IPIP, WRAT-4-WR, WRAT-4-SC, WMT-DR, WMT-CNS, WMT-MC, WMT-PA, WMT-FR, Stroop-C, TMT-B, Shipley-2-A, and WMT-LDFR. The administration order from hard to easy includes: Shipley-2-A, TMT-B, Stroop-C, WMT-IR, BAI, BDI-II, Mini-IPIP, WRAT-4-SC, WRAT-4-WR, WMT-DR, WMT-CNS, WMT-MC, WMT-PA, WMT-FR, Shipley-2-V, TMT-A, Stroop-A, and WMT-LDFR.

Table 1.

Demographics and psychiatric disorders by condition

Demographics Positive Neutral Negative Total (%)
N 35 39 36 110 (100%)
Age
 Mean 22.4 22.6 24.4 23.1
 Range 18–48 18–53 18–51 18–53
Gender
 Male 7 9 7 23 (21%)
 Female 28 30 29 87 (79%)
Race
 Asian 4 4 3 11 (10%)
 Black 4 4 5 13 (12%)
 Hispanic 2 3 4 9 (8%)
 White 23 26 23 72 (65%)
 Other 2 2 1 5 (5%)
Education level
 Mean 14.4 14.2 14.4 14.3
 Range (12–17) (12–17) (12–17) 12–17
Psychiatric disorders
 Anxiety 4 2 2 8 (7%)
 Depression 0 2 3 5 (5%)
 ADHD 1 1 1 3 (3%)
 Comorbidity 1 2 3 6 (5%)

Notes: Education level is provided in years of education completed. Comorbidity refers to participants who reported having been diagnosed with both depression and anxiety.

Results

Table 2 displays group performance across all measures used in the study. Order of test presentation had no effect on any variable according to multiple one-way ANOVAs (p > .05), so results were collapsed across order. The lone significant difference occurred for the Stroop-C task where the negative condition performed more poorly than the positive condition (p < .05), while neither differed from the neutral condition. As evident in the table, condition did not affect performance on the WMT, with only 6.4% (n = 7) of participants failing according to the criterion of below cut-off performance on any of the five effort indices (WMT-IR, -DR, -CNS, -MC, and -PA per the WMT manual: Green, 2004). Failures occurred roughly equally across conditions; two failures each in the positive and neutral condition, and three failures in the negative condition with scores as shown in Table 3.

Table 2.

Test performance means and standard errors by condition

Test Positive mean (SE) Neutral mean (SE) Negative mean (SE)
TMT-A 23 (1.4) 23 (1.3) 23 (1.4)
TMT-B 64 (3.5) 61 (3.3) 62 (3.4)
Shipley-V 29 (0.8) 27 (0.8) 29 (0.8)
Shipley-A 14 (0.5) 14 (0.5) 14 (0.5)
Stroop-A 100 (3.2) 94 (3.1) 101 (3.2)
Stroop-C 54 (2.1)a 47 (2.0)ab 48 (2.1)b
WMT-IR 98 (0.9) 99 (0.8) 98 (0.9)
WMT-DR 98 (1.7) 98 (1.7) 96 (1.7)
WMT-CNS 97 (1.1) 97 (1.1) 97 (1.1)
WMT-MC 96 (2.3) 92 (2.3) 93 (2.4)
WMT-PA 95 (2.0) 93 (2.0) 93 (2.0)
WMT-FR 69 (2.8) 64 (2.7) 60 (2.8)
WMT-LDFR 70 (2.8) 66 (2.7) 62 (2.8)
WRAT-4-WR 45 (0.9) 45 (0.8) 44 (0.9)
WRAT-4-SC 45 (0.9) 43 (0.8) 44 (0.9)
BAI 8 (1.3) 10 (1.2) 10 (1.3)
BDI-II 8 (1.4) 11 (1.3) 9 (1.4)
Mini-IPIP-E 12 (0.5) 12 (0.4) 11 (0.5)
Mini-IPIP-A 13 (0.5) 14 (0.5) 13 (0.5)
Mini-IPIP-C 13 (0.5) 14 (0.5) 13 (0.5)
Mini-IPIP-N 11 (0.5) 11 (0.4) 11 (0.5)
Mini-IPIP-O 11 (0.5) 11 (0.5) 11 (0.5)

Notes: Superscript letters signify statistical significance between scores not sharing the same superscript letters, F(2, 106) = 3.03, p = .05, R2 = .05. Group means and standard errors (SEs) are shown per each test. TMT-A = Trail Making, Part A raw seconds; TMT-B = Trail Making, Part B raw seconds; Shipley-V = Shipley-2 Vocabulary raw correct; Shipley-A = Shipley-2 Abstract raw correct; Stroop-A = Stroop Word Reading page raw number count; Stroop-C = Stroop Color-Word page raw number count; WMT-IR = Word Memory Test Immediate Recognition; WMT-DR = Word Memory Test Delayed Recognition; WMT-CNS = Word Memory Test Consistency; WMT-MC = Word Memory Test Multiple Choice; WMT-PA = Word Memory Test Paired Associates; WMT-FR = Word Memory Test Free Recall; WMT-LDFR = Word Memory Test Long Delayed Free Recall; WRAT-4-WR = Wide Range Achievement Test-Fourth Edition Word Reading; WRAT-4-SC = Wide Range Achievement Test-Fourth Edition Sentence Comprehension; BAI = Beck Anxiety Inventory; BDI-II = Beck Depression Inventory-Second Edition; Mini-IPIP-E = Mini-International Personality Item Pool Extraversion; Mini-IPIP-A = Mini-International Personality Item Pool Agreeableness; Mini-IPIP-C = Mini-International Personality Item Pool Conscientiousness; Mini-IPIP-N = Mini-International Personality Item Pool Neuroticism; and Mini-IPIP-O = Mini-International Personality Item Pool Openness.

Table 3.

Scores by condition for those seven subjects who failed the WMT

IR DR CNS MC PA Condition
58 100 58 85 90 Positive
80 65 75 35 35 Positive
90 80 75 60 55 Neutral
93 88 80 85 75 Neutral
100 95 95 75 55 Negative
88 98 85 50 55 Negative
88 80 78 50 50 Negative

Notes: IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency; MC = Multiple Choice; PA = Paired Associates; FR = Free Recall; LDFR = Long Delayed Free Recall.

Discussion

As noted in McCambridge and colleagues (2012), there are few studies that examine demand characteristics in non-laboratory settings (the authors found only one experimental and six observational studies). Thus, they concluded that an understanding of this important and well-known effect is unknown in situations such as clinical neuropsychological testing. Contrary to our hypothesis in regard to the negative condition exhibiting a higher rate of effort failure, the results of the present study suggested that PVT failure, as measured by WMT, is similar in the three experimental conditions (positive, neutral, and negative) and is relatively low as found in other studies involving undergraduate volunteers conducted in our laboratory. Specifically, two conclusions are suggested. First, undergraduate participants in our laboratory rarely fail the WMT when given standard instructions. Secondly, neither experimenter demeanor nor tests ordered according to difficulty impacted WMT performance in the current results. We believe the WMT to be robust to standard clinical conditions in neuropsychological evaluations and research studies.

The present results found only a 6.4% WMT failure rate, which are in direct contrast to the enormous failure rates in the An and colleagues study (31%–56%). This large discrepancy is difficult to reconcile by methodological differences across the two studies, and we believe the An and colleagues results to be aberrant. For example, we cannot attribute such differences to differing types and numbers of PVTs since the WMT has generally been found to be the most sensitive at high levels of specificity across many different studies using a wide range of populations (Gervais et al., 2004; Green, 2003; Green, Montijo, & Brockhaus, 2011; Tan et al., 2002). It is also noted that reported base rates of malingering for non-litigant cases fall between 7.10% and 11.56% (Mittenberg, Patton, Canyock, & Condit, 2002), a figure much closer to 6% than a third to more than half of the subjects. Therefore, it is recommended that this discrepancy in effort failure be reconciled in further work. Specifically, while demand characteristics and test order do not seem to be substantial moderators of PVT failure according to present results, further studies exploring these issues in more detail would be useful. Additionally, an aberrant sampling distribution is possible since An and colleagues included only 36 participants. Future studies should include larger sample sizes. Finally, it is possible that different strategies for feigning deficits may be detected by using different PVT instruments, as was done in the An and colleagues study. Future work should include a variety of PVTs that might better detect differing feigning strategies; however, PVTs that are included should have demonstrated high sensitivity when specificity is held at 90% or better. Future work may also note the limitations in the present study, as noted below.

Limitations in the present study include the lack of a post-experiment questionnaire to determine the participants' perception about the positive, negative, and neutral demand characteristics. Likewise, perceived test order difficulty was not empirically determined for the present study.

Acknowledgements

We thank the psychology undergraduate volunteers who participated in the study. Special thanks to our research assistants, Emily Kennedy-Hettwer, Ashten Morth, Elizabeth Peters, Blake Hummer, Abbey Van Boxtel, Erin Giese, and Olivia Harmelink, who helped with data collection and project management.

References

  1. An K. Y., Zakzanis K. K., Joordens S. Conducting research with non-clinical healthy undergraduates: Does effort play a role in neuropsychological test performance? Archives of Clinical Neuropsychology. 2012;27(8):849–857. doi: 10.1093/arclin/acs085. doi:10.1093/arclin/acs085. [DOI] [PubMed] [Google Scholar]
  2. Army Individual Test Battery. Manual of directions and scoring. Washington, DC: War Department, Adjutant General's Office; 1944. [Google Scholar]
  3. Beck A. T., Steer R. A. Beck Anxiety Inventory manual. San Antonio, TX: The Psychological Corporation; 1993. [Google Scholar]
  4. Beck A. T., Steer R. A., Brown G. K. Manual for the Beck Depression Inventory-II. San Antonio, TX: The Psychological Corporation; 1996. [Google Scholar]
  5. Donnellan M. B., Oswald F. L., Baird B. M., Lucas R. E. The mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality. Psychological Assessment. 2006;18:192–203. doi: 10.1037/1040-3590.18.2.192. [DOI] [PubMed] [Google Scholar]
  6. Gervais R. O., Rohling M. L., Green P., Ford W. A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants. Archives of Clinical Neuropsychology. 2004;19:475–487. doi: 10.1016/j.acn.2003.05.001. doi:10.1016/j.acn.2003.05.001. [DOI] [PubMed] [Google Scholar]
  7. Golden C. J. Stroop color and word test: A manual for clinical and experimental uses. Chicago, IL: Stoelting; 1978. [Google Scholar]
  8. Green P. Green’s word memory test for microsoft windows: User’s manual. Edmonton, Canada: Green's Publishing Inc; 2003. [Google Scholar]
  9. Green P. Green's medical symptom validity test (MSVT) for microsoft windows: User's manual. Edmonton, Canada: Green's Publishing Inc; 2004. [Google Scholar]
  10. Green P. Spoiled for choice: Making comparisons between forced-choice effort tests. In: Boone K. B., editor. Assessment of feigned cognitive impairment. New York: Guilford Press; 2007. pp. 50–77. [Google Scholar]
  11. Green P., Montijo J., Brockhaus R. High specificity of the Word Memory Test and Medical Symptom Validity Test in groups with severe verbal memory impairment. Applied Neuropsychology. 2011;18(2):86–94. doi: 10.1080/09084282.2010.523389. doi:10.1080/09084282.2010.523389. [DOI] [PubMed] [Google Scholar]
  12. McCambridge J., de Bruin M., Witton J. The effects of demand characteristics on research participant behaviours in non-laboratory settings: A systematic review. PLoS ONE. 2012;7(6):e39116. doi: 10.1371/journal.pone.0039116. doi:10.1371/journal.pone.0039116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Mittenberg W., Patton C., Canyock E. M., Condit D. C. Base rates of malingering and symptom exaggeration. Journal of Clinical and Experimental Neuropsychology. 2002;24(8):1094–1102. doi: 10.1076/jcen.24.8.1094.8379. doi:10.1076/jcen.24.8.1094.8379. [DOI] [PubMed] [Google Scholar]
  14. O'Bryant S. E., Lucas J. A. Estimating the predictive value of the Test of Memory Malingering: An illustrative example for clinicians. The Clinical Neuropsychologist. 2006;20:533–540. doi: 10.1080/13854040590967568. doi:10.1080/13854040590967568. [DOI] [PubMed] [Google Scholar]
  15. Shipley W. C., Gruber C. P., Martin T. A., Klein A. M. Shipley-2 manual. Los Angeles: Western Psychological Services; 2009. [Google Scholar]
  16. Tan J., Slick D., Strauss E., Hultsch D. F. Malingering strategies on symptom validity tests. The Clinical Neuropsychologist. 2002;16(4):495–505. doi: 10.1076/clin.16.4.495.13909. [DOI] [PubMed] [Google Scholar]
  17. Wilkinson G. S., Robertson G. J. Wide range achievement test 4 professional manual. Lutz, FL: Psychological Assessment Resources; 2006. [Google Scholar]

Articles from Archives of Clinical Neuropsychology are provided here courtesy of Oxford University Press

RESOURCES