Abstract
It has been well documented that IQ scores calculated using Canadian norms are generally 2–5 points lower than those calculated using American norms on the Wechsler IQ scales. However, recent findings have demonstrated that the difference may be significantly larger for individuals with certain demographic characteristics, and this has prompted discussion about the appropriateness of using the Canadian normative system with a clinical population in Canada. This study compared the interpretive effects of applying the American and Canadian normative systems in a clinical sample. We used a multivariate analysis of variance (ANOVA) to calculate differences between IQ and Index scores in a clinical sample, and mixed model ANOVAs to assess the pattern of differences across age and ability level. As expected, Full Scale IQ scores calculated using Canadian norms were systematically lower than those calculated using American norms, but differences were significantly larger for individuals classified as having extremely low or borderline intellectual functioning when compared with those who scored in the average range. Implications of clinically different conclusions for up to 52.8% of patients based on these discrepancies highlight a unique dilemma facing Canadian clinicians, and underscore the need for caution when choosing a normative system with which to interpret WAIS-IV results in the context of a neuropsychological test battery in Canada. Based on these findings, we offer guidelines for best practice for Canadian clinicians when interpreting data from neuropsychological test batteries that include different normative systems, and suggestions to assist with future test development.
Keywords: Assessment, Intelligence, Norms/normative studies, Professional issues
Introduction
The Wechsler Intelligence Scales (Wechsler, 1955, 1981, 1996, 1997, 2001, 2008a, 2008b) are the most widely used measures of intellectual abilities in adults and children (Nelson, Canivez, & Watkins, 2013; Watkins, Campbell, Nieberding, & Hallmark, 1995). Like many psychometric tests, these scales were originally normed on an American population, and Canadian psychologists used American normative data to derive scores on these tests of cognitive ability. However, questions as to whether normative data based on an American sample were appropriate for use with Canadian patients prompted the development of Canadian norms on more recent versions of the Weshsler scales (Bowden, Lange, Weiss, & Saklofske, 2008; Wechsler, 2001, 2008a). Historically, Canadians have scored on average 2–5 points higher than Americans on the Wechsler scales (Wechsler, 1996, 2001, 2004), highlighting the need for separate Canadian norms on these tests. That score differences are reduced when Canadian and American samples are matched on key demographic characteristics—such as educational attainment and ethnicity—supports the notion that the composition of the Canadian population is distinct from that of the United States, and further supports the use of Canadian norms when testing Canadian individuals.
The Wechsler IQ scales are widely considered to be the gold standard for measurement of overall intellectual ability. As such, these tests are often included as part of a neuropsychological assessment, and may be used to characterize a person's current level of cognitive, behavioral, and social–emotional functioning, or to monitor changes in cognitive status across repeat assessments (Costa, 1988; Lezak, Howieson, Bigler, & Tranel, 2012; Stebbins, 2007). Neuropsychological testing is routinely used to aid in important decisions regarding capacity for self-care (Galski, Tompkins, & Johnston, 1998), employability, or allocation of benefits such as employment insurance, education funding, or aid for individuals with disabilities (Beiser & Gotowiec, 2000; Kush & Watkins, 2007). Within the context of a neuropsychological test battery, the Wechsler IQ scales may provide an overall measure of cognitive competence and may be used as an interpretive context for other test results (Blyler, Gold, Iannone, & Buchanan, 2000). For example, a commonly used cut-off for a diagnosis of Intellectual Disability is a Full Scale IQ (FSIQ) score of 70 or below (American Psychiatric Association, 2013); and in Canada, having at least average abilities essential for thinking and/or reasoning is a major component of the current recommendations for a diagnosis of Learning Disability (Harrison, 2005; Harrison & Holmes, 2012; Kozey & Siegel, 2008; Learning Disabilities Association of Canada, 2002). In some cases, the provision of government aid such as financial or academic assistance may be contingent on such diagnoses (Beck & Davidson, 2001; Kush & Watkins, 2007; Lunsky, Garcin, Morin, Cobigo, & Bradley, 2007).
To date, few studies have examined the clinical implications of using Canadian versus American norms for only certain tests within a neuropsychological battery. In a study examining differences in scores on the WAIS-III in a neuropsychiatric population, Iverson, Lange, and Viljoen (2006) found a mean difference of 3.9 FSIQ points (SD = 2.6) when Canadian versus American norms were used, and differences across index scores ranged from 2.5 (SD = 3.2) for the Verbal Comprehension Index to 4.8 (2.0) for the Perceptual Reasoning Index. Despite these seemingly small discrepancies, they found that clinically different conclusions were reached in 13%–21% of cases when a different normative system was used to calculate FSIQ. More recently, Harrison and colleagues (Harrison, Armstrong, Harrison, Lange, & Iverson, 2014; Harrison, Holmes, Silvestri, & Armstrong, 2015a) have reported even larger differences across all WAIS-IV index and subtest scores that were calculated using Canadian versus American norms in clinical samples of 432 and 861 students who had been referred for psycho-educational or neuropsychological assessment, with a mean FSIQ difference of 7.5 (SD = 2.3). Significantly larger discrepancies were reported as both FSIQ and age decreased.
The findings by Harrison and colleagues (2014, 2015a) parallel observations by Canadian clinicians that larger discrepancies between Canadian and American scores seem to be generated using the WAIS-IV compared with the WAIS-III, and echo previously raised concerns regarding the accuracy of the Canadian norms derived for the Wechsler scales when measuring intellectual functioning for individuals at the extreme ends of the range (Gordon & Duff, 2010; Whitaker, 2008; Whitaker & Wood, 2008). The Wechsler group does acknowledge a larger discrepancy, of as much as 8 points, for individuals under the age of 40 (Wechsler, 2008a); they suggest that this may be due to social, economic, and educational differences between Canada and the United States. However, while they state that these large differences are “difficult to overlook” (Wechsler, 2008a, p. 52), they certainly appear to have been overlooked with respect to practical application of the test. Furthermore, the WAIS-IV manual does not report an inverse relationship between the average size of the difference and FSIQ score. Although this phenomenon was acknowledged following Harrison and colleagues' initial study by Pearson Canada (2014), who attribute the discrepancy to a relatively homogeneous population with a negative skew compared with the American population, the acknowledgment itself does not address the central problem facing Canadian clinicians.
The Principles for Fair Student Assessment Practices for Education in Canada (Joint Advisory Committee of the Canadian Education Association, 1993) and the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014) state that the best method of assessing an individual's cognitive abilities is through the use of tests that have been normed on a sample drawn from the same population as the individual being assessed. Unfortunately however, few test developers have followed in the footsteps of the Wechsler group, and in order to conduct a thorough neuropsychological assessment, Canadian clinicians are forced to use a battery that contains tests normed only on an American population (Lezak et al., 2012). Even within the Wechsler scales, the commonly used Wechsler Memory Scale-Fourth Edition (WMS-IV; Wechsler, 2009) has not been standardized on a Canadian population. Thus, Canadian psychologists are faced with the unique challenge of deciding how to best interpret data obtained through the use of a comprehensive test battery using norms derived from inequivalent populations.
In the Compendium of Neuropsychological Tests, Strauss, Sherman, and Spreen (2006) acknowledge that while using American norms may not provide the most appropriate representation of individual performance, they may be preferable when comparing performance across assessment tools without Canadian norms. Likewise, Pearson Canada (2014) recommended the use of a consistent normative data set across test instruments when Canadian norms are not available for all tests. However, in clinical practice, this is not always possible. For example, a neuropsychologist will frequently receive a referral for a patient who has already been administered the WAIS-IV by a clinical psychologist. As repeat administration of this test within a short time frame would invalidate the test, the neuropsychologist has no choice but to incorporate the results into a battery. Based on the above arguments, in such cases, it would be recommended that the clinical psychologist use Canadian norms, but the neuropsychologist use American norms as it is unlikely that a comprehensive neuropsychological assessment comprised only of tests with Canadian norms would be employed. Furthermore, the use of a different scoring system across Canadian clinicians places a significant limitation on interpretability of repeat assessments, particularly when few clinicians report their choice of normative system in assessment reports.
On the other hand, both Miller and colleagues (2015) and Kirkegaard (2015) cite the robust finding that Canadians demonstrate higher average raw score performance than Americans as grounds for dismissing the concerns that clinicians have raised regarding the large differences between normative data sets using the WAIS-IV when compared with the WAIS-III. As outlined in the WAIS-IV Canadian manual (Wechsler, 2008a), one rationale behind development of Canadian norms was the concern that the use of American norms could lead to underdiagnosis and limit access to needed supports for Canadian patients. However, although Miller and colleagues (2015) advocate for the use of Canadian norms for the WAIS-IV whenever possible, they do not provide any guidelines for how best to incorporate test scores into a comprehensive battery.
Thus, the challenge facing Canadian clinicians is to decide whether it is more fair to assess an individual using a consistent set of criteria as suggested by Pearson Canada (2014), or whether to use the available norms derived from the same population as the individual being assessed as suggested by Miller and colleagues (2015).
The goals of this study were to contribute to the research by replicating previous results that have shown large differences in FSIQ and primary index scores that result from the use of the Canadian versus the American normative system within a clinical population, and the extent to which these differences vary as a function of ability level and age group. However, beyond simply providing further evidence to support an already well-established finding, we seek to clarify the central issues that Canadian clinicians face from a practical standpoint and provide recommendations for clinicians who are currently using standardized tests in their practice, as well as for test developers to assist with norm development in future versions of standardized tests. Based on both clinical observation and previous research (Harrison et al., 2014, 2015a), we hypothesize that the magnitude of difference between FSIQ and index scores calculated using Canadian and American norms will be greater for those with FSIQ and index scores at the lower end of the spectrum than for those in the average or above range; we further hypothesize that the difference between FSIQ and index scores will be greater for younger than for older individuals.
Methods
Participants
The clinical sample in this study comprised 338 individuals aged 18–61 (M = 30.93, SD = 10.87; 50% female) in Alberta and in the Yukon Territory who received routine neuropsychological assessment through a private practice of psychologists in the province of Alberta. All clients who were assessed through this practice were invited to sign a consent form releasing their assessment data for research purposes. An anonymized database of test scores and relevant demographic information was compiled in 2010 by a graduate and an undergraduate student from the files of individuals who had received neuropsychological testing, including the WAIS-IV, between 2008 and 2010. Demographic information such as age, gender, clinical diagnoses, ethnicity, and geographical dwelling were used to describe the sample, and scores obtained from the WAIS-IV were included as variables in this study. Only those clients who consented to the use of their test results for research purposes were included in the database.
Within this clinical sample, FSIQ scores did not differ significantly across age group. Years of formal education ranged from 3 to16 years (M = 10.78, SD = 1.86). Ethnicity of the sample was 78% caucasian, 13% First Nations or Metis, 4% other ethnicity (e.g., Asian or African), and 5% unknown; 95% of the sample reported speaking English as a first language. Twenty-eight percent of the sample reported growing up in a city, 35% in a small town, 10% in a rural environment, 3% on a reserve, 14% reported being transient or moving regularly, and for 10%, this information was not known.
The majority of participants in this clinical sample (97%) had received between one and six current comorbid psychiatric, developmental, or serious medical diagnoses that may affect cognitive functioning. Of the 338 participants in the sample, 42% had been diagnosed with a depressive disorder, 38% with an anxiety disorder, 36% with attention deficit hyperactivity disorder (ADHD), 40% with a learning disability, 38% with a substance abuse or dependence disorder, 5% with an active psychotic disorder, 14% with a personality disorder, 9% with a developmental disorder (e.g., autism spectrum disorder), 11% with fetal alcohol spectrum disorder, 22% with a medical diagnosis (e.g., epilepsy, cerebral palsy, brain injury), and 15% reported experiencing chronic pain.
Measure
Wechsler Adult Intelligence Scale-IV (WAIS-IV)
FSIQ and index scores were indexed by the WAIS-IV (Wechsler, 2008a). This individually administered test of intelligence was standardized both on a sample of 2,200 American adults stratified according to the 2005 U.S. Census and on a sample of 680 Canadian adults stratified according to the 2006 Canadian Census. This measure is composed of 10 subscales and yields FSIQ scores with reported internal consistencies ranging from 0.97 to 0.98 for the American normed data and 0.96 to 0.98 for the Canadian normed data, as well as four Index scores, VCI, PRI, WMI, and PSI with internal consistencies ranging from 0.90 to 0.96 and test-retest correlations ranging from .84 to .95 (Wechsler, 2008a, 2008b). Each of these scores is normed to have a mean of 100 and a standard deviation of 15. Several studies have demonstrated the validity of the WAIS-IV (Bowden, Saklofske, & Weiss, 2011; Wechsler, 2008b), and scores on this measure are strongly correlated with scores on the WAIS-III (rs ranging from .85 to .94 (Wechsler, 2008b).
Procedure
The WAIS-IV uses age-based norms in calculating scaled scores. Based on the age categories used in the WAIS-IV manuals (Wechsler, 2008a, 2008b), participants were divided into groups consisting of those aged 18–19, 20–24, 25–29, 30–34, 35–44, 45–54, and 55–64. Individuals were also classified, based on both their calculated American and Canadian FSIQ scores, into the following levels of ability as outlined in the WAIS-IV manuals (Wechsler, 2008a, 2008b): Extremely low (FSIQ below 70), Borderline (FSIQ between 70 and 79), Low Average (FSIQ between 80 and 89), Average (Low; FSIQ between 90 and 99), Average (High; FSIQ between 100 and 109), High Average (FSIQ between 110 and 119), Superior (FSIQ between 120 and 129), and Very Superior (FSIQ of 130 and above). Although FSIQ scores that fall between 90 and 109 are considered Average by the test developers (Wechsler, 2008a, 2008b), this category was divided into Average (Low) and Average (High) for the purpose of the present study for two reasons: first, to allow for an equal spread of scores within each classification level, and second, to reduce the discrepancy in sample size across groups.
Dependent and Independent Variables
The dependent variables in this study were the standardized scores obtained for individual FSIQ, VCI, PRI, WMI, and PSI scales using both the American and Canadian norms, as well as difference scores between FSIQ scores calculated using American and Canadian normative systems. As FSIQ scores have been observed in clinical settings to be higher when calculated using American than Canadian norms, difference scores were calculated by subtracting the value of the scaled score calculated using Canadian norms from the value of the respective scaled score calculated using American norms (FSIQU.S. – FSIQCDN = Difference Score). The independent variables in this study included the age groups defined by the WAIS-IV manual (Wechsler 2008a, 2008b) as well as the ability groups defined based on FSIQ scaled scores using Canadian and American norms (Wechsler 2008a, 2008b).
Data Analysis
Differences between Canadian and American standard scores on FSIQ, VCI, PRI, WMI, and PSI were examined using a multivariate analysis of variance (MANOVA) with each scale treated as a dependent measure and norm group as a between-groups factor. Evaluation of statistical significance incorporated a Bonferroni correction of the α-level (Tabachnik & Fidell, 2007).
Three mixed model analyses of variance (ANOVAs) were used to further assess the pattern of differences across Canadian and American normed standardized scores, with difference scores as within-groups factors, and age categories and ability level classifications for both Canadian and American norms as between-groups factors. As the participants in this study fell mainly in the Low Average to Borderline range, few participants were classified as High Average and above, and only three or five participants scored in the Superior to Very Superior ranges when FSIQ scores were calculated using Canadian or American norms, respectively. Simple t-tests revealed that participants classified as High Average using American norms did not differ from participants classified as Superior to Very Superior on IQ difference [t(19) = −1.57, p = .133] or age [t(19) = 0.609, p = .550], and that participants classified as High Average using Canadian norms did not differ from participants classified as Superior to Very Superior on IQ difference [t(9) = −0.723, p = .488] or age [t(9) = −0.138, p = .893]. Therefore, for the purpose of this study, participants with High Average to Superior classifications were treated as a single ability group, which we called High Average + (FSIQ ≥ 110).
Results
Standard Score Comparisons
Descriptive statistics, mean comparisons, and effect sizes for each of the standard scores obtained using the Canadian and American normative systems, including FSIQ, VCI, PRI, WMI, and PSI, are presented in Table 1. To evaluate differences in standard scores obtained using the Canadian and American normative systems, we conducted a MANOVA with standard scores as dependent variables and normative grouping as a between-group factor. There was a significant difference between Canadian and American normative classification [Λ = 0.893, F(5, 670) = 15.97, p < .001, η2 = 0.107]. Subsequent ANOVAs indicated that there were significant differences between means for FSIQ [F(1, 676) = 47.60, p < .001, η2 = 0.066], VCI [F(1, 676) = 27.51, p < .001, η2 = 0.039], PRI [F(1, 676) = 23.67, p < .001, η2 = 0.034], WMI [F(1, 676) = 43.00, p < .001, η2 = 0.060], and PSI [F(1, 676) = 13.92, p < .001, η2 = 0.020]. There was a consistent pattern in which scaled scores that were calculated using American norms were consistently higher than those calculated using Canadian norms; effect size (η2) shows the amount of variance accounted for by each index, while controlling for other variables in the analysis (Cohen, 1988). Effect sizes were small to medium with the largest effect found for FSIQ and WMI. Across the sample, the difference between FSIQ score calculated using American versus Canadian norms ranged from −1 to 13 points. The average difference was 7.51 (SD = 2.50) with both a mode and median score of 8.
Table 1.
Descriptive statistics
Scale | American norms |
Canadian norms |
F | p-value | η2 | ||
---|---|---|---|---|---|---|---|
M | SD | M | SD | ||||
FSIQ | 88.20 | 13.43 | 80.70 | 14.83 | 47.60 | <.001 | 0.066 |
VCI | 90.13 | 14.73 | 83.83 | 16.47 | 27.51 | <.001 | 0.039 |
PRI | 92.27 | 14.33 | 86.58 | 16.02 | 23.67 | <.001 | 0.034 |
WMI | 86.53 | 13.00 | 79.75 | 13.86 | 43.00 | <.001 | 0.060 |
PSI | 91.31 | 14.11 | 87.07 | 15.36 | 13.92 | <.001 | 0.020 |
Note: FSIQ = Full Scale IQ; VCI = Verbal Comprehension Index; PRI = Perceptual Reasoning Index; WMI = Working Memory Index; PSI = Processing Speed Index. η2 = partial eta squared.
Comparisons Across Age and Ability Levels
To answer our second research question, planned comparisons were obtained from three, mixed model ANOVAs. The mean difference between FSIQ scores obtained using American and Canadian norms is shown for each of the age categories and ability classifications in Table 2.
Table 2.
Mean Full Scale IQ differences across age category and ability classification
Age group | Mean difference | SD | Classification | Canadian norms |
American norms |
||
---|---|---|---|---|---|---|---|
Mean difference | SD | Mean difference | SD | ||||
18–19 | 8.22 | 2.18 | Extremely Low | 9.06 | 1.82 | 8.89 | 2.33 |
20–24 | 8.41 | 2.07 | Borderline | 8.30 | 1.51 | 8.85 | 1.76 |
25–29 | 8.51 | 2.17 | Low Average | 7.78 | 2.23 | 7.85 | 1.97 |
30–34 | 7.94 | 2.09 | Average (Low) | 6.19 | 1.99 | 7.54 | 2.37 |
35–44 | 6.86 | 2.11 | Average (High) | 3.60 | 1.91 | 5.83 | 2.43 |
45–54 | 4.58 | 2.16 | High Average+ | 3.64 | 1.91 | 3.61 | 1.67 |
55–64 | 4.50 | 2.84 |
Note: Mean difference is shown for classification based on Full Scale IQ scores calculated using both Canadian and American normative data, as classification is dependent on IQ score.
The first ANOVA compared differences between American and Canadian standardized scores across age categories used by the WAIS-IV (Wechsler, 2008a, 2008b). The assumption of homogeneity of variance was met F(6, 330) = 0.273, p = .950. The effect of age group on FSIQ difference scores was significant, F(6, 330) = 21.40, p < .001. Post hoc analysis using the Bonferroni post hoc test indicated that the average FSIQ difference was significantly greater for adults in the younger age categories than for adults in the older age categories.
The second and third ANOVAs compared differences between American and Canadian standardized scores across ability classifications as outlined in the WAIS-IV manuals (Wechsler, 2008a, 2008b). The assumption of homogeneity of variance was met for classification made using Canadian norms F(5, 332) = 2.17, p = .063, but was not met for classification made using American norms F(5, 332) = 2.24, p = .050 due to unequal group sizes, with distribution of variance for the six groups ranging from 2.72 to 6.25. However, the other assumptions of ANOVA (independent samples and normal distribution) were met and given the large overall sample size and relatively small standard error values, the ANOVA and post hoc tests were interpreted.
The effect of ability classification on FSIQ difference scores was significant for classification made using both Canadian F(5, 332) = 49.80, p < .001 and American F(5, 332) = 126.24, p < .001 norms. For classification using the Canadian norms, post hoc analysis of differences scores using the Bonferroni post hoc test indicated that the average FSIQ difference was significantly greater for adults with below average FSIQ scores than for adults with average to above average FSIQ scores. For classification using American norms, post hoc analysis using the Bonferroni post hoc criterion for significance indicated that although the FSIQ difference was significantly greater for adults with average to below average FSIQ scores compared with adults with average to above average FSIQ scores, there was little discrepancy among difference scores for adults classified as Average (Low) and below.
To evaluate the interpretive differences of applying the Canadian versus American norms at an individual level, the percentage of individuals whose FSIQ difference score was greater than 1/3 SD (i.e., >5 points) or who were classified differently was calculated (Iverson et al., 2006). Within our sample, 314 individuals (78.4%) had difference scores larger than 5 points when their Canadian FSIQ score was subtracted from their FSIQ score, and 273 individuals (70.4%) received a different ability classification depending on the normative system used. Overall, 205 individuals, or 52.8% of our clinical sample, had both difference scores >5, and were classified differently depending on the normative system used.
Discussion
Results of this study are consistent with previous findings that significant differences exist between scores calculated using the WAIS-IV Canadian and American normative systems. Overall, the use of Canadian norms yielded significantly lower scores on FSIQ and across all four Indices than American norms. Consistent with previously reported findings, the largest differences were found for FSIQ and WMI and the smallest difference for the PSI (Harrison et al., 2014, 2015a). Our hypothesis that the pattern of difference would vary across ability level was supported. Consistent with past findings (Harrison et al., 2014, 2015a), differences between scores calculated using Canadian and American norms were significantly larger for individuals with lower ability classifications. Additionally, consistent with findings reported by Harrison and colleagues (2014, 2015a) as well as findings reported across the Canadian standardization sample (Wechsler, 2008a), the discrepancy between FSIQ calculated using American and Canadian norms varied across age level, with larger differences seen in individuals under the age of 44.
Our findings highlight the dilemma facing Canadian clinicians. The normative system that we choose could result in a different clinical classification of a patient, which could in turn impact receipt of government aid or benefits (Kush & Watkins, 2007; Lunsky et al., 2007), availability of educational supports for students with learning disabilities, or even attendance at an educational institution (Beck & Davidson, 2001; Beiser & Gotowiec, 2000; Kush & Watkins, 2007). For example, in our clinical sample, seven participants had an FSIQ score of 78 when calculated using Canadian norms and were classified as Borderline. However, when calculated using American norms, the FSIQ scores for these participants ranged from 86 to 89 and fell within the Low Average range. If any of these patients had a corresponding weakness in a specific academic area, a diagnosis of a specific learning disability could be contingent on the clinician's choice of normative system (Harrison, 2005; Harrison & Holmes, 2012; Kozey & Siegel, 2008; Learning Disabilities Association of Canada, 2002).
The discrepancies that we noted for individuals who scored within the average range or above were generally consistent with what we would expect based on the 4.5 point difference for the Canadian standardization sample reported in the WAIS-IVCDN manual (Wechsler, 2008a). However, differences of up to 13 points between scores in the borderline and extremely low groups fall well above what we would expect if both normative systems were based on the normal curve. As highlighted by both Miller and colleagues (2015) and by Pearson Canada (2014), this may reflect a more homogeneous population in Canada with a negative skew when compared with the United States. While it is reasonable to conclude that a skewed Canadian population accounts for some portion of the increasing magnitude of discrepancy at the lower extreme of the curve, it is an issue that should be addressed rather than dismissed, particularly given the larger differences seen with the WAIS-IV when compared with the WAIS-III. Whereas Iverson and colleagues (2006) found that 20% of patients in an inpatient neuropsychiatric population were classified differently depending on which norms were used to calculate FSIQ on the WAIS-III, in the present study, more than half of our sample (52.8%) had clinically meaningful differences in FSIQ score (differences of more than 5 points and a different ability-based classification) when a different normative system was used on the WAIS-IV. A comparison between our findings and those of Iverson and colleagues (2006) would suggest that if both standardization samples were representative of the Canadian population at the time of standardization, then the composition of Canada's population may have shifted. Indeed, Canadian census data indicate that an increasing number of individuals have obtained a postsecondary degree between 1991 and 2011 (Uppal & LaRochelle-Côté, 2014). We may question, however, whether an increasing negative skew could represent a corresponding increase in the number of marginalized Canadians who have not received a standard level of education, and if so, what can be done to bridge this gap.
Another explanation for the change may be found in the dramatic changes to the exclusionary criteria for the WAIS-IV standardization study (Lezak et al., 2012; Wechsler, 2001, 2008b). As stated by Lezak and colleagues (2012), “the WAIS-IV standardization departs substantially from previous standardizations in that … subjects from the normative data set who exhibited inadequate task engagement or poor effort were… eliminated” (p. 715). Additionally, subjects who were not able to understand instructions and participate fully in testing, who were primarily nonverbal or uncommunicative, nonproficient in English, or anyone who had been diagnosed with ADHD, any learning disorder, mood disorder, language disorder, or any traumatic brain injury were excluded (Wechsler, 2008a). “Thus, compared to previous standardizations the current WAIS-IV may represent the healthiest groups but with less sampling variation” (Lezak et al., 2012, p. 715). Additionally, whereas the WAIS-III Canadian standardization included a category for individuals with less than or equal to a grade 8 level of education, which accounted for 8% of the sample (Wechsler, 2001), this category was not retained in the WAIS-IV study (Wechsler, 2008a). Although Canadian census data from 1991 showed that 18% of the population had less than or equal to a grade 8 education (Wechsler, 2001), this category was removed from the 2006 census (Statistics Canada, 2009), which likely explains its absence in the WAIS-IV standardization study. Given the challenges inherent in recruiting individuals with low educational attainment for research purposes, it is possible that the WAIS-IV standardization sample had a lower proportion of individuals with a grade 8 or less level of education than the WAIS-III sample, and than the Canadian population as a whole. Thus, it is possible that sampling changes may have further restricted the range of representativeness by undersampling those with very limited education or lower cognitive functioning.
It is particularly important that the concerns of Canadian clinicians regarding Canadian norm development be allayed now, before the standardization and release of the WAIS-V and the adoption of the new Q Interactive (NCS Pearson Inc., 2015) digital scoring platform, which will be cloud-based and allow only one score per patient. Few studies to date have examined the clinical implications of using Canadian versus American norms for neuropsychological assessment, and the majority of those studies, including the present study, have focused solely on the WAIS (Harrison et al., 2014, 2015a; Iverson et al., 2006). We are aware of no outcome studies that examine the long-term implications of choice of normative system. The adoption of cloud-based scoring would limit the possibilities for independent research on the normative systems that are used in clinical practice. Canadian clinicians need assurance from test developers that the norms we employ will classify our clients accurately relative to other Canadians, and that lower functioning individuals are not marginalized by pathologizing their level of functioning based on a negatively skewed population.
As with the studies by Harrison and colleagues (2014, 2015a), our participants were drawn from a clinical sample, and as such, cannot be considered equivalent to the Canadian standardization study sample. We would expect a higher proportion of our sample to score in the Extremely Low or Borderline range, given that the majority were referred for neuropsychological assessment due to some difficulty. Although the use of a patient population with no control has been criticized by Miller and colleagues (2015), it is important to note that the conclusions we draw are not based on patient characteristics such as diagnostic status or effort. The concern is simply the magnitude of the discrepancy between FSIQ scores for individuals who score at the lower end of the ability range. As demonstrated by both Miller and colleagues (2015) and Harrison and colleagues (2015b), the results that we have presented can be generated independent of individual participants. In this case, we would argue that a focus on the individual characteristics of the sample is a Red Herring, which detracts from the central argument: That something has changed from the publication of the WAIS-III to the publication of the WAIS-IV, and this has implications for Canadian clinicians.
General Discussion
The results of this study underscore the need for caution when choosing a normative system for the WAIS-IV for use with a clinical population, and when interpreting the results of the WAIS-IV within the context of a neuropsychological test battery. Concerns raised regarding appropriate use of normative data are compounded by the lack of consistency across clinicians in Canada. Whereas some neuropsychologists use Canadian norms when interpreting test results, others use American norms. The choice of normative system is often not disclosed in clinical reports. Inconsistency across clinical practices in Canada may limit the credibility and generalizability of neuropsychological findings, and create unnecessary difficulties for clinicians who may seek confirmation of diagnosis or who engage in repeat testing to monitor changes in neuropsychological functioning over time (Costa, 1988; Lezak et al., 2012; Stebbins, 2007).
Miller and colleagues (2015) made a number of best practice recommendations for the assessment of adults with a suspected learning or intellectual disability. These include an emphasis on the integration of developmental and learning history with test results; administration of a comprehensive assessment battery that includes effort testing; comparison of current to past assessment results; use of Canadian norms when available and avoidance of interpreting test scores derived using an invalid normative system; careful consideration of scores that fall in the Extremely Low range in light of developmental history before diagnosing an intellectual disability; and drawing conclusions that are consistent with functioning across the life span.
We are in complete agreement with Miller and colleagues (2015) that the job of a psychologist goes beyond that of a technician who would simply report a test score, and we agree with the suggested best practice guidelines. However, these guidelines fail to address some key issues facing clinicians. For example, Pearson Canada (2014) has rightly stated that our understanding of measurement theory and normative change over time has changed the way psychologists view diagnosis. However, although psychologists are trained to recognize that all test scores include some proportion of measurement error and that guidelines, rather than strict cut scores, should be used when making diagnoses and recommendations, this understanding is not always demonstrated by organizations such as government offices responsible for allocating benefits. For example, certain benefit providers continue to require an FSIQ score that falls above or below a specific cut point in order to approve benefits for a certain disorder or disability. Thus, another key role of a psychologist is to provide education regarding the limitations of our instruments to colleagues and referral sources. We conclude our article with additional practical recommendations for Canadian clinicians who integrate the WAIS-IV into a comprehensive assessment battery, and suggestions for test developers.
Best Practice Recommendations: Clinical Applications for Clinicians
Be thorough. Any comprehensive assessment should include a thorough developmental, medical, psychiatric, and neuropsychological history. Review previous assessment reports and other documentation that is available. Take the time to conduct a thorough intake interview and ensure that you have a clear understanding of the client's background and current difficulties. If necessary, seek permission to conduct a collateral interview with a family member or other individual who knows the client well.
Report. Despite the significant concerns raised regarding the representativeness of the current Canadian norms for the WAIS-IV, for the reasons highlighted above, we agree with Miller and colleagues (2015) that Canadian norms should be used when available when testing individuals in Canada. Report your choice of normative system and the possible implications thereof in chart notes and assessment reports. For example, clinicians may choose to include a phrase such as the following: “Canadian norms were used to score the WAIS-IV. Mrs. X would have scored somewhat higher on this test had American norms been used.”
Educate. When writing reports and addressing referral questions, it is important for Canadian psychologists to highlight the realities of measurement error and differences in standardization. Presenting test scores in terms of a range, or a confidence interval is preferable to a single score, or even a percentile rank, which can be converted to a standard score with great ease. Explain that tests can be normed on different populations and how use of one normative system over another could affect overall results. Link and explain test results in terms of functional capabilities and limitations whenever possible. Take opportunities to educate referral sources and colleagues by discussing the limitations of standardized tests, while highlighting their utility.
Advocate. Speak out about the arbitrariness of cut points that are used by funding sources by highlighting the patient's specific needs, limitations, and strengths in the recommendations section of an assessment report. Incorporate background information that highlights a person's day-to-day functioning. When appropriate, state how a functional limitation, combined with below average test scores on a particular index indicate the type and level of support that a patient is likely to require.
Diagnose. Consistent with recommendations by Pearson Canada (2014), we recommend that clinicians use DSM-5 criteria when diagnosing Intellectual Disability, which is based on the severity of adaptive functioning deficits, as well as for any other conditions for which criteria are identified. As suggested by Miller and colleagues (2015), consider and incorporate all available sources of information when arriving at a diagnosis, and conclude with an interpretation that considers both developmental and recent history as well as current level of functioning.
Recommendations: Clinical Applications for Test Developers
We would urge Pearson Canada to consider carefully the discrepancies that have been identified in current and past research, and strive to understand what may have driven the shift that has been identified from the normative data collected for the WAIS-III to the WAIS-IV. Furthermore, we would ask that this issue be given careful consideration during standardization of the WAIS-V.
Given the documented differences between the Canadian and American populations, it would be very helpful to Canadian clinicians for test developers, such as Pearson Canada, to invest in Canadian standardization studies for additional tests, such as the WMS-IV (Wechsler, 2009), the California Verbal Learning Test-Second Edition (CVLT-II; Delis, Kramer, Kaplan, & Ober, 2000), and the Delis Kaplan Executive Function System (D-KEFS; Delis, Kaplan, & Kramer, 2001). A co-normed test battery is the ideal way to measure overall cognitive functioning and identify a reliable pattern of individual strengths and weaknesses. Although a limited number of co-normed batteries have been developed for research purposes, these are generally normed only on an American population, and often lack some component that is necessary to conduct a thorough neuropsychological assessment. Having a consistent set of norms by which to interpret test scores will not only alleviate some of the controversy that has been raised with the publication of the WAIS-IV, but will also ensure that patients receive more appropriate services in Canada based on their clinical needs.
We are aware that a number of Canadian clinicians retain test data for research purposes. We posit that a collaboration between test developers and clinicians in developing a consistent set of common norms that can be applied to a comprehensive test battery for use in Canada would be an efficient method of standardization. Maintenance of well-organized databases of test scores by clinicians, with carefully coded demographic information in order to facilitate application of exclusionary criteria, would serve to facilitate such a collaboration. While this may be an unorthodox suggestion, such a collaboration would serve to avoid increased costs to test developers, while providing a large, data-rich sample from a representative population.
It has been suggested that the Canadian population is significantly negatively skewed in terms of overall cognitive ability. While this does not seem to be a variable that can be easily altered, it does seem unfair to low functioning individuals who have the misfortune of belonging to such a high functioning, homogeneous population. Thus, we would ask that this skew be investigated carefully during development of Canadian norms for subsequent tests, and, if appropriate, a correction be applied to normalize the data. If such a correction is not appropriate, then it would be helpful for test developers to address this issue by including a discussion in the test manual of how the negatively skewed Canadian population influences the normative data.
Finally, if Pearson Inc. does, adopt a cloud-based storage and scoring system for standardized tests, we would request, given the unique challenges facing Canadian clinicians, that Pearson make both the Canadian and American standard scores available to Canadian clinicians for all patients in order to facilitate interpretation of tests within different contexts. Based on Pearson's recommendations that a consistent set of norms be used, it may be necessary to interpret data both ways for an individual patient. For example, in the case of an individual who is applying to universities in both the United States and in Canada, two separate reports may be required.
Funding
This work was made possible, in part, by a Doctoral Research Award from the Canadian Institutes of Health Research.
Conflict of Interest
None declared.
Acknowledgements
The authors would like to thank Ms. Meg Boddington for her assistance with data collection.
References
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: Author. [Google Scholar]
- American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: Author. [Google Scholar]
- Beck H. P., Davidson W. D. (2001). Establishing an early warning system: Predicting low grades in college students from survey of academic orientations scores. Research in Higher Education, 42 (6), 709–723. [Google Scholar]
- Beiser M., Gotowiec A. (2000). Accounting for native/non-native differences in IQ scores. Psychology in the Schools, 37 (3), 237–252. [Google Scholar]
- Blyler C., Gold J., Iannone V., Buchanan R. (2000). Short form of the WAIS-III for use with patients with schizophrenia. Schizophrenia Research, 46 (2–3), 209–215. [DOI] [PubMed] [Google Scholar]
- Bowden S., Lange R., Weiss L., Saklofske D. (2008). Invariance of the measurement model underlying the Wechsler Adult Intelligence Scale—III in the United States and Canada. Educational and Psychological Measurement, 68 (6), 1024–1040. [Google Scholar]
- Bowden S., Saklofske D., Weiss L. (2011). Invariance of the measurement model underlying the Wechsler Adult Intelligence Scale-IV in the United States and Canada. Educational and Psychological Measurement, 71 (1), 186–199. [Google Scholar]
- Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. [Google Scholar]
- Costa L. (1988). Clinical neuropsychology: Prospects and problems. Clinical Neuropsychology, 2, 3–11. [Google Scholar]
- Delis D., Kaplan E., Kramer J. (2001). D-KEFS: Examiners manual. San Antonio, TX: The Psychological Corporation. [Google Scholar]
- Delis D., Kramer J., Kaplan E., Ober B. (2000). The California verbal learning test (2nd ed.). San Antonio, TX: The Psychological Corporation. [Google Scholar]
- Galski T., Tompkins C., Johnston M. (1998). Competence in discourse as a measure of social integration and quality of life in persons with traumatic brain injury. Brain Injury, 12 (9), 769–782. [DOI] [PubMed] [Google Scholar]
- Gordon S., Duff S. (2010). Comparison of the WAIS-III and WISC-IV in 16 year old special education students. Journal of Applied Research in Intellectual Disabilities, 23, 197–200. [Google Scholar]
- Harrison A. (2005). Recommended best practices for the early identification and diagnosis of children with specific learning disabilities in Ontario. Canadian Journal of School Psychology, 20 (1–2), 21–43. [Google Scholar]
- Harrison A., Armstrong I., Harrison L., Lange R., Iverson G. (2014). Comparing Canadian and American normative scores on the Wechsler Adult Intelligence Scale-Fourth Edition. Archives of Clinical Neuropsychology, 29 (8), 737–746. [DOI] [PubMed] [Google Scholar]
- Harrison A., Holmes A. (2012). Easier said than done: Operationalizing the diagnosis of learning disability for use at the postsecondary level in Canada. Canadian Journal of School Psychology, 27 (1), 12–34. [Google Scholar]
- Harrison A., Holmes A., Silvestri R., Armstrong I. (2015a). Getting back to the main point: A reply to Miller et al. Journal of Psychoeducational Assessment, 33 (8), 780–786. [Google Scholar]
- Harrison A., Holmes A., Silvestri R., Armstrong I. (2015b). Implications for educational classification and psychological diagnoses using the Wechsler Adult Intelligence Scale-Fourth Edition with Canadian versus American norms. Journal of Psychoeducational Assessment, doi:10.1177/0734282915573723. [Google Scholar]
- Iverson G., Lange R., Viljoen H. (2006). Comparing the Canadian and American WAIS-III normative systems in inpatient neuropsychiatry and forensic psychiatry. Canadian Journal of Behavioural Science/Revue Canadienne Des Sciences Du Comportement, 38 (4), 348–353. [Google Scholar]
- Joint Advisory Committee of the Canadian Education Association. (1993). Principles for fair student assessment practices for education in Canada. Edmonton, AB: Author. [Google Scholar]
- Kirkegaard E. (2015). International differences in intelligence can be confusing: A commentary on Harrison et al (2015) . Retrieved January 18, 2016, from https://thewinnower.com/papers/international-differences-in-intelligence-can-be-confusing-a-commentary-on-harrison-et-al-2015 (accessed 17 May 2016).
- Kozey M., Siegel L. (2008). Definitions of learning disabilities in Canadian provinces and territories. Canadian Psychology/Psychologie Canadienne, 49 (2), 162–171. [Google Scholar]
- Kush J., Watkins M. (2007). Structural validity of the WISC-III for a National Sample of Native American Students. Canadian Journal of School Psychology, 22 (2), 235–248. [Google Scholar]
- Learning Disabilities Association of Canada. (2002). Official definition of learning disabilities. Retrieved from http://www.ldac-acta.ca/learn-more/ld-defined (accessed 17 May 2016).
- Lezak M., Howieson D., Bigler E., Tranel D. (2012). Neuropsychological assessment(5th ed.). New York, NY: Oxford University Press. [Google Scholar]
- Lunsky Y., Garcin N., Morin D., Cobigo V., Bradley E. (2007). Mental health services for individuals with intellectual disabilities in Canada: Findings from a national survey. Journal of Applied Research in Intellectual Disabilities, 20 (5), 439–447. [Google Scholar]
- Miller J., Weiss L., Beal A., Saklofske D., Zhu J., Holdnack J. (2015). Intelligent use of intelligence tests: Empirical and clinical support for Canadian WAIS-IV Norms. Journal of Psychoeducational Assessment, 33 (4), 312–328. [Google Scholar]
- NCS Pearson Inc. (2015). Introducing Q-interactive. Retrieved January 18, 2016, from http://www.pearsonassess.ca/static/q-interactive/home.htm (accessed 17 May 2016).
- Nelson J., Canivez G., Watkins M. (2013). Structural and incremental validity of the Wechsler Adult Intelligence Scale-Fourth Edition with a clinical sample. Psychological Assessment, 25 (2), 618–630. [DOI] [PubMed] [Google Scholar]
- Pearson Canada. (2014). WAIS-IV [Special note]. Retrieved April 23, 2015, from http://www.pearsonassess.ca/content/dam/ani/clinicalassessments/ca/programs/pdfs/WAIS-IV_Special_Note_Dec2014.pdf (accessed 17 May 2016).
- Statistics Canada. (2009). Education Reference Guide, 2006 Census. Statistics Canada Catalogue no. 97-560-GWE2006003 Ottawa, Ontario: November 10. Retrieved January 18, 2016, from http://www12.statcan.gc.ca/census-recensement/2006/ref/rp-guides/education-eng.cfm#changequest (accessed 17 May 2016). [Google Scholar]
- Stebbins G. T. (2007). Neuropsychological testing. In Textbook of clinical neurology (3rd ed., pp. 539–557). Chicago, IL: WB Saunders Company; doi:10.1016/B978-1-4160-3618-0.10027-X. [Google Scholar]
- Strauss E., Sherman E., Spreen O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary, third edition. neurology. New York: Oxford University Press; doi:10.1212/WNL.41.11.1856-a [Google Scholar]
- Tabachnik B. G., Fidell L. S. (2007). Using multivariate statistics (5th ed.). New York, NY: Harper & Row. [Google Scholar]
- Uppal S., LaRochelle-Côté S. (2014). Overqualification among recent university graduates in Canada. Insights on Canadian Society, (75), Retrieved from http://www.statcan.gc.ca/pub/75-006-x/2014001/article/11916-eng.pdf (accessed 17 May 2016). [Google Scholar]
- Watkins C. E., Campbell V. L., Nieberding R., Hallmark R. (1995). Contemporary practice of psychological assessment by clinical psychologists. Professional Psychology: Research and Practice, 26 (1), 54–60. [Google Scholar]
- Wechsler D. (1955). Wechsler adult intelligence scale. San Antonio, TX: The Psychological Corporation. [Google Scholar]
- Wechsler D. (1981). Wechsler adult intelligence scale—revised. San Antonio, TX: The Psychological Corporation. [Google Scholar]
- Wechsler D. (1996). Wechsler intelligence scale for children—third edition: Canadian manual supplement. Toronto, ON: The Psychological Corporation. [Google Scholar]
- Wechsler D. (1997). Wechsler adult intelligence scale (3rd ed.). San Antonio, TX: Psychological Corporation. [Google Scholar]
- Wechsler D. (2001). Wechsler adult intelligence scale—third edition: Canadian technical manual. Toronto, ON: Harcourt Canada. [Google Scholar]
- Wechsler D. (2004). Wechsler intelligence scale for children—fourth edition: Canadian. Toronto, ON: Harcourt Assessment. [Google Scholar]
- Wechsler D. (2008a). Wechsler adult intelligence scale—fourth edition: Canadian manual. Toronto, ON: NCS Pearson, Inc. and Pearson Canada Assessment, Inc. [Google Scholar]
- Wechsler D. (2008b). Wechsler adult intelligence scale—fourth edition: Technical and interpretive manual. San Antonio, TX: NCS Pearson, Inc. [Google Scholar]
- Wechsler D. (2009). Wechsler memory scale—Fourth edition (WMS–IV) technical and interpretive manual. San Antonio, TX: Pearson. [Google Scholar]
- Whitaker S. (2008). WISC-IV and low IQ: Review and comparison with the WAIS-III. Educational Psychology in Practice, 24 (2), 129–137. [Google Scholar]
- Whitaker S., Wood C. (2008). The distribution of scaled scores and possible floor effects on the WISC-III and WAIS-III. Journal of Applied Research in Intellectual Disabilities, 21 (2), 136–141. [Google Scholar]