Abstract
Objective:
The correct “dosing” of neuropsychological assessment is of interest for the purposes of cost management and the personalization of medicine/assessment. In this context, embedded IQ screening, rather than routine comprehensive IQ testing, may be useful in identifying youth at risk for Intellectual Disability (ID) for whom further assessment is needed. This retrospective, cross-sectional study examined subtests from the Wechsler Intelligence Scale for Children-Fifth Edition (WISC-5) needed to identify youth with Full Scale IQ (FSIQ) ≤75.
Method:
Data were obtained from a large pediatric clinically referred sample (N = 4,299; Mean Age = 10.7 years; Range = 6–16y; 66% male; 54% White; 29% receiving Public Insurance), divided into training (n = 2149) and test (n = 2150) samples.
Results:
In the training sample, sequential and additive regression-based models for predicting FSIQ comprised of one (Block Design [BD]), two (BD + Similarities [SI]), three (BD + SI + Matrix Reasoning [MR]), and four (BD + SI + MR + Digit Span [DS]) subtests of the WISC-5 explained 61.3%, 82.7%, 88.5%, and 93.0% of FSIQ variance, respectively. Using a predicted FSIQ ≤ 80 as a cut score to identify persons with observed FSIQ ≤75, the two subtest (BD + SI) model showed strong sensitivity (83.4), specificity (90.5), and negative predictive value (96.2) in the test sample; however, positive predictive value was low (65.3%). Three and four subtest models provided incremental, but modest gains in classification metrics.
Conclusions:
Findings suggest the first several subtests of the WISC-5 can be used to identify clinically referred youth at risk for ID who subsequently require full administration of the WISC-5 for consideration of an ID diagnosis.
Keywords: WISC-V, clinical decisionmaking, decision-support systems, evidence-based practice, screening
Personalized medicine models have begun to influence the field of neuropsychology (Bauer, 2016; Gur, 2018), with growing interest in the appropriate “dosing” of assessment services for the referral question at hand. One component of personalized neuropsychological care and assessment is the use of “embedded” screening measures to differentiate patients with intact skills versus those at risk for more involved clinical presentations and who thus require comprehensive assessment. Such embedded screening models are emerging in pediatric neuropsychology, including models in which limited subtest or item administration from a measure is used to identify patients for whom full administration of the measure may be clinically and diagnostically justified, e.g. Wide Range Assessment of Memory and Learning-Second Edition (WRAML-2: Sheslow & Adams, 2003), Kaufman Test of Educational Achievement-Third Edition (KTEA-3) Dyslexia Index (Breaux, 2018); Clinical Evaluation of Language Fundamentals-Fifth Edition (CELF-5) Screening Test (Wigg et al., 2013). This embedded screening model has advantages over the use of stand-alone screening tools, as a clinician using an embedded screening model can simply administer the remaining items or subtests of the measure in those cases in which comprehensive assessment is indicated based upon the screening subtest results. In contrast, positive screening on a stand-alone screening measure may require additional assessment by way of full administration of a different measure of the same construct (e.g. full administration of the Wechsler Intelligence Scale for Children-Fifth Edition [Wechsler, 2015] after the patient obtains a low score on the Wechsler Abbreviated Scales of Intelligence-Second Edition [WASI-2; (Wechsler, 2011)]).
There have been numerous efforts to generate embedded screening indices in tests of intellectual functioning, particularly in the past editions of the Wechsler Intelligence Scales. For instance, there was considerable interest in identifying subtest subsets of the Wechsler Intelligence Scale for Children-Third Edition (WISC-3; Wechsler, 1991) that would differentiate individuals with and without dyslexia, e.g. the “ACID index”: Arithmetic, Coding, Information, and Digit Span (Ward et al., 1995). Others have proposed empirically derived short forms of established IQ measures (Donders, 1992; Donders et al., 2013; Ringe et al., 2002) that can be used to estimate IQ without administration of the full IQ test. Moreover, at least one short form comprised of seven WISC-4 subtests (Crawford et al., 2010) was found to have high classification accuracy when used to predict Intellectual Disability (ID) group status (McKenzie et al., 2014). Short forms of IQ measures, however, have come under considerable scrutiny (Kaufman & Kaufman, 2001) due to their psychometric properties and the loss of “strengths and weaknesses” profile analysis. Not surprisingly then, there have been few published studies proposing an IQ short form or embedded screening index for identifying patients at risk for ID. This observation may be because the Full Scale IQ is considered a “gold standard” component of an ID diagnostic evaluation, and use of a subset of IQ subtests such as those comprising the General Abilities Index (GAI) to diagnose ID might lead to an under-identification of cases (Koriakin et al., 2013; Lanfranchi, 2013).
Despite criticism directed at short form IQ tests, there is ample rationale for the development of embedded ID screening indices in pediatric neuropsychology, and there is emerging evidence that these indices can be used to effectively estimate IQ (van Duijvenbode et al., 2016). One rationale for developing an embedded ID screening index lies in the high incidence of ID in medically involved populations. While ID affects approximately 1% of the population in the United States (Maulik et al., 2011), the incidence of ID is much higher in children with acquired brain injury and congenital conditions (Mahone et al., 2017; Pulsifer, 1996). Moreover, ID is a very common cognitive phenotype of individuals with known or as-of-yet unidentified genetic conditions (Vissers et al., 2016). As such, the capacity to efficiently screen for ID in multidisciplinary settings specializing in medical and genetic conditions has become increasingly important as the cognitive phenotypes of these genetic and developmental conditions has become more apparent.
Within a specialized medical setting, the time requirements of IQ test administration can be difficult to anticipate (Ryan et al., 2007), particularly for individuals with cognitive impairment or interfering behaviors who may take considerably longer to complete these measures relative to typical administration times (Ryan et al., 1998). While there are a number of stand-alone abbreviated testing formats such as the WASI-2 (Wechsler, 2011), Reynolds Intellectual Assessment Scales, Second Edition (RIAS-2; Reynolds & Kamphaus, 2015), and Kaufman Brief Intelligence Test, Second Edition (KBIT-2; Kaufman & Kaufman, 2004) with short administration times, these measures are not typically accepted for diagnostic determination, nor are they typically accepted for determining eligibility for disability-related services or educational supports. In a specialized medical setting, an embedded ID screening index would offer a time efficient approach to ruling out possible ID and/or identifying youth for whom completion of the remaining IQ subtests was required to help inform an ID diagnostic determination.
The aim of this study was to evaluate a screening procedure that can identify youth at risk for low IQ (one component of an ID diagnosis), and to do so using selected subtests from an established comprehensive measure of intelligence. To this end, the study examined the proportion of variance in WISC-5 Full Scale IQ accounted for by combinations of WISC-5 subtests naturally occurring within the WISC-5 standardized subtest administration sequence. The subtest combination models were sequential and additive (i.e. Model 1 = subtest 1; Model 2 = subtests 1 & 2; Model 3 = subtests 1, 2, & 3, etc.) to ensure that clinical use of these models would not require deviation from standardized test administration procedures. Using this approach, several regression models were developed and tested within the first half of our large clinically referred sample (i.e. the training sample). These models were tested using the second half of our clinically referred sample (i.e. the test sample) in order to identify models which could quickly and accurately identify individuals at high risk for obtaining a WISC-5 FSIQ score equal to or less than a standard score of 75 on a comprehensive administration (i.e. an observed FSIQ). These screening models were also evaluated among a subpopulation of the overall sample who were referred specifically for medically-based conditions. No study, to our knowledge, has established this type of ID screening procedure using upto-date intellectual functioning measures such as the WISC-5.
Methods
Participants
This study included a convenience sample of children evaluated between 2014 and 2019 in an outpatient testing service at a regional, Mid-Atlantic hospital for psychological or neuropsychological assessment. The youth in this study (N = 4,299) were children and adolescents ranging in age from 6 to 16 years (Mean = 10.7y, SD = 2.7); 65% were male, 29% were on Public Insurance (e.g. Medicaid), and 54% were White (28% African-American, 12% Other Races, and 4% Hispanic). Children and adolescents were included in this study if a 10-subtest Wechsler Intelligence Scale for Children – Fifth Edition (WISC-5) test administration was performed within the course of clinical assessment (i.e. enough subtests to calculate a FSIQ as well as all of the WISC-5 Primary Indices). With regard to reason for referral, the most common primary billing diagnoses for the sample were: Attention-deficit/Hyperactivity Disorder (ADHD; 56%), unspecified encephalopathy (14%), anxiety disorders (10%), adjustment disorders (7%), behavioral disorders (5%), oncologic diseases (3%), epilepsy (3%), and mixed-expressive language disorders (3%).
Procedure
General intelligence was measured using the WISC-5. The WISC-5 is a 21-subtest measure of intellectual ability with four levels of interpretation (Full, Primary Index, Ancillary Index, and Complementary Index). The WISC-5 manual reports a representative standardization sample of 2,200 US children (ages 6:0 to 16:11). The Full Scale IQ (FSIQ) is derived from seven WISC-5 subtests, typically Similarities (SI), Vocabulary (VO), Block Design (BD), Matrix Reasoning (MR), Figure Weights (FW), Digit Span (DS), and Coding (CD), although there are nine additional subtests that can be used as substitutes in the calculation of the FSIQ. The Primary Indices are Verbal Comprehension, Visual Spatial, Fluid Reasoning, Working Memory, and Processing Speed. Each Primary Index is composed of two WISC-5 subtest measures, the majority of which are used in the calculation of the FSIQ as well. For the purposes of this study, the scope of analysis was limited to the seven subtests typically used to calculate a FSIQ, and the three additional subtests (i.e. Visual Puzzles [VP], Picture Span [PS], and Symbol Search [SS]) necessary to calculate all five Primary Indices.
In the WISC-5 standardization sample, the reliability of the WISC-5 Primary Index scores range from .88 to .93 (Wechsler, 2015) and test-retest data suggest adequate stability across time. Multiple lines of validity support the use of the WISC-5 as a measure of intellectual ability. Of particular relevance to the current study, data from the WISC-5 standardization sample (Wechsler, 2015) reveal that several individual WISC-5 subtests have strong (uncorrected) correlations with FSIQ, including VO (r=.77), SI (r=.76), BD (r=.73), MR (r=.72), FW (r=.71), and DS (r=.71).
Standardized test scores from the WISC-5 were acquired via clinical assessment. These data, along with demographic information, were maintained in the secure electronic health record. A waiver of consent to study these de-identified data was granted by the local Institutional Review Board.
Analysis
The goal of the analysis was to identify the minimum number of WISC-5 subtests needed to accurately identify children at risk of ID due to an FSIQ ≤ 75. The first step was to examine the proportion of variance (R2) of FSIQ that is explained by models comprised of sequential and additive WISC-5 subtests. This analysis was conducted using a series of multiple linear regressions. Model building occurred in a stepwise fashion, such that the first model included only the BD task, the second included BD + SI, the third included BD + SI + MR, and so forth. From these subtest models, a standard regression equation was used to create a predicted FSIQ (e.g. β0 + BD*β1; β0 + BD*β1 + SI*β2, etc.) for each of the subtest models.
Results from the statistical models were used to select the optimal number of subtests to include in the screening approach. Decision making regarding which subtests to include in the model was based upon: 1) the amount of FSIQ variance explained by subtest combinations and 2) the extent of difference between the predicted and observed FSIQ scores. This second decision making component included an examination of the proportion of the sample that had a 5, 10, and 15-point difference between the observed and predicted FSIQ. It also included a review of descriptive statistics of the difference scores (i.e. predicted FSIQ minus observed FSIQ), e.g. the median difference score (to understand the directionality of average differences) and the standard deviation of the difference scores (with smaller SDs suggesting less difference between predicted and observed FSIQ).
The reliability of each of the abovementioned models was examined via intraclass correlation coefficient (ICC) using the convention put forth by McGraw and Wong (1996). Using this approach, the dependent variable was observed FSIQ and the subtests were included together as the independent variable in the oneway ANOVA. ICC values of .50 to .75 are considered moderate, .75 to 90 good, and >.90 excellent.
Predicted FSIQ scores were used to create dichotomous variables defined by cut scores of 1.) predicted FSIQ ≤ 75 and 2.) predicted FSIQ ≤ 80. The more conservative cut score of a predicted FSIQ ≤ 80 was included in the analysis to allow for an evaluation of the screening model when sensitivity was expected to be more optimal. The accuracy of these cut scores to correctly identify observed (or true score) FSIQ ≤75 was examined using a Receiver Operating Characteristic (ROC) analysis. This analysis produced four metrics: sensitivity (proportion of individuals with FSIQ ≤75 who screen positive), specificity (proportion of individuals with FSIQ >75 who screen negative), and positive (probability of an FSIQ ≤75, among those who screen positive) and negative (probability of an FSIQ >75, among those who screen negative) predictive values. The Area under the Curve (AUC) value provides an overall metric of classification accuracy.
Two strategies were employed to increase the validity of the findings. First, we used cross-validation by randomly splitting the sample in half. The first half (n = 2149), the “training sample,” was employed for identifying the optimal number of subtests for the screening model. The second half of the sample (n = 2150), the “test sample,” was employed to examine the accuracy of the screening model using the ROC analyses. This approach was taken since it is well known that the accuracy of tests decrease when employed in an independent sample from which the models were developed (Cawley & Talbot, 2010). Second, we examined the findings among a subsample (drawn from the total sample) of children who were referred for medically-based diagnoses. This sensitivity analysis sought to confirm that the findings held among a more homogenous medical subgroup of the overall mixed clinical sample. All analyses were conducted in STATA 15.0 (College Station, TX).
Results
Regression analyses in the training dataset
Using the training dataset (n = 2149), the one subtest model (BD) individually accounted for 61.3% of the variance in FSIQ. When SI was added, the R2 substantially increased 21.4%. When MR was added, a small increase was found (5.8%). When DS was added, an even smaller increase in R2 was found (4.5%; total R2 = 93.0%). Given that saturation in R2 was achieved at the fourth subtest and that a fifth subtest (CD) added only 2.7% to the variance, no further subtests were added to the model beyond DS. A regression equation was then developed for the one, two, three, and four subtest models. The regression equations for the models were as follows:
One-subtest model (55.279 + BD*4.137)
Two-subtest model (41.894 + BD*2.670 + SI*2.854)
Three-subtest model (38.500 + BD*1.801 +SI*2.400 + MR*1.680)
Four-subtest model (35.993 + BD*1.627 + SI*1.862 + MR*1.316 + DS*1.515).
Table 1 details the R2 of the predicted FSIQ as well as the Median and SD of the difference between the observed and predicted FSIQ. The one subtest model (BD) clearly had the poorest performance, as the SD of the predicted/observed FSIQ difference was large (10.9) and overlap (Figure 1) was poor (e.g. only 64% of the predicted and observed FSIQ were within 10 points of each other). The two subtest (BD + SI) model was stronger, dropping the SD by 3.6 points compared to the 1 subtest model. However, there was still some discrepancy as only 83% of predicted and observed FSIQ fell within 10 points). The three subtest model (BD + SI + MR) had strong performance features, as the predicted and observed FSIQ scores differed by 10 or fewer points for 91.4% of the sample. There was only a small difference in medians (.05) between the predicted and observed FSIQ. The four subtest model (BD + SI + MR + DS) was also highly accurate, having very similar characteristics to the 3 subtest model.
Table 1.
Difference between predicted and observed FSIQs based on subtest administration using Training Sample (n=2,149).
| Difference between Predicted and Observed FSIQ | 1-subtest (BD) | 2-subtest (BD + SI) | 3-subtest (BD + SI + MR) | 4-subtest (BD + SI + MR + DS) |
|---|---|---|---|---|
| +/− 5 SS points (%) | 35.6 | 51.5 | 60.8 | 73.9 |
| +/− 10 SS points (%) | 64.4 | 83.2 | 91.4 | 96.4 |
| +/− 15 SS points (%) | 83.2 | 96.1 | 98.3 | 99.8 |
| Median difference score distribution* | .33 | .09 | .05 | .01 |
| SD of difference scores | 10.9 | 7.3 | 5.9 | 4.6 |
| % variance explained | 61.3 | 82.7 | 88.5 | 93.0 |
Note. FSIQ=Full Scale Intelligence Quotient; SS=Standard Score; SD=Standard Deviation; BD=Block Design; MR=Matrix Reasoning; SI=Similarities; DS=Digit Span;
Difference score=Predicted FSIQ minus Observed FSIQ.
Figure 1.

Distribution of difference scores (Predicted FSIQ minus Observed FSIQ=Difference Score) for the one, two, three, and four subtest screening models using the Training Sample (n = 2,149). Note. BD = Block Design; MR = Matrix Reasoning; SI = Similarities; DS=Digit Span
Reliability in the training dataset
Reliability, assessed via ICC (1,1), was calculated on the various models using the training sample. The ICCs were .79, .85, .86, and .89 - for the one, two, three, and four subtest models - respectively. This is considered very good reliability, with the exception of the 1 subtest model.
Classification properties in the test dataset
Regression-Based models
Table 2 shows the classification properties for the regression-based predicted FSIQ scores. The accuracy of cut scores of 1.) predicted FSIQ ≤ 75 and 2.) predicted FSIQ ≤ 80 was examined with regard to their ability to correctly identify individuals with observed FSIQ ≤ 75. Using predicted FSIQ cut scores of 75 and 80, the two subtest model was accurate for classifying individuals in general (AUCs of .79 and .87 respectively) and was particularly accurate for identifying individuals who had observed FSIQ of 76 or higher (NPVs of 92.3 and 93.3 respectively). Of note, sensitivity was low (62.3) when using the two subtest predicted FSIQ cut score of 75, but improved to 83.4 when using a predicted FSIQ cut score of 80. A similar pattern was noted for the three and four subtest models (AUCs of .86 or higher), with excellent NPVs (.94 or higher) regardless of cut score, and improved sensitivity when using a predicted FSIQ ≤ 80 cut score (sensitivity of .90 or higher) relative to a predicted FSIQ ≤ 75 (sensitivity ranging from 74 to 79). Across models, specificity and PPV were stronger when using a predicted FSIQ ≤ 75 cut score relative to a predicted FSIQ ≤ 80 cut score.
Table 2.
Classification properties of the two, three and four subtest models in the test (n = 2,150), and medical (n = 699) samples when screening for Observed FSIQ ≤ 75.
| Predicted FSIQ Cutoff | Subtests Used to Predict FSIQ | Sensitivity | Specificity | PPV | NPV | AUC | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Test | Med. | Test | Med. | Test | Med. | Test | Med. | Test | Med. | ||
| 75 | BD + SI | 62.3 | 56.6 | 96.5 | 98.1 | 79.2 | 90.7 | 92.3 | 87.3 | .79 | .77 |
| 75 | BD + SI + MR | 74.4 | 69.9 | 96.8 | 98.5 | 83.2 | 93.8 | 94.6 | 90.9 | .86 | .84 |
| 75 | BD + SI + MR + DS | 79.9 | 76.9 | 97.5 | 98.7 | 87.1 | 95.0 | 95.8 | 92.8 | .89 | .88 |
| 80 | BD + SI | 83.4 | 79.8 | 90.5 | 92.0 | 65.2 | 76.7 | 96.2 | 93.3 | .87 | .86 |
| 80 | BD + SI + MR | 90.8 | 87.3 | 91.4 | 94.1 | 69.8 | 83.0 | 97.9 | 95.7 | .91 | .91 |
| 80 | BD + SI + MR + DS | 94.5 | 92.5 | 91.0 | 93.9 | 69.2 | 83.3 | 98.7 | 97.4 | .93 | .93 |
Note. Test = Test sample; Med. = Medical diagnoses only; FSIQ = Full Scale Intelligence Quotient; PPV = Positive Predictive Value; NPV = Negative Predictive Value; AUC = Area Under the Curve; BD = Block Design; SI = Similarities; MR = Matrix Reasoning; DS = Digit Span.
Medical conditions
To understand if the screening models worked among children with more circumscribed medical diagnoses, the screening models were tested among a subpopulation (drawn from the total sample) comprised of individuals with diseases of the circulatory system (N = 9), neoplasms (N = 96), epilepsy (N = 99), and diseases of the nervous system (N = 495), identified by primary ICD-9 and ICD-10 billing diagnosis (total n = 699). Overall, the classification statistics were very similar when compared to the entire sample. This can be seen by AUC values that were within 2 points regardless of predicted FSIQ cut score.
Discussion
This study demonstrated the feasibility of an embedded screening procedure for identifying patients at risk for obtaining a FSIQ of 75 or lower on the WISC-5. Given the cost and time required to administer the full WISC-5, this approach could facilitate rapid screening for ID in specialty medical clinics, and preserve critical resources when comprehensive testing is not warranted. Doing so could potentially reduce patient burden associated with extended clinic visits and/or additional assessment days. This type of screening approach could also benefit clinical programs that have great demand and a need to identify effective ways to reduce waitlists.
The analysis examined WISC-5 subtests in the standardized order of administration. Model building was limited to four WISC-5 subsets since there was very little FSIQ left to explain after the fourth subtest. Interestingly, the model comprised of the first two WISC-5 subtests explained a large proportion of variance in FSIQ, with only modest increases after including the third and fourth test. However, when calculated using the two subtest regression-based formula, the median predicted FSIQ in the test sample was 3 standard score points lower than the median of the observed FSIQ scores. This underestimation is problematic as it will result in suboptimal sensitivity and missing individuals. As sensitivity was also in the 70 s for the three and four subtest models, it became clear that the best screening approach was to use a predicted FSIQ cut score of 80 in order to increase the sensitivity in general. All of the models had strong specificity and NPV regardless of predicted FSIQ cut score; PPV was also generally strong, except for the 2-subtest model. The low PPV is a product of false positives (i.e. those patients who are identified as needing a full WISC-5 battery to rule out low IQ, but actually do not have an FSIQ ≤ 75). Taken together, the regression-based screening method of using two or more WISC-5 subtests to calculate a predicted FSIQ worked quite well for identifying youth with observed FSIQ > 75 (i.e. True Negatives). It also performed well at identifying youth with observed FSIQ ≤ 75 (i.e. True Positives) if adjustments (i.e. a somewhat higher predicted FSIQ cut score) were made to account for lower sensitivity.
The analysis was replicated in a medical subsample to ensure that the findings were not unique to our large sample of individuals diagnosed with behavioral health conditions such as ADHD. Overall, the models worked equally well in this medically referred subsample. This provides some measure of confidence that these ID screening models can be used in medical subspecialty clinics. Of note, however, caution is still advised when using this IQ screening approach with medical populations in which processing speed deficits are common (e.g. traumatic brain injury), as it does not include WISC-5 processing speed subtests (e.g. Coding) and thus might overestimate FSIQ in these populations.
The two and three subtest models allow for quick and efficient identification of children for whom comprehensive IQ assessment would be warranted in order to rule out/in ID. Of note, these models were tested using a clinically referred sample in which the prevalence of FSIQ ≤ 75 was fairly high (18.8% of the full sample). As both PPV and NPV are impacted by the prevalence of a condition in a population, we would expect PPV and NPV to change if this screening approach were used in a non-referred sample in which FSIQ ≤75 was rare (e.g. 1–2%). Specifically, while NPV might be expected to remain stable or improve in this scenario, one might also expect a decline in PPV and considerably more “false positives” if used with a non-referred community sample. As such, this screening approach is more suited for use in clinical settings in which the likelihood of FSIQ ≤ 75 is higher.
The iterative process of neuropsychological assessment involves a screening of the major domains of neurocognitive function, with further assessment subsequently conducted when a positive screen is identified. The proposed two, three, and four subtest embedded IQ screening models fit well into this iterative approach and support a more streamlined and tailored neuropsychological evaluation. In fact, the screening method was designed so that clinicians can make a decision whether or not to continue the WISC-5 after administering the first several subtests. As noted, these models are well suited for targeted assessment in specialty medical clinics in which quick and efficient consideration of an ID diagnosis is a primary objective. Moreover, they are well suited for broad based assessment with a wider range of referral questions and concerns. In the context of broad based assessment, the negative predictive power of the models would equip the clinician to quickly rule out low intelligence as a contributing cause of the patient’s adaptive or academic dysfunction, and pivot to examination of other potential explanations such as attention, memory, learning, executive functioning and/or emotional concerns. The regression equations generated for these models allow for the calculation of a predicted FSIQ when testing time and resources are limited, particularly if the first three WISC-5 subtests have been administered.
There are obviously very good reasons why a clinician would opt for a full/comprehensive WISC-5 administration for his or her patient evaluation, even if ID were not a high likelihood diagnosis. For instance, in medical populations in which cognitive proficiency is particularly vulnerable secondary to disease processes and/or treatment (e.g. pediatric brain tumor survivors; Kahalley et al., 2016), the neuropsychologist may choose to administer the WISC-5 working memory and processing speed subtests in order to document these areas of weakness. Similarly, a more comprehensive examination of a verbal/nonverbal intellectual split via full WISC-5 subtest administration may be preferred when assessing patients with conditions in which this type of discrepancy is a common phenotype (Baron, 2004). However, in both of these examples, an equally compelling argument could be made for efficient screening of intelligence using the two, three, or four subtest IQ screening models followed by evaluation of vulnerable skill areas such as working memory, performance speed, visuospatial processing, etc. using specialized neuropsychological tests to assess these constructs in lieu of additional IQ subtest administration.
As always, these findings should be viewed in light of the study’s strengths and weaknesses. Sample size was a strength of this study, and allowed for the development of the models in one half of the sample and evaluation of the models in the other. The approach was also novel and holds potential for improving clinical practice. However, this study also had a number of weaknesses. First, this study did not include systematic measurement of adaptive abilities typically used when considering an ID diagnosis, nor did it include clinician assigned diagnosis. For this reason, classification characteristics (e.g. sensitive, specificity, etc.) of the two, three, and four subtest screening models could only be calculated with regard to the risk of FSIQ ≤ 75, rather than an actual ID diagnosis. However, as FSIQ ≤ 75 is a necessary (yet not sufficient) component of an IQ diagnosis, it is our opinion that risk of FSIQ ≤ 75 is a salient clinical variable for clinical decision making, particularly if the patient is found to be at very low risk of FSIQ ≤ 75. Another limitation of this study is the lack of performance validity tests (PVT). As these data were obtained from routine clinical operations, different performance validity evaluation approaches were used by different clinicians, and systematic reporting upon PVT administration was not possible.
Finally, the mixed clinical sample used in this study may not be representative of the populations served by other neuropsychological assessment providers. Although this study was conducted using a mixed clinical sample of children referred for assessment, it is comprised primarily of individuals with behavioral disorders such as ADHD, rather than complex medical conditions. Moreover, while the racial composition of the mixed clinical sample was consistent with the regional demographics of the hospital where this study was conducted, they deviated from those of the overall United States population. As such, it is possible that these results may not generalize to all populations of children with congenital and acquired brain disorders or to settings that serve a different referral group.
In conclusion, this study provides support for the use of the first two, three, or four WISC-5 subtests to identify youth at risk for ID. Using only the first two WISC-5 subtests, we found that we could efficiently and accurately identify individuals with observed FSIQ of 75 or higher (i.e. True Negatives; children who would likely not meet the IQ-based DSM-5 diagnostic criteria for ID) due to the high NPV of the model. Moreover, the model was effective for identifying True Positives as well (i.e. children with FSIQ < 75) if a more conservative cut score was used (i.e. predicted FSIQ < 80). Classification statistics improved when three and four subtests were administered, and consideration of the classification statistics from the three and four subtest models will allow the clinician to achieve a higher level of screening confidence if desired by administering additional WISC-5 subtests. Moreover, the three and four subtest screening models and regression formulas allow for the calculation of a predicted FSIQ which fell within 10 or fewer standard score points of the observed FSIQ in the majority of our cases (91.4% and 94.4% of cases, respectively). These findings contribute to the larger body of research focused upon cost management and the increased personalization of neuropsychological assessment. Within this context, embedded IQ screening, rather than routine comprehensive IQ testing, may be useful for identifying youth at risk for ID and for whom further assessment is needed.
Footnotes
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- Baron IS (2004). Neuropsychological Evaluation of the Child. Oxford University Press. [Google Scholar]
- Bauer RM (2016). Clinical neuropsychology in the age of personalized/precision medicine [Paper presentation]. Paper Presented to Division 40 (Society for Clinical Neuropsychology) at the Annual Meeting of the American Psychological Association Conference, August), Denver, CO. [Google Scholar]
- Breaux KC (2018). Dyslexia index scores manual. Kaufman test of educational achievement (3rd ed.). NCS Pearson. [Google Scholar]
- Cawley GC, & Talbot NL (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107. [Google Scholar]
- Crawford JR, Anderson V, Rankin PM, & MacDonald J (2010). An index-based short-form of the WISC-IV with accompanying analysis of the reliability and abnormality of differences. The British Journal of Clinical Psychology, 49(Pt 2), 235–258. 10.1348/014466509X455470 [DOI] [PubMed] [Google Scholar]
- Donders J (1992). Validity of two short forms of the WISC-R in children with traumatic brain injury. Journal of Clinical Psychology, 48(3), 364–370. [DOI] [PubMed] [Google Scholar]
- Donders J, Elzinga B, Kuipers D, Helder E, & Crawford JR (2013). Development of an eightsubtest short form of the WISC-IV and evaluation of its clinical utility in children with traumatic brain injury. Child Neuropsychology, 19(6), 662–670. 10.1080/09297049.2012.723681 [DOI] [PubMed] [Google Scholar]
- Gur RC (2018, February). “Precision Neuropsychology”: Neuropsychological assessment in the “Precision Medicine” era [Paper presentation]. Paper Presented at the Annual Meeting of the International Neuropsychological Society. Washington, DC. [Google Scholar]
- Kahalley LS, Winter-Greenberg A, Stancel H, Ris MD, & Gragert M (2016). Utility of the General Ability Index (GAI) and Cognitive Proficiency Index (CPI) with survivors of pediatric brain tumors: Comparison to Full Scale IQ and premorbid IQ estimates. Journal of Clinical and Experimental Neuropsychology, 38(10), 1065–1076. 10.1080/13803395.2016.1189883 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaufman JC, & Kaufman AS (2001). Time for the changing of the guard: A farewell to short forms of intelligence tests. Journal of Psychoeducational Assessment, 19(3), 245–267. 10.1177/073428290101900305 [DOI] [Google Scholar]
- Kaufman AS, & Kaufman NL (2004). Kaufman Brief Intelligence Test (2nd ed..). Pearson, Inc. [Google Scholar]
- Koriakin TA, McCurdy MD, Papazoglou A, Pritchard AE, Zabel TA, Mahone EM, & Jacobson LA (2013). Classification of intellectual disability using the Wechsler Intelligence Scale for Children: Full Scale IQ or General Abilities Index? Developmental Medicine and Child Neurology, 55(9), 840–845. 10.1111/dmcn.12201 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lanfranchi S (2013). Is the WISC-IV General Ability Index a useful tool for identifying intellectual disability? Developmental Medicine and Child Neurology, 55(99), 782–783. 10.1111/dmcn.12210 [DOI] [PubMed] [Google Scholar]
- Mahone EM, Slomine BS, & Zabel TA (2017). Neurodevelopmental disorders. In Morgan JE & Ricker JH (Eds.), Textbook of Clinical Neuropsychology: Second Edition. Routledge. [Google Scholar]
- Maulik PK, Mascarenhas MN, Mathers CD, Dua T, & Saxena S (2011). Prevalence of intellectual disability: A meta-analysis of population-based studies. Research in Developmental Disabilities, 32(2), 419–436. 10.1016/j.ridd.2010.12.018 [DOI] [PubMed] [Google Scholar]
- McGraw KO, & Wong SP (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. 10.1037/1082-989X.1.1.30 [DOI] [Google Scholar]
- McKenzie K, Murray AL, Murray KR, & Murray GC (2014). Assessing the accuracy of the WISC-IV seven-subtest short form and the child and adolescent intellectual disability screening questionnaire in identifying intellectual disability in children. Child Neuropsychology : a Journal on Normal and Abnormal Development in Childhood and Adolescence, 20(3), 372–377. 10.1080/09297049.2013.799642 [DOI] [PubMed] [Google Scholar]
- Pulsifer MB (1996). The neuropsychology of mental retardation. Journal of the International Neuropsychological Society: JINS, 2(2), 159–176. 10.1017/s1355617700001016 [DOI] [PubMed] [Google Scholar]
- Reynolds CR, & Kamphaus RW (2015). Reynolds Intellectual Assessment Scales, (2nd ed.). WPS. [Google Scholar]
- Ringe WK, Saine KC, Lacritz LH, Hynan LS, & Cullum CM (2002). Dyadic short forms of the Wechsler Adult Intelligence Scale-III. Assessment, 9(3), 254–260. 10.1177/1073191102009003004 [DOI] [PubMed] [Google Scholar]
- Ryan JJ, Glass LA, & Brown CN (2007). Administration time estimates for Wechsler Intelligence Scale for Children-IV subtests, composites, and short forms. Journal of Clinical Psychology, 63(4), 309–318. 10.1002/jclp.20343 [DOI] [PubMed] [Google Scholar]
- Ryan JJ, Lopez SJ, & Werth TR (1998). Administration time estimates for WAIS-III subtests, scales, and short forms in a clinical sample. Journal of Psychoeducational Assessment, 16(4), 315–323. 10.1177/073428299801600403 [DOI] [Google Scholar]
- Sheslow D, & Adams W (2003). Wide range assessment of memory and learning – second edition, administration and technical manual. Wide Range. [Google Scholar]
- van Duijvenbode N, Didden R, van den Hazel T, & Engels RC (2016). Psychometric qualities of a tetrad WAIS-III short form for use in individuals with mild to borderline intellectual disability. Developmental Rehabilitation, 19(1), 26–30. 10.3109/17518423.2014.893265 [DOI] [PubMed] [Google Scholar]
- Vissers LE, Gilissen C, & Veltman JA (2016). Genetic studies in intellectual disability and related disorders. Nature Reviews. Genetics, 17(1), 9–18. 10.1038/nrg3999 [DOI] [PubMed] [Google Scholar]
- Ward SB, Ward TJ, Hatt CV, Young DL, & Mollner NR (1995). The incidence and utility of the ACID, ACIDS, and SCAD profiles in a referred population. Psychology in the Schools, 32(4), 267–276. [DOI] [Google Scholar]
- Wechsler D (1991). Wechsler intelligence scale for children: Third edition manual. The Psychological Corporation. [Google Scholar]
- Wechsler D (2011). Wechsler abbreviated scale of intelligence: WASI-II; Manual. Pearson. https://books.google.com/books?id=MjarjwEACAAJ [Google Scholar]
- Wechsler DL (2015). Wechsler intelligence scale for children (5th ed.). The Psychological Corporation. [Google Scholar]
- Wigg EH, Secord WA, & Semel E (2013). CELF-5 screening test. Pearson Clinical Assessment. [Google Scholar]
