Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Apr 4.
Published in final edited form as: School Psych Rev. 2010 Jun;39(2):258–276.

Identifying Children in Middle Childhood Who Are at Risk for Reading Problems

Deborah L Speece 1, Kristen D Ritchey 2, Rebecca Silverman 3, Christopher Schatschneider 4, Caroline Y Walker 5, Katryna N Andrusik 6
PMCID: PMC3070313  NIHMSID: NIHMS262756  PMID: 21472039

Abstract

The purpose of this study was to identify and evaluate a universal screening battery for reading that is appropriate for older elementary students in a response to intervention model. Multiple measures of reading and reading correlates were administered to 230 fourth-grade children. Teachers rated children’s reading skills, academic competence, and attention. Children were classified as not-at-risk or at-risk readers based on a three-factor model reflecting reading comprehension, word recognition/decoding, and word fluency. Predictors of reading status included group-administered tests of reading comprehension, silent word reading fluency, and teacher ratings of reading problems. Inclusion of individually administered tests and growth estimates did not add substantial variance. The receiver-operator characteristic curve analysis yielded an area under the curve index of 0.90, suggesting this model may both accurately and efficiently screen older elementary students with reading problems.


Although many advances have been made in early identification and intervention for students with reading disabilities, there has been less progress in identifying and remediating the reading skills of older children. The National Assessment of Educational Progress report shows that 34% of fourth-grade students perform below basic levels in reading (National Assessment of Educational Progress, National Center for Educational Statistics, & Institute of Education Sciences, 2007). Some of these students may have experienced difficulty with reading from the beginning of their school careers, but other students confront reading problems for the first time in middle childhood. In fact, Leach, Scarborough, and Rescorla (2003) estimated that 41% of all students with reading disabilities have late-emerging reading disabilities (i.e., reading disabilities that are not evident until at least third grade). These late-emerging difficulties are not identifiable by early screening assessments (Compton, Fuchs, Fuchs, Elleman, & Gilbert, 2008). Research is needed on methods of identifying children who are at risk for reading problems in middle childhood to further reduce the prevalence of reading failure.

The purpose of this article is to investigate the identification and definition of reading problems in an older group of children, specifically those in fourth grade. These children are underrepresented in the screening literature and are likely a more heterogeneous group of poor readers compared to younger students. Deficits may be broader and could include those associated with younger poor readers (e.g., decoding, word recognition, word reading fluency, spelling) as well as comprehension, vocabulary, and oral language (Cain, Oakhill, & Lemmon, 2004; Catts, Adlof, & Weismer, 2006). The possibility that the reading skills of these children vary on a number of dimensions requires careful attention to the predictor variables selected for screening, the criterion variables used to define reading problems, and the methodological procedures used in both phases of a screening paradigm.

Universal Screening

Recent revisions to the Individuals with Disabilities Education Improvement Act (2004) permit adoption of response to intervention (RTI) as a process to identify students with learning disabilities. This change in identification methods was prompted by scientific evidence and rational arguments against the long-standing adherence to a discrepancy between achievement and intelligence test scores as the hallmark of learning disabilities generally, and reading disabilities more specifically (Stuebing et al., 2002). The emphasis on child “responsiveness” to intervention was proposed as a method to move away from arbitrary test score cut points that often yielded invalid inferences and toward methods that focused specifically on a child’s academic weaknesses in the context of instruction (Fuchs & Fuchs, 1998). Although there is no single model of RTI, most conceptualizations include universal screening to replace teacher referral as the initial mechanism to identify struggling readers for more intervention.

The conceptual framework for screening is straightforward. Screening must be reliable, valid, and accurate. In other words, screening batteries should have high sensitivity in accurately identifying the students who will encounter difficulty and high specificity in adequately identifying the students who are not likely to experience problems. High sensitivity and specificity indices provide confidence that the screening battery will accurately identify students in need of intervention. An additional consideration for screening is efficiency. Because all children participate in a universal screening model, screening batteries are ideally brief and easy to administer and score.

Research on universal screening within an RTI framework focuses predominantly on children in kindergarten through second grade (Compton, Fuchs, Fuchs, & Bryant, 2006; Speece & Case, 2001). Some screening procedures for younger readers, especially those that use multivariate batteries, have reached acceptable levels of classification accuracy and work continues in this direction (Jenkins, Hudson, & Johnson, 2007). However, few efforts focus on appropriate models for students beyond the initial stages of reading acquisition. In a recent review of the literature, Jenkins et al. identified only three studies conducted with third-grade students and two studies implemented with fourth-grade students. All five studies relied on a single predictor (oral reading fluency) and state-mandated reading assessments as the criterion measure. As summarized by Jenkins et al., the sensitivity and specificity indices were not optimal for fourth-grade students (median sensitivity and specificity = 0.66 and 0.76, respectively), which would result in many false-positive and false-negative errors. These authors recommended a multivariate approach to selecting predictor variables to include not only relevant reading variables such as comprehension, word recognition, and decoding, but also other nonreading variables that have reasonable correlations with reading.

The Measurement Framework

Predictor Variables

Based on models of reading development, measures of word decoding, word recognition, and reading comprehension are prime candidates for screening measures designed to discriminate good and poor readers (Shankweiler et al., 1999). In addition to measuring selected reading variables, a broader framework may be necessary to improve the accuracy of prediction with older readers.

Language skills, inclusive of phonological processing, are most often studied as predictors of reading development. Work in this area is usually based theoretically on the Simple View of Reading, proposed by Gough and Tunmer (1986), in which they posited that reading comprehension is a product of decoding and language comprehension. The importance of both decoding and language comprehension received support from Catts et al. (2006), who found that fourth-grade children with poor comprehension performed more poorly on a composite language measure (comprising vocabulary, syntax, and listening comprehension) than either typical readers or children with poor decoding skills. These authors also demonstrated that children with poor decoding skills performed more poorly on phonological awareness measures compared to typical readers and children with poor comprehension skills. Thus, phonological processing skills may continue to be problematic for older readers. In an extensive analysis of the Simple View of Reading, Vellutino, Tunmer, Jaccard, and Chen (2007) reported that phonological skills and spelling contributed to decoding, whereas language skills (primarily semantic knowledge) contributed to language comprehension in children in second and third grade as well as sixth and seventh grade. These authors concluded that the importance of language comprehension skills to reading may not become apparent until adequate word identification skills have been acquired.

The role of memory in reading development is less clear. One argument for including working memory is based on Perfetti’s (1985) verbal efficiency theory, which suggests that slow word recognition places high demand on working memory, inhibiting cognitive resources necessary for comprehension. Research on the role of memory has produced conflicting results. Some authors report significant contributions of memory to reading (e.g., Oakhill, Cain, & Bryant, 2003) whereas others do not (e.g., Fowler & Swainson, 2004; Stothard & Hulme, 1992). Design and measurement differences may explain the contradictory results.

There has been a recent resurgence of theoretical and empirical interest in word reading fluency and its connection to comprehension (e.g., Fuchs, Fuchs, Hosp, & Jenkins, 2001; National Reading Panel, 2000). Various measures of fluency may prove to be powerful screening tools. There is ample evidence that fluency measures of both word- and text-level reading are unique predictors of reading skill in younger children (e.g., Compton et al., 2006; Speece & Ritchey, 2005).

Teacher ratings of child behavior have a solid history in the prediction of academic achievement (DuPaul et al., 2004; DiPerna & Elliott, 1999; Taylor, Anselmo, Foreman, Schatschneider, & Angelopoulos, 2000). For example, ratings of children’s attention to task- and work-related behaviors predict achievement and RTI (e.g., Stage, Abbott, Jenkins, & Berninger, 2003). Furthermore, Speece and Ritchey (2005) reported that ratings of academic competence uniquely predicted end of year reading skill in a multivariate model that included reading, reading-related variables, and intelligence.

Criterion Variables

What is being predicted is as important as the selection of predictors to be included in a screening battery because different definitions of reading problems will yield different results. For example, determining which measures predict reading comprehension depends on how reading comprehension is assessed (Cutting & Scarborough, 2006) and when it is assessed (Francis, Fletcher, Catts, & Tomblin, 2005). It follows that constructs such as risk for reading problems would be similarly affected. There is no single operational definition that garners widespread support. Because reading disability is a complicated construct that changes with development, the way it is defined and measured for different populations of children should take this into account.

Methodological Variations

Beyond the selection of predictor and criterion measures, different methodological approaches may increase prediction accuracy and efficiency.

Efficiency

With younger students, screening batteries traditionally use individually administered assessments because younger children often require individual attention to complete assessment tasks. However, older students are capable of completing assessment tasks in a group setting. Therefore, privileging group administration is one option for increasing efficiency. Although individually administered tests are generally more reliable, it may be that group-administered tests provide as much information at a lower cost. Teacher ratings (described earlier) are another means of increasing efficiency. Teachers can quickly complete ratings of student aptitude and achievement that do not require additional assessment time of the children. We are unaware of studies that compare the relative strength of group tests and teacher ratings with a more traditional approach of using individually administered tests alone in predicting reader status.

Accuracy

The classic early identification framework requires a positive or negative assessment of risk at the screening paired with a positive or negative outcome at diagnosis or criterion. These dichotomies likely work against accurate prediction because of change that is unaccounted for between the screen and diagnosis (Speece & Cooper, 2004) and measurement error associated with arbitrary cut points (Francis, Fletcher, Steubing et al., 2005). Consideration of students’ growth (i.e., responsiveness) to classroom instruction may potentially improve classification accuracy (Speece, 2005). Although growth is more often considered in evaluating the effects of interventions in an RTI framework, poor growth has differentiated good and poor readers in identification procedures for younger students (Compton et al., 2006; Speece & Case, 2001, Vellutino et al., 1996). The addition of growth as a screening construct adds complexity to the “efficiency” requirement of screening because measurement must occur at least twice. Repeated assessments may be worth the additional time if sensitivity and specificity are improved. Compton et al. (2006) found this to be the case when predicting reading status at the end of second grade from a first-grade battery. In the current study, we evaluated whether short-term growth contributes to identification accuracy of older readers.

The Current Study

Despite the fact that children may experience reading difficulty after the initial stages of literacy instruction, there are few studies that examine methods of identifying these children. The lack of attention to identification of children in third grade and beyond may be from the supposition that their problems are more evident and easily diagnosed by reading tests or observation. Existing evidence suggests there are a substantial number of children whose problems may not be apparent in the primary grades and whose difficulties may not be recognized without a structured postprimary screening process (Leach et al., 2003).

The purposes of this study were to (a) use a multivariate approach to define reading problems of fourth-grade children, and (b) identify an accurate and efficient screening battery to select children who may require more intensive instruction. This study is placed in the context of traditional screening paradigms but departs from previous efforts by including, in a single study, (a) older students, (b) a multivariate screening battery that allows us to compare the efficacy of group-administered measures and teacher ratings with individually administered measures, (c) measures of growth, and (d) a reading competence criterion defined by multiple reading measures.

Method

Participants

The participants were 230 fourth-grade students from 20 classrooms in 15 parochial schools in a large, mid-Atlantic city and surrounding suburban communities. All children in participating schools were invited to participate via parent letter, and informed consent from parents and assent from individual children were obtained. Of the 398 children invited to participate, 235 (59.0%) of the parents provided permission. Three children refused assent, one student transferred before initial testing began, and one permission was received too late—thus, the maximum n = 230.

The participants were 125 male (54.4%) and 105 female (45.7%) children. In late fall, the mean age of the participants was 9.45 years (SD = 0.33 years). Seventy-four percent of the students were Caucasian, 17.9% were African American, 1% were Asian, and 4% were biracial. Parents of 3.1% of the students did not provide information on race. All students spoke English as their primary language, and 9 students (3.9%) had an individualized education plan or had been referred for special education evaluations. With respect to mothers’ level of education, 14.9% had at least a high school diploma, 33.7% had some college education, 27.6% had a college degree, and 21.6% had a professional or graduate degree.

These participants are part of a larger investigation of RTI designed to identify poor readers, provide them with intervention, and examine their longitudinal outcomes. The current sample is the longitudinal control group and did not receive intervention as part of this investigation.

Measures

Students were assessed using measures of reading comprehension, oral language, word recognition, word decoding, phonological processing, auditory memory, and spelling. Raw scores were used in all analyses, unless otherwise specified.

Reading comprehension

The Reading Comprehension subtest of the Gates-MacGinitie Reading Test (GMRT; MacGinitie, MacGinitie, Maria, & Dreyer, 2000) is group administered and norm referenced. Students have 35 min to silently read short narrative and expository passages and answer multiple-choice questions. Form S was administered in the fall, and Form T was administered in the spring. The alternate-forms and test–retest reliability coefficients of the GMRT are above .90 for fourth-grade students.

Maze (Fuchs, n.d.; Fuchs & Fuchs, 1992) is a group-administered curriculum-based measure (CBM) that uses a modified cloze technique. In this assessment, the first sentence of a reading passage remains intact, but every seventh word thereafter is deleted and replaced with three choices; students select the choice that is contextually appropriate. Students are given 2 min to complete as many choices as possible. Students completed two probes, and the mean of the number of correct choices was converted to items correct per minute. The reliability and criterion validity of the Maze are adequate (r = .60–.86; Fuchs & Fuchs, 1992).

Oral language

The Clinical Evaluation of Language Functions, Fourth Edition (Semel, Wiig, & Secord, 2003), is an individually administered, norm-referenced assessment of language. The Formulated Sentences and Word Classes subtests of the Clinical Evaluation of Language Function were administered. The Formulated Sentences subtest requires students to orally create semantically and syntactically correct sentences when prompted by a word and an illustration. Responses are awarded 0, 1, or 2 points. The Word Classes subtest requires students to identify and express semantic relationships between two orally presented words. Responses are awarded a 1 or 0. The authors reported that test–retest reliability for both subtests is high (r = .86 and .81) and validity evidence is strong based on factor analysis studies and diagnostic sensitivity (Semel et al., 2003).

Listening comprehension

The Listening Comprehension Test was developed for this study to evaluate oral comprehension of passages comparable to those used to assess reading comprehension. Three passages from Form T of the GMRT (MacGinitie et al., 2000) were read aloud, and 16 multiple-choice questions were presented orally and in print after each passage had been read. Cronbach’s alpha was .73 for the current sample.

Word recognition and decoding

The Woodcock-Johnson Tests of Achievement, Third Edition (WJ-III; Woodcock, McGrew, Mather, & Schrank, 2001) is individually administered and norm referenced. The Letter Word Identification and Word Attack subtests were administered. The Word Identification subtest is an assessment of automatic word recognition skills; the Word Attack subtest requires students to decode pseudowords as an evaluation of phonetic and word structure skills. The split-half reliability coefficients for 9-year-old children are .94 and .89, respectively.

Passage Reading Fluency (Fuchs, Hamlett, & Fuchs, 1990) is an individually-administered CBM of oral reading fluency. Students are given 1 min to read a narrative passage. Students read two fourth-grade passages. The mean words read correctly per minute was calculated. Both test–retest and alternate-forms reliability are high (r > .90) across studies, and criterion validity is strong (Fuchs & Fuchs, 1992; Marston, 1989).

The Test of Silent Word Reading Fluency (TOSWRF; Mather, Hammill, Allen, & Roberts, 2004) is a group-administered, norm-referenced assessment of students’ ability to fluently recognize and identify printed words. The test consists of 32 lines with running letter strings that students separate into words by drawing a line between real words. After practice, students are given 3 min to complete the measure. The number of correctly separated words was used in analysis. The reliability (r = .73–.91) and validity (r = .78 and .77 for Form A and .74 and .73 for Form B) are adequate.

The Test of Word Reading Efficiency (TOWRE; Torgesen, Wagner, & Rashotte, 1999) is an individually administered, norm-referenced measure consisting of two subtests. The Sight Word Efficiency subtest assesses students’ skills in fluent reading of real words. The Phonemic Decoding Efficiency subtest assesses students’ skills in decoding nonwords. Form A was administered in the fall and Form B in the spring. The authors report excellent alternate-form reliability (r = .93) and strong concurrent criterion-related validity (r = .87–.89).

Word Identification Fluency (WIF) is an individually-administered CBM of word reading fluency developed for this study. The development of WIF was based on a procedure developed by Compton (personal communication, March 3, 2003). The words on WIF were randomly selected from the Educator’s Word Frequency Guide (Zeno, Ivens, Millard, & Duvvuri, 1995). The Educator’s Word Frequency Guide is based on a large word frequency study, and provides information on the frequency of words and what grade students are likely to encounter words. Parallel probes, each with 80 words representing a range of frequency levels, were created. The variable of interest was the mean number of words students read correctly in 1 min over two trials. In the current sample, the parallel-forms reliability coefficient was .92. Validity coefficients with the WJ-III Word Identification subtest (r = .68), TOWRE Sight Word Efficiency (r = .86), and PRF (r = .78) are strong.

The Colorado Assessment of Decoding, Revision II (Scarborough et al., 2008) is an experimental group-administered test of decoding skill, adapted from the homophone choice reaction time measure developed for the Colorado Twin Study (Olson, Forsberg, Wise, & Rack, 1994). Students underlined the phonologically accurate familiar English word from among three incorrectly spelled words (e.g., lun, fep, kat). Students were given 5 min to complete 40 items. The number of correct responses was used in analyses. Cronbach’s alpha was .83 for the current sample.

Phonological processing

The Comprehensive Tests of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte, 1999) is an individually-administered, norm-referenced measure of phonological processing. Two subtests were administered. The Elision subtest assesses students’ phonological awareness and requires that students orally delete syllables and phonemes of a word and then pronounce the resulting word. The Nonword Repetition subtest evaluates students’ phonological memory by requiring students to repeat orally presented nonwords. The authors reported strong criterion-related predictive validity for Elision (r = .67–.68) and for Nonword Repetition (r = .52).

The Rapid Automatized Naming—Letters (RAN; Wolf & Denckla, 2005) is an individually-administered, norm-referenced assessment that evaluates phonological processing by assessing how quickly and accurately students name five letters that are randomly repeated on a page. The number of seconds to complete the task was averaged for two trials. A low score is associated with better performance. The test–retest reliability (r = .87) is excellent and criterion-related validity (r = .46) is moderate.

Auditory working memory

The WJ-III (Woodcock, McGrew, & Mather, 2001) Auditory Working Memory subtest is an individually-administered, norm-referenced assessment that evaluates a student’s ability to retain orally presented information. The subtest requires that a student hold a mixed set of numbers and words in immediate awareness while reordering into two sequences. The split-half reliability of the subtest is strong (r = .87) and criterion-related validity is moderate (r = .62).

Spelling

Spelling Fluency (Fuchs, Fuchs, Hamlett, & Allinder, 1991) is a group-administered CBM in which students are presented with a new word every 10 sec for 2 min and asked to spell the word. The words are randomly drawn, with replacement, from the Harris-Jacobson grade-level list. Three practice words are administered to ensure that students understand the directions and the pacing. The mean number of correct letter sequences per minute based on two trials was calculated. Marston (1989) reported excellent reliability (test–retest/parallel forms median r = .85) and criterion validity (median r = .86) for this measure.

Teacher ratings

Academic Competence is one of three subscales of the Social Skills Rating System (Gresham & Elliott, 1990) and consists of 9 items that require the teacher to use a 4-point scale to compare the child to classmates on reading and math achievement and motivation to learn. The sum of scores was used in analysis. The authors report excellent test–retest reliability for the Academic Competence subscale (r = .93); criterion validity is moderate to strong with other measures of teacher ratings of child behavior.

The Attention Deficit Hyperactivity Disorder Rating Scale-IV (DuPaul, Power, Anastapoulos, & Reid, 1998) was completed by teachers. This 18-item questionnaire uses a 4-point rating scale to identify behaviors that may suggest a diagnosis of attention deficit hyperactivity disorder. The variables of interest were the sum of the items for the Hyperactivity and the Inattention subscales. A low score indicates low rates of hyperactive or inattentive behavior. The test–retest reliability is moderate to excellent (r = .78–.90), and the internal consistency of each form is excellent (r = 88–.96). Criterion related validity was established with the Conner’s Teacher and Parent Rating Scales (r = .61–.81).

The Teacher Reading Rating Form was developed for this study to measure teachers’ assessment of children’s reading skills. Teachers rated each child on a 1–5 scale (Overall Score). Scores of 1 or 2 indicated below grade level performance and scores of 3, 4, or 5 represented skill at or above grade level. For children rated 1 or 2, teachers were asked to identify specific problem areas for the child. The selections were decoding, fluency, vocabulary, comprehension, and motivation; teachers could select as many problem areas as were applicable to the child. The number of problem areas was summed to produce a Teacher Rating of Reading Problems. Based on the current sample, the validity coefficients for the Overall Score and GMRT and Maze were .68 and .63, respectively. For the Teacher Rating of Reading Problems, the validity coefficients with the same reading measures were −.50 and −.49, respectively.

Procedure

Data collection

Data for this study were collected in five waves across the school year. Waves of data collection overlapped in time and schools were tested in the same order within wave to preserve spacing between waves. In general, Wave 1 was collected between November through mid-December; Wave 2 from December through January; Wave 3 in late January through February; Wave 4 from April to mid-May; and Wave 5 in May. Waves 1, 3, and 5 were individual administrations, and Waves 2 and 4 were group administrations. Within each wave, assessments were administered in a standard order. All data were collected by graduate research assistants who were trained to a 90% accuracy criterion (administration and scoring) on all measures before testing began. On-site fidelity checks were made throughout the year.

Data-analytic plan

The analysis of the data for this study was a multistep process. First, we needed to identify those students who exhibited poor reading performance at the end of fourth grade. The following measures of comprehension, word recognition, and decoding (collected at Waves 4 and 5) were used to define the reading criterion: GMRT, Maze, Letter Word Identification and Word Attack, PRF, TOSWRF, TOWRE Sight Word Efficiency and Phonemic Decoding Efficiency, WIF, and Colorado Assessment of Decoding. These measures were subjected to an exploratory principal axis factor analysis to increase reliability and reduce the number of criterion variables. Once the factors were estimated, we identified criteria for the factor scores, which were then used to classify students as at-risk or not-at-risk readers.

To determine which measures should be used as the strongest predictors of risk status, we employed an all-subsets regression analysis (Miller, 2002). There are a number of possible regression techniques to determine the relative importance of individual predictors in multiple regression (Aizen & Budescu, 2003; Budescu, 1993) and sets of predictors (Miller, 2002). One technique that has received much deserved criticism is stepwise regression (Thompson, 1995). One of the major problems with stepwise regression is that the selection of the “important” variables is often sample specific and does not replicate in new samples. Fortunately, our goal for selecting predictors was less lofty than trying to identify the best subset of predictors. For the purposes of this study, we needed to find a subset of predictors that was practical and efficient, while providing strong information as to which students would be identified as at-risk readers at the end of the year, acknowledging that there may be other sets of predictors that could potentially do as well, but were unlikely to do better.

Measures used as predictor variables in the all-subsets regression analysis, included assessments of reading and related skills (collected at Waves 1 and 2), estimates of growth on several predictors (collected at Wave 3), and teacher ratings of reading performance, attention, and academic competence. Group-administered assessments and teacher ratings were examined first, followed by individually-administered assessments, and then growth scores. After a subset of practical and efficient predictors was identified, these predictors were used in a logistic regression model to assess the overall classification accuracy of these variables and the risk probability values associated with specific performances on the predictor variables.

Results

Table 1 provides descriptive statistics on all measures organized by criterion and predictor variables. Tables 2 and 3 provide correlations.

Table 1.

Descriptive Statistics for Total Sample and by Reader Group

Total Sample Not At-Risk Readers At-Risk Readers



Mean SD N Mean SD n Mean SD n
Criterion Measures
    CBM Maze 8.95 2.60 228 9.84 2.21 165 6.60 2.05 63
    CBM Passage Reading Fluency 138.96 39.28 228 154.57 31.72 165 98.07 25.26 63
    CBM Word Identification Fluency 77.44 17.14 228 83.62 14.19 165 61.25 13.20 63
    Colorado Decoding 28.17 6.57 228 30.12 5.68 165 23.05 5.97 63
    GMRT Reading Comprehension 102.03 14.49 228 107.41 11.74 165 87.92 11.16 63
    TOWRE Phonemic Decoding Efficiency 109.72 12.14 228 114.70 9.15 165 96.67 8.89 63
    TOWRE Sight Word Efficiency 106.70 10.08 228 110.33 8.33 165 97.19 7.87 63
    WJ-III Word Attack 106.01 8.99 228 108.85 7.65 165 98.57 7.97 63
    WJ-III Word Identification 103.80 10.29 228 107.34 8.99 165 94.52 7.28 63
Predictor Measures
    Academic Competence 31.20 8.20 214 33.86 7.00 152 24.68 7.27 62
    ADHD Hyperactivity 3.27 5.03 228 2.72 4.55 166 4.73 5.93 62
    ADHD Inattention 5.36 6.48 222 3.94 5.31 164 9.36 7.75 58
    CBM Maze 7.53 2.42 229 8.37 2.05 166 5.33 1.89 63
    CBM Maze Growth 0.42 1.78 229 0.47 1.86 166 0.29 1.56 63
    CBM Passage Reading Fluency 127.63 27.24 230 137.24 21.61 167 102.17 24.16 63
    CBM PRF Growth 2.44 16.00 229 4.15 16.44 166 −2.06 13.90 63
    CBM Spelling (CLS) 35.30 4.19 229 36.81 2.94 166 31.30 4.39 63
    CBM Spelling (CLS) Growth 20.12 11.18 229 23.29 10.18 166 11.76 9.28 63
    CBM Word Identification Fluency 65.68 15.44 230 70.76 12.66 167 52.22 14.07 63
    CBM WIF Growth 6.96 10.21 229 7.43 11.09 166 5.70 7.34 63
    CELF Formulated Sentences 10.93 2.41 229 11.20 2.42 166 10.21 2.24 63
    CELF Word Classes 10.43 2.67 230 11.08 2.58 167 8.73 2.10 63
    Colorado Decoding 26.76 6.24 229 28.54 5.41 166 22.06 5.86 63
    CTOPP Elision 10.37 2.92 230 11.14 2.58 167 8.32 2.80 63
    CTOPP Nonword Repetition 10.45 2.62 230 10.71 2.65 167 9.75 2.42 63
    GMRT Reading Comprehension 102.02 14.46 229 107.37 11.72 166 87.92 11.16 63
    Listening Comprehension 8.60 3.37 230 9.21 3.37 167 6.98 2.81 63
    RAN Letters (seconds) 26.01 5.81 230 24.91 4.99 167 28.92 6.79 63
    Teacher Reading Rating (Overall Rating) 3.23 0.92 228 3.52 0.78 166 2.45 0.78 62
    Teacher Reading Rating (Total Problems) 0.64 1.40 228 0.15 0.63 166 1.95 1.95 62
    Test of Silent Word Reading Fluency 104.99 9.46 229 107.81 8.00 166 97.56 9.03 63
    TOWRE Phonemic Decoding Efficiency 103.97 11.49 230 107.82 10.01 167 93.78 8.59 63
    TOWRE Sight Word Efficiency 103.37 9.49 230 106.41 8.17 167 95.30 7.96 63
    WJ-III Auditory Memory 106.98 13.20 229 108.65 12.91 166 102.59 13.04 63
    WJ-III Word Attack 104.77 10.38 230 107.70 8.60 167 97.02 10.76 63
    WJ-III Word Identification 104.37 9.87 230 107.80 8.19 167 95.30 8.07 63

Note. ADHD = Attention Deficit Hyperactivity Disorder Rating Scale; CLS = correct letter sequences; CBM = curriculum-based measurement; CTOPP = Comprehensive Test of Phonological Processing; GMRT = Gates-MacGinitie Reading Test; RAN = Rapid Automatized Naming; TOWRE = Test of Word Reading Efficiency; WJ-III = Woodcock-Johnson Tests of Achievement and Cognitive Abilities, Third Edition; All CBM measures (Passage Reading Fluency, Word Identification Fluency, Maze, Math, and Spelling) are averaged across probes and converted to items per minute. Grade-based standard scores are presented for WJ-III subtests, GMRT Reading Comprehension, CTOPP, and TOWRE subtests. WJ-III subtests, GMRT Reading Comprehension, and TOWRE subtests have a mean standard score of 100, SD of 15. Other subtests have a mean of 10, SD of 3. Raw scores are presented for all other measures.

Table 2.

Correlations Among the Criterion Assessments

Measures 1 2 3 4 5 6 7 8 9
1. GMRT Reading Comprehension 1.0
2. Colorado Decoding 0.55 1.0
3. WJ-III Word Identification 0.56 0.63 1.0
4. WJ-III Word Attack 0.49 0.66 0.76 1.0
5. CBM Maze 0.62 0.44 0.51 0.49 1.0
6. CBM Passage Reading Fluency 0.61 0.41 0.61 0.54 0.74 1.0
7. CBM Word Identification Fluency 0.44 0.35 0.57 0.54 0.58 0.78 1.0
8. TOWRE Sight Word Efficiency 0.45 0.30 0.50 0.49 0.60 0.74 0.87 1.0
9. TOWRE Phonemic Decoding Efficiency 0.51 0.54 0.69 0.71 0.58 0.73 0.76 0.76 1.0

Note. CBM = curriculum-based measurement; GMRT = Gates-MacGinitie Reading Test; WJ-III = Woodcock-Johnson Tests of Achievement, Third Edition; TOWRE = Test of Word Reading Efficiency.

Table 3.

Correlations Among Predictor Measures

Measure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
1. Academic Competence 1.00
2. ADHD Hyperactivity −0.33 1.00
3. ADHD Inattention −0.65 0.67 1.00
4. CBM Maze 0.50 −0.21 −0.38 1.00
5. CBM Maze Growth 0.19 −0.10 −0.06 −0.05 1.00
6. CBM Passage Reading Fluency 0.54 −0.15 −0.29 0.71 0.23 1.00
7. CBM PRF Growth 0.15 −0.01 −0.05 0.19 0.04 0.00 1.00
8. CBM Spelling (CLS) 0.49 −0.11 −0.26 0.57 0.08 0.61 0.14 1.00
9. CBM Spelling (CLS) Growth 0.39 −0.10 −0.17 0.48 0.33 0.57 0.20 0.51 1.00
10. CBM Word Identification Fluency 0.43 −0.03 −0.15 0.60 0.09 0.78 0.13 0.63 0.48 1.00
11. CBM WIF Growth 0.12 0.01 −0.04 0.03 −0.04 −0.04 0.28 0.03 0.06 −0.18 1.00
12. CELF Formulated Sentences 0.36 −0.04 −0.17 0.30 0.19 0.34 0.02 0.32 0.30 0.26 −0.03 1.00
13. CELF Word Classes 0.51 −0.13 −0.30 0.44 0.24 0.49 0.03 0.44 0.32 0.41 −0.02 0.47 1.00
14. Colorado Decoding 0.39 −0.09 −0.19 0.37 0.08 0.40 0.11 0.62 0.33 0.39 −0.04 0.31 0.38 1.00
15. CTOPP Elision .40 −0.11 −0.27 0.38 0.17 0.43 0.16 0.50 0.42 0.43 −0.03 0.32 0.40 0.42 1.00
16. CTOPP Nonword Repetition 0.26 −0.13 −0.12 0.25 0.26 0.33 0.02 0.29 0.25 0.23 −.10 0.25 0.34 0.31 0.28 1.00
17. GMRT Reading Comprehension 0.64 −0.21 −0.38 0.58 0.19 0.60 0.11 0.54 0.38 0.44 0.00 0.40 0.60 0.49 0.47 0.36 1.00
18. Listening Comprehension 0.38 −0.04 −0.13 0.42 0.13 0.41 0.10 0.29 0.34 0.26 0.00 0.35 0.54 0.21 0.32 0.25 0.59 1.00
19. RAN Letters (seconds) −0.26 0.02 0.09 −0.35 −0.04 −0.49 −0.06 −0.26 −0.27 −0.56 −0.03 −0.13 −0.18 −0.25 −0.17 −0.15 −0.19 −0.02 1.00
20. Reading Rating (Overall Rating) 0.75 −0.25 −0.45 0.63 0.27 0.65 0.19 0.53 0.44 0.55 0.00 0.39 0.57 0.46 0.46 0.31 0.68 0.43 −0.32 1.00
21. Teacher Reading Rating (Total Problems) −0.62 0.19 0.44 −0.49 −0.10 −0.52 −0.15 −0.52 −0.36 −0.50 −0.02 −0.26 −0.39 −0.37 −0.38 −0.16 −0.50 −0.28 0.25 −0.68 1.00
22. Test of Silent Word Reading Fluency 0.42 −0.17 −0.40 0.66 0.09 0.54 0.11 0.52 0.42 0.48 0.05 0.11 0.36 0.40 0.34 0.21 0.45 0.24 −0.23 0.47 −0.42 1.00
23. TOWRE Phonemic Decoding Efficiency 0.43 −0.09 −0.16 0.56 0.11 0.71 0.10 0.70 0.51 0.78 −0.02 0.25 0.41 0.58 0.49 0.32 0.47 0.27 −0.47 0.53 −0.43 0.46 1.00
24. TOWRE Sight Word Efficiency 0.39 −0.03 −0.15 0.60 0.06 0.76 0.11 0.56 0.42 0.86 0.01 0.23 0.37 0.38 0.36 0.26 0.40 0.21 −0.67 0.52 −0.47 0.46 0.77 1.00
25. WJ-III Auditory Memory 0.26 −0.16 −0.19 0.33 0.12 0.30 0.02 0.23 0.26 0.21 −0.00 0.31 0.40 0.23 0.35 0.35 0.34 0.27 −0.15 0.31 −0.21 0.15 0.28 0.25 1.00
26. WJ-III Word Attack 0.41 −0.05 −0.16 0.41 0.11 0.52 0.08 0.71 0.42 0.56 −0.05 0.32 0.37 0.64 0.52 0.31 0.40 0.24 −0.23 0.47 −0.41 0.43 0.77 0.53 0.29 1.00
27. WJ-III Word Identification 0.56 −0.10 −0.22 0.58 0.16 0.67 0.13 0.77 0.49 0.68 −0.02 0.36 0.56 0.63 0.54 0.41 0.61 0.38 −0.33 0.63 −0.51 0.51 0.79 0.61 0.31 0.77 1.00

Note. ADHD = Attention Deficit Hyperactivity Disorder Rating Scale; CELF = Clinical Evaluation of Language Functions, Fourth Edition; CLS = correct letter sequences; CBM = curriculum-based measurement; CTOPP = Comprehensive Test of Phonological Processing; GMRT = Gates-MacGinitie Reading Test; RAN = Rapid Automatized Naming; TOWRE = Test of Word Reading Efficiency; WJ-III = Woodcock-Johnson Tests of Achievement, Third Edition. All correlations are among raw scores for all variables. All correlations greater than .11 are significant at p < .05 or greater.

Exploratory Factor Analysis of the Criterion and the Identification of At-Risk Readers

An exploratory principal axis factor analysis using maximum likelihood estimation was conducted using raw scores for all nine measures. We first inspected the data for univariate normality and found that skewness and kurtosis for all measures were within ±1.0. We then inspected scatter plots to identify bivariate outliers and none were found. Following the guidelines for conducting exploratory factor analyses using maximum likelihood that was set forth by Fabrigar, Wegener, MacCallum, and Strahan (1999), we used multiple indicators (root mean square error of approximation, Scree test, variance accounted for by the factors) to determine the number of factors to retain. The solution deemed best was the three-factor solution that explained 84% of the original covariance among the nine variables. The root mean square error of approximation was 0.086 and the scree plot also indicated a three-factor solution. The three-factor solution was rotated using an oblique (Promax) rotation (Muthen & Muthen, 2007) because the variables were assumed to be correlated. The loadings of each of the nine criterion variables with the factors (the pattern matrix) are presented in Table 4. The pattern of loadings indicates that the first factor represents a reading comprehension factor (Factor 1: Comprehension), with high loadings for all of the criterion measures that assessed some aspect of comprehension (Maze and GMRT). The second factor represents a word reading factor (Factor 2: Word Reading) with high loadings on the Colorado Assessment of Decoding, Word Attack, and Word Identification. The third factor represents a speeded word reading factor (Factor 3: Fluency) with high loadings from Word Identification Fluency, and TOWRE Phonemic Decoding Efficiency and Sight Word Efficiency. CBM Passage Reading Fluency loaded on both Comprehension and Fluency. The correlations among the factors indicated that all three factors were strongly correlated with each other, with a .64 correlation between Comprehension and Word Reading, a .58 correlation between Comprehension and Fluency, and a .51 correlation between Word Reading and Fluency.

Table 4.

Factor Pattern Matrix

Measure Factor 1 Factor 2 Factor 3
CBM Maze .74 −.03 .18
CBM Passage Reading Fluency .55 −.01 .48
CBM Word Identification Fluency .04 .07 .88
Colorado Decoding .18 .74 −.18
GMRT Reading Comprehension .69 .21 −.08
TOWRE Phonemic Efficiency .02 .46 .55
TOWRE Sight Word Efficiency .09 −.03 .89
WJ-III Word Attack −.09 .88 .12
WJ-III Word Identification .09 .72 .12

Note. CBM = curriculum-based measurement; GMRT = Gates-MacGinitie Reading Test; TOWRE = Test of Word Reading Efficiency; WJ-III = Woodcock-Johnson Tests of Achievement, Third Edition.

Factor loadings were then used to create three factor scores for each student. To identify students with poor reading skills, sample-based percentile scores were computed for each factor score, and a reading problem was defined as scoring below the 15th percentile on any one of the three factors. This cut point was selected to acknowledge that our sample scored slightly above the normative mean on several measures, so a stricter criterion was appropriate. This procedure identified 63 children as at risk. Means and standard deviations for the not-at-risk and at-risk subgroups also are reported in Table 1.

Predicting Reading Status in Fourth Grade

An all-subsets regression that computed all possible R’s for the group-administered assessments (10 variables) and teacher ratings for all possible set sizes was conducted. The obtained R2 values were then rank ordered from highest to lowest within a given set size. Once the highest R2 values for each set size were obtained, we looked for a point of “diminishing returns” such that going from a set of predictors of size n to a set of predictors of size n + 1 would not provide an important increase in the overall R2 value. The models with one to five variables yielded the R2 values of .35, .45, .49, .50, and .50, respectively; the final five models all yielded an R2 of .51. These values begin to show a diminishing increase in R2 values after set size three. For example, increasing the set size from three to four would only increase the overall R2 value by 1%, so we selected three predictors. Of the 120 possible combinations of three predictor variables (chosen from a total set of 10), Table 5 shows the 10 largest R2 values for different combinations of three predictors. We acknowledge that any one of these sets of three would probably not be statistically larger in R2 value than any other sets in this table, so we selected the set that contained the assessments that had national norms to facilitate comparisons with other samples. Thus, from the sets displayed in Table 5, we chose the set that contained the GMRT, TOSWRF, and the Teacher Rating of Reading Problems (R2 = 0.46). Because the GMRT and TOSWRF are group administered and the Teacher Rating of Reading Problems is completed by teachers, this set would also minimize the loss of instructional time for students.

Table 5.

The Top Ten Predictors of Reading Status for Set Size = 3

Predictors R2
CBM Maze, CBM Spelling, Teacher Reading Rating (Total Problems) .49
CBM Maze, Colorado Decoding, Teacher Reading Rating (Total Problems) .48
CBM Spelling, Test of Silent Word Reading Fluency, Teacher Reading Rating (Total Problems) .47
CBM Maze, GMRT Reading Comprehension, Teacher Reading Rating (Total Problems) .47
CBM Maze, Test of Silent Word Reading Fluency, Teacher Reading Rating (Total Problems) .47
GMRT Reading Comprehension, CBM Spelling, Teacher Reading Rating (Total Problems) .46
CBM Maze, Academic Competence, Teacher Reading Rating (Total Problems) .46
CBM Maze, ADHD Inattention, Teacher Reading Rating (Total Problems) .46
GMRT Reading Comprehension, Test of Silent Word Reading Fluency, Teacher Reading Rating (Total Problems) .46
Colorado Decoding, Test of Silent Word Reading Fluency, Teacher Reading Rating (Total Problems) .46

Note. ADHD = Attention Deficit Hyperactivity Disorder Rating Scale-IV; CBM = curriculum-based measurement; GMRT = Gates-MacGinitie Reading Test.

The next two regression models were conducted with these three variables and (a) the 12 individually-administered measures and then (b) the four growth measures to determine if prediction of reading status could be improved. The only individually-administered variable that accounted for significant additional variance was the TOWRE Phonemic Decoding Efficiency (F = 10.58, p < .002), but the amount of variance accounted for by this variable was small (2.4%). In addition, the regression model with growth variables identified Spelling (F = 10.46, p < .02), but the amount of variance accounted for was only 2.5%. Thus, none of the individually-administered measures or the growth measures substantially improved prediction of reading status over the GMRT, TOSWRF, and Teacher Rating of Reading Problems.

Determining Probability of Reading Problems

Using the three predictor variables selected earlier (GMRT, TOSWRF, and Teacher Rating of Reading Problems), we performed a logistic regression and a receiver-operator characteristic (ROC) curve analysis (Swets, 1986) to predict which students would be identified with a reading problem. ROC curves provide a useful tool for examining the utility of a screening battery in predicting the presence or absence of a problem. ROC curves allow for an inspection of potential cut points for a screening battery that may be chosen to optimize sensitivity, specificity, or minimize a certain type of error (false positives or false negatives). A standard ROC curve will have sensitivity on the y axis and 1-specificity (false positives) on the x axis. The area under the curve (AUC) of a ROC curve is a probability index that ranges between 0.50, which means the screening battery does no better than chance, and 1.0, which means perfect prediction. The value of the AUC can also be thought of as the probability that the screening battery will correctly classify a pair of randomly selected individuals in which one has the problem and the other does not.

All three predictor variables were entered simultaneously into the logistic regression, and each was statistically significant (see Table 6). In addition, the ROC curve yielded an AUC = 0.90 (see Figure 1). From this graph, one could select a point along the curve (which would represent a potential cut point on a linearized combination of the three independent variables) that could be investigated for an acceptable level of sensitivity and specificity. For example, this graph indicates that if we chose a sensitivity level of 0.80, the corresponding specificity rate would also be 0.80. If a desired sensitivity level was 0.90, the corresponding specificity level would be around 0.63.

Table 6.

Results From Logistic Regression Predicting Reading Problems

Predictors Estimate SE Odds
Ratio
95%
CI—Lower
Bound
95%
CI—Upper
Bound
Intercept 7.35 1.64
GMRT Reading Comprehension −0.10 0.03 0.91 0.86 0.95
Teacher Reading Rating (Total Problems) 0.61 0.17 1.85 1.33 2.57
Test of Silent Word Reading Fluency −0.07 0.02 0.93 0.90 0.96

Note. GMRT = Gates MacGinitie Reading Test; CI = confidence interval.

Figure 1.

Figure 1

ROC curve predicting a reading problem.

Although our preference was the model that contained nationally norm-referenced measures, the first set of predictors (Maze, Spelling, Teacher Rating of Reading Problems, R2 = .49) would comprise the most efficient battery taking approximately 12 min to administer two probes of each CBM measure. We repeated the earlier analyses examining individually-administered and growth measures and found essentially the same results: Although TOWRE Phonemic Decoding Efficiency and Spelling Fluency growth added significant variance, the amount was small (less than 1%); the ROC curve analyses yielded an AUC of 0.92.

Discussion

The purpose of this study was to define reading competence for fourth-grade children and to identify a screening battery that effectively discriminated between older students who were and were not at risk in reading. Screening efficiency was examined by comparing group-administered tests and teacher ratings with individually-administered tests and estimates of growth. Overall, we found that three factors explained reading variance at the end of the school year and that a three-variable model comprising group-administered tests and a teacher rating was sufficient to predict reading status.

Defining Reading Problems

Our results suggest that the variance in fourth-grade students’ reading skills at the end of the school year can be defined by three factors (Comprehension, Word Reading, and Fluency) that reflect comprehension of connected text, word accuracy and decoding skills, and word and pseudoword fluency. The emergence of three factors to represent reading competence in older readers is noteworthy as it broadens the perspective on reading components. Studies of young at-risk readers emphasize phonological processes and single word reading, not often including fluency and comprehension (e.g., Wagner et al., 1997). This emphasis may be from the difficulty of measuring comprehension and the strong relationship between word reading and comprehension at younger ages (Francis et al., 2005). In our study with older readers, reading comprehension was a distinct factor and word accuracy and fluency were similarly separate. These findings suggest that comprehension, accuracy, and fluency abilities become more distinct with development, as suggested by Francis et al. This would account for slow but accurate readers who can comprehend yet whose rate of reading negatively interferes with their ability to complete assignments in a timely manner. The factor structure we obtained also supports findings from intervention research with older readers indicating that remediating word-level deficits may not remediate fluency deficits (Torgesen et al., 2001).

In this study, children who scored below the 15th percentile for the sample on any of the factors were identified as at risk. The 15th percentile is an admittedly arbitrary criterion but was selected to capture as many at-risk readers as possible while acknowledging that the sample scored somewhat higher than normative samples (see Table 1). This procedure yielded 63 at-risk readers, which was 27.4% of the sample. Fourteen children (22.2%) met the criterion for all three factors, 13 children (20.6%) met criterion on two factors (n = 3 Comprehension and Word Reading; n = 5 Comprehension and Fluency; n = 5 Word Reading and Fluency), and a single factor score identified 36 children (57.1%; n = 13 Comprehension; n = 12 Word Reading; n = 11 Fluency). Thus, 35 children had difficulty in comprehension, 34 had difficulty in word reading, and 35 had difficulty in fluency. These proportions suggest the necessity of considering all three reading components when assessing older readers. Replication with different samples is required to further validate these procedures to define at-risk readers in middle childhood.

Predicting Reading Problems

The results of this study confirm that screening for reading problems in middle childhood requires a multivariate perspective. No single measure adequately identified children we defined as being at risk. From a large battery, three measures were needed to best predict reading status, including reading comprehension, word reading fluency, and notably, teacher ratings of reading problems. Together, these variables accounted for 46% of the variance in the binary prediction of reading problems (i.e., at risk or not at risk for reading problems). They also comprised an efficient screening battery in that two measures were group administered and one was completed by teachers. Accounting for almost half of the variance is important given that reading status, rather than a continuous measure of reading, served as the dependent measure. Nonetheless, it is possible that other approaches to the problem may yield stronger results.

The strength of the three-variable screening battery was bolstered by considering if individually-administered measures of reading and related skills and growth on fluency measures would increase the variance associated with reading status. From a practical perspective, the answer is no. The additional variables added, at most, 2.5% variance to the original model, not enough to offset the expense of an additional test session when efficiency is the goal. The additional significant variables represented pseudoword fluency and spelling growth. Interestingly, none of the language variables entered the equations despite their theoretical importance in understanding reading development (e.g., Catts et al., 2006; Jenkins et al. 2007). Regarding growth, we hypothesized that growth in reading skills would increase the accuracy of a screening battery, in line with the findings of Compton et al. (2006) for first-grade children. However, we found that this was not the case, except for spelling fluency. For children in middle childhood, it may be that growth is not a useful indicator of risk. On the other hand, it may be that assessment of growth requires more frequent measurement over a longer period of time than was possible in this study. These hypotheses require further investigation.

In addition to measures of students’ reading abilities, teachers’ ratings also defined reading problems. The critical variable was the number of problems that teachers identified for children rated as reading below grade level. This finding adds to research on the role of teacher ratings in predicting academic achievement (e.g., Hoge & Coladarci, 1989; Taylor, et al., 2000), and is important for two additional reasons. First, acknowledging the contribution teachers can make in screening highlights the value of their knowledge, based on many observations, in understanding child development. Second, teacher ratings provide an additional way, besides group testing, to make screening as efficient as possible.

Classification accuracy was strong, with ROC curve analysis yielding an AUC value of 0.90. This value is regarded as a desirable standard in screening (National Center for Response to Intervention, n.d.) and compares well to screening batteries with younger children (Compton et al., 2006). It is of interest that a three-variable model consisting of two CBM measures and the teacher reading rating performed similarly. Practitioners and researchers may find this efficient battery to be useful when nationally normed instruments are not required.

Limitations

There are several limitations that should be considered when interpreting these findings. This study was conducted in parochial schools, and school populations were relatively small in comparison to public districts in the same geographic area. Further research in other school settings is needed to generalize these findings. There also is overlap between measures used to develop the screening battery and those used to define at-risk status. Although collecting data at two time points and use of factor analysis to define the criterion mitigates shared method variance to some extent, it is certainly possible that independent measures at the two time points would yield different results. Relatedly, we used a large number of measures administered in a standard order to assess both actual reading skill and skills shown to be related to reading achievement. The benefit of this approach is that many theoretically intriguing constructs were represented. The drawbacks are that (a) replication may be a daunting prospect, and (b) potential assessment fatigue may affect results.

A further limitation was the timing of the screening battery. Because of practical considerations regarding access to schools, we were unable to begin assessment until late fall. Additional research is needed to determine if the screening battery we identified replicates when conducted earlier in the academic year. This is likely more of an issue when considering the importance of growth in reading skills. Finally, this study began when students were in fourth grade, and did not include collection of information on prior reading competencies. Knowledge of children’s early reading skills would allow comment on the extent to which children identified as at-risk readers in fourth grade exhibited problems earlier in their development.

Implications

The model of screening presented in this article is optimal for a RTI framework because of its efficiency. The administration of the two reading measures takes, at most, 45 min plus the time for teachers to rate reading, which takes about 1 min per child. Teacher time is not a trivial issue, of course, and needs to be considered in any evaluation of screening efficiency. In this study, the influence of individually-administered assessments was minimal from a practical perspective. Practitioners should note, however, that screening accuracy varies, depending on the cut point, as illustrated by the ROC analysis.

Implications for research include further investigation of the usefulness of group-administered tests for screening. The use of group-administered assessments and teacher ratings represent a departure from other research examining universal screening. There may be other group-administered tests and rating scales that would perform as well or better than the ones selected for this study that deserve consideration. The small but significant role of phonological decoding fluency and spelling growth in understanding the reading problems of some at-risk readers should be explored descriptively and experimentally.

To identify children in middle childhood quickly and intervene appropriately within a RTI framework, efficient and sensitive screening batteries must be able to identify children at risk for reading problems that are heterogeneous and multidimensional in older children. The screening battery and definition of reading problems identified in this study provide a foundation for further work on how to screen children in the later grades to inform a RTI models.

Acknowledgments

This work was supported by grants from the National Institute of Child Health and Human Development (Grant P5HD052121) and the U.S. Department of Education (Grant H325D070082).

Biographies

Deborah L. Speece is Professor of Special Education at the University of Maryland. Her areas of interest include children at risk for school failure, reading disabilities, and response to intervention as a method of early intervention and identification.

Kristen D. Ritchey is Associate Professor in the School of Education at the University of Delaware. Her research interests are identification and intervention of reading and writing disabilities.

Rebecca Silverman is Assistant Professor of Special Education at the University of Maryland. Her research focuses on language and literacy assessment, development, and intervention.

Christopher Schatschneider is Professor of Psychology at Florida State University and is Associate Director of the Florida Center for Reading Research. His research focuses on early reading development, methodology, and statistics.

Caroline Y. Walker is a doctoral candidate at the University of Maryland. Her areas of interest include subtypes of reading disability and reading comprehension.

Katryna N. Andrusik is a doctoral candidate and graduate research assistant at the University of Maryland. Her research interests include students with learning disabilities and academic motivation.

Footnotes

Copyright of School Psychology Review is the property of National Association of School Psychologists and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Contributor Information

Deborah L. Speece, University of Maryland

Kristen D. Ritchey, University of Delaware

Rebecca Silverman, University of Maryland.

Christopher Schatschneider, Florida State University.

Caroline Y. Walker, University of Maryland

Katryna N. Andrusik, University of Maryland

References

  1. Aizen R, Budescu DV. The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods. 2003;8(2):129–148. doi: 10.1037/1082-989x.8.2.129. [DOI] [PubMed] [Google Scholar]
  2. Budescu DV. Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin. 1993;114:542–551. [Google Scholar]
  3. Cain K, Oakhill J, Lemmon K. Individual differences in the inference of word meanings from context: The influence of reading comprehension, vocabulary knowledge, and memory capacity. Journal of Educational Psychology. 2004;96:671–681. [Google Scholar]
  4. Catts HW, Adolf SM, Weismer SE. Language deficits in poor comprehenders: A case for the simple view of reading. Journal of Speech, Language, and Hearing Research. 2006;49:278–293. doi: 10.1044/1092-4388(2006/023). [DOI] [PubMed] [Google Scholar]
  5. Compton DL, Fuchs D, Fuchs LS, Bryant JD. Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology. 2006;98:394–409. [Google Scholar]
  6. Compton DL, Fuchs D, Fuchs LS, Elleman AM, Gilbert JK. Tracking children who fly below the radar screen: Latent transition modeling of students with late-emerging reading disability. Learning and Individual Differences. 2008;18:329–337. [Google Scholar]
  7. Cutting LE, Scarborough HS. Prediction of reading comprehension: Relative contributions of word recognition, language proficiency, and other cognitive skills can depend on how comprehension is measured. Scientific Studies of Reading. 2006;10:277–299. [Google Scholar]
  8. DiPerna JC, Elliott SN. Development and validation of the academic competence evaluation scales. Journal of Psychoeducational Assessment. 1999;17:207–225. [Google Scholar]
  9. DuPaul GJ, Power T, Anastopoulos A, Reid R. ADHD Rating Scale—IV. New York: Guilford Press; 1998. [Google Scholar]
  10. DuPaul GJ, Volpe RJ, Jitendra AK, Lutz JG, Lorah KS, Gruber R. Elementary school students with AD/HD: Predictors of academic achievement. Journal of School Psychology. 2004;42(2):285–301. [Google Scholar]
  11. Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods. 1999;4:272–299. [Google Scholar]
  12. Fowler AE, Swainson B. Relationships of naming skills to reading, memory, and receptive vocabulary: Evidence for imprecise phonological representations of words by poor readers. Annals of Dyslexia. 2004;54(2):247–280. doi: 10.1007/s11881-004-0013-0. [DOI] [PubMed] [Google Scholar]
  13. Francis DJ, Fletcher JM, Catts HW, Tomblin JB. Dimensions affecting the assessment of reading comprehension. In: Paris SG, Stahl SA, editors. Children’s reading comprehension and assessment. Mahwah, NJ: Erlbaum; 2005. pp. 369–394. [Google Scholar]
  14. Francis DJ, Fletcher JM, Stuebing KK, Lyon GR, Shaywitz BA, Shaywitz SE. Psychometric approaches to the identification of LD: IQ and achievement scores are not sufficient. Journal of Learning Disabilities. 2005;38:98–108. doi: 10.1177/00222194050380020101. [DOI] [PubMed] [Google Scholar]
  15. Fuchs LS. Project PROACT Maze Reading Passages. Nashville, TN: Vanderbilt University; (n.d.) [Google Scholar]
  16. Fuchs LS, Fuchs D. Identifying a measure for monitoring student reading progress. School Psychology Review. 1992;21:45–58. [Google Scholar]
  17. Fuchs LS, Fuchs D. Treatment validity: A unifying concept for reconceptualizing the identification of learning disabilities. Learning Disabilities Research and Practice. 1998;13(4):204–219. [Google Scholar]
  18. Fuchs LS, Fuchs D, Hamlett CL, Allinder RM. Effects of expert system advice within curriculum-based measurement on teacher planning and student achievement in spelling. School Psychology Review. 1991;20:49–66. [Google Scholar]
  19. Fuchs LS, Fuchs D, Hosp MK, Jenkins JR. Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading. 2001;5:239–256. [Google Scholar]
  20. Fuchs LS, Hamlett CL, Fuchs D. Monitoring basic skills progress: Basic reading. Austin, TX: ProEd; 1990. [Google Scholar]
  21. Gough PB, Tunmer W. Decoding, reading, and reading disability. Remedial and Special Education. 1986;7:6–10. [Google Scholar]
  22. Gresham F, Elliott S. Social Skills Rating System. Circle Pines, MN: American Guidance Services; 1990. [Google Scholar]
  23. Hoge RD, Caladarci T. Teacher-based judgments of academic achievement: A review of the literature. Review of Educational Research. 1989;59:297–313. [Google Scholar]
  24. Individuals with Disabilities Education Improvement Act of 2004. 2004 Pub. L. No. 108–446, § 601, Stat. 2647. [Google Scholar]
  25. Jenkins JR, Hudson RF, Johnson ES. Screening for at-risk readers in a response to intervention framework. School Psychology Review. 2007;36:582–600. [Google Scholar]
  26. Leach JM, Scarborough HS, Rescorla L. Late emerging reading disabilities. Journal of Educational Psychology. 2003;95:211–224. [Google Scholar]
  27. MacGinitie WH, MacGinitie RK, Maria K, Dreyer LG. Gates-MacGinitie Reading Tests, Fourth Edition, Forms S and T. Itasca, IL: Riverside Publishing; 2000. [Google Scholar]
  28. Marston DB. A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In: Shinn MR, editor. Curriculum-based measurement. New York: Guilford Press; 1989. pp. 18–78. [Google Scholar]
  29. Mather N, Hammill DD, Allen EA, Roberts R. Test of Silent Word Reading Fluency: Examiner’s manual. Austin, TX: Pro-Ed; 2004. [Google Scholar]
  30. Miller AJ. Subset selection in regression. 2nd ed. New York: Chapman & Hall; 2002. [Google Scholar]
  31. Muthen LK, Muthen BO. Mplus user’s guide. Los Angeles: Muthen & Muthen; 2007. [Google Scholar]
  32. National Assessment of Educational Progress, National Center for Educational Statistics, & Institute of Education Sciences. [Retrieved on August 1, 2008];Fourth grade: The nation’s report card. 2007 from http://nces.ed.gov/nationsreportcard/pdf/main2007/2007496_2.pdf.
  33. National Center for Response to Intervention Technical Review Committee. [Retrieved October 13, 2009];Classification accuracy. (n.d.) from http://www.rti4success.org/chart/screeningTools/scoringRubrics/accuracy.html.
  34. Report of National Reading Panel: Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Report of the subgroups. Washington, DC: National Institutes of Child Health and Human Development; 2000. National Reading Panel. [Google Scholar]
  35. Oakhill JV, Cain K, Bryant PE. The dissociation of word reading and text comprehension: Evidence from component skills. Language and Cognitive Processes. 2003;18:443–468. [Google Scholar]
  36. Olson R, Forsberg H, Wise B, Rack J. Measurement of word recognition, orthographic, and phonological skills. In: Lyon GR, editor. Frames of reference for the assessment of learning disabilities: New views on measurement issues. Baltimore: Brookes; 1994. pp. 243–277. [Google Scholar]
  37. Perfetti CA. Reading ability. New York: Oxford University Press; 1985. [Google Scholar]
  38. Scarborough HS, Cutting LE, Speece DL, Sabatini JP, Olson RK, Denckla MB. Can taking a cheap and easy route get the job done? Using quick group tasks to screen for reading disability. Paper presented at the IES/ETS conference on Assessing Reading in the 21st Century; Philadelphia, PA. 2008. Apr, [Google Scholar]
  39. Semel E, Wiig E, Secord W. Clinical evaluation of language fundamentals. 4th ed. San Antonio, TX: PsychCorp; 2003. [Google Scholar]
  40. Shankweiler D, Lundquist E, Katz L, Stuebing KK, Fletcher JM, Brady S, et al. Comprehension and decoding: Patterns of association in children with reading difficulties. Scientific Studies of Reading. 1999;3:69–94. [Google Scholar]
  41. Speece DL. Hitting the moving target known as reading development: Some thoughts on screening children for secondary interventions. Journal of Learning Disabilities. 2005;38:487–493. doi: 10.1177/00222194050380060301. [DOI] [PubMed] [Google Scholar]
  42. Speece DL, Case LP. Classification in context: An alternative approach to identifying early reading disability. Journal of Educational Psychology. 2001;93(4):735–749. [Google Scholar]
  43. Speece DL, Cooper DH. Methodological issues in research on language and early literacy from the perspective of early identification and instruction. In: Stone CA, Silliman ER, Ehren B, Apel K, editors. Handbook of language and literacy disorders. New York: Guilford Press; 2004. pp. 82–94. [Google Scholar]
  44. Speece DL, Ritchey KD. A longitudinal study of the development of oral reading fluency in young children at risk for reading failure. Journal of Learning Disabilities. 2005;38:387–399. doi: 10.1177/00222194050380050201. [DOI] [PubMed] [Google Scholar]
  45. Stage SA, Abbott RD, Jenkins JR, Berninger VW. Predicting response to early reading intervention from verbal IQ, reading-related language abilities, attention ratings, and verbal IQ-word reading discrepancy: Failure to validate discrepancy method. Journal of Learning Disabilities. 2003;36:24–33. doi: 10.1177/00222194030360010401. [DOI] [PubMed] [Google Scholar]
  46. Stuebing KK, Fletcher JM, LeDoux JM, Lyon GR, Shaywitz SE, Shaywitz BA. Validity of IQ-discrepancy classifications of reading disabilities: A meta-analysis. American Educational Research Journal. 2002;39(2):469–518. [Google Scholar]
  47. Stothard SE, Hulme C. A comparison of reading comprehension and decoding difficulties in children. In: Cornoldi C, Oakhill J, editors. Reading comprehension difficulties: Processes and intervention. Mahwah, NJ: Erlbaum; 1996. pp. 93–112. [Google Scholar]
  48. Swets JA. Indices of discrimination or diagnostic accuracy: Their ROCs and implied models. Psychological Bulletin. 1986;99:100–117. [PubMed] [Google Scholar]
  49. Taylor H, Anselmo M, Foreman AL, Schatschneider C, Angelopoulos J. Utility of kindergarten teacher judgments in identifying early learning problems. Journal of Learning Disabilities. 2000;33:200–210. doi: 10.1177/002221940003300208. [DOI] [PubMed] [Google Scholar]
  50. Thompson B. Stepwise regression and stepwise discriminant function analysis need not apply here: A guideline editorial. Educational and Psychological Measurement. 1995;55:525–534. [Google Scholar]
  51. Torgesen JK, Alexander AW, Wagner RK, Rashotte CA, Voeller K, Conway T, Rose E. Intensive remedial instruction for children with severe reading disabilities: Immediate and long-term outcomes from two instructional approaches. Journal of Learning Disabilities. 2001;34:33–58. doi: 10.1177/002221940103400104. [DOI] [PubMed] [Google Scholar]
  52. Torgesen JK, Wagner RK, Rashotte CA. Test of Word Reading Efficiency: Examiner’s manual. Austin, TX: Pro-Ed; 1999. [Google Scholar]
  53. Vellutino FR, Scanlon DM, Sipay ER, Small SG, Pratt A, Chen R, et al. Cognitive profiles of difficult-to-remediate and readily remediated poor readers: Early intervention as a vehicle for distinguishing between cognitive and experiential deficits as basic causes of specific reading disability. Journal of Educational Psychology. 1996;88:601–637. [Google Scholar]
  54. Vellutino FR, Tunmer WE, Jaccard JJ, Chen R. Components of reading ability: Multivariate evidence for a convergent skills model of reading development. Scientific Studies of Reading. 2007;11:3–32. [Google Scholar]
  55. Wagner RK, Torgesen JK, Rashotte CA, Hecht SA, Barker TA, Burgess SR, et al. Changing relations between phonological processing abilities and word-level reading as children develop from beginning to skilled readers: A 5-year longitudinal study. Developmental Psychology. 1997;33:468–479. doi: 10.1037//0012-1649.33.3.468. [DOI] [PubMed] [Google Scholar]
  56. Wagner RK, Torgesen JK, Rashotte CA. Comprehensive Test of Phonological Processing: Examiner manual. Austin, TX: Pro-Ed; 1999. [Google Scholar]
  57. Wolf M, Denckla MB. Rapid Automatized Naming and Rapid Alternating Stimulus tests, examiner’s manual. Austin, TX: Pro-Ed; 2005. [Google Scholar]
  58. Woodcock R, McGrew K, Mather N. Woodcock Johnson Tests of Cognitive Abilities, Third Edition. Itasca, IL: Riverside Publishing; 2001. [Google Scholar]
  59. Woodcock R, McGrew K, Mather N, Schrank F. Woodcock-Johnson Tests of Achievement, Third Edition. Itasca, IL: Riverside Publishing; 2001. [Google Scholar]
  60. Zeno SM, Ivens SH, Millard RT, Duvuri R. The educator’s word frequency guide. Brewster, NY: Touchstone Applied Science Associates; 1995. [Google Scholar]

RESOURCES