Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Oct 1.
Published in final edited form as: J Educ Psychol. 2022 Sep 8;114(7):1495–1532. doi: 10.1037/edu0000758

Bringing Assessment-to-Instruction (A2i) Technology to Scale: Exploring the Process from Development to Implementation

Carol McDonald Connor 1, Henry May 2, Nicole Sparapani 3, Jin Kyoung Hwang 1, Ashley Adams 1, Taffeta S Wood 1, Sarah Siegal 4, Cassidy Wolfe 1, Stephanie Day 1
PMCID: PMC10249657  NIHMSID: NIHMS1843594  PMID: 37305063

Abstract

Bringing effective, research-based literacy interventions into the classroom is challenging, especially given the cultural and linguistic diversity of today’s classrooms. We examined the promise of Assessment-to-Instruction (A2i) technology redesigned to be used at scale to support teachers’ implementation of the individualized student instruction (ISI) intervention from kindergarten through third grade. In seven randomized controlled trials, A2i and ISI have demonstrated efficacy. However, the research version of A2i was not scalable. In order to bring A2i to scale in schools serving linguistically diverse students, we carried out the current study across two phases. This study represents both an exploration of what it takes to bring an educational intervention to scale (Phase 1) and a quasi-experiment on the literacy outcomes of learners whose teachers used the technology (Phase 2). We integrated assessments of vocabulary, word decoding, and reading comprehension; revised the A2i algorithms to account for the constellation of skills English learners (ELs) bring to the classroom; updated the user interfaces and added new graphic features; and improved bandwidth and stability of the technology. Findings were mixed, including several non-significant results, a marginally significant intent-to-treat effect on word reading in kindergarten and first grade for English monolingual students and ELs, and one significant interaction effect, which suggested ELs and students with less developed reading skills in second and third grade benefitted most from the intervention. With some caution, we conclude that A2i demonstrates potential to be used at scale and promise of effectiveness for improving code-focused skills for diverse learners.


Moving from research to practice is one of the most difficult challenges confronting practitioners, policy makers, and researchers today. It is critical to make evidence-based technology, programs, professional development, and other materials developed with federal funds accessible to practitioners (Fixsen et al., 2013). Unfortunately, many effective programs developed by researchers sit on shelves or computers. The Department of Education, Institute of Education Sciences (IES) has funded the development and testing of over 300 programs. Of these, over 90 programs were efficacious, yet only a small proportion are now used in schools (Albro, 2020). Education is not alone in its challenge to promote the use of evidence-based interventions in communities. Public health, medicine, and other professions share many of the same challenges. These challenges include, but are not limited to, user training and development, cost, and effectiveness at scale. Technology offers additional challenges: user access to technology and internet bandwidth, feasibility and intuitiveness of design, security, school site positionality toward change, and more.

In many, if not all, applied research studies conducted within the field of education, the goal is to contribute to the body of knowledge within the research community as well as bridge the gap from research to practice in actual classrooms. Closing the gap between research and practice “requires a broader systems perspective that leads to scaled up use of effective practices” (Odom et al., 2019). This bridge becomes tangible with the implementation of technology when considering the number of barriers between controlled research environments to large-scale application (Supplee & Metz, 2015). Hence, this study investigated how we approached and addressed barriers to school-wide implementation of Assessment-to-Instruction (A2i) technology, a web-based literacy tool to support individualized student instruction (Connor et al., 2007; Connor et al., 2016), as well as redesigned the technology to be scalable beyond a constrained research setting. We examine A2i as a tool to support literacy development for both monolingual students and English Learners (ELs). This initiative addresses the growing need for programs to effectively meet the needs of today’s linguistically diverse student body as well as the increasing call from leading researchers to focus on how to translate decades of reading research, or the “science of reading,” to practical implementation by teachers in schools (Solari et al., 2020).

The Present Study – Purpose Statement

The purpose of this effort was to describe the transition from the research version of A2i to a more generalizable platform that contained the needed components vital for improving student literacy outcomes. We had to ensure that the A2i technology had the flexibility and stability for effective implementation in schools nationwide. In the present study, we report aspects of both an exploration of what it takes to bring an educational intervention to scale and a quasi-experiment on the literacy outcomes of linguistically diverse learners whose teachers used A2i. Through this interactive process, we begin to establish evidence of consequential validity of the A2i technology. We present both aspects of scalability within Phase 1 and student level outcomes from the quasi-experiment within Phase 2 together because technology best improves education when it is considered in tandem with student learning rather than on its own (Hantula, 2019; McKnight, 2016). Moreover, implementing at scale includes considering the populations that will be affected by the intervention as it reaches more students within classrooms. For example, ELs are more likely to be reached by an intervention as it spreads to more classrooms. Hence, this paper intends to serve as a description of the scalability process while also providing initial evidence of or promise for the effectiveness of A2i at scale.

We begin with presenting the theoretical frameworks that underlie the A2i research technology and briefly outline the features of the tool to provide a foundation for the current project. We then present a model drawn from the implementation science field that we used to guide our process for “scaling up.” The project is organized across two phases. Phase 1 is the Exploration Phase (2014–2015). Here, we outline the process and procedures of the exploratory work that provided the foundation for executing Phase 2. We also reflect on lessons learned during the implementation process that allowed us to identify barriers and enact responsive solutions to bringing a revised A2i to scale in kindergarten through third grade classrooms. Phase 2 (2015–2016) is the Quasi-Experimental Phase. Here, we describe our process for developing valid, reliable, and adaptive literacy assessments integrated into the revised A2i technology using a linguistically diverse sample of students. We also present the procedures of and findings from the quasi-experiment. We outline the Methods and Results of Phases 1 and 2 separately; however, we interpret our findings from both phases in light of the potential for national scalability.

Theoretical Frameworks Underlying A2i Technology

The theoretical basis for the development of A2i was heavily influenced by the Simple View of Reading (Hoover & Gough, 1990), which outlines the importance of both decoding (code-focused) and language comprehension (meaning-focused) skills for successful reading comprehension. This theoretical model posits that strong code-focused and meaning-focused skills are necessary for reading and comprehending text—without the development of both skills, reading comprehension is jeopardized. There has been extensive empirical evidence supporting the Simple View of Reading not only for monolingual English speakers but also for ELs (e.g., Florit & Cain, 2011; Kim 2017; Mancilla-Martinez & Lesaux, 2017; Proctor et al., 2006). This justified the recommendations of both code- and meaning-focused instruction provided by A2i for both monolingual English speakers and ELs.

A2i has more recently been informed by the Lattice Model (Connor, 2016; Connor et al., 2016), which places instruction as a central force for change in students’ literacy learning. Aligned with Cronbach’s (1975) idea of aptitude by treatment interaction effects, the Lattice Model emphasizes that the effect of instruction depends on each student’s linguistic, text-specific, cognitive, and social-emotional skills (i.e., child characteristic by instruction interaction effects; Connor et al., 2007). In other words, the effects of instruction may differ based on students’ baseline skills across various developmental domains. Moreover, according to the Lattice Model, there are reciprocal or bi-directional effects such that, as instruction improves literacy skills, it also improves linguistic, cognitive, and social-emotional skills. At the same time, these developmental areas help to improve students’ literacy skills (Connor et al., 2016). This idea of students’ characteristics (skills) by instruction interaction effects on literacy, as supported by the Lattice Model, are the premise for individualizing student instruction. We next provide a brief overview of A2i. We refer the reader to Connor (2019) for a full description of the A2i features.

Components of the A2i Technology – Overview of the Research Version

DFI Algorithms and the Classroom View

As supported by the Lattice Model, A2i provides the means for teachers to individualize instruction based on the characteristics that their students bring with them into the classroom, in this case, their literacy skills. At the heart of A2i, and the premise for individualizing student instruction, there are dynamic forecasting intervention (DFI) algorithms. These DFI algorithms are patented (Connor, 2013) and developed from empirical studies (e.g., Connor et al., 2004). DFI algorithms compute recommended amounts (in minutes) of four types of literacy instruction that will optimize literacy gains based on individual student’s language and literacy skills. The four types of literacy instruction include code-focused instruction with the teacher (e.g., phonological awareness, phonics, spelling, word fluency), meaning-focused instruction with the teacher (e.g., language, vocabulary comprehension, metacognition), code-focused instruction with peers or alone (e.g., phonics worksheets) and meaning-focused instruction with peers or alone (e.g., independent sustained silent reading, buddy reading). With the right information about individual students, teachers can predict students’ potential trajectories as they learn to read, taking into account documented sources of influence (e.g., amount of literacy instruction, support from home) and constraints (e.g., previous achievement, home resources). The recommended amounts of instruction are displayed for each student in the Classroom View of the A2i technology. As students are assessed throughout the year, the calculated recommendations are automatically updated so that more recent information about students’ literacy skills is taken into consideration. The DFI algorithms used in the A2i technology have been tested for efficacy in multiple research studies (Al Otaiba et al., 2011; Connor et al., 2011a; Connor et al., 2013; Connor et al., 2007; Connor et al., 2011b; Connor et al., 2009).

A2i Assessments and Graphs

In the research version of A2i, we used standardized reading and vocabulary assessments, administered to students within their schools and entered into the technology by research assistants. Once entered, A2i uses the scores in the DFI algorithms to compute the recommended amounts and types of literacy instruction needed for optimal growth. Each student’s assessment results and targeted growth over a one-year period as well as their instructional recommendations are then displayed for teachers within graphs.

Lesson Plans

A2i provides evidence-based resources that teachers can use to individualize instruction based on students’ literacy skills. Teachers can access and download (copyright permitting) the activities from their core literacy curriculum and other indexed evidence-based literacy activities (e.g., Florida Center for Reading Research [FCRR] center activities; www.fcrr.org). They can also change the activity and locate other relevant activities using advanced search features. Once teachers have given a lesson, they click the activity as accomplished. This records that the activity was completed.

Implementation of A2i within Kindergarten–Third Grade Classrooms

Although the research version of A2i provided a means for teachers to individualize student instruction, the tool was not feasible nor scalable for classroom use without support from the research team. Previous studies examining the development and effectiveness of A2i have been grounded in design-based implementation research (DBIR)—to develop a tool in collaboration with practitioners that is by design, feasible and implementable (Connor et al., 2015; Fishman et al., 2004). Our aim for this study, however, was that individualizing student instruction, using A2i along with a professional development (PD) protocol, be scalable. In the current paper, we draw from the Exploration, Preparation, Implementation, Sustainment Model (EPIS; Aarons, Hurlburt, & Horwitz, 2011; Moullin et al., 2020) to outline a set of practices and procedures for supporting the implementation of A2i within kindergarten through third grade classrooms with high percentages of ELs. We describe each area in the EPIS model below and contextualize our stages of implementation by drawing from experiences with our school partners across two academic years (2014–2016).

Exploration

Within the EPIS model, the stage of exploration (Odom et al., 2019) takes place at the level of an outer contextual factor (e.g., school districts) and an inner contextual factor (e.g., school administrators; Aarons et al., 2011). In educational settings, these are the district leaders and school principals who make decisions about changes to instruction with which teachers will be tasked. In relation to our project, we met with school principals prior to the start of the study in order to develop a common research objective. The leaders were tasked with implementing district-mandated Response to Intervention (RTI) within their schools, which included universal literacy screening and multi-tiered, targeted instruction. Demonstrating how individualizing student instruction with the use of A2i aligned with RTI was the beginning of our mutual partnership, with the shared objective of supporting literacy gains in all learners, including ELs.

Preparation

Schools and teachers possess individual characteristics that vary. During the preparation stage, initial training is provided to site-specific teachers in order to prepare the climate for implementation, ensuring that schools and teachers have what is needed to create change (Odom et al., 2019). Researchers who work with teachers act as bridging factors or interconnections between research and implementation (Aarons et al., 2011). They must foster trust and “buy-in” of teachers. These teachers, in turn, work with their students to support classroom learning—they act as bridging factors between researchers and students. While this shifting of roles may seem complex, it is in part due to the dynamic and reciprocal nature of implementation of change illustrated by the EPIS model (Aarons et al., 2011).

To understand the varying needs and experiences of our school partners, we interviewed school leaders and led workshops with teachers. Our goal was to gather information about the school environment (access to computers and headphones, internet availability and bandwidth, class size and student characteristics, etc.) as well as individual experiences using technology and running flexible small groups. We used the information learned during this time to prepare the climate for implementation. We then created a roadmap of changes needed for successful scale up. We designed an online professional development (PD) protocol that aligned with the needs of our school partners while also addressing critical components for using A2i to individualize literacy instruction within kindergarten through third grade classrooms.

Implementation

Implementation of an educational intervention positions teachers as learners (Odom et al., 2019). Teachers both provide information and receive feedback on implementation of an intervention, and in turn, use their new learning to change their practice. Fidelity of implementation is critical at this stage as teachers communicate feasibility concerns. In addition, the research team maneuvers or adjusts approaches for different teachers at different stages of “uptake.” This might include teachers with different types of experience, degree of openness, and levels of trust that influence intervention implementation. We supported teachers’ implementation of A2i through personalized and continuous PD across the school year. We monitored and adjusted our approaches as needed to respond to individual needs, ensure uptake of new practices with fidelity, and facilitate change.

Sustainment

Sustainment can be understood in the context of bringing an educational intervention to scale as the continued implementation of an intervention that has been fully taken up by school sites in classrooms (Odom et al., 2019). Sustainment occurs after researchers have fostered relationships, supported teachers in changing practices, and communicated findings (Aarons et al., 2011). Fostering relationships often begins at the exploration stage and continues throughout the stages. These linkages, as described by the EPIS model, often operate through human and institutional relationships (Aarons et al., 2011). In the case of educational interventions at scale, this would include relationships between teachers and principals, teachers and their students and families, researchers and teachers, districts and researchers, and various combinations of the aforementioned.

At the stage of sustainment, our goal was to give our school partners the tools they needed to continue implementing A2i school-wide without extensive support from the research team, while also maintaining a positive school-researcher partnership. We therefore discussed their progress, shared findings from across the school year, and ensured that everyone (principals and teachers) continued to have access to A2i and the online PD protocol. We also offered continued technical support as needed and an open door for future communication and collaboration.

Phase 1 (2015–2016): Research Objective and Methods

To ensure effective, school-wide implementation of A2i, the primary research objective of Phase 1 was to explore thoroughly the process of scaling up. That is, we examined the transition between implementing the research version of A2i to a more generalizable tool. In Phase 1, we recruited 24 kindergarten through third grade teachers and four principals (one per school site) from two large schools in Phoenix, Arizona (AZ) with substantial EL student populations and two schools in Pittsburg, Pennsylvania (PA).

Procedures.

At the start of the academic year, we carried out in-person structured interviews with the school principals from each site to gather information on the individual needs of their schools and establish a reciprocal school-researcher partnership. We inquired about district-level and school-level concerns and noted areas for potential collaboration. Although the schools were tasked with different district-level charges, they shared the common goal of improving literacy outcomes in their early elementary students. We developed a year-long plan for partnership centered on implementing A2i in kindergarten through third grade classrooms to support individualized literacy instruction, while studying the process and gathering feedback from teachers. The schools shared their beginning and end of year progress monitoring data (i.e., DIBELS), and the research team uploaded the scores to A2i per classroom.

Initial Trainings.

The school year started with a “kick-off” in-person training for teachers at each school site. The training consisted of two half-day workshops in which we gathered information about the school implementation climate and the needs and experiences of individual teachers and grade-level teams. We also provided information regarding A2i as an evidence-based literacy tool, discussed the features of the research version, and assisted teachers in using A2i in their classrooms to individualize student instruction.

Monthly Communities of Practice Meetings.

In addition, two classroom educators from our research team facilitated monthly grade-level communities of practice meetings (e.g., Bos et al., 1999) at the AZ school sites only, as these schools were local to the research team. We developed a working handbook, which included guiding questions and monthly topics (setting up your classroom, using A2i recommendations to drive instruction) to structure the meetings and facilitate discussion. The monthly meetings followed a similar sequence across the schools and grade-levels, including a “check-in” period to inquire about strengths and concerns with individualizing instruction using A2i, delivery of content, and discussion with reflection.

Classroom Observations.

In addition to these monthly communities of practice, the classroom educators from our research team observed each of the AZ teachers in their classrooms three times during the year (fall, winter, spring). Specifically, we were interested in understanding whether and how teachers effectively used A2i to plan and deliver literacy instruction within individualized, small groups and differentiated learning centers for their diverse student body. We assisted teachers as needed in understanding the A2i recommendations, creating individualized small groups and learning centers based on the A2i recommendations, and preparing the A2i recommended curricula materials and evidence-based activities.

Focus Groups.

Finally, we carried out focus groups with teachers from each site to gather information on their experiences using A2i in their classrooms. For the AZ schools, the teachers, research team, and program developers participated in focus groups (one focus group per site). In the PA schools, the research team met with teachers, gathering notes to share with the program developers at a later time. The focus group questions centered on teachers’ experiences with specific features of A2i. We inquired, for example, about the A2i features teachers found most helpful and how easily they were able to navigate the tool as well as readability of tables and figures and usefulness of the A2i recommended materials and activities. This information was critical, as it helped to inform the updates we made to the A2i technology prior to Phase 2.

Data Sources.

We collected detailed notes from the initial planning meeting with the school principals, the “kick-off” training, and the monthly communities of practice meetings with our AZ schools. We compared notes from the monthly communities of practice meetings across groups to outline similarities and differences between the different grade levels and schools. In addition, we gathered field notes during the classroom observations and monitored teachers’ usage of A2i to support their students’ learning as a means for gauging fidelity. Finally, we iteratively reviewed the records taken from the focus groups, in which we elicited teachers’ feedback about their experiences using A2i. Taken together, we identified four themes that we addressed prior to the quasi-experiment carried out during the 2015–2016 school year. We next outline barriers and solutions derived from the four themes. See Table 1 for a summary of this process.

Table 1.

Summary of the procedures and key points outlined in Phase 1

Identified Barrier and Data Sources that Informed Decisions Solution Evidence to Support Scalability Key Points and Recommendations
Barrier 1, Effort from Research Team: The high level of effort required from the research team to administer, score, and enter the assessments that allow for the A2i algorithms to make their instructional recommendations for each individual student.
We documented the amount of time that the research team spent on gathering assessment information and uploading scores into A2i.
Solution 1, Integrated Assessments: We developed and tested three literacy assessments that were integrated into A2i. The assessments are adaptive, so students begin each assessment at their grade level, but the difficulty level of the items either increases or decreases based on the students’ performance. For example, if students miss an item, the next item is easier; if they get the item correct, the next item is more difficult. This allows for relatively quick administration of each assessment (approximately 7–10 minutes per assessment).
The assessments also provide a reliable and valid measure of students’ literacy skills, which are used in the A2i algorithms to make instructional recommendations for every student. The test results and instructional recommendations are updated in real time with each completed assessment
With the redesign, the research team did not need to collect assessment data or upload scores into A2i. Teachers were able to administer the assessments independently with some assistance from the research team as needed.
As a result of the adaptive nature of the assessments, students were able to take the online assessments throughout the year without seeing the same items multiple times. Teachers were able to monitor and track their literacy progress over time and make changes to their practices based on the assessment information.
These assessments were further improved to be functional on iPads and Tablets, which improved the flexibility of use for schools and the reliability of the scores for younger students.
Exploration. Centering a school-research partnership on a common goal is critical for successful implementation of school-based interventions.
Universal screening, effective progress monitoring, and targeted tiered instruction that leads to literacy achievement in all learners are the school- and district-level objectives that provided the entry point for our study. By establishing a mutual partnership, in which school leaders and teachers were key players in our study, we were able to successfully redesign and implement A2i within classrooms—ensuring that A2i provided teachers the means to monitor their students’ literacy progress and make instructional decisions with ease.
Barrier 2, User Interface: Teachers wanted additional information about the lesson planning feature, lesson plans to link with Common Core State Standards (CCSS; 2010), and better data visualization of student progress. School and district leaders wanted reports on how A2i was being used in individual classrooms.
We iteratively reviewed records taken from the focus groups and revised the user interface based on teachers’ feedback and suggestions.
Solution 2, Improved Lesson Plans: We added search and navigation menus, a wider curriculum selection, and indexed curricula materials that were linked with CCSS. A set of administrative menus were also added, allowing new curricula and resources to be added directly to the A2i lesson database. New curricula materials and resources continue to be indexed and stored in the A2i Lesson Plan.
Student progress reports were enhanced, and teacher usage reports and tracking features (tracking user-clicks per page visits) were included in order to facilitate district and school educational leaders’
provision of focused support to teachers for individualizing student instruction.
These updates improved the flexibility of the program and expanded the number of activities teachers could address to meet their students’ diverse learning needs overall. Teachers were able to independently navigate the A2i features and pages and implement the recommended activities that aligned with their curriculum, which were linked to the CCSS.
Reporting features allowed teachers and school administers to access and export student test scores, making these data easily available at both the school and district level.
School leaders were also able to review and download teacher user logs, which provided the amount of time that teachers used the varying A2i features.
Sustainment. Fostering positive relationships among school leaders, teachers, students, and the research team is foundational for sustainability.
A2i needed to be an accessible, flexible, reliable, and stable tool that provided teachers with the information they needed, in a format they could read and interpret, to individualize student literacy instruction. With easy access to teacher usage and student progress information, A2i provided a platform for communication between school leaders and teachers. This was an important component for sustainability, as school leaders are key players in supporting teachers in changing their practices.
Barrier 3, Recommendations for English Learners (ELs): Teachers wanted to know how to interpret the A2i recommendations for ELs. With the growing number of ELs attending elementary school in the U.S., scaling up meant that A2i needed to work for all students, including students from culturally and linguistically diverse backgrounds.
During the in-person, structured interviews with school principals, the initial “kick-off” training” and the classroom observations, we learned of the district-wide goal of improved literacy outcomes in all learners, including ELs.
Solution 3, Recommendations for ELs:
We revised the algorithms to make instructional recommendations for ELs based on their current literacy skills in English. Because our sample of ELs demonstrated limited vocabulary skills, we included vocabulary in the A2i algorithm. By doing so, A2i provided recommendations for both teacher-managed meaning-focused time and additional teacher-managed code-focused time based on students’ vocabulary skills.
For children in the United States who speak a language other than English at home, research has documented that well-designed education programs with appropriate assessments can successfully support their achievement in both English and their home language (Bialystok, 2001; Collins, 2014; Francis et al., 2006). This appears to be especially the case for ELs who speak Spanish at home (e.g., Baker et al., 2016; Collins, 2014).
Using the revised the algorithms, teachers are able to deliver individualized literacy instruction to all of their students, including ELs with varying levels of English proficiency. This is especially critical in districts serving large numbers of culturally and linguistically diverse students.
Implementation. Positioning teachers as learners and ensuring interventions are carried out with high fidelity are critical steps in the implementation process.
We revised the A2i algorithms to ensure that teachers were able to use A2i to make data-driven instructional decisions for all their students, including ELs. We provided teachers with personalized PD throughout the year to gather feedback on the implementation process and provide them with information to move forward. These meetings also helped us gauge whether teachers were using A2i as a tool to individualize student instruction, which in part, provided us with a measure of fidelity.
Barrier 4, Bandwidth: Slower-than-normal response times from the website during times when website traffic was high.
We became aware of this issue during the communities of practice meetings and the classroom observations when assisting teachers in using A2i.
Solution 4: The infrastructure of the servers, codebase, and internal data tables were enhanced. The capacity to handle large numbers of simultaneous users greatly improved the flexibility and power to the technology. In addition, purely logistical implications of scale included increased data security and capacity needs within the system. Security updates were made to the website as well as the password system to allow for higher levels of data security. Teachers were able to use A2i without response times slowing during high traffic times. Multiple teachers within schools were able to access the varying A2i features simultaneously, and students were able to access the web-based A2i assessments directly without having to navigate A2i. The capacity to handle large numbers of simultaneous users greatly improved the flexibility and power of the technology. Preparation. Preparing the climate for successful school-wide implementation is foundational.
We needed to ensure that the environment was adequately equipped to support change. In order to benefit from A2i, we learned that 1) we needed teachers’ “buy-in” about the tool, 2) teachers needed to have access to multiple computers (at any given time), and 3) school sites needed to have fast and reliable internet connectivity. Issues at this stage of the implementation process could have jeopardized successful implementation efforts overall.

Barriers and Solutions to Implementation – Redesigning A2i Technology

Barrier and Solution 1, Effort from Research Team and Integrated Assessments

Perhaps the most daunting barrier identified was the high level of effort required from the research team to administer, score, and enter the assessments that allow the A2i algorithms to make instructional recommendations for individual students. As a result, we determined that A2i would need integrated assessments that students could take with relatively little teacher intervention. We realized that the assessments would need to be short enough for students to take multiple times in a school year, and they would need to provide reliable, valid estimates of students’ language and literacy skills. The assessments would also need to be scored automatically, without researcher support. With this in mind, we developed three adaptive assessments validated for students in kindergarten through third grade that could be integrated into A2i: an online vocabulary assessment (Word Match Game [WMG]) and two reading assessments (Letters to Meaning [L2M] and Reading to Comprehension [R2C]). Details on item development and psychometric properties are reported in Table 1 and in the Method section.

Barrier and Solution 2, User Interface and Improved Lesson Plans

The second barrier was related to the user’s experience of the user interface (i.e., how easy A2i was to navigate and use). Teachers and administrators reported wanting additional information about the lesson plans, specifically how they related to the Common Core State Standards (CCSS; Common Core State Standards Initiative, 2010) and better tools to visualize teacher usage of A2i and student progress across the school year. To be responsive to these requests, we improved and expanded the lesson planning feature, which was used to facilitate automatic lesson planning for the implementation of individualized instruction in the classroom. Specifically, we included search and navigation menus, a wider curriculum selection, indexed curriculum activities linked to the CCSS, and recommended open-source materials linked directly to the lesson plans. We also included enhanced reports for student progress and teacher usage, improved reporting features as well as added more web-based PD resources. See Table 1 for details and Appendix A for screenshots.

Barrier and Solution 3, Recommendations for ELs and Updating the A2i Algorithm

A third theme that emerged from the data was teachers’ desire to understand how to interpret the A2i recommendations for ELs. The initial studies that demonstrated the efficacy of A2i were conducted in areas that had a diverse cultural and racial makeup, but they were not diverse linguistically. Considering the growing number of ELs attending elementary school in the United States, and the fact that the teachers involved in Phase 1 of the study were in AZ and PA, it is not unsurprising that this issue arose. Having an intervention that scales up means having an intervention that works for all students, including students from culturally and linguistically diverse backgrounds.

Although scholars of effective instruction for ELs call for more research on modifications to classroom instruction for ELs, they have identified several strategies that are advantageous to literacy development including, individualizing (or differentiating) instruction (Gunn et al., 2000; Kamps et al., 2007), providing ongoing teacher support and student monitoring (Haager & Windmueller, 2001), identifying similarities and differences between students’ first and second languages (Giambo & McKinney 2004; Kramer et al., 1983), and capitalizing on first language strengths (August et al., 2014; August & Shanahan, 2010). A number of classroom-level intervention studies that have focused on ELs have also shown positive effects in enhancing students’ language and literacy skills (e.g., Cheung & Slavin, 2012; Collins, 2014; Dianda et al. 1995; Calderón et al., 1998; Vaughn et al., 2005). Drawing from this evidence and from the Simple View of Reading framework, we concluded that individualizing instruction using both code- and meaning-focused instructional recommendations from A2i would be appropriate for ELs, but we considered the need to revise the A2i algorithms to accommodate ELs’ unique constellations of skills.

Given that the integrated A2i assessments were developed to measure literacy skills in English, we re-evaluated the appropriateness of the algorithms to make instructional recommendations for ELs (who were receiving English-only instruction) based on their current literacy skills in English. The information that feeds the algorithm for recommendations related to time spent in meaning-focused instruction is pulled from student performance on the vocabulary assessment (for kindergarten and first grade) and from the reading comprehension assessment (for second and third grade). ELs with limited oral language proficiency in English would be expected to score lower than children with higher levels of English oral language proficiency on these assessments, which would lead the algorithms to recommend more time in teacher-managed, meaning-focused instruction. Increased time in small-group instruction that supports oral language development aligns with recommendations within the existing literature related to how best to support ELs in the classroom (e.g., August et al., 2016; August et al., 2018; Baker et al., 2014; Crevecoeur et al., 2013; Gersten & Baker, 2000; Gunn et al., 2000; Shanahan & Beck, 2006). We recognize, however, that more precise recommendations could likely be made by incorporating both English and native language skill—this is a direction of future work.

When considering the A2i algorithm’s recommendations for teacher-managed, code-focused instruction, we explored whether to base this recommendation solely on word reading skills (as had been the case with previous A2i studies among English-only students) or to include vocabulary scores so that students with lower levels of vocabulary would receive recommendations for larger amounts of teacher-managed, code-focused instruction. Our rationale for ultimately altering this algorithm to include both word reading and vocabulary skills was that students with less developed English vocabularies would benefit from spending relatively more instructional time with the teacher where they would be most likely to receive explicit, code-focused instruction tailored to their individual needs. Again, we based this conclusion on theory as well as the literature related to best instructional practices for ELs (e.g., Baker et al., 2014; Cunningham & Stanovich, 1997; Ouelette, 2006; Perfetti & Hart, 2002; Scarborough, 2001; Thomas & Sénéchal, 2004). See Table 1 for additional information and further rationale.

Barrier and Solution 4, Bandwidth

The final barrier was identified as a result of teacher reports of occasional slower-than-normal response times from the website, which the research team identified as being related to times when website traffic was high. To address the increase in traffic inherent in scale up, the infrastructure of the servers, codebase, and internal data tables were enhanced to account for additional users without reducing performance. To reduce traffic on the main website, a protocol was also developed to enable students to access the online assessments directly, without having to navigate A2i. See Table 1 for further detail.

Phase 2 (2015 – 2016): The Quasi-Experimental Phase – Research Objectives

Phase 2 aimed to test whether our revised, scalable version of A2i demonstrated promise of effectiveness when implemented by elementary school teachers serving both English monolingual students and ELs. There were three research questions in this quasi-experiment.

  1. What is the validity of the newly developed, integrated A2i assessments that are embedded within the A2i technology?

  2. What effect does teachers’ use of the revised A2i technology, with on-going professional development (PD), have on students’ literacy outcomes (intent-to-treat)?
    • Does the effect of A2i depend on students’ initial language and literacy skills?
    • Does the effect of A2i depend on whether students are monolingual or EL?
  3. Controlling for pre-intervention reading scores, are post-intervention reading scores higher for those students whose teachers spent more time using the A2i technology?1
    • To what extent does teachers’ use of the revised A2i technology, calculated from user logs (treatment teachers only), predict students’ reading outcomes?
    • Does this vary by students’ monolingual or EL status?
    • Is teacher use of A2i related to PD uptake?

Method

Transparency and Openness Statement

This research was conducted following a grant proposal funded by the Institute of Education Sciences (IES; Grant # R305A160404), which pre-specified the research questions, theoretical framework, implementation strategy, data collection, and analysis plan. As an IES Development Grant, there was no requirement for public release of data, and the IRB protocol and consent forms for this study do not allow for sharing data with third parties. Data analyses were conducted using HLM7 and SAS 9.4; data analysis code is available from the authors upon request. Selected materials from the study (e.g., the implementation fidelity rubric) are also available from the authors upon request. A2i is now a commercial product, and the authors include a conflict of interest statement printed elsewhere in this article.

Procedure

During the 2015–2016 academic year, we conducted a quasi-experiment to assess the promise of the effectiveness of using A2i to support teachers as they individualized their students’ literacy instruction. Two large schools in AZ were randomly assigned to either use A2i at the beginning of the school year (immediate treatment) or to wait until April of the school year (delayed treatment). The school year for both schools ended in June. Both schools used the same curriculum: Wonders, published by McGraw Hill (12/program/microsites/MKTSP-BGA07M0/wonders.html). The Wonders curriculum was indexed (embedded within A2i) so that teachers could access recommended lessons from the A2i Lesson Plan based on their students’ grade level and reading ability.

Participants

Thirty-three kindergarten through third grade teachers and their students (N = 763) participated in the quasi-experiment. There were four or five classrooms per grade level at each school. Sixty-eight percent (68%) of the participants qualified for the US National School Lunch Program (NSLP), which is frequently used as a proxy for socio-economic status. Eighty percent (80%) of the students were Hispanic/Latinx, with 25% designated as ELs. In this district, students identified as non-proficient English Learners (ELs) were assigned to an English immersion classroom (EL classroom), with one EL classroom per grade level per school. EL classrooms had a dedicated four-hour English language block to support English language development. This four-hour block was at the academic expense of other content areas, with mathematics as the exception.

Professional Development

All participating teachers across both treatment conditions received professional development (PD) delivered by educators (certified teachers or classroom specialists) on the research team. However, the PD protocol varied by treatment condition. The teachers in both conditions participated in two half-day workshops prior to the beginning of the school year, but only the immediate treatment condition was given access to A2i at this time. With access to A2i, they were able to access the online PD materials and utilize all of the A2i features (Lesson Plan, Classroom View, etc.). In addition, the teachers in the immediate treatment condition received personalized coaching in the classroom three times per year and monthly grade-level communities of practice meetings. In the delayed treatment condition, teachers were given access to A2i starting in April.

Measures

Students were administered a battery of well-established, valid, and reliable standardized literacy measures as well as the A2i online literacy assessments. For both conditions, all assessments, excluding the A2i online assessments, were administered in the fall (between August and September depending on classroom schedules) and again in the spring (April). Students in the immediate treatment condition completed the A2i online assessments in the fall and spring; Students in the delayed treatment condition completed the A2i assessments only in spring, just before their teachers began using A2i since accessing the assessments required access to A2i. The spring assessment scores represent the outcome measures for the quasi-experiment. In addition, as a measure of implementation fidelity, we monitored teachers’ A2i usage through user-logs and gauged teachers’ PD uptake using a researcher-developed rubric.

Standardized Literacy Measures

Woodcock-Johnson III Test of Achievements (WJ-III).

The WJ-III (Woodcock et al., 2001) is a standardized assessment, normed on a nationally representative sample that measures a wide range of students’ cognitive and academic abilities. The Letter-Word Identification subtest (LW) was used to assess kindergarten and first graders’ ability to name and decode words out of context. Research personnel administered the LW subtest individually to students in a quiet area outside the classroom. Reliability on the subtest in the students’ age range varied from .93 to .98. The intraclass correlation (ICC) for the spring assessment was .40, suggesting that 40% of the variability in students’ scores fell between classrooms. W scores were used in the analyses.

Gates-MacGinitie Reading Test (GM).

The GM (MacGinitie & MacGinitie, 2006) is a standardized reading assessment that has two subtests: Vocabulary and Reading Comprehension. Research personnel administered the assessment to second and third graders as a whole group within their classrooms. Reliability coefficients ranged from .64 to .75, and the ICC for the spring assessment was .26. Extended scale scores from both subtests were used in the analyses.

A2i Online Assessments

One key aim of this study was to develop integrated online and computer adaptive tests that were valid (i.e., demonstrating both construct and predictive validity) to use in the A2i technology. The use of computer adaptive assessments within A2i allowed for shorter test administration times, with initial item selection determined by a student’s grade level and subsequent item selection determined by a student’s performance on previously administered items. This maximizes both the efficiency and reliability of the A2i assessments by presenting students with only a subset of items specifically aligned with their current ability level. Sample practice items are presented in Appendix A. Three A2i online assessments, outlined below, were used in the current study: Word Match Game (WMG), Letters2Meaning (L2M), and Reading2Comprehension (R2C). Teachers administered the assessments to students in their classrooms with assistance from the research team as needed.

Word Match Game (WMG).

This assessment was designed to measure students’ vocabulary knowledge using a semantic matching task. Students are presented with three words by audio and text (e.g., cat, kitten, tree). The words are highlighted as they are presented, and students are asked to click two words that go together (e.g., cat and kitten). The assessment is adaptive, requiring students to match more advanced vocabulary words (e.g., copal and resin) if they continue to match correctly or conversely, presenting more simple vocabulary if they are not semantically matching words.

Letters2Meaning (L2M).

This assessment was designed to assess students’ decoding, word reading, spelling, and sentence writing skills (generative comprehension skills and grammatical knowledge). L2M has five consecutive components ranging from simple alphabetic principle tasks to sentence-level semantics: Letter Identification, Letter-Sound Identification, Word Identification, Letters2Words, and Words2Sentences. The easiest task is Letter Identification in which they click on the letter that they hear from a pool of letters. In the Letter-Sound Identification task, students hear a letter sound and are asked to click on the letter that corresponds to the sound from a pool of letters. In the Word Identification task, students are asked to click on the word they hear from a pool of words. In the Letter2Words task, students hear a word and are asked to select letters from a pool of letters to spell out the word. Finally, the Words2Sentences task asks students to create meaningful sentences from a pool of words. Text structure (e.g., punctuation) are included as clues for creating the sentence. This assessment advances through all five components as students answer correctly. The ICC for the April assessment of L2M was .57.

Reading2Comprehension (R2C).

This assessment was designed for students who read at a second-grade level or higher. R2C measures students’ higher-order reading comprehension skills (inferencing and comprehension monitoring) across social studies, science, and narrative text. Students read a passage that is missing a word early in the paragraph and select one of four words to fill in the blank. All four choices make sense when they are first read in the sentence, so students cannot identify the correct word until they read and comprehend the entire paragraph.

Teacher Involvement Measures

A2i-Generated Teacher User-Logs.

A2i automatically generates user-logs outlining the amount of time teachers spend using the varying A2i features (e.g., Classroom View, Lesson Plan, etc.). The user-logs can be viewed as charts within A2i for teachers to see, and they can be exported as Excel spreadsheets. For this study, we focused on the total amount of time teachers in the immediate treatment condition used A2i (min), not including the time students spent on assessments.

Teacher PD Uptake Rubric.

This researcher-developed rubric included eight items (outlined in Appendix B). One of the educators from the research team rated teachers’ PD uptake, ranging from 1 (poor) to 5 (strong) based on teachers’ attendance in the monthly communities of practice meetings, participation in the PD opportunities, and willingness to learn and use A2i within their classrooms.

Psychometric Analyses Plan

Item Response Patterns and Missing Data

Given that items on some A2i assessments were administered via a computer adaptive testing (CAT) platform, which selects items to be administered based on correct/incorrect responses, the specific set of items administered to one student was generally different from the set of items administered to other students. Methods for handling missing data across the full set of items included the EM algorithm to estimate the item covariance matrix and full-information maximum likelihood (FIML) estimation of scaling model parameters and person scores.

Dimensionality

The number of constructs captured by each instrument was examined via exploratory factor analysis and scree plots of factor eigenvalues. The EM algorithm was used to estimate the item covariance matrix given the missing data associated with computer adaptive administration. Data for each instrument were analyzed separately to assess the strength of a single latent construct (i.e., an overall scale) and search for evidence of potential subscales for each instrument. A large eigenvalue for the first factor relative to the second eigenvalue (e.g., ξ1 > 3×ξ2, or ξ1 > 2×ξ2 and ξ2 <1.5) was considered evidence of unidimensionality.

Scaling Model and Estimation

A Rasch scaling model was used to estimate item difficulty parameters and person scores for each instrument. The Rasch model is an item response theory (IRT) model that expresses the probability of a correct item response as a function of an item’s difficulty (bi) and the respondent’s ability (θ). A unidimensional model was used for each assessment. The functional form of the model is:

P(Yi=1)=eθbi1+eθbi (1)

The scaling model used estimates item and person parameters using all of the available data and accommodates the differences in item sets administered to different students. The Rasch scaling models were estimated using PROC IRT in SAS STAT 14.1 under SAS 9.4. All items from an instrument were included in the estimation for that instrument, with non-administered items having missing values for responses. FIML was used to estimate the item parameters based on the complete set of observed item responses, with non-administered items excluded from likelihood calculations. An item difficulty parameter and its standard error was estimated for each item for which there were at least 30 responses, including at least one incorrect and one correct response. The percent of correct responses was also calculated for each item. Goodness of fit for each item was assessed using Pearson’s Chi- Square statistic based on the subset of students responding to the item. P-values were calculated for each item, with values less than .05 suggesting poor fit under the Rasch model.

An overall test information function (TIF) was calculated for each instrument based on the estimated item parameters and associated item characteristic curves. A plot of the TIF curve for each instrument was used to assess the precision of score estimation throughout the range of possible test scores. Information values greater than 2 (i.e., corresponding to a reliability greater than .70) were considered adequate for precise score estimation at that point on the ability scale. Values less than 2 were considered as suggesting the need for additional items with difficulty near that point on the ability scale.

Respondent Scores

An overall score (θ) was estimated for each respondent as the maximum a posteriori (MAP) score, which is equal to a weighted combination of the maximum likelihood (ML) score and a standard normal (mean = 0, standard deviation = 1) Bayesian prior distribution. MAP scores are highly correlated with ML scores, but they are less prone to problems of estimation and outlier scores for those students who answer most or all items presented to them correctly or incorrectly (e.g., ceiling or floor effects).

Grade equivalent (GE) scores were calculated by linking scores on each A2i assessment to scores on the LW and GM reading assessments administered at approximately the same time (generally within 2–4 weeks) of the A2i assessment. Linking to the standardized assessments allows estimation of GE scores relative to a nationally representative sample of elementary students. Non-linear regression models using a logit transformation were estimated to determine a conversion equation between the standardized test scale scores and grade equivalents.

Statistical Model of Impacts

A hierarchical linear model (HLM) was used to analyze differences in students’ scores from April, 2016 on the A2i Letters2Meaning test (grades K-3), the LW test (grades K-1), and the GM Reading test (grades 2–3). The mathematical form of the model is:

Level-1Equation:Yij=β0j+β1j(Pretestij)+rij (2)
Level-2Equation:β0j=γ00+γ01(A2iIMMj)+γ02(Gradej)+u0j (3)
β1j=γ10 (4)

with Yij representing the April test score for student i from classroom j, Pretestij representing the fall pretest score for student i from classroom j (included only for a subset of LW and GM models given that A2i fall scores do not exist for the delayed intervention group), A2iIMMij indicating whether the class was in the treatment condition (A2iIMMj=1) or the delayed treatment (control) condition (A2iIMMj=0), and Gradej indicating the grade level of classroom j (K = 0, First = 1, etc.). The coefficients represent the fitted mean score in kindergarten (γ00), the impact of A2i (γ01), an overall grade effect (γ02). Additional parameters are included in some models to estimate the moderating effect (i.e., interaction) of Grade, EL status, or baseline literacy scores on the A2i impact (γ11). Given that this study is focused on feasibility of implementation and potential impacts of A2i as a new intervention, and that it involves a relatively small sample, we do not implement a strict .05 cutoff for significance, nor do we implement a correction for multiple tests. While this does increase the possibility of a Type I error, the exploratory nature of this study calls for more focused control of Type II errors.

Results

Research Question 1: The Validity of the A2i Assessments

Results for each of the three assessments based on the series of psychometric analyses described above are summarized in Table 2. Additional details and figures are provided in Appendix C. Results for two of the three A2i assessments, WMG and R2C, suggest unidimentionality while results for L2M suggests a strong general factor, and three to six subscales. Item fit statistics were good for all but a few items, and item difficulty statistics and test information plots suggest adequate reliability of measurement (i.e., I>2.0, r>.70) throughout a wide range of abilities for both the L2M and WMG computer adaptive tests; while the R2C item difficulty statistics and test information plots suggest that the R2C assessment (which does not use CAT) is not appropriate for students with less-developed reading skills.

Table 2.

Summary of Psychometric Analysis Results

Assessment Word Match Game Resuts of IRT analyses
209 items with over 30 responses for each
Proportion correct across items = .61
Item difficulty ranged from −3.3 to +3.5 (mean difficulty = −.38)
Appears to be unidimensional
Overall test information was excellent with a bell-shaped function and total information greater than 2.0 throughout the range of Rasch theta scores from −5.0 to +5.0, suggesting that computer adaptive administration of WMG will produce reliable individual scores throughout the full range of student abilities
Letters2Meaning 2,807 test administrations with a majority of students responding to more than 10 items. 505 individual items with more than 30 student responses were used in the IRT analyses.
L2M may not be purely unidimensional with a large first factor. Thus, there exists the potential for subscales.
Overall test information for the complete pool of 505 Rasch-scaled L2M items was excellent, with a bell-shaped information function and Total Information greater than 2.0 throughout the range of Rasch theta scores from −5.0 to +5.0, suggesting that computer adaptive administration of L2M will produce reliable individual scores throughout the full range of student abilities.
Reading2Comprehension All 10 items in the R2C item pool had more than 30 responses and were included in the Rasch analyses. The average proportion correct across the items was .37 and the median proportion correct was .32 across all items. Item difficulty parameter estimates for the 10 items ranged from −1.5 to +2.3 with a mean difficulty of +1.28, a median difficulty of +1.53, and a standard deviation of 1.03 points on the Rasch Theta scale. Standard errors for the difficulty estimates ranged from 0.07 to .16 with a mean standard error of 0.11 and a median standard error of 0.10 points on the Rasch Theta scale.
Overall test information for the complete pool of 10 Rasch-scaled R2C items was modest, with a bell-shaped information function and Total Information greater than 2.0 for Rasch theta scores in the range +1.0 to +3.0, suggesting that computer adaptive administration of R2C will produce reliable individual scores only in the upper range of student abilities and that reliability of R2C scores at the lower end would be improved if additional items were added to the R2C item pool.

We reviewed how well each A2i assessment correlated with itself and with standardized measures, including the LW subtest on the WJ-III and the GM. Results are provided in Table 3. The L2M correlated highly with both the LW subtest (given only to kindergarten and first grade students) and the GM (given only to second and third grade students) with correlations (r) ranging from .65 to .76. The WMG was moderately correlated with L2M (r = .56), while it had smaller correlations with LW and GM (r ranging from .27 to .37) and no significant correlation with R2C. R2C was moderately correlated with L2M and to the GM (r = .30).

Table 3.

Correlations among A2i assessments and standardized reading assessments

Spring L2M Spring WMG Spring R2C Fall GM Spring GM Fall LW Spring LW
Spring L2M Pearson Correlation 1
N 580
Spring WMG Pearson Correlation .557** 1
N 561 659
Spring R2C Pearson Correlation .296** .062 1
N 283 354 357
Fall GM Pearson Correlation .727** .370** .249** 1
N 249 298 292 304
Spring GM Pearson Correlation .762** .374** .355** .858** 1
N 256 305 299 285 310
Fall LW Pearson Correlation .645** .270** .a .a .a 1
N 274 274 3 0 0 365
Spring LW Pearson Correlation .751** .294 .a .a .a .859** 1
N 280 279 3 0 0 326 342

Note.

**

Correlation is significant at the .01 level (2-tailed).

a

Cannot be computed because there were no students who took both assessments (LW were administered to K-1st graders, GM was administered to 2nd–3rd graders).

L2M = Letters2Meaning assessment, R2C = Reading2Comprehension, GM = Gates MacGinitie Reading Test, LW = Woodcock Johnson III Test of Achievements Letter-Word Identification subtest. Grade equivalent scores were used for L2M and R2C. Extended Scale Scores were used for GM and W scores were used for LW

Research Question 2: Effects of A2i on Students’ Literacy Outcomes (Intent-to-treat Results)

Analyses revealed no significant differences between conditions on the standardized measures at baseline, which is required for a strong quasi-experiment (Shadish et al., 2002). Table 4a shows means and standard deviations for treatment and control groups on WJ-III LW subtest and GM assessments at baseline. On average, students were reading at grade expectations in kindergarten and first grade, based on examination of LW standard scores (M = 98). However, in second and third grades, based on GM percentile rank, on average students were reading below grade expectations at the 34th percentile. In general, second and third grade students in EL classrooms tended to have lower GM scores (23rd percentile) compared to their peers in general education classrooms. There was no significant difference in LW scores for ELs in kindergarten or first grade EL classrooms compared to their peers in general education classrooms (Table 4b).

Table 4a.

Baseline comparisons: Descriptive statistics for kindergarten and first grade (top) and second and third grade (bottom).

Kindergarten and first grade descriptive statistics for letter-word identification at baseline
Condition Mean Std. Deviation N
Delayed Treatment 367.46 44.737 165
Immediate Treatment 369.51 51.128 162
Total 368.47 47.946 327
Note. Wilks’ Lambda = .999, p = .844. W-scores were used.
Second and third grade descriptive statistics for Gates MacGinitie Reading Test at baseline
Condition Mean Std. Deviation N
Delayed Treatment 409.15 36.289 221
Immediate Treatment 404.64 38.811 198
Total 407.02 37.525 419
Note. F(1, 417) = 1.51, p = .219. Extended Scaled Scores were used

Table 4b.

Baseline comparisons: HLM for kindergarten and first grade fall letter-word identification (top) and for second and third grade fall Gates-MacGinitie Reading Test (bottom).

Effect Estimate SE 95% CI p
LL UL
K-1 Fall WJIII Letter-Word Identification
Fixed Effects
 Intercept 446.637 4.452 437.911 455.363 <0.001
 A2i Immediate Treatment 3.095 3.613 −3.986 10.176 0.406
 Grade 75.634 3.619 68.541 82.727 <0.001
 EL Class −12.963 8.630 −29.878 3.952 0.134
 EL × A2i Immediate Treatment 9.960 7.157 −4.068 23.988 0.165
 EL × Grade 1.407 7.165 −12.636 15.450 0.844
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 5.680 32.262 19.112 0.160
 Student 29.684 881.158
Effect Estimate SE 95% CI p
LL UL
2–3 Fall Gates MacGinitie Reading
Fixed Effects
 Intercept 392.878 6.794 379.562 406.194 <0.001
 A2i Immediate Treatment −6.261 5.205 −16.463 3.941 0.249
 Grade 26.934 5.241 16.662 37.206 <0.001
 EL Class −27.110 12.442 −51.496 −2.724 0.030
 EL × A2i Immediate Treatment 18.996 10.177 −0.951 38.942 0.063
 EL × Grade 4.329 10.195 −15.653 24.311 0.671
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.597 0.356 10.410 >0.500
 Student 29.474 868.742

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1 or 2nd grade = 0 and 3rd grade = 1. WJII Model Deviance = 3471.262866. GM Model Deviance = 2895.465514.

Intent-to-Treat Results

Using hierarchical linear modeling (HLM), with students nested in classrooms and a fixed effect denoting school/treatment assignment at the classroom level, we examined intent-to-treat (ITT) effects using the A2i integrated assessments as well as the WJ-III LW subtest for kindergarteners and first graders and the GM for second and third graders. When using the A2i assessments as outcome measures, we found a marginally significant ITT effect on L2M in kindergarten (p = .09) with a small effect size (d = 0.15). This effect decreased as grade increased (see Table 5). Scores on L2M were higher for students in later grades, but there was no grade by treatment interaction effect. When we added EL classroom to the classroom level of the model, we found a treatment by EL classroom interaction effect (p = .048)—there was generally little effect of treatment for students in general education classrooms, but there was a greater treatment effect for students in English Learning classrooms (see Table 6 and Figure 1). We did not find significant ITT effects for WMG (Table 7) or R2C (Table 8). When we examined the effect of A2i for kindergarten and first grade students on the LW subtest using HLM, we found significant effects of treatment (p = .004) with an effect size (d) of 0.37 (see Table 9). We also found a significant effect of grade—first graders had higher scores than kindergarteners. However, there was no grade by treatment interaction effect. When we added EL classroom to the model at the classroom level (see Table 9 bottom), there was still an effect of treatment for A2i with no significant difference for ELs. Nor was there an EL classroom by treatment interaction effect. That is, A2i was effective for improving letter-word recognition of kindergarten and first graders regardless of whether students were in EL or general education classrooms.

Table 5.

Intent-to-treat effects for Letters2Meaning (L2M) in kindergarten–third grade: HLM model predicting treatment effects on spring L2M scores.

Effect Estimate SE 95% CI p
LL UL
L2M Spring Scores
Fixed Effects
 Intercept 0.863 0.039 0.787 0.939 <0.001
 A2i Immediate Treatment 0.097 0.056 −0.013 0.207 0.090
 Grade 0.648 0.031 0.587 0.709 <0.001
 Grade × A2i Immediate Treatment −0.033 0.042 −0.115 0.049 0.434
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.03719 0.001 31.779 0.428
 Student 0.64074 0.411

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1, 2nd grade = 2 and 3rd grade = 3. Model Deviance = 1138.935716.

Table 6.

HLM model predicting ITT effects on spring L2M scores in kindergarten–third grade, including EL as a classroom level variable.

Effect Estimate SE 95% CI p
LL UL
L2M Spring Scores
Fixed Effects
 Intercept 0.930 0.044 0.844 1.016 <0.001
 A2i Immediate Treatment 0.021 0.052 −0.081 0.123 0.695
 Grade 0.637 0.020 0.598 0.676 <0.001
 EL Class −0.218 0.053 −0.322 −0.114 <0.001
 EL × A2i Immediate Treatment 0.142 0.069 0.007 0.277 0.048
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.009 <0.001 25.204 >0.500
 Student 0.637 0.406

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1, 2nd grade = 2 and 3rd grade = 3. Model Deviance = 1137.247867.

Figure 1.

Figure 1.

Modeled results for kindergarten in general education (general education classroom in light gray) and English immersion classrooms (EL classroom in dark gray). Results look the same for first grade but scores are higher. Error bars are standard errors. L2M GE = Letters2Meaning grade equivalent scores.

Table 7.

Intent-to-treat effects for Word Match Game (WMG) in kindergarten–third grade: HLM model predicting treatment effects on spring WMG scores

Effect Estimate SE 95% CI p
LL UL
WMG Spring Scores
Fixed Effects
 Intercept 0.291 0.051 0.191 0.392 <0.001
 A2i Immediate Treatment 0.006 0.065 −0.122 0134 0.928
 Grade 0.390 0.042 0.308 0.473 <0.001
 Grade × A2i Immediate Treatment 0.019 0.050 −0.080 0.118 0.710
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.136 0.018 51.209 0.013
 Student 0.679 0.461

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1, 2nd grade = 2 and 3rd grade = 3 and grand mean centered. Model Deviance = 1390.535888.

Table 8.

Intent-to-treat effects for Reading2Comprehension (R2C) in second and third grade: HLM model predicting treatment effects on spring R2C scores.

Effect Estimate SE 95% CI p
LL UL
R2C Spring Scores
Fixed Effects
 Intercept 1.249 0.073 1.105 1.393 <0.001
 A2i Immediate Treatment −0.033 0.087 −0.203 0.136 0.700
 Grade 0.128 0.101 −0.069 0.325 0.220
 Grade × A2i Immediate Treatment 0.077 0.118 −0.156 0.309 0.518
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.087 0.008 17.509 >0.500
 Student 0.671 0.451

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded 2nd grade = 2 and 3rd grade = 3 and grand mean centered. Model Deviance = 740.505185.

Table 9.

Spring WJ letter-word identification descriptive statistics (top) and HLM intent-to-treat effect for kindergarten and first grade (middle) and adding EL classroom (bottom).

Descriptive Statistics
Grade Mean Std. Deviation N
Delayed Treatment Kindergarten 383.22 24.774 88
1st 428.46 27.519 87
Total 405.71 34.581 175
A2i Immediate Treatment Kindergarten 395.89 25.593 88
1st 436.51 30.047 76
Total 414.71 34.321 164
Total Kindergarten 389.55 25.906 176
1st 432.21 28.918 163
Total 410.06 34.699 339
HLM Results
Effect Estimate SE 95% CI p
LL UL
Fixed Effects
 Intercept 384.650 2.517 379.717 389.583 <0.001
 A2i Immediate Treatment 10.090 2.935 4.337 15.843 0.004
 Grade 42.362 2.936 36.607 48.117 <0.001
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.427 0.182 9.228 >0.500
 Student 27.045 731.433
HLM Results Including EL Interaction
Effect Estimate SE 95% CI p
LL UL
Fixed Effects
 Intercept 386.089 2.733 380.732 391.446 <0.001
 A2i Immediate Treatment 7.706 3.393 1.056 14.356 0.044
 Grade 42.741 2.951 36.957 48.525 <0.001
 EL Class −6.944 4.843 −16.436 2.548 0.179
 EL × A2i Immediate Treatment 9.797 6.799 −3.529 23.123 0.177
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.429 0.184 6.820 >0.500
 Student 27.029 730.552

Note. W-scores were used. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1. Model Deviance = 3198.329921.

When we examined treatment effects for second and third graders on the GM total score, there was no significant effect of treatment (see Table 10). There was a grade effect with third graders achieving generally higher scores than second graders. There was no grade by treatment interaction effect. When we added the EL classroom variable at the classroom level, students in EL classrooms had generally lower GM scores compared with students in general education classrooms, and although there was no significant intent-to-treat main effect, the treatment by EL classroom interaction effect was marginally significant (p = .07). This suggests that EL students experienced larger impacts of A2i on their GM reading scores in second and third grades.

Table 10.

Spring Gates Mac-Ginitie Reading Test descriptive statistics (top) and HLM intent-to-treat effect for second and third grades (middle) and adding EL classroom (bottom).

Descriptive Statistics
Grade Mean Std. Deviation N
Delayed Treatment 2nd 418.92 41.252 61
3rd 451.39 40.937 83
Total 437.63 43.979 144
A2i Immediate Treatment 2nd 412.06 35.006 83
3rd 444.22 27.163 83
Total 428.14 35.153 166
Total 2nd 414.97 37.792 144
3rd 447.80 34.820 166
Total 432.55 39.717 310
HLM Results
Effect Estimate SE 95% CI p
LL UL
Fixed Effects
 Intercept 405.843 8.924 388.352 423.334 <0.001
 A2i Immediate Treatment −8.651 7.127 −22.620 5.318 0.245
 Grade 31.663 7.147 17.655 45.671 <0.001
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 11.979 143.505 42.279 <0.001
 Student 34.603 1197.395
HLM Results Including EL Interaction
Effect Estimate SE 95% CI p
LL UL
Fixed Effects
 Intercept 414.000 7.281 399.729 428.271 <0.001
 A2i Immediate Treatment −14.246 6.224 −26.445 −2.047 0.041
 Grade 30.490 5.388 19.930 41.050 <0.001
 EL Class −29.179 8.725 −46.280 −12.078 0.006
 EL × A2i Immediate Treatment 24.010 12.245 0.010 48.010 0.074
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 7.211 52.004 21.630 0.042
 Student 34.638 1199.806

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade was grand mean centered with Grade 2 = 2 and Grade 3 = 3. EL classroom = 1; General Education Classroom = 0. Deviance = 3053.991523.

When we added fall pretest scores to the LW and GM impact models and included an interaction between baseline literacy scores and the A2i treatment (Table 11), we found the following. For LW, the main effect of A2i remained positive and significant (p = .003), and there was no significant interaction with fall LW scores (p = .42). Thus, the impact of A2i was not significantly different for students with higher or lower baseline literacy scores in kindergarten and first grade. For GM, the main effect of A2i was not significant (p = .36), but there was a significant negative coefficient of the interaction with fall GM scores (p = .015). This suggests that the impact of A2i was greater for students with lower initial GM scores.

Table 11.

ITT effects including baseline literacy interaction for spring WJ Letter-Word Identification in kindergarten and first grades (top) and spring Gates Mac-Ginitie Reading Test in second and third grades (bottom).

Effect Estimate SE 95% CI p
LL UL
K-1 Spring WJIII Letter-Word Identification
Fixed Effects
 Intercept 385.068 2.235 380.687 389.449 <0.001
 A2i Immediate Treatment 9.570 2.570 4.533 14.607 0.003
 Grade 41.991 2.569 36.956 47.026 <0.001
 Fall LW 0.781 0.048 0.687 0.875 <0.001
 Fall LW × A2i Immediate Treatment −0.051 0.063 −0.174 0.072 0.422
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 3.644 13.278 26.197 0.016
 Student 16.235 263.586
Effect Estimate SE 95% CI p
LL UL
Grd. 2–3 Spring Gates MacGinitie Reading
Fixed Effects
 Intercept 359.097 20.035 319.828 398.366 <0.001
 A2i Immediate Treatment −7.067 7.552 −21.869 7.735 0.365
 Grade 30.374 7.560 15.556 45.192 0.001
 Fall GM 1.053 0.064 0.928 1.178 <0.001
 Fall GM × A2i Immediate Treatment −0.202 0.082 −0.363 −0.041 0.015
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 14.631 214.066 127.577 <0.001
 Student 19.792 391.712

Note. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1 or 2nd grade = 2 and 3rd grade = 3 and grand mean centered. WJII Model Deviance = 2745.773536, GM Model Deviance = 2535.290981.

To summarize, there were not significant ITT effects on the integrated A2i assessments aside from a marginal effect on L2M in kindergarten. However, there was a significant A2i treatment by EL classroom interaction effect on L2M, suggesting that students in EL classrooms benefitted more from A2i use than their peers in general education classrooms. There was a significant ITT effect on LW (d = 0.37; kindergarten and first grade students) scores and no interaction with EL status. There was not a significant ITT effect of A2i on GM. However, the A2i treatment by students’ baseline literacy skills interaction term was significant as was the A2i treatment by EL classroom interaction, suggesting that EL and monolingual students who started the year with less developed reading skills benefited more from the intervention than those who started off as stronger readers.

RQ 3: Relationships between Teachers’ A2i Use and Student Outcomes

We accessed A2i teacher-use logs, which were embedded in the technology to record overall A2i usage, including the time spent using the planning-specific aspects of A2i (i.e., the Literacy Minutes Manager, Student Test Scores and the Activity Planner). The user logs serve as a proximal measure of the time individual teachers spent planning for individualized literacy instruction (Connor et al., 2010). It is important to note that this measure cannot provide detailed information about the extent to which teachers adhered to the key recommendations from A2i, only the extent to which they engaged with the technology. However, previous studies using A2i have demonstrated that teacher usage of A2i alongside the fidelity measure of the individualizing student instruction framework is linked with student literacy achievement (Connor et al., 2007; Connor, Piasta et al., 2009).

Considering only the teachers in the immediate treatment condition, we examined whether teachers’ time spent using A2i (min) predicted students’ spring L2M scores (before the delayed treatment teachers used A2i), controlling for fall L2M scores (see Table 12). We found that the more teachers used A2i over the school year, the greater were their students’ word reading skill gains. For every 100 extra minutes teachers spent using A2i, their students’ scores generally increased by 0.1 GEs or about a one-month increase. Importantly, this effect was greater for students with less developed fall scores. Teachers’ time spent using A2i had the same effect regardless of whether they were teaching an EL class or a general education class. The fall L2M by EL classroom interaction effect was not significantly different from zero. However, we did not see the same association with two other A2i assessments—WMG and R2C. Furthermore, when these models were run using LW and GM scores, the associations between A2i use and student outcomes were not statistically significant. As such, we do not include detailed tables of model estimates for these two outcomes, but these are available by request to the corresponding author. It is also important to note that because teachers’ use of A2i is likely confounded with other factors, the strength of causal inference is considerably weaker than that for the ITT effects.

Table 12.

Predicting effects of A2i use (minutes) on students’ spring L2M outcomes for Immediate Treatment group only (top) and considering EL classrooms (bottom).

HLM Results
Effect Estimate SE 95% CI p
LL UL
Fixed Effects
 Intercept 1.891 0.140 1.616 2.166 <0.001
 A2i Use 0.004 0.002 0.001 0.008 0.023
 Fall L2M 0.275 0.041 0.194 0.355 <0.001
 A2i Use × Fall L2M −0.001 0.000 −0.002 −0.001 <0.001
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.536 0.287 207.262 <0.001
 Student 0.563 0.317
HLM Results Including EL
Effect Estimate SE 95% CI p
LL UL
Fixed Effects
 Intercept 1.856 0.207 1.450 2.262 <0.001
 A2i Use 0.005 0.002 0.000 0.009 0.055
 EL Class 0.150 0.422 −0.677 0.976 0.728
 Fall L2M 0.288 0.050 0.189 0.387 <0.001
 A2i Use × Fall L2M −0.001 0.000 −0.002 −0.001 <0.001
 EL Class × Fall L2M −0.105 0.100 −0.300 0.090 0.292
Random Effects Std. Dev. Variance Component χ2 p
 Classroom 0.694 0.482 331.145 <0.001
 Student 0.563 0.317

Note. W-scores were used. A2i immediate treatment = 1; Delayed Treatment = 0. Grade coded K = 0, 1st grade = 1, 2nd grade = 2 and 3rd grade = 3. Model Deviance (top) = 443.122457, Model Deviance (bottom) = 450.328206.

RQ 3: Examining Teacher Uptake of Professional Development and A2i Use

We next examined teachers’ uptake of our PD protocol using the researcher-developed rubric completed. We found that, on average, teachers in the immediate treatment group achieved scores of 30.94 (out of 40), which is significantly greater than the teachers in the delayed control condition, who received scores of 22.76 on average. In the immediate treatment condition, teachers in kindergarten and second grade participated in PD more than did teachers in the other grades (kindergarten M = 32.5, SD = 4.1; first grade M = 26.0, SD = 6.6; second grade M = 37.0, SD = 3.6; third grade M = 28.25, SD = 6.9). There was no significant mean difference in teachers’ uptake of our PD between EL classrooms and general education classrooms (EL classroom PD uptake M = 28.00, SD = 6.38; general education M = 31.92, SD = 7.09).

Overall, the teachers in the immediate treatment condition (n = 16) used A2i for an average of 161.30 minutes (SD = 84.62) and this ranged from a low of 64 minutes to a high of 425 minutes. We did not compare this with the teachers in the delayed treatment condition because they only had access to A2i from April through June. There was no significant difference in the amount of time teachers spent using A2i when we compared EL classrooms with general education classrooms. General education classroom teachers used A2i for a mean of 171.3 minutes (95% CI 118.28; 224.30), whereas EL classroom teachers used A2i for a mean of 131.33 (95% CI 39.51; 223.14) minutes. Finally, we found a significant correlation between PD uptake and use of A2i (min, r = .526, p = .037). That is, the more teachers participated in PD, the more likely they were to spend time using A2i, or similarly, teachers who used A2i were more likely to participate in PD.

Discussion

The purpose of this study was to describe the process of bringing A2i to scale for effective implementation in classrooms serving a linguistically diverse group of learners. We investigated teachers’ use of the revised A2i technology with PD support on the literacy outcomes of English monolingual students and Els. We contextualized our process of reaching effective scalability by grounding the study in the EPIS model, a conceptual framework within the Implementation Science field. We present our findings in one paper in order to illustrate a path to effective classroom change, spanning from redesigning to implementing A2i. Our findings are important, as they provide a theoretical framework, with specific practices and procedures, for researchers and practitioners to implement evidence-based interventions within classrooms. These data also provide initial evidence of consequential validity of A2i, such that, documenting teachers’ use of A2i is what makes the tool scalable and leads to meaningful change when used as intended within classrooms. Overall, our newly designed A2i technology, including the new DFI algorithms, shows promise to use at scale with kindergarten and first grade monolingual students and Els. We have five principle findings gleaned from the two phases of the study:

  1. Our aim was to develop computer-based adaptive assessments that teachers could administer easily and that were valid and reliable for linguistically diverse students. In general, results show that the integrated A2i online adaptive assessments were psychometrically strong; particularly WMG and L2M (see Appendix C). R2C had limited range and was appropriate only for students with strong reading skills. This result suggests the need to develop more R2C items to assess students with varying reading abilities. Currently, only students in second or third grade are able to take the R2C assessment. Furthermore, we found that teachers required more support from the research team to use the assessments independently than anticipated, but there was variability with some teachers able to use the assessments independently while others not at all. One reason for this may have been due to the content within our PD session. We primarily focused on helping teachers read and interpret assessment results to plan individualized instruction for their students, with little focus on the logistics of administering the assessments. With our goal of sustainability, we plan to include more PD centered on assessment administration (i.e., logging into A2i, navigating the assessments) in order to ensure that teachers have the foundational knowledge they need to move forward independently.

    In addition, some teachers questioned the validity of the newly developed assessments and the results, feeling that their students’ scores were too low overall. Fortunately, the IRT results demonstrate that the integrated online assessments are valid and reliable and correlate significantly with the LW and GM assessments, which are widely used standardized measures of reading. Yet, this example highlights the importance of the “preparation” stage in the EPIS model; making sure that the climate is ready for implementation includes fostering trust and “buy-in” from teachers. If teachers question the validity of an intervention, for example, they will likely not believe that implementing the tool will benefit themselves or their students. The teachers’ view that the assessments underestimated their students’ abilities, therefore, points to the need to better prepare teachers in observing, understanding, interpreting variability in their students’ individual skill development, such as stronger word decoding skills than vocabulary skills as we observed in our sample.

  2. Overall, we observed mixed results for the quasi-experimental intent-to-treat analyses. The standardized reading assessments (LW for kindergarten and first grade; GM for second and third grade) revealed that the intent-to-treat effect was significant only for kindergarten and first grade students (d = .15 L2M, p = .09; d = .37 LW, p = .004). There was no main treatment effect for second and third graders on the GM; however, there is evidence of interactions between A2i intervention with EL status and baseline literacy scores. Students in EL classrooms and those with less developed literacy skills in second and third grade experienced larger gains when their teachers used A2i.

    We present two possible interpretations of the differential findings we observed in kindergarten and first grade compared to second and third grade. The first possibility is that an “active ingredient” of A2i implementation may be appropriately-timed individualized code-focused instruction. Since the development of code-focused skills is critical during kindergarten and first grade, it is possible that better alignment between students’ instructional needs with the actual instruction they are provided leads to better overall word reading outcomes. Intervention that primarily affects code-focused skills may be less effective for students in subsequent elementary grades who are starting to make the transition from learning to read to reading to learn (Chall, 1996; Wanzek et al., 2010)—those who may be nearing mastery of code-focused skills. This interpretation is supported by previous studies that have documented the effects of A2i on code-focused skills (e.g., Al Otaiba et al., 2011; Connor et al., 2007; Connor, Morrison, Schatschneider, et al., 2011; Connor, Morrison, Fishman, et al., 2011; Connor et al., 2013) as well as studies that have documented A2i treatment effects on reading comprehension outcomes in third grade (Connor, Morrison, Fishman et al., 2011). Furthermore, in a longitudinal efficacy study evaluating A2i, Connor and colleagues (2013) found that effects may be cumulative such that, unless second and third graders had participated in A2i classrooms beginning in first grade, their performance was not significantly different than students in the control condition (teachers did not use A2i). Thus, as previous studies have indicated, a clear recommendation from an implementation standpoint is to introduce A2i starting in kindergarten and first grade and then follow students into second and third grade in a gradual rollout. Such gradual rollout could also support sustainment, the final step of the EPIS model (Aarons et al., 2011).

    A second possible explanation for the differential findings by grade lies in the outcome measure for determining intent-to-treat effects for students in the lower vs. upper grades. In kindergarten and first grades, the intent-to-treat effect was based on students’ word reading, whereas in second and third grade, a standardized measure of reading comprehension was used. Word reading is a relatively more constrained skill set that is more malleable with effective instruction, at least in the short-term, than reading comprehension (e.g., Paris, 2005; Snow & Matthews, 2016). Word reading is also easier to measure relative to reading comprehension, as the scope and sequence of development are more clearly defined (Snow & Matthews, 2016) and there is less susceptibility to bias since word reading is significantly less dependent on factors like background knowledge or inference making skills (e.g., Kim, 2017, 2020). In contrast, reading comprehension is a notoriously difficult construct to change and to measure. Even the most rigorous of studies, such as those conducted through the Reading for Understanding initiative, found it difficult to “move the needle” on reading comprehension (Pearson et al., 2020). Nevertheless, prior research on A2i did find significant impacts on reading comprehension (Connor et al., 2007; Connor, Morrison, Fishman, et al., 2011). Considering the small sample of the present study, its quasi-experimental design, and the relatively short duration of teachers’ implementation of A2i, it is not surprising that we did not observe a significant treatment effect with reading comprehension as the outcome variable.

    We have certainly considered what modifications would need to be made to A2i in order to bring about a strong treatment effect when comprehension is the outcome variable. We concede that quantity (i.e., recommendations in minutes of time spent in a given instructional activity) is only one element of instructional quality. It is likely that, in order to make significant changes in children’s comprehension abilities, a coherent, knowledge-rich curriculum that builds knowledge within and across grade levels will be necessary (Hirsch, 2006; Kamhi & Catts, 2017; Willingham, 2006). In addition, we face the same barriers as other researchers in this area in finding ways to properly assess reading comprehension by either a) aligning reading comprehension assessment closely to actual content being taught or b) finding ways to decouple background knowledge from reading comprehension performance (to the extent possible) such as with a more authentic assessment like the GISA (O’Reilly, Sabatini, & Deane, 2013). While there is clearly more work to be done to improve reading comprehension, we found it promising that students with less developed reading abilities benefitted from participating in classrooms where A2i was being used (the A2i treatment by students’ baseline literacy skills interaction effect as measured by GM performance). Nonetheless, these findings highlight the need for a continued focus on improving effective instructional practices for promoting growth of meaning-focused skills, which might be particularly important for ELs since limited L2 oral language proficiency may interfere with successful L2 reading comprehension (e.g., Lesaux, 2006; Mancilla-Martinez & Lesaux, 2010; Nakamoto et al., 2007).

  3. We found promising effects for ELs. Our findings provide convincing evidence that A2i usage leads to improved word reading outcomes for ELs, with greater effects for students in EL classrooms than for students in general education classrooms. This finding is supported in the research literature. Studies have found that although ELs often enter school with less developed literacy skills (Hammer et al., 2011), they are able to perform on par with their monolingual peers on word-reading accuracy after as little as one year of formal instruction (see Lesaux & Geva, 2006 for a review). In addition, we documented a marginally significant EL by A2i treatment interaction effect on students’ reading comprehension outcomes. This interaction effect suggests that meaning-focused, individualized instruction may also lead to improved reading comprehension outcomes. This finding supports a central theme in the literature on effective literacy instruction for ELs—namely, that instruction focusing on the development of oral language skills is integral for successful reading comprehension (August & Shanahan, 2006; Castro et al., 2011). Hence, these findings provide preliminary evidence suggesting that both code- and meaning focused instruction, when aligned with ELs’ individualized needs, may lead to improved literacy skills, and that A2i can leverage knowledge of ELs’ baseline skills to lead to better individualized instruction. This discussion must be tempered with the caution that there was a single EL classroom per grade level so differential effects would have to be quite large in order to detect them within this study design. Taken together, the revised DFI algorithms used in A2i appear to be working as expected in kindergarten and first grade classrooms. Hence, using A2i technology to individualize student instruction shows promise of efficacy for both English monolingual students and ELs.

  4. Analyses of the relationship between students’ post-intervention reading scores and teachers’ time spent with A2i technology, revealed that the more teachers used A2i (min), the greater were their kindergarten through third grade students’ reading gains on one A2i assessment of word reading. This relationship was stronger for students with less developed reading skills in the fall. Teachers in EL classrooms generally used A2i to the same extent as teachers in general education classrooms. Moreover, this effect of A2i use within the A2i immediate treatment condition was consistent for students in EL classrooms and in general education classrooms. However, these findings were not replicated when measures of vocabulary or reading comprehension were used as the outcome measure. These results suggest that A2i technology can be used in classrooms that serve students from diverse linguistic backgrounds with varying levels of English proficiency to improve word reading, and the revised DFI algorithms are working as anticipated. The data also suggest that teachers in both general education and EL classrooms are able to better support student development of code-focused skills through individualizing instruction using A2i.

  5. Overall, teachers’ uptake of our PD varied by grade level, with kindergarten and second grade teachers more likely to participate in the A2i PD protocol than first and third grade teachers. There is not a theoretical reason we can ascertain that would explain this grade level effect based on the nature of teaching other than individual variation. Important aspects of practitioner level variables from the EPIS model, such as the openness to change, the conviction that change needs to happen in order for goals to be met, and different perceptions of risk to change could explain these grade level differences (Aarons et al., 2011). Furthermore, uptake did not vary between EL classrooms and general education classrooms. The more teachers, including teachers of EL classrooms, participated in the A2i PD, the more likely they were to spend using A2i. Furthermore, as noted above, the more time teachers used A2i, the greater were their students’ word reading skill gains. However, we did not find a similar relationship with other measures of reading (e.g., vocabulary, reading comprehension).

In scaling up the PD, we attempted to move resources online so they were easily accessible through A2i. However, the PD was still too expensive to be fully scalable. Cost analyses suggested that with the current PD protocol, the entire implementation cost per student is about $150 (including PD, technical support, administrative support, etc.); noting that PD is the primary driver of implementation costs. A more scalable version would have total implementation costs closer to about $50 per student. It may be that moving more of the PD online and replacing most face-to-face interactions with video conferencing would reduce costs while maintaining efficacy. In light of the COVID-19 pandemic, accessibility of effective web-based interventions, such as A2i, for individualizing student learning is increasingly important. While our initial concerns with scalability revolved around pricing, schools now face the added challenge of distance learning, often mediated by technology. This necessitates such web-based approaches as A2i to support classroom learning for all students, including linguistically diverse students.

Limitations

There are limitations to this study that should be considered when interpreting the results. We conducted a quasi-experiment where two schools were randomly assigned to immediate or delayed treatment conditions. However, in the analyses, treatment variables were entered at the classroom level rather than at the school level, and the numbers of classrooms and students in this study were small with regards to power for subgroup and moderation analyses. A fully powered randomized controlled trial was beyond the scope of the project; however, the groups were equivalent at baseline, which is a strength. To examine the efficacy of A2i and its impact on diverse student populations, a fully powered randomized controlled trial is needed. Next, we intentionally recruited higher poverty schools that served a higher proportion of Hispanic/Latinx students, with approximately 25% of students in English immersion classrooms based on their reported limited English proficiency. Unfortunately, the schools would not allow us to assess students’ language and reading skills in Spanish, so we relied on the schools’ assessment of English proficiency. There were certainly dual language learners (i.e., students from non-English speaking homes) in the general education classrooms, but we were not able to identify them. Thus, we had to rely on school report on students’ EL status as a classroom-level variable (i.e., EL classroom). Additionally, it is not clear that these findings would generalize to other school settings with different student demographics and varying levels of teachers’ openness to innovation although studies using the research version of A2i suggest that A2i and individualizing student instruction is effective across a range of school settings (e.g., Connor et al., 2007, 2011, 2013). Furthermore, we acknowledge that the measures we used to assess teachers’ uptake of PD and A2i usage could be improved. To fully understand how fidelity of implementation impacts student outcomes, more information about how teachers use A2i’s recommendations in the classroom is needed. Simply knowing the amount of time teachers spent using A2i can only give us a measure of surface fidelity and more sophisticated analyses (e.g. mediational or instrumental variable analyses) would be needed to fully understand this relationship. As a future direction, we plan to design and cross-validate measures of teachers’ uptake that would more carefully examine teachers’ behaviors in relation to their student outcomes. Finally, we acknowledge that the adjustments made to the A2i algorithms may not reflect the full set of language and literacy needs ELs bring to the classroom, rather they were developed to use with both monolingual students and ELs. As mentioned above, we were limited by outer context factors of AZ state laws on monolingual instruction and assessment. As an area for future research, we plan to develop a partner set of validated assessments in students’ first language (Spanish in this case) that would make recommendations in light of students’ first and second language and literacy abilities. Beyond this, it is our goal that the algorithms will provide recommendations for both English and Spanish instruction (i.e., dual language instruction) with an eye towards supporting biliterate readers.

Other Lessons Learned and Scaling Up

Principal buy-in, the extent to which principals supported and enforced the school-wide implementations of A2i for individualizing instruction, was found to be instrumental in ensuring teachers’ use of A2i. This lesson is confirmed in the EPIS framework idea of stakeholders who act as inner contextual factors to promote and lead change (Aarons et al., 2011). Grade level teams engaged more with the technology when there was at least one teacher at a grade level team that advocated for the use of A2i. Thus, we strongly recommend that for scale up, implementation be focused on the entire system—district, school, and classroom. This might include memos of understanding with the district and identifying literacy champions at the school to work closely with teachers and literacy coaches. Implementation Science suggests that such a strategy should be effective (Fixsen et al., 2013).

A critical finding of this study was that in kindergarten and first grade, A2i was effective for improving students’ word reading skills in EL classrooms and was similarly effective for students in general education classrooms. Moreover, there was no significant difference in outcomes in kindergarten and first grade for EL and monolingual students, which is highly encouraging. According to the Census, ELs now make up 25% of elementary students (Bauman, 2017). Thus, studies that identify potentially effective, scalable interventions must logically include analysis for linguistically diverse students. Based on the proportionality of ELs in classrooms, and their unique needs, educational programs that do not consider ELs are less likely to be successfully implemented at scale.

There is ongoing debate about the “Science of Reading” and how to support teachers’ use of evidence-based practices (e.g., Castles, Rastle, & Nation, 2018; Solari et al., 2020). The “reading wars” were the original inspiration for A2i and regrettably, the battle has become re-invigorated. A2i is positioned to answer this re-emerging challenge of supporting teachers in providing effective literacy instruction given A2i’s long track record of efficacy (e.g., Connor et al., 2004; 2007), and now, initial implementation research. A concern about the science of reading movement is that there is not clear advice to teachers about exactly what the science of reading looks like in their classrooms. Although data driven, individualized instruction is associated with substantial literacy gains (e.g., Al Otaiba et al., 2011; Fuchs, Fuchs, & Phillips, 1994), teachers find it difficult to implement effectively (Roehrig et al., 2008). A2i technology facilitates individualized instruction and supports the delivery of more efficacious and efficient instruction (e.g., Connor, 2009; Connor et al., 2011). A compelling reason to get effective interventions, such as A2i, off of researchers’ computers and into classrooms is to provide effective tools for teachers that operationalize the science of reading in ways that ensure that students achieve proficient reading skills.

This study describes the process of bringing A2i technology to scale using the EPIS implementation model. We outline the lessons we learned, providing a framework for future research and practice. With funding through the Department of Education, Education Innovation Research (EIR) program, we are currently using these data to plan and conduct a large-scale study to bring A2i to scale nationally at a reasonable cost per student. This means that A2i could potentially move from being a pure research tool to a professional support system that can be used in many schools that differ substantially in location (e.g., New Jersey, New York, Pennsylvania, California) and student populations (although the focus of the EIR project is on working with schools that serve children in need). In the EIR project, we added an out-of-school component so that individualized student learning experiences could continue in students’ homes and communities (learningovations.com; readcharlotte.org). This focus has become more critical during the COVID-19 pandemic, as much of the instruction and learning experiences happen in out-of-school contexts and online domains. The results of the current scalability study described throughout this paper directly inform what we are now doing in the EIR project nationally. It is our intention that these studies, together, provide an example of the EPIS model in school settings for other researchers and practitioners as they work to bring their effective programs to scale.

Educational Impact and Implications Statement.

In this study, we outline the process of bringing Assessment-to-Instruction (A2i) technology to scale within kindergarten through third grade classrooms serving linguistically diverse learners. We carried out this research within two interactive phases. Within Phase 1, we worked closely with our school partners to guide the revision of A2i technology to use at scale. In Phase 2, we conducted a quasi-experiment on the literacy outcomes of learners whose teachers used A2i. Overall, our newly designed A2i technology shows promise to use at scale with kindergarten and first grade monolingual students and English Learners. Limitations, implications, and future directions are discussed.

Acknowledgements

We thank the entire ISI lab team for their contributions to this study with a special thank you to our school partner administrators and teachers, and to the parents and students who participated. Funding to support this project included the US Department of Education, Institute of Education Sciences, Grants # R305A160404, R305N160050, and R305A210077 as well as NICHD # P50 HD052120. Dr. Carol M. Connor has an equity interest in Learning Ovations, a company that may potentially benefit from the research results. The terms of this arrangement have been reviewed and approved by the University of California, Irvine in accordance with its conflict of interest policies. Dr. Sarah Siegal, who was a doctoral student at the time this study was conducted, now works for Learning Ovations, Inc. All results and reporting have been carefully reviewed by authors who do not have a conflict of interest.

Appendix A

Screenshots of the A2i Technology

Figure A.1.

Figure A.1.

Classroom view. Children’s names have been whited out to preserve confidentiality. Each line represents the individual recommendations for one student.

Figure A.2.

Figure A.2.

Training item from Word Match Game (WMG). Student hears, “click on the two words that go together.” Each word is highlighted as it is said.

Figure A.3.

Figure A.3.

Item from Letters-2-Meaning (L2M). Student hears “click on the word “hour”.

Figure A.4.

Figure A.4.

Training Item from Reading2Comprehension (R2C). Students are asked to read passages and choose the best word to fill in the blanks. The instruction and passages are read out aloud for them.

A.5.

A.5.

Progress graph for individual student (not a real name). The blue line represents the target for achievement and the black line shows students’ actual progress.

Figure A.6.

Figure A.6.

Classroom graphs. Student names are pseudonyms to preserve confidentiality. Each set of bars represents achievement over time for one student.

Figure A.7.

Figure A.7.

Lesson plan pag

Appendix B. Rubric of Teacher Uptake of Professional Development

Item Score
Teacher response and participation in communities of practice (COP) meetings (1 = poor; 5 = strong)
Teacher attendance in COP (1 = missed > 2 session; 5 = attended all sessions)
Teacher response and participation in in-classroom PD (1 = poor; 5 = strong)
Teacher attendance in in-class PD (1 = not willing to schedule; 3 = scheduled but ignored Research Partner; 5 = scheduled and used feedback
Teacher comfort with technology (1 = not at all comfortable; 5 = very comfortable)
Teacher feedback on user interface (1 = not useful 5 = very useful)
Teacher willingness to learn how to use A2i (1 = not willing; 5 = very willing)
Teacher willingness to meet with Research Partner on a one-to-one basis (1=not willing to schedule, 3=scheduled but ignored feedback, 5=scheduled and used feedback)
Total
Comment

Appendix C

Scaling Results for the A2i Letters to Meaning (L2M) Assessment

A total of 2,807 test administrations were used in the scaling analysis for L2M. Given that the L2M was a computer adaptive assessment, the number of items administered to each student varied. Nearly all test administrations included more than 10 items, and the majority of students responded to 20 or 25 items from the L2M item pool.

graphic file with name nihms-1843594-f0001.jpg

Dimensionality

A Scree plot suggests that the L2M assessment is not purely unidimensional. While the first factor is large, there exists the potential for several subscales. Subsequent analyses were conducted for the overall L2M score, two subscales (i.e., Decoding and Comprehension), and separately for all six subtests within the L2M (i.e., Letter Identification (LID), Sound Identification (SID), Word Recognition (WR), Letters to Words (L2W), Words to Sentences (W2S), and Sentences to Paragraphs (S2P).

graphic file with name nihms-1843594-f0002.jpg

graphic file with name nihms-1843594-f0003.jpg

Item Statistics

Of the 686 items in the L2M item pool, 505 items had more than 30 responses and were included in the Rasch analyses. The average proportion correct across the items was .53 and the median proportion correct was .58 across all items. Item difficulty parameter estimates for the 505 items ranged from −6.5 to +9.3 with a mean difficulty of −0.03, a median difficulty of −0.21, and a standard deviation of 2.8 points on the Rasch Theta scale. Standard errors for the difficulty estimates ranged from 0.12 to 1.79 with a mean standard error of 0.36 and a median standard error of 0.32 points on the Rasch Theta scale. Full details of item statistics for all L2M items are included in Table A1.

Goodness of Fit

Of the 505 items included in the Rasch scaling, 30 items had more than 200 responses, allowing calculation of an item fit Chi-Square statistic. Of these 30 items, only 2 items had significant goodness of fit statistics (p < .05). Both of these items had more than 350 responses, suggesting that the significant chi-square was a result of a small deviation from the expected values. Inspection of item characteristic curves relative to observed proportion correct confirmed reasonably good fit to the Rasch model despite the significant goodness of fit statistic for these two items.

Test Information

Overall test information for the complete pool of 505 Rasch-scaled L2M items was excellent, with a bell-shaped information function and Total Information greater than 2.0 throughout the range of Rasch theta scores from −5.0 to +5.0, suggesting that computer adaptive administration of L2M will produce reliable individual scores throughout the full range of student abilities.

graphic file with name nihms-1843594-f0004.jpg

Scaling Results for the A2i Reading to Comprehension (R2C) Assessment

A total of 1,585 test administrations were used in the scaling analysis for R2C. Given that the R2C was a computer adaptive assessment, the number of items administered to each student varied. Just over half of the test administrations (51%) included 4 items, 32% included 5 to 9 items, and 15% included all 10 items.

graphic file with name nihms-1843594-f0005.jpg

Dimensionality

A Scree plot suggests that the R2C assessment is unidimensional. The eigenvalue for the first factor is more than 2 times larger than the second factor, and the next 8 eigenvalues diminish gradually toward zero. This suggests a strong general factor and unidimensionality.

graphic file with name nihms-1843594-f0006.jpg

Item Statistics

All 10 items in the R2C item pool had more than 30 responses and were included in the Rasch analyses. The average proportion correct across the items was .37 and the median proportion correct was .32 across all items. Item difficulty parameter estimates for the 10 items ranged from −1.5 to +2.3 with a mean difficulty of +1.28, a median difficulty of +1.53, and a standard deviation of 1.03 points on the Rasch Theta scale. Standard errors for the difficulty estimates ranged from 0.07 to .16 with a mean standard error of 0.11 and a median standard error of 0.10 points on the Rasch Theta scale. Full details of item statistics for all R2C items are included in Table A2.

Goodness of Fit

All 10 items included in the Rasch scaling had more than 200 responses, allowing calculation of an item fit Chi-Square statistic. None had significant goodness of fit statistics (p<.05).

Test Information

Overall test information for the complete pool of 10 Rasch-scaled R2C items was modest, with a bell-shaped information function and Total Information greater than 2.0 for Rasch theta scores in the range +1.0 to +3.0, suggesting that computer adaptive administration of R2C will produce reliable individual scores only in the upper range of student abilities and that reliability of R2C scores at the lower end would be improved if additional items were added to the R2C item pool.

graphic file with name nihms-1843594-f0007.jpg

Overall, second and third graders achieved means of 1.32 and 1.47 respectively with an ICC for student of .17 and for teachers .19. These are not out of line with students’ scores on the GM.

Scaling Results for the A2i Word Match Game (WMG) Assessment

A total of 2,613 test administrations were used in the scaling analysis for WMG. Given that the WMG was a computer adaptive assessment, the number of items administered to each student varied. Just over half of the test administrations (57%) included 7 items, 36% included 8 to 29 items, and 6% included 30 or more items.

graphic file with name nihms-1843594-f0008.jpg

Dimensionality

A Scree plot suggests that the WMG assessment may be unidimensional. The eigenvalue for the first factor is approximately 1.5 times larger than the second factor, and the next 18 eigenvalues diminish gradually toward zero. Given the large item pool of 209 items and the small number of responses for some items, this suggests a strong general factor and possible unidimensionality.

graphic file with name nihms-1843594-f0009.jpg

Item Statistics

All 209 items in the WMG item pool had more than 30 responses and were included in the Rasch analyses. The average proportion correct across the items was .61 and the median proportion correct was .63 across all items. Item difficulty parameter estimates for the 209 items ranged from −3.3 to +3.5 with a mean difficulty of −0.38, a median difficulty of −0.39, and a standard deviation of 1.03 points on the Rasch Theta scale. Standard errors for the difficulty estimates ranged from 0.07 to .56 with a mean standard error of 0.24 and a median standard error of 0.24 points on the Rasch Theta scale. Full details of item statistics for all WMG items are included in Table A3.

Goodness of Fit

Of the 209 items included in the Rasch scaling, 53 items had more than 200 responses, allowing calculation of an item fit Chi-Square statistic. Of these 53 items, none had significant goodness of fit statistics (p<.05).

Test Information

Overall test information for the complete pool of 209 Rasch-scaled WMG items was excellent, with a bell-shaped information function and Total Information greater than 2.0 throughout the range of Rasch theta scores from −5.0 to +5.0, suggesting that computer adaptive administration of WMG will produce reliable individual scores throughout the full range of student abilities.

graphic file with name nihms-1843594-f0010.jpg

Footnotes

1

It is important to note that analyses for research question 3 are exploratory, and do not provide support for causal inference about program impacts due to the likelihood of unmeasured confounds that correlate with both teachers’ use of A2i and student outcomes. As such, any significant findings here would suggest that teacher use of A2i is correlated with students’ reading gains, but we cannot rule out the possibility that the difference in reading gains is due to an unmeasured confound instead of the impact of A2i.

References

  1. Al Otaiba S, Connor CM, Folsom JS, Greulich L, Meadows J, & Li Z (2011). Assessment data-informed guidance to individualize kindergarten reading instruction: Findings from a cluster-randomized control field trial. Elementary School Journal, 111(4), 535–560. doi: 10.1086/659031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Albro ER (2020). How IES advances education research. Paper presented at the Department of Education, Institute of Education Sciences, Washington DC. https://www.aera.net/Newsroom/AERA-Highlights-E-newsletter/-em-AERA-Highlights-em-October-2017/Q-A-IESs-Liz-Albro-Discusses-How-IES-Advances-Education-Research [Google Scholar]
  3. Aarons GA, Hurlburt M, & Horwitz SM (2011). Advancing a conceptual model of evidence-based practice implementation in public service sectors. Administration and Policy in Mental Health and Mental Health Services Research, 38(1), 4–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. August D, Artzi L, & Barr C (2016) Helping ELLs meet standards in English language arts and science: An intervention focused on academic vocabulary. Reading and Writing Quarterly, 32(4), 373–396. [Google Scholar]
  5. August D, Artzi L, Barr C, & Francis D (2018). The moderating influence of instructional intensity and word type on the acquisition of academic vocabulary in young English language learners. Reading and Writing, 31(4), 965–989. [Google Scholar]
  6. August D, McCardle P, & Shanahan T (2014). Developing literacy in English language learners: Findings from a review of the experimental research. School Psychology Review, 43(4), 490–498. doi: 10.1080/02796015.2014.12087417 [DOI] [Google Scholar]
  7. August D, & Shanahan T (2010). Response to a review and update on developing literacy in second-language learners: Report of the national literacy panel on language minority children and youth. Journal of Literacy Research, 42(3), 341–348. doi: 10.1080/1086296X.2010.503745 [DOI] [Google Scholar]
  8. Baker DL, Basaraba DL, & Polanco P (2016). Connecting the present to the past: Furthering the research on bilingual education and bilingualism. Review of Research in Education, 40(1), 821–883. [Google Scholar]
  9. Baker S, Lesaux N, Jayanthi M, Dimino J, Proctor CP, & Morris J, et al. (2014). Teaching academic content and literacy to English learners in elementary and middle school, NCEE 2014–4012. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. [Google Scholar]
  10. Bauman K (2017). School enrollment of the Hispanic population: Two decades of growth. Retrieved from https://www.census.gov/newsroom/blogs/random-samplings/2017/08/school_enrollmentof.html
  11. Beck IL, Perfetti C, & McKeown MG (1982). Effects of long-term vocabulary instruction on lexical access and reading comprehension. Journal of Educational Psychology, 7, 506–521. [Google Scholar]
  12. Beck IL, McKeown MG, & Omanson R (1987). The effects and use of diverse vocabulary instructional techniques. In McKeown M & Curtis M (Eds.), The nature of vocabulary acquisition (pp. 147–163). Hillsdale, NJ: Lawrence Erlbaum. [Google Scholar]
  13. Bialystok E (2001). Bilingualism in development: Language, literacy, and cognition. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511605963 [DOI] [Google Scholar]
  14. Bos C, Mather N, Narr RF, & Babur N (1999). Interactive, collaborative professional development in early literacy instruction: Supporting the balancing act. Learning Disabilities Research and Practice, 14(4), 227–238. [Google Scholar]
  15. Calderon M, Hertz-Lazarowitz R, & Slavin RE (1998). Effects of Bilingual Cooperative Integrated Reading and Composition on students making the transition from Spanish to English reading. Elementary School Journal, 99, 153–165. Retrieved from http://www.csos.jhu.edu/crespar/techReports/report10.pdf [Google Scholar]
  16. Carlisle JF, Beeman M, Davis LH, & Spharim G (1999). Relationship of metalinguistic capabilities and reading achievement for children who are becoming bilingual. Applied Psycholinguistics, 20(4), 459–478. [Google Scholar]
  17. Castles A, Rastle K, & Nation K (2018). Ending the reading wars: Reading acquisition from novice to expert. Psychological Science in the Public Interest, 19(1), 5–51. [DOI] [PubMed] [Google Scholar]
  18. Castro DC, Páez MM, Dickinson DK, & Frede E (2011). Promoting language and literacy in young dual language learners: Research, practice, and policy. Child development perspectives, 5(1), 15–21. [Google Scholar]
  19. Chall JS (1996). Stages of reading development (2nd ed.). Orlando, FL: Harcourt Brace. [Google Scholar]
  20. Cheung ACK, & Slavin RE (2012). Effective reading programs for Spanish-dominant English language learners (ELLs) in the elementary grades: A synthesis of research. Review of Educational Research, 82(4), 351–395. doi: 10.3102/0034654312465472 [DOI] [Google Scholar]
  21. Cirino PT, Vaughn S, Linan-Thompson S, Cardenas-Hagan E, Fletcher JM, & Francis DJ (2009). One-year follow-up outcomes of Spanish and English interventions for English language learners at risk for reading problems. American Educational Research Journal, 46, 744–781. [Google Scholar]
  22. Collins BA (2014). Dual language development of Latino children: Effect of instructional program type and the home and school language environment. Early Childhood Research Quarterly, 29(3), 389–397. doi: 10.1016/j.ecresq.2014.04.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Common Core State Standards Initiative. (2010). Common Core State Standards for English language arts & literacy in history/social studies, science, and technical subjects. Washington, DC: Authors. [Google Scholar]
  24. Connor CM (2011). Child by Instruction interactions: Language and literacy connections. In Neuman SB & Dickinson DK (Eds.), Handbook on early literacy (3rd ed., pp. 256–275). New York: Guilford. [Google Scholar]
  25. Connor CM (2016). A lattice model of the development of reading comprehension. Child Development Perspectives, 10(4), 269–274. doi: 10.1111/cdep.12200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Connor C, Barrus A, & Fellows J (2014). Making individualized literacy instruction available to all teachers: Adapting the assessment to instruction (A2i) software for multiple real-world contexts. In Society for Information Technology & Teacher Education International Conference (pp. 1220–1226). Association for the Advancement of Computing in Education (AACE). [Google Scholar]
  27. Connor CM, & Morrison FJ (2016). Individualizing student instruction in reading: Implications for policy and practice. Policy insights from the behavioral and brain sciences, 3(1), 54–61. doi: 10.1177/2372732215624931 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Connor CM, Day SL, Phillips B, Sparapani N, Ingebrand SW, McLean L, … Kaschak MP (2016). Reciprocal effects of self-regulation, semantic knowledge, and reading comprehension in early elementary school. Child Development, 87(6), 1813–1824. doi: 10.1111/cdev.12570 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Connor CM, Morrison FJ, Fishman B, Giuliani S, Luck M, Underwood P, … Schatschneider C (2011). Classroom instruction, child × instruction interactions and the impact of differentiating student instruction on third graders’ reading comprehension. Reading Research Quarterly, 46(3), 189–221. doi: 10.1598/RRQ.46.3.1/epdf [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Connor CM, Morrison FJ, Fishman BJ, Crowe EC, Al Otaiba S, & Schatschneider C (2013). A longitudinal cluster-randomized controlled study on the accumulating effects of individualized literacy instruction on students’ reading from first through third grade. Psychological Science, 24(8), 1408–1419. doi: 10.1177/0956797612472204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Connor CM, Morrison FJ, Fishman BJ, Schatschneider C, & Underwood P (2007). The early years: Algorithm-guided individualized reading instruction. Science, 315(5811), 464–465. doi: 10.1126/science.1134513 [DOI] [PubMed] [Google Scholar]
  32. Connor CM, Morrison FJ, & Underwood P (2007). A second chance in second grade? The independent and cumulative Impact of first and second grade reading Instruction and students’ letter-word reading skill growth. Scientific Studies of Reading, 11(3), 199–233. [Google Scholar]
  33. Connor CM, Morrison FJ, & Katch LE (2004). Beyond the reading wars: Exploring the effect of child × instruction interactions on growth in early reading. Scientific Studies of Reading, 8(4), 305–336. Doi: 10.1207/s1532799xssr0804_1 [DOI] [Google Scholar]
  34. Connor CM, Morrison FJ, Schatschneider C, Toste J, Lundblom EG, Crowe E, & Fishman B (2011). Effective classroom instruction: Implications of child characteristic by instruction interactions on first graders’ word reading achievement. Journal of Research on Educational Effectiveness, 4(3), 173–207. doi: 10.1080/19345747.2010.510179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Connor CM, Piasta SB, Fishman B, Glasney S, Schatschneider C, Crowe E, … Morrison FJ (2009). Individualizing student instruction precisely: Effects of child × instruction interactions on first graders’ literacy development. Child Development, 80(1), 77–100. doi: 10.1111/j.1467-8624.2008.01247.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Connor CM, Ponitz CC, Phillips BM, Travis QM, Glasney S, & Morrison FJ (2010). First graders’ literacy and self-regulation gains: The effect of individualizing student instruction. Journal of School Psychology, 48(5), 433–455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Crevecoeur YC, Coyne MD, & McCoach DB (2013). English language learners and English-only learners’ response to direct vocabulary instruction. Reading and Writing Quarterly, 30, 51–78. [Google Scholar]
  38. Cronbach LJ (1957). The two disciplines of scientific psychology. American Psychologist, 12(11), 671–684. doi: 10.1037/h0043943 [DOI] [Google Scholar]
  39. Cunningham AE, & Stanovich KE (1997). Early reading acquisition and its relation to reading experience and ability 10 years later. Developmental Psychology, 33(6), 934. [DOI] [PubMed] [Google Scholar]
  40. Fishman BJ, Marx R, Blumenfeld P, Krajcik JS, & Soloway E (2004). Creating a framework for research on systemic technology innovations. The Journal of the Learning Sciences, 13(1), 43–76. doi: 10.1207/s15327809jls1301_3 [DOI] [Google Scholar]
  41. Fixsen D, Blase K, Metz A, & Dyke MV (2013). Statewide implementation of evidence-based programs. Exceptional Children, 79(2), 213–230. [Google Scholar]
  42. Florit E, & Cain K (2011). The simple view of reading: Is it valid for different types of alphabetic orthographies? Educational Psychology Review, 23, 553–576. doi: 10.1007/s10648-011-9175-6 [DOI] [Google Scholar]
  43. Francis DJ, Lesaux N, & August D (2006). Language of instruction. In August D & Shanahan T (Eds.), Developing Literacy in Second-Language Learners (pp. 365–413). Mahwah, NJ: Lawrence Erlbaum Associates. [Google Scholar]
  44. Fuchs LS, Fuchs D, & Phillips N (1994). The relation between teachers’ beliefs about the importance of good student work habits, teacher planning, and student achievement. Elementary School Journal, 94(3), 331–345. [Google Scholar]
  45. Gathercole SE, & Baddeley AD (1993). Phonological working memory: A critical building block for reading development and vocabulary acquisition? European Journal of Psychology of Education, 8(3), 259. [Google Scholar]
  46. Gersten R, & Baker S (2000). What we know about effective instructional practices for English-language learners. Exceptional children, 66(4), 454–470. [Google Scholar]
  47. Giambo DA & McKinney JD (2004). The effects of a phonological awareness intervention on the oral English proficiency of Spanish-speaking kindergarten children. TESOL Quarterly, 38(1), 95–117. [Google Scholar]
  48. Goldenberg C (2008). Teaching English language learners: What the research does—and does not—say. American Educator, 32, 7–23, 42–44. https://www.aft.org/sites/default/files/periodicals/goldenberg.pdf [Google Scholar]
  49. Gottardo A (2002). The relationship between language and reading skills in bilingual Spanish-English speakers. Topics in Language Disorders, 22(5), 46–70. [Google Scholar]
  50. Gunn B, Biglan A, Smolkowski K, & Ary D (2000). The efficacy of supplemental instruction in decoding skills for Hispanic and non-Hispanic students in early elementary school. The Journal of Special Education, 34(2), 90–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Haager D, & Windmueller MP (2001). Early reading intervention for English language learners at-risk for learning disabilities: Student and teacher outcomes in an urban school. Learning Disability Quarterly, 24(4), 235–250. [Google Scholar]
  52. Hammer CS, Jia G, & Uchikoshi Y (2011). Language and literacy development of dual language learners growing up in the United States: A call for research. Child Development Perspectives, 5(1), 4–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Hantula DA (2019). Replication and reliability in behavior science and behavior analysis: A call for a conversation. Perspective Behav Sci 42, 1–11. doi: 10.1007/s40614-019-00194-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Hill C, Bloome H, Black AR, & Lipsey MW (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177. doi: 10.1111/j.1750-8606.2008.00061.x [DOI] [Google Scholar]
  55. Hirsch ED (2006). Building knowledge: The case for bringing content into the language arts block and for a knowledge-rich curriculum core for all children. American Educator, 30(1), 8. [Google Scholar]
  56. Hoover WA, & Gough PB (1990). The simple view of reading. Reading and Writing, 2(2), 127–160. [Google Scholar]
  57. Kamhi AG, & Catts HW (2017). Epilogue: Reading comprehension is not a single ability—Implications for assessment and instruction. Language, Speech, and Hearing Services in Schools, 48(2), 104–107. [DOI] [PubMed] [Google Scholar]
  58. Kamps D, Abbott M, Greenwood C, Arreaga-Mayer C, Wills H, Longstaff J, … & Walton C (2007). Use of evidence-based, small-group reading instruction for English language learners in elementary grades: Secondary-tier intervention. Learning Disability Quarterly, 30(3), 153–168. [Google Scholar]
  59. Kim Y-S (2017). Why the Simple View of Reading is not simplistic: Unpacking component skills of reading using a Direct and Indirect Effect Model of Reading (DIER). Scientific Studies of Reading, 21(4), 310–333. doi: 10.1080/10888438.2017.1291643 [DOI] [Google Scholar]
  60. Kim Y-SG (2020). Hierarchical and dynamic relations of language and cognitive skills to reading comprehension: Testing the direct and indirect effects model of reading (DIER). Journal of Educational Psychology, 112(4), 667–684. doi: 10.1037/edu0000407 [DOI] [Google Scholar]
  61. Kramer VR Schell LM Rubison RM (1983). Auditory discrimination training in English of Spanish-speaking children. Reading Improvement, 20, 162–168. [Google Scholar]
  62. Lervåg A, & Aukrust VG (2010). Vocabulary knowledge is a critical determinant of the difference in reading comprehension growth between first and second language learners. Journal of Child Psychology and Psychiatry, 51(5), 612–620. [DOI] [PubMed] [Google Scholar]
  63. Lesaux NK (2006). Building consensus: Future directions for research on English language learners at risk for learning difficulties. The Teachers College Record, 108(11), 2406–2438. [Google Scholar]
  64. Lesaux N, & Geva E (2006). Synthesis: Development of literacy in second-language learners. In August DL & Shanahan T (Eds.), Developing literacy in second-language learners: Report of the National Literacy Panel on Language-Minority Children and Youth (pp. 53–74). Mahwah, NT: Lawrence Erlbaum. [Google Scholar]
  65. Lindsey KA, Manis FR, & Bailey CE (2003). Prediction of first-grade reading in Spanish-speaking English-language learners. Journal of Educational Psychology, 95(3), 482. [Google Scholar]
  66. MacGinitie WH, & MacGinitie RK (2006). Gates-MacGinitie Reading Tests (4th ed.). Iowa City: Houghton Mifflin. [Google Scholar]
  67. Mancilla-Martinez J, & Lesaux NK (2010). Predictors of reading comprehension for struggling readers: The case of Spanish-speaking language minority learners. Journal of Educational Psychology, 102(3), 701–711. doi: 10.1037/a0019135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Mancilla-Martinez J, & Lesaux NK (2017). Early indicators of later English reading comprehension outcomes among children from Spanish-speaking homes. Scientific Studies of Reading, 21(5), 428–448. doi: 10.1080/10888438.2017.1320402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. McKeown MG, Beck I, Omanson RC, & Perfetti C (1983). The effects of long-term vocabulary instruction on reading comprehension: A replication. Journal of Reading Behavior, 15(1), 3–18. [Google Scholar]
  70. Metsala JL, & Walley AC (1998). Spoken vocabulary growth and the segmental restructuring of lexical representations: Precursors to phonemic awareness and early reading ability. In Metsala JL & Ehri LC (Eds.), Word recognition in beginning literacy (p. 89–120). Lawrence Erlbaum Associates Publishers. [Google Scholar]
  71. Nakamoto J, Lindsey KA, & Manis FR (2007). A longitudinal analysis of English language learners’ word decoding and reading comprehension. Reading and Writing, 20(7), 691–719. doi: 10.1007/s11145-006-9045-7 [DOI] [Google Scholar]
  72. Nation P, & Waring R (1997). Vocabulary size, text coverage and word lists. In Schmitt N & McCarthy M (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 6–19). New York, NY: Cambridge University Press. [Google Scholar]
  73. Nelson JR, Vadasy P, & Sanders EA (2011). Efficacy of a Tier 2 supplemental root word vocabulary and decoding intervention with kindergarten Spanish-speaking English learners. Journal of Literacy Research, 43(2), 184–211. [Google Scholar]
  74. Nielsen J (1993). Usability engineering. San Francisco: Morgan Kaufmann Publishers Inc. [Google Scholar]
  75. Nielsen J, & Mack RL (Eds.). (1994). Usability inspection methods. New York: John Wiley & Sons. [Google Scholar]
  76. Odom SL, Hall LJ, & Suhrheinrich J (2019). Implementation science, behavior analysis, and supporting evidence-based practices for individuals with autism. European journal of behavior analysis, 21(1), 55–73. doi: 10.1080/15021149.2019.1641952 [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. O’Reilly T, Sabatini J, & Deane P (2013). Preliminary reading research assessment framework: Foundation and rationale for assessment and system design (ETS Research Report, RR-13–30). Princeton, NJ: Educational Testing Service. Retrieved from https://www.ets.org/research/topics/reading_for_understanding/assessments/gisa_samples/ [Google Scholar]
  78. Ouellette GP (2006). What’s meaning got to do with it: The role of vocabulary in word reading and reading comprehension. Journal of Educational Psychology, 98(3), 554. [Google Scholar]
  79. Páez M, & Rinaldi C (2006). Predicting English word reading skills for Spanish-speaking students in first grade. Topics in Language Disorders, 26(4), 338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Paris SG (2005). Reinterpreting the development of reading skills. Reading Research Quarterly, 40(2), 184–202. doi: 10.1598/RRQ.40.2.3 [DOI] [Google Scholar]
  81. Perfetti CA, & Hart L (2002). The lexical quality hypothesis. In Verhoeven L, Elbro C & Reisma P (Eds.), Precursors of functional literacy. John Benjamins Publishing Company. [Google Scholar]
  82. Proctor CP, August D, Carlo M, & Snow CE (2006). The intriguing role of Spanish vocabulary knowledge in predicting English reading comprehension. Journal of Educational Psychology, 98, 159–169. doi: 10.1037/0022-0663.98.1.159 [DOI] [Google Scholar]
  83. Proctor CP, Carlo M, August D, & Snow C (2005). Native Spanish-speaking children reading in English: Toward a model of comprehension. Journal of Educational Psychology, 97(2), 246. [Google Scholar]
  84. Roehrig AD, Duggar SW, Moats LC, Glover M, & Mincey B (2008). When teachers work to use progress monitoring data to inform literacy instruction: Identifying potential supports and challenges. Remedial and Special Education, 29, 364–382. [Google Scholar]
  85. Royer JM, & Carlo MS (1991). Transfer of comprehension skills from native to second language. Journal of Reading, 34(6), 450–455. [Google Scholar]
  86. Rubin J (1994). Handbook of usability testing. New York: Wiley. [Google Scholar]
  87. Scarborough HS (2001). Connecting early language and literacy to later reading (dis)abilities: Evidence, theory, and practice. In Neuman SB & Dickinson DK (Eds.), Handbook of early literacy research (pp.97–110). New York: Guilford Press. [Google Scholar]
  88. Sénéchal M, Ouellette G, & Rodney D (2006). The misunderstood giant: On the predictive role of early vocabulary to future reading. In Neuman SB & Dickinson D (Eds.), Handbook of early literacy research: Vol. 2 (pp.173–182). New York: Guilford Press. [Google Scholar]
  89. Shadish WR, Cook TD, & Campbell JR (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Mifflin Company. [Google Scholar]
  90. Shanahan T, & Beck IL (2006). Effective Literacy Teaching for English-Language Learners. In August D & Shanahan T (Eds.), Developing literacy in second-language learners: Report of the National Literacy Panel on Language-Minority Children and Youth (p. 415–488). Lawrence Erlbaum Associates Publishers. [Google Scholar]
  91. Snow CE, & Matthews TJ (2016). Reading and language in the early grades. The Future of Children, 57–74. [Google Scholar]
  92. Solari E, Terry NP, Gaab N, Hogan TP, Nelson N, Pentimonti J, … Sayko S (2020). Translational science: A roadmap for the science of reading. Reading Research Quarterly, 55(S1), S347–S360. doi: 10.35542/osf.io/8z7e6 [DOI] [Google Scholar]
  93. Squires D, & Preece J (1999). Predicting quality in educational software: Evaluating for learning, usability and the synergy between them. Interacting with Computers, 11, 467–483. [Google Scholar]
  94. Supplee LH, & Metz A (2015). Opportunities and challenges in evidence-based social policy. Social Policy Report: Sharing Child and Youth Development Knowledge, 28(4), 3–19. doi: 10.1002/j.2379-3988.2015.tb00081.x [DOI] [Google Scholar]
  95. Thomas E, & Sénéchal M (2004). Long-term association between articulation quality and phoneme sensitivity: A study from age 3 to age 8. Applied Psycholinguistics, 25(4), 513. [Google Scholar]
  96. Vaughn S, Mathes PG, Linan-Thompson S, & Francis DJ (2005). Teaching English language learners at risk for reading disabilities to read: Putting research into practice. Learning Disabilities Research & Practice, 20(1), 58–67. doi: 10.1111/j.1540-5826.2005.00121.x [DOI] [Google Scholar]
  97. Wanzek J, Wexler J, Vaughn S, & Ciullo S (2010). Reading interventions for struggling readers in the upper elementary grades: A synthesis of 20 years of research. Reading and Writing, 23(8), 889–912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Willingham DT (2006). How knowledge helps: It speeds and strengthens reading comprehension, learning-and thinking. American Educator, 30(1), 30. [Google Scholar]
  99. Woodcock RW, McGrew KS, & Mather N (2001). Woodcock-Johnson-III Tests of Achievement. Itasca, IL: Riverside. [Google Scholar]

RESOURCES