Abstract
Today’s students live within a world filled with complexity, uncertainty, and misinformation, so we need to help all learners, including students with learning disabilities, learn to comprehend complex information about the natural world and make credible evidence-based claims. Our study is a first step in making this possible. In this investigation, our goal was for students to make a claim, use evidence and content-specific vocabulary, then use reasoning to link claims with evidence in an argument. These outcomes were assessed using a pre-post design with a scenario-based assessment administered twice before and once after instruction, which used multimodal STEM text sets to fuse science and literacy learning. Data indicate that both students with disabilities and those without disabilities made significant gains in argumentation and that the effects of instruction were similar for both groups of students. Gains for students with disabilities suggest the multimodal STEM text sets provided important scaffolding that enabled this group of students to learn important content in the general education classroom.
Keywords: special education, reading, argumentation, reasoning, scaffolding, learning disability
Science Literacy: Using Multimodal STEM Text Sets to Help Students with Disabilities Engage in Argumentation
Making evidence-based claims, argumentation, is central to literate science and responsible citizenship (Llewellyn, 2013; Sjöström & Eilks, 2018). Because scientific writing is a key mode by which scientists communicate and construct new knowledge, written argumentation deserves emphasis in the science classroom. Further, practices to support learning argumentation has become increasingly important in both the science (i.e., Next Generation Science Standards [NGSS]) and literacy (i.e., Common Core State Standards-English Language Arts & Literacy in History/Social Studies, Science and Technical Subjects [CCSS.ELA-LITERACY.RST]) standards for all students (NGSS Lead States, 2013).
The ability to engage in argumentation involves “important learning processes related to interpreting information, synthesizing personal understanding, and learning to make one’s own thinking visible to others” (Bricker, et al, 2017, p. 261). Scientific argumentation can be thought of as a problem-solving process where students propose explanations based on data derived from their observations of the world. Hence, argumentation can be framed as an iterative process of communication (i.e., make a claim and justify using evidence and reasoning) and intellectual growth, where a student simultaneously becomes more confident in their own ideas and more open to considering alternative perspectives (McNeill & Krajcik, 2008).
Building on the work of McNeil and Krajcik (2008) in understanding students’ claim-evidence-reasoning processes, more recent work with assessment of argumentation frames it in terms of increasing complexity of competence (Berland & McNeil, 2010; Osborne et al., 2016; Deane et al. 2019). Osborne et al. (2016) frame their learning progression-based assessment in the context of analysis and explanation of scientific phenomena using Toulmin’s (1958) model for argumentation which is reflected within the claim-evidence-reasoning (CER) assessment framework proposed by Gotwals and Songer (2010). Osborne et al. (2016) suggest that the ability to make a claim is at the lower level of their progression. As students’ progress higher, they begin to draw upon evidence and provide a warrant which explains how the evidence supports the claim. At the highest levels, students are expected to negotiate multiple competing arguments (Osborne et al., 2016). Similarly, Deane et al. (2019), propose that argumentation starts with the ability to make a claim and look at multiple sides of an argument (Level 1). Once these are mastered, students begin identifying evidence and supporting ideas backing up the claim (Level 2). The next level (Level 3) focuses on the process of articulating the argument in writing, and Level 4 focuses on evaluation and critique of an argument.
This research suggests that argumentation can be framed as a progression of competencies along a hierarchy of cognitive complexity with respect to the CER framework (Gotwals & Songer, 2010). Within this framework, it makes sense to address the core argumentation practices (1: making a claim, 2: identifying evidence, 3: including reasoning that connects the evidence to the claim). In addition, in a case where writing practices are used to communicate the argument in makes sense to assess competencies related to communication practices (1: use of discipline-specific vocabulary, and 2: quality of the written communication - clarity and mechanics). In an assessment of CER, the simplest task involves making a claim which expresses an understanding of the purpose of the argument. Once students understand the purpose, they can begin to engage in meaningful writing about the issue and begin using vocabulary in their writing that draws from their content knowledge within the domain of the issue. Through this process of written argumentation students are given the opportunity to express their reasoning which is the most complex of the competencies, and students with the highest levels of CER ability can incorporate evidence from the scenario into their reasoning.
Student Engagement with Argumentation
In general, it has been found that middle school age and younger students can engage in argumentation (Berland & Hammer, 2012); however, they often have difficulty constructing and critiquing arguments with evidence. Students have often been found to rely on personal beliefs or past experiences for evidence (Kuhn, 2010). Unfortunately, specifically for students with disabilities (SWD), there has been a dearth of research examining their ability to make scientific arguments because they are often overlooked by science education researchers and are also underrepresented groups in science and mathematics education (Moon et al. 2012). However, from the limited available research, the findings are like Kuhn’s (2010). Students who were academically low achieving or had a disability, when engaged in argumentation, tended to describe what occurred as opposed to why (Klein & Rose, 2010) and often drew on their own prior knowledge in critiquing the claims and did more poorly on the reasoning tasks as compared to their peers without disabilities (De La Paz & Levin, 2018).
Compounding their ability to engage in argumentation is that SWD have also been observed to experience challenges with understanding cause and effect relationships in addition to planning, writing, editing, and revising their writing due to limitations in working memory, processing, memorizing, and information recall which derive from challenges in executive functioning and metacognition (De La Paz & Levin, 2018; Levin et al., 2021). These challenges are further magnified by trouble with writing skills such as spelling, mechanics, and transcription (De La Paz, 1999; De La Paz & Levin, 2018). Overall, this suggests that students with SWD need instruction that comprehensively addresses the practice of argumentation, but also provides opportunities to develop the knowledge and skills necessary to engage in written argumentation.
Supporting Development of Argumentation with Text Sets
Literacy in reading and writing is central to argumentation and inference and to development of a scientifically literate mindset in general (Norris & Phillips, 2003; Pearson, et al., 2010). The use of text sets—collections of resources from different genres, media, and levels of reading difficulty, strategically sequenced to build vocabulary and content knowledge—is a way to help students develop literacy skills to apply argumentation regardless of their backgrounds and abilities (Cappiello & Dawes, 2021). Simply put, students’ knowledge funds or schemata provide a framework for classifying information within a text; the more the individual knows about something, the stronger the framework or schemata for classifying information (Kimball & Holyoak, 2000; Pearson et al., 1979). Reading or listening to a series of texts on the same topic can yield as much as four times the vocabulary growth in comparison to reading discordant pieces (Landauer & Dumais, 1997). This sequencing of texts as a text set to build vocabulary and knowledge on a topic drawn from schema theory, which highlights the importance of domain knowledge to enhance reading comprehension (Kim et al., 2021a; Spiro et al., 2003). This allows the reader to connect old words to new, and develop nuanced understandings based on these connections. Given that schemas reflect the ability to instantiate networks of vocabulary and knowledge, the task is to determine how to organize instruction to improve literacy skills including argumentation.
Previous research supports the use of text sets or “linked texts” to promote development of concepts and vocabulary (e.g., Elish-Piper et al.,2014; Lupo et al., 2020). More recently, however, the recommendation for text set organization is to develop a coherent set that is centered on a particular concept as opposed to an overarching topic or theme (e.g., Giorgis & Johnson, 2002). This approach has the potential to help students build a more substantive understanding of a concept to better leverage knowledge, setting a foundation for stronger reading comprehension of related texts and use of that knowledge for task completion (Cervetti et al., 2016). A few studies have been conducted involving text sets centered around a concept.
Cervetti et al., (2016) studied the impact of text sets with fourth grade students, exploring whether knowledge building through the reading of a conceptually coherent set of texts supported comprehension of related texts and increased students' interest in the focus topic. Findings indicated that students who read a series of texts on a topic demonstrated better comprehension of the new text on the topic, as compared to students who read a series of disconnected texts. In the same vein, Kim et al. (2021a), studied the impact of using Model of Reading Engagement (MORE), a content literacy intervention that incorporated use of complex texts (a series of texts that converged on the same concept), on science domain knowledge, reading engagement, and reading comprehension of first graders (38 first-grade classrooms; N = 674 students). Results indicated that the intervention had a positive and significant effect on science domain knowledge for the treatment group as compared to the control group.
Building on their initial work (Kim et al., 2021a), Kim et al. (2021b) studied the use of the MORE model to improve not only vocabulary and reading comprehension, but also argumentative writing in science and social studies contexts of first graders (2,886 treatment and 2608 control students nested within 144 treatment and 136 control classrooms). The treatment consisted of a text-based curriculum which, as a part of the MORE model, engaged students in reading discipline-based texts (conceptually related) in science and social studies with varying complexity to support growth in students with various levels of reading literacy. The findings indicated positive and significant effects on vocabulary knowledge depth, and argumentative writing in both science and social studies contexts. Vocabulary knowledge depth partially mediated the treatment effects on argumentative writing. Interestingly, improvements in reading comprehension were not replicated, but that may be because more time was spent on science and social studies content as opposed to reading activities.
Purpose of the Current Study
Students' vocabulary and language backgrounds affect their ability to formulate and communicate scientific arguments (Gilkerson et al., 2017; Hart & Risley, 2003; Kim et al., 2021b; Sperry et al., 2018). Readers who have prior knowledge and greater experience with scientific topics can more readily make connections between what they are reading and what they know (Cervetti & Hiebert, 2015; Mclaughlin & Overturf, 2012; Pearson et al., 1979). Because argumentation involves recursive thinking between evaluation of data and justification of reasoning, then synthesis of one’s position using discipline-appropriate language and vocabulary (Berland & McNeil, 2010; Kim et al., 2021b), utilizing effective strategies to also shore up and accelerate content knowledge and vocabulary, continues to be essential (Boyle et al., 2020; Cervetti, et al., 2016; Graham et al., 2020).
In this study, we take the position that students can learn to comprehend and evaluate scientific information and formulate arguments using that information. However, unlike the work by Kim et al., (2021a, 2021b) and Cervetti et al., (2016), we extend this position to include middle school students and SWD. To meet this challenge, we focus on the use of multimodal STEM (science, technology, engineering, and mathematics) text sets that integrate science and literacy practices to facilitate meaningful engagement in and comprehension of scientific information in complex texts. Our research question is: What is the efficacy of using multimodal STEM text sets in helping SWD learn to evaluate and use scientific information to construct arguments?
Method
We used a quasi-experimental pre-post design for this study which controlled for test-retest effects. In this design, we administered our outcome measure, the scenario-based assessment (SBA), initially at least 2 weeks before the beginning of the text set implementation. We then administered the SBA a second time directly before the text set implementation, (resulting in two pretests) and a final time after instruction ended.
Professional Development Program: Context and Multimodal STEM Text Sets
The Linking Science and Literacy for All Learners program is a collaboration among faculty and professionals from diverse backgrounds and experiences including biochemistry, science education, literacy education, and special education as well as 6th-8th grade teachers in either science, English, or special education. Teachers from districts and schools with varied needs throughout the state apply and are then selected to participate in a year-long program. The overarching goals of the professional development (PD) program were to work with the teachers to develop grade-level multimodal STEM text sets that addressed NGSS and CCSS.ELA-LITERACY.RST standards and provide support for the teachers to implement the multimodal STEM text sets with their students with particular attention to diverse learners including SWD.
The year-long PD program began with a 1-day orientation session followed by a weeklong summer workshop and four 1-day follow-up sessions during the academic year. All activities, except for the last follow-up session which was held online due to COVID-19, were held on a university campus. The PD focused on teacher knowledge to learn about and apply a multimodal STEM text set with their learners specifically, (a) what is and the need for disciplinary literacy and complex text, (b) what is and how to develop a multimodal STEM text, (c) what are and how to integrate scaffolds into multimodal text sets for diverse learner needs, and (d) argumentation (i.e., claim-evidence-reasoning) as a cross cutting practice.
Multimodal STEM Text Sets
There are many ways of organizing text sets (e.g., Lupo et al., 2018). For our program, a multimodal STEM text set comprises a coherent sequence of “texts” (e.g., online resources, printed text, video) and “activities” (e.g., lab activity) centered on a specific line of inquiry or phenomenon (i.e., standard) as found in an “anchor text”—a rich, complex informational text based on published primary scientific literature (see Figure 1). Included in the multimodal STEM text set are a series of learning cycles/lessons that incorporate selected texts and activities. Importantly, the text and activities serve as scaffolds with the primary goal to engage all learners in the anchor text. There are two ways the text and activities are scaffolded. First, they serve as content scaffolds focused on “what” is to be learned and are designed to build knowledge about the content. Content scaffolds are conceptually organized (e.g., based on a content standards) and are multimodal such as video, audio, websites, printed text, and hands on (e.g., lab activity). Second, they serve as instructional scaffolds focused on “how” to learn (e.g., simpler text, graphic organizer, writing strategies) to support engagement and comprehension of the content (the “what”) to be learned. All multimodal STEM text sets include a focus on the practice of argumentation as it cuts-across the disciplines of science and literacy (Cheuk, 2013).
Figure 1.

Organization of a Multimodal STEM Text Set
Two grade 6-8 band anchor texts were used in this study. The first anchor, Flight of the Bumblebee, was developed from Miller-Struttmann’s et al. (2017) study, which investigated the use of bees’ acoustic sound waves in monitoring bumble bee behavior and pollination services in connection to current ecological problems such as the decline of bees. The second anchor, Heat Waves in Missouri, was developed based on Steinweg and Gutowski’s (2015) study that highlights the increasing number of heat waves in the St. Louis, Missouri area, the outcomes of recent climate models and the negative impact of hotter temperatures on human health and daily activities. See Figure 2 for an overview of the two anchor multimodal STEM text sets.
Figure 2.

Overview of Multimodal STEM Text Set
*PS = Physical Sciences; ESS = Earth and Space Sciences; LS = Life Sciences; MS = Middle school
Participants
Fourteen middle school teachers enrolled in the year-long PD. Due to COVID-19, we report on data collected from eight science teachers and three English Language Arts (ELA) teachers from eight school districts in a Midwestern state. Nine teachers identified as female, two identified as male, ten identified as white, and one identified as Black and/or African American. Six of the teachers taught sixth or seventh grade and five taught eighth grade students. Six taught in a rural school district, 3 taught in a suburban district, and two taught in an urban district. The teachers’ educational backgrounds and years of experience varied (1 teacher did not disclose this information) where one teacher had a bachelor’s degree, seven had a master’s degree, and two had a specialist’s certificate. Four teachers had 1-10-years teaching experience and six teachers had 11or more years of teaching experience. All participating teachers had SWD in their classes (6.5%-38.5% of the students).
At least one assessment was completed by 1046 students across the 11 teachers. Of the 1046 total students, 63% took the SBAs at all time points. This attrition was due to students being absent from school during one or more of the three testings. Thus, the 663 students with complete cases were included in the data analysis in lieu of using an imputation method (WWC, 2020). Of these students reporting their gender, 49% reported female and 51% reported male. Most students were White (78%) or Black/African American (12%). Fewer students were Hispanic/Latino 6.0%, Asian 1.1%, or multiracial 2.6%. About one third were in sixth grade (N = 226, 34%), with proportional numbers in seventh (N = 195, 29%), and eighth grade (N = 242, 37%). Twenty-one (3.2%) of the students were English Language Learners (ELL), and 73 (11%) were SWD who received special education services and had an individualized educational program (IEP). Although we were not able to verify a learning disability (LD) classification, we would suggest many the students had a LD as 64% of students with disabilities spend at least 80% of their day in the general education classroom. A large majority of the 64% of students with disabilities are students with a LD (72.3%). Due to COVID-19, a blended learning environment was offered for 62 (9.4%) of the students, while the vast majority received face-to-face instruction.
Teacher Multimodal STEM Text Sets Development and Implementation
Each teacher selected one anchor text as their focus for their multimodal STEM text set. Their selection was connected to one or more standards to be taught in their classroom during the academic year per district requirements. The teachers used a design activity approach (McNeill et al., 2017), where they selected texts and activities matched to the needs of their classroom context, to develop a multimodal STEM text set. It was expected that the multimodal STEM text set be organized and comprise of key components as defined by the program (e.g., anchor text, focused on key concepts aligned with standards, multimodal content and instructional scaffolds, integrated CER). A review of the teachers’ multimodal STEM text sets allowed us to determine that all teachers included these components. Examples of the key components and learning activities can be found in Table 1.
Table 1.
Key Components and Example Learning Activities Found In Multimodal STEM Text Sets
| Key component | Definition | Example learning activities from text sets |
|---|---|---|
| Anchor text (complex text) | Foundational text that serves as the line of inquiry/phenomena the text set is conceptually organized around. |
|
| Claim-Evidence-Reasoning (CER) | Primary strategy used for the practice of argumentation. |
|
| Multimodal Scaffolds | Content and instructional scaffolds designed as intentional supports to increase content knowledge and text comprehension. |
|
Nine of the teachers (7 science teachers and 3 ELA teachers, including the ELA/Science teacher combination) developed a multimodal STEM text set around the anchor Heat Waves in Missouri, and one teacher (science) developed a multimodal STEM text set around the anchor Flight of the Bumblebee. Conceptually, all the teachers (ELA and science) focused on one of the following science topics (a) human body systems – cells (i.e., the body is a system of interacting subsystems composed of groups of cells), (b) human body system – impact of climate (e.g., how environmental and genetic factors influence the growth of organisms), (c) climate – impact of humans (e.g., how changes to physical or biological components of an ecosystem can affect populations), and (d) waves – (e.g., describe a simple model for waves that includes how the amplitude of a wave is related to energy in a wave). Eight of the teachers (2 ELA, ELA/Science teachers, 6 science teachers) also focused on ELA standards, in particular: (a) reading complex scientific texts (e.g., cite textual evidence to support analysis of what the text says), and/or (b) writing (e.g., writing arguments to support claims supported with reason and evidence).
The teachers were provided scheduled opportunities throughout the year-long PD to develop their multimodal STEM text sets. These scheduled opportunities occurred during the last half of the week-long summer workshop and the first two follow-up sessions. The teachers also developed and completed their multimodal STEM text sets in their own time with the requirement to complete before implementation. Faculty were available for support during the scheduled opportunities and at other times if requested. The teachers were given flexibility as to when to implement their multimodal STEM text set during their academic calendar. This was done to allow them to connect to their district/school required scope and sequence. However, they were required to implement all learning cycles/lessons consecutively until completed. Two teachers (1 ELA and 1 science) shared the same group of students and worked as a team and consecutively implemented their multimodal STEM text set. Four teachers implemented in the first half of the academic year, five teachers implemented in the second half of the year, two teachers (including the ELA/science teacher combination) implemented end of the first half and beginning of the second half of the year.
Instrumentation, Scoring, and Measure Validation
We created a SBA to measure argumentation using the CER framework (Gotwals & Songer, 2010). Each SBA scenario included evidence, designation of an audience and a situation that the student needed to address by stating a claim, presenting evidence, and connecting the claim and evidence through logical reasoning. The students were given one of three possible scenarios that were scored for five argumentation competencies (1: making a claim, 2: identifying evidence, 3: including reasoning that connects the evidence to the claim, 4: use of discipline-specific vocabulary, and 5: quality of the written communication - clarity and mechanics) and one holistic score which were scored using a rubric (see Supplemental Materials Table S1 for the rubric) developed by the project team. The topics included: (a) monitoring of aggressive bees (Scenario A), (b) Meramec River flooding (Scenario B), and (c) scheduling of football practices (Scenario C).
Development and Administration of SBAs for Argumentation
The SBAs used for measurement of argumentation in this study are researcher-developed; however, a planning template and sample SBAs from the STEM Literacy Project, an Improving Teaching Quality Grant program, were used to guide the development of the SBAs. Development of the SBAs involved a research team discussion on potential topics. The third author developed an initial draft with a focus on ensuring the content was scientifically rigorous, with several iterations of feedback and editing from the research team. One SBA, Monitoring of Aggressive Bees, was developed and implemented during the first year of the project. The data were used to revise the SBA (i.e., task prompt clarity and uniformity) and inform the process for developing the other two SBAs during the project's second year. Each SBA comprises a scenario (1-2 pages) and a task prompt with lines to respond.
The SBAs were administered by the teachers in their classrooms. To ensure fidelity of implementation, all teachers were provided with directions on how to administer along with a script to use with the students. Further, the administration process was detailed to the teachers in a PD session. Administration of the SBA involved two steps. First, to ensure that reading did not prevent access to the content and task, the students were given 10 minutes to read the text with a partner, or the teacher read aloud the text. Following reading the task, the students had a chance to ask for clarification of words or phrases they may not be familiar with. Second, students were instructed to complete the task on their own as completely as they could. They were given 20 minutes to complete the task. The SBA took one class session (40 minutes) to implement. One teacher administered Scenario A (142 students), six teachers administered Scenario B (643 students), and three teachers administered Scenario C (261 students). The SBAs were administered at three different time points. First, the SBA was administered within a two-week period during September to measure the students’ CER skills at the beginning of the school year. Then, the SBA was administered immediately prior to the teacher’s text set instruction implementation to measure the students’ CER skills. Lastly, the SBA was administered immediately after the teacher’s text set instruction implementation to measure the students’ change in CER skills due to instruction.
Scoring Procedure
A rubric was used to evaluate and score student’s SBA responses for the following competencies: 1) making a claim, 2) using evidence, 3) drawing upon content-specific vocabulary, 4) using reasoning to explain how data support a claim, and 5) communicating an argument through writing. Responses were also scored holistically as follows. Scores ranged from 1-4 points with 1 representing the lowest score and 4 the highest score. A score of “1” was given to writing which was readable and coherent but contained no evidence of meeting the competency. A score of “4” was given to writing which exemplified the competency at a level which would be expected at the middle school level (e.g., Reasoning: Effectively demonstrates reasoning using accurate explanation of information. Makes appropriate inferences and conclusions based on and referencing data). Scores of “2” or “3” were assigned to writing which contained partial understanding of the competency (e.g., a score of 2 for Reasoning demonstrates some reasoning in attempts to explain data and makes appropriate inferences and conclusions based on that information but may make some incorrect conclusions). Responses which were blank or not readable were given a “not scorable” designation and were treated as missing data.
Students’ responses were scored by a team of 17 raters, from which each response was assigned an ordinal measure from 1 to 4 on each of the six competencies by two raters. Because the combination of raters was not necessarily the same across the students’ responses, traditional inter-rater reliability analyses could not adequately address psychometric concerns around rater agreement. We instead elected to utilize the multi-faceted Rasch model to account for natural differences in rater severity and to evaluate potential biases associated with individual raters.
Scorers participated in a 3-day event. During the training, team members provided information regarding the program, the goals, and the SBAs. Trainers addressed the types of biases that can affect scores and how to be aware of those; this also set the groundwork for why norming the score is so important. Scorers then reviewed the attributes on the scoring rubric and discussed anchor papers selected to represent the different ranges of scores on the rubric. To evaluate inter-rater consistency, 10.4% of the SBA’s were double scored. Results for the six CER competencies ranged between 0.64 and 0.73 which exceed the guideline of 0.6 suggested by the What Works Clearinghouse (2020). However, the moderate values account for our treatment of the scorers as a panel of independent experts (i.e., perfect agreement is not necessarily encouraged) which is a suggested best practice in studies which utilize a Rasch measurement framework (Boone, Staver, & Yale, 2013).
Construct Validity of the SBA Measures
Before deriving conclusions from the data, due diligence was needed to ensure that the SBA’s and scoring process produced unbiased measures of CER. Given that use of 6 competencies (5 core competencies plus the holistic score), 17 scorers, and 3 scenarios each contributed to variation in the measures, it is conceivable that bias could be introduced by any one of these variables. We used a 4-faceted Rasch modeling procedure (facets were “student”, “competency” [e.g., claim, evidence, reasoning, etc.], “scenario”, and “scorer”) to determine (1) the efficacy of the competencies in generating reliable and unidimensional measures and (2) the absence of systematic bias with respect to how the specific competencies, scenarios, and scorers generated the students’ measures of CER (Linacre & Wright, 1988). Unlike traditional measures of test performance derived from Classical Test Theory, Rasch models provide linear measures for each facet on a common log-odds (logit) scale so that they can be compared to each other directly. For example, if a student’s score is located below the difficulty level of the “reasoning” competency on the common logit scale, then we can predict that the student has not yet mastered reasoning. However, that student’s score may be located above the difficulty level of an easier competency such as “making a claim”, and so the model allows us to predict that the student has mastered that competency.
To explore whether the competencies form a unidimensional CER construct, we used principal components analysis (PCA) on residuals with respect to a 2-faceted model containing the student and competency facets (M. Linacre, Personal Communication, July 8, 2017). If the competencies are measuring CER as a single construct, then a first eigenvalue at or below 2, which indicates randomly distributed residuals, is expected (Linacre & Tennant, 2009). For all facets, mean squares infit and outfit indices were used as an indicator of the extent to which responses to each item, across each scenario and testing instance, fit with the model-expected values derived from a 4-faceted model. These have expected values of 1, but Wright (1994) suggests that these indices can range between 0.5-1.5 for useful items and scenarios.
Evaluation: Effects of the Curricula on Gains for SWD and non-SWD students
We used a univariate repeated measures ANCOVA procedure to test for significance of pre-post gains during the instruction, as well as to evaluate the role of a student’s teacher and IEP status (SWD) in the effect of the instruction. The between-subjects model included two factors, IEP status and Teacher, the IEP-by-Teacher 2-way interaction, and the covariate of prior CER ability measured from an SBA administration at least 2 weeks before the intervention began which we included to control for the effects of prior knowledge and test-retest effects. The within-subjects model included Time (pre-post effect), three two-way interactions (Time-by-Prior knowledge, Time-by-IEP, Time-by-Teacher), and one three-way interaction (Time-by-IEP-by-Teacher). The statistical significance of these effects was evaluated at the 95% confidence level using Type 3 F-tests. Assuming significant effects were found, significance of gains for both students with an IEP and without an IEP was evaluated using paired t-tests (2-tailed 95% confidence level).
Results
Construct Validity of the CER Measures
The multi-faceted Rasch modeling procedure showed promising evidence for construct validity of the measures. The competencies produced CER measures which had a reliability above 0.9 (r = 0.93). PCA on the residuals from the 2-faceted model yielded a first eigenvalue of 1.55, supporting unidimensionality of the competencies making up the CER scale (Linacre & Tennant, 2008). Further, although all competencies, scenarios, and scorers showed a range of difficulty or severity (Figure 3), no serious systematic biases were detected through excessive misfit with the 4-faceted Rasch model. Mean squares fit values less than 1.5 for all competencies, scenarios, and scorers demonstrates that the tasks, rubrics, and scoring team generated measures which were able to differentiate between multiple levels of CER consistently. Readers interested in measures of misfit for specific competencies, scenarios, and scorers are encouraged to consult the Supplementary Materials (Tables S2-S4).
Figure 3.

Person-Facet Map (Wright Map) Of Pre- and Post-Test Measures for IEP And Non-IEP Students, Item Difficulty, Rater Severity, And Scenario Difficulty, Onto A Common Linear Scale
Differential Effects between SWDs and Students without Disabilities
Differential effects of the text sets
Being a SWD had a significant effect on students’ argumentation scores (F(1,642) = 11.47, p = 0.001, η2partial = 0.02) (Table 2); however, the differential effect of instruction between SWDs and students without disabilities was non-significant (F(1,642) = 1.83, p = 0.176, η2partial < 0.01, (Table 3) meaning that the effect of the instruction was similar for all students. In addition, the Time x IEP x Teacher 3-way interaction was non-significant (F(9,642) = 0.56, p = 0.832, η2partial = 0.01); this indicates that there was no systematic difference in the effects of each individual teachers’ implementation of the multimodal STEM text sets between SWDs and students without disabilities.
Table 2.
Tests of Between-Subjects Effects in the Evaluation Model
| Between-Subjects Variable | SS | df | MS | F | p | η2partial |
|---|---|---|---|---|---|---|
| Prior | 2596.78 | 1 | 2596.78 | 179.93 | 0.000 | 0.22 |
| IEP | 165.60 | 1 | 165.60 | 11.47 | 0.001 | 0.02 |
| Teacher | 1051.75 | 9 | 116.86 | 8.10 | 0.000 | 0.10 |
| Teacher x IEP | 415.82 | 9 | 46.20 | 3.20 | 0.001 | 0.04 |
| Error | 9265.34 | 642 | 14.43 |
Table 3.
Tests of Within-Subjects Effects in the Evaluation Model
| Within-Subjects Variable |
SS | df | MS | F | p | η2partial |
|---|---|---|---|---|---|---|
| Time | 114.80 | 1 | 114.80 | 17.13 | 0.000 | 0.03 |
| Time x Prior | 4.81 | 1 | 4.81 | 0.72 | 0.397 | 0.00 |
| Time x IEP | 12.29 | 1 | 12.29 | 1.83 | 0.176 | 0.00 |
| Time x Teacher | 138.10 | 9 | 15.34 | 2.29 | 0.016 | 0.03 |
| Time x IEP x Teacher | 33.68 | 9 | 3.74 | 0.56 | 0.832 | 0.01 |
| Error | 4303.48 | 642 | 6.70 |
Given that the effect of Time (pre-post effect) was significant (F(1,642) = 17.13, p < 0.001, η2 partial = 0.03) and that measures at the end of instruction were higher than those at the beginning of instruction (Tables 3 and 4, Figure 3), we can support the claim that both groups made significant gains from instruction. Table 4 indicates that SWDs started at an average of −2.72 logits, but after instruction scored at an average of −1.27 logits, representing a significant gain of 1.45 logits (t(72) = 3.23, p = 0.002, SMD = 0.33). A standardized mean difference (SMD) of 0.33 indicates that the intervention moved the average SWD from the 50th percentile to the 63rd percentile among SWDs. Students without disabilities started higher, at −0.44 logits, and ended up at 0.27 logits, representing a gain of 0.71 logits (t(589) = 4.62, p << 0.001, SMD = 0.19). A standardized mean difference (SMD) of 0.19 indicates that the intervention moved the average student without disabilities from the 50th percentile to the 58th percentile among students without disabilities. Although the magnitude of the effect size of the change was higher for SWDs, the non-significant Time x IEP interaction indicates that the shift from the 50th percentile to the 63rd percentile among SWDs across the intervention is not significantly greater statistically than the shift from the 50th percentile to the 58th percentile among students without disabilities; hence there is insufficient evidence to suggest that ability to make gains over the course of instruction was affected by whether or not the student had a disability and was on an IEP.
Table 4.
Pre-Post Gains in CER For SWD and Students Without Disabilities
| IEP status |
N | Pre | SD | Post | SD | Gain | SD | SMD | t | p-value |
|---|---|---|---|---|---|---|---|---|---|---|
| Yes | 73 | −2.72 | 4.32 | −1.27 | 4.53 | 1.45 | 3.82 | 0.33 | 3.23 | 0.002 |
| No | 590 | −0.44 | 3.69 | 0.27 | 3.96 | 0.71 | 3.73 | 0.19 | 4.62 | 0.000 |
Zones of Change for Students with and without Disabilities
Given that significant gains were experienced by SWDs and students without disabilities, it becomes instructive to utilize the person-facet map (Figure 3), which is a map of the student CER measure and facet difficulty/severity distributions on a common linear scale, to better understand the degree to which students improved their CER ability. For the SWDs, the average score on the pre- and post-measures for these students are below the Rasch model-derived logit difficulty levels for all competencies. Because the average post-test ability measure of SWDs sits below the difficulty level of these competencies, it suggests that all of them were still difficult for the average SWD even after instruction. However, when looking at the 3rd quartiles in the SWDs' boxplots on the pre-test, the 3rd quartile sits at the difficulty levels of the competencies “claim” and “communication” indicating that the model predicts that 3rd quartile SWDs have an equal probability scoring a 1 or 4 on the rubric for these competencies and are more likely to score a 1 on the rubric on the more difficult competencies. However, after instruction, these 3rd quartile students’ measures exceeded the difficulty level of all competencies including the most difficult competency (using evidence). This means that by the post-test, these 3rd quartile SWD students are more likely to score a 4 on the rubric for all elements of CER measured by the SBAs.
Discussion
Our efforts to integrate multimodal STEM texts into general education science and English classrooms is underpinned by the hypothesis that purposeful sequencing and scaffolding of texts with varied complexity provides a mechanism to not only increase comprehension, vocabulary, and reasoning ability, but also to differentiate instruction with the result that students at differing ability levels, including SWD, can actively engage with the reading, writing about what they read, and discussion with their peers and instructor about what they read in order to be able to more effectively engage in argumentation. With respect to science classrooms, the goal was to integrate more literary contexts such as opportunities to read and write into the activities. For the English classrooms, the text sets provided an opportunity to pursue these activities in a discipline-based context, namely science.
Our study's outcome supports multimodal STEM text sets with SWDs and students without disabilities. It is interesting to note that the magnitude of effect size for the difference between the pre- and post-tests of CER was greater for SWDs although the difference in effect sizes was not statistically significant at the 95% confidence level. The SWDs made gains in their CER ability at a rate similar to students without disabilities. This is encouraging, as the instruction could be considered equally effective for SWD as it was for students without disabilities. Although more research is needed, this type of instruction can potentially help reduce the academic gap. Further, like findings from Kim et al., (2021a, 2021b) and Cervetti et al., (2016), we suggest the use of a multimodal STEM text set can be a powerful intervention towards building argumentation skills. However, our findings extend the work of Kim et al., (2021a, 2021b) and Cervetti et al., (2016) to include SWDs as also positively benefiting from the use of multimodal STEM text sets. This finding is tremendously encouraging given the few studies focused on classroom-based science interventions for diverse learners in general and, more specifically, studies that disaggregate data subgroups such as students with disabilities (e.g., Lee & De La Paz, 2021).
Although SWDs made gains in all elements of CER, both making a claim and written communication made the least amount of gain. We had suggested that the “simplest” task in the CER process may have involved making a claim. Yet, for the SWDs in this study, this task appears to be more difficult for them. This seems contrary to findings from other studies that suggest SWDs did not have difficulty making claims and instead may have more difficulty with reasoning and evidence following instruction in the CER process (De La Paz et al., 2022; Klein & Rose, 2010). This may be an artifact of the task and the way the claim was scored. To receive a high score on the rubric, students were required to, in their claim, indicate the complexities of the issue and/or consider other viewpoints (Friedrich et al., 2018). Recognizing that there are multiple points of view or complex layers as a part of a claim are not typically included when scoring claim statements (e.g., Bulgren et al., 2014). Further, definitions of claim typically presented to students draw from Toulmin’s (1958) model in which a claim is “a conclusion whose merits we are seeking to establish” (p.90). However, it may be that additional instructional support, for teachers and students, that includes opportunities to incorporate multiple perspectives and viewpoints when writing a claim needs to be included in the multimodal STEM text sets.
For SWDs their written communication (e.g., clarity, mechanics such as sentence construction) appears to also need additional instructional support. Despite some gain in CER processes following instruction, poor written communication skills may have impacted their scores–particularly with claim–as was observed in a study by Deane et al., (2019). Difficulty with written communication has been noted in several studies that can compound their ability in CER (Boyle et al., 2016; De La Paz & Levin, 2018; Hebert et al., 2018; Levin, et al, 2021). To support SWD’s development in written communication, it may be important to add instructional components as a scaffold; for example, focused explicit instruction on how to plan, revise and self-monitor writing (e.g., Mason et al., 2011).
Limitations and Future Research
Although the findings were encouraging, there are some limitations of the study to consider. Foremost, we did not carry out the research within a specific disciplinary context (e.g., science only), therefore did not measure content knowledge directly; rather, we measured this indirectly through the rating of students’ use of domain-specific vocabulary. Reader familiarity with the content plays a key role in comprehension (Lupo et al., 2019; Smith et al., 2021). Based on the indirect measure of domain-specific vocabulary, we believe content knowledge improved during the instruction, which helped support the higher-level CER competencies. However, a future study which restricts the measurement of CER to a specific disciplinary context and proceeding to measure content knowledge within that same context using a validated knowledge instrument, would shed more light on how CER and content knowledge work together to support improvement in literacy.
The second limitation to discuss is that our study design did not permit the identification of students with specific learning disabilities; rather, we identified SWD based on whether a student was on an IEP. Although being on an IEP is a standard way to identify a student with disabilities and it is reasonable to assume that most of the SWD are students with LD, this still needs further investigation. Studies which focus on SWD in a more fine-grained way would shed light on how these types of curricula support or hinder students with specific disabilities.
The final limitation to discuss is that there were no observations of teachers to ensure fidelity of implementation of the multimodal STEM text sets as developed; rather, we examined teachers multimodal STEM text sets to ensure key program components were included. Although such a fine-grained approach to fidelity is beyond the scope of the current work, including these types of observational data in future studies may help shed more light on which aspects of the program worked best and which aspects did not work as well for SWD.
Supplementary Material
Acknowledgments
This project is funded by the National Institutes of Health Science Education Partnership Award (NIH-SEPA).
Contributor Information
W. Romine, Wright State University
D. van Garderen, University of Missouri
W. Folk, University of Missouri
A. Lannin, University of Missouri
R. Juergensen, University of Missouri
C. Smith, University of Missouri
H. Abedelnaby, University of Missouri
T. Milarsky, University of Missouri
References
- Bricker LA, Bell P, Van Horne K, & Clark TL (2017). Obtaining, evaluating, and communicating information. Helping students make sense of the world using the Next Generation Science Standards, 259–281. [Google Scholar]
- Berland LK, & McNeill KL (2010). A learning progression for scientific argumentation: Understanding student work and designing supportive instructional contexts. Science Education, 94(5), 765–793. [Google Scholar]
- Berland LK, & Hammer D (2012). Framing for scientific argumentation. Journal of Research in Science Teaching, 49(1), 68–94. [Google Scholar]
- Boone WJ, Staver JR, & Yale MS (2013). Rasch analysis in the human sciences. Springer Science & Business Media. [Google Scholar]
- Boyle S, Rizzo KL, & Taylor JC (2020). Reducing Language Barriers in Science for Students with Special Educational Needs. Asia-Pacific Science Education, 6(2), 364–387. [Google Scholar]
- Boyle JR, Rosen SM, & Forchelli G (2016). Exploring metacognitive strategy use during note-taking for students with learning disabilities. Education 3–13, 44(2), 161–180. [Google Scholar]
- Cappiello MA, & Dawes ET (2021). Text Sets in Action: Pathways through Content Area Literacy. Stenhouse Publishers. [Google Scholar]
- Cervetti GN, & Hiebert EH (2015). The sixth pillar of reading instruction: Knowledge development. The Reading Teacher, 68(7), 548–551. [Google Scholar]
- Cervetti GN, Wright TS, & Hwang H (2016). Conceptual coherence, comprehension, and vocabulary acquisition: A knowledge effect? Reading and Writing, 29(4), 761–779. [Google Scholar]
- Cheuk T. (2013). Relationships and convergences among the mathematics, science, and ELA practices. Refined version of diagram created by the Understanding Language Initiative for ELP Standards. Palo Alto, CA: Stanford University. [Google Scholar]
- Deane P, Song Y, van Rijn P, O’Reilly T, Fowles M, Bennett R, … & Zhang M (2019). The case for scenario-based assessment of written argumentation. Reading and Writing, 32(6), 1575–1606. [Google Scholar]
- De La Paz S, Butler C, Levin DM, & Felton MK (2022). Effects of a cognitive apprenticeship on transfer of argumentative writing in middle school science. Learning Disability Quarterly, [online first]. DoI: 07319487221119365. [Google Scholar]
- De La Paz S, & Levin DM (2018). Beyond “they cited the text”: Middle school students and teachers’ written critiques of scientific conclusions. Research in Science Education, 48(6), 1433–1459. [Google Scholar]
- De La Paz S. (1999). Self-regulated strategy instruction in regular education settings: Improving outcomes for students with and without learning disabilities. Learning Disabilities Research & Practice, 14(2), 92–106. [Google Scholar]
- Fisher D, & Frey N (2014). Addressing CCSS anchor standard 10: Text complexity. Language Arts, 91(4), 236–250. [Google Scholar]
- Freeman S, Eddy SL, McDonough M, Smith MK, Okoroafor N, Jordt H, & Wenderoth MP (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410–8415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilkerson J, Richards JA, Warren SF, Montgomery JK, Greenwood CR, Kimbrough Oller D, … & Paul TD (2017). Mapping the early language environment using all-day recordings and automated analysis. American journal of speech-language pathology, 26(2), 248–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giorgis C, & Johnson NJ (2002). Children's books: Text sets. The Reading Teacher, 56(2), 200–208. [Google Scholar]
- Gotwals AW, & Songer NB (2010). Reasoning up and down a food chain: Using an assessment framework to investigate students' middle knowledge. Science Education, 94(2), 259–281. [Google Scholar]
- Hart B, & Risley TR (2003). The early catastrophe: The 30 million word gap by age 3. American Educator, 27(1), 4–9. [Google Scholar]
- Individuals with Disabilities Education Improvement Act of 2004, 20 U.S.C. § 614 et seq. [Google Scholar]
- Kim JS, Burkhauser MA, Mesite LM, Asher CA, Relyea JE, Fitzgerald J, & Elmore J (2021a). Improving reading comprehension, science domain knowledge, and reading engagement through a first-grade content literacy intervention. Journal of Educational Psychology, 113(1), 3–26. 10.1037/edu0000465 [DOI] [Google Scholar]
- Kim JS, Relyea JE, Burkhauser MA, Scherer E, & Rich P (2021b). Improving elementary grade students’ science and social studies vocabulary knowledge depth, reading comprehension, and argumentative writing: A conceptual replication. Educational Psychology Review, 33(4), 1935–1964. [Google Scholar]
- Kimball DR, & Holyoak KJ (2000). Transfer and expertise. In Tulving E & Craik FIM (Eds.), The Oxford handbook of memory: 109–122. New York: Oxford University Press Transfer and expertise. The Oxford Handbook of Memory, 109-122. [Google Scholar]
- Klein PD, & Rose MA (2010). Teaching argument and explanation to prepare junior students for writing to learn. Reading Research Quarterly, 45(4), 433–461. [Google Scholar]
- Kuhn D. (2010). Teaching and learning science as argument. Science Education, 94(5), 810–824. [Google Scholar]
- Landauer TK, & Dumais ST (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211. [Google Scholar]
- Lannin A, van Garderen D, Abdelnaby HZ, Smith CM, Juergensen RL, Folk W, Palmer T, & Romine W (under review). Scaffolding learning via multimodal STEM text sets for students with learning disabilities. Learning Disability Quarterly. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levin DM, De La Paz S, Lee Y, & Escola EN (2021). Use of cognitive apprenticeship models of instruction to support middle school students’ construction and critique of written scientific explanations and arguments. Learning Disabilities: A Multidisciplinary Journal, 26(1), 58–72. [Google Scholar]
- Linacre JM (2011). A user’s guide to WINSTEPS [Computer Manual]. Chicago: Winsteps. [Google Scholar]
- Linacre JM, & Wright BD (1988). Facets. Chicago, IL: MESA. [Google Scholar]
- Linacre JM, & Tennant A (2009). More about critical eigenvalue sizes (variances) in standardized residual principal components analysis (PCA). Rasch Measurement Transactions, 23(3), 1228. [Google Scholar]
- Llewellyn D. (2013). Teaching high school science through inquiry and argumentation. Corwin Press. [Google Scholar]
- Lupo SM, Berry A, Thacker E, Sawyer A, & Merritt J (2020). Rethinking text sets to support knowledge building and interdisciplinary learning. The Reading Teacher, 73(4), 513–524. [Google Scholar]
- Lupo SM, Strong JZ, Lewis W, Walpole S, & McKenna MC (2018). Building background knowledge through reading: Rethinking text sets. Journal of Adolescent & Adult Literacy, 61(4), 433–444. [Google Scholar]
- Lupo SM, Tortorelli L, Invernizzi M, Ryoo JH, & Strong JZ (2019). An exploration of text difficulty and knowledge support on adolescents' comprehension. Reading Research Quarterly, 54(4), 457–479. doi: 10.1002/rrq.247. [DOI] [Google Scholar]
- Mason LH, Harris KR, & Graham S (2011). Self-regulated strategy development for students with writing difficulties. Theory into practice, 50(1), 20–27. [Google Scholar]
- McLaughlin M, & Overturf BJ (2012). The common core: Insights into the K-5 standards. Reading Teacher, 66(2), 153–164. [Google Scholar]
- McNeill KL, González-Howard M, Katsh-Singer R, & Loper S (2017). Moving beyond pseudoargumentation: Teachers’ enactments of an educative science curriculum focused on argumentation. Science Education, 101(3), 426–457. [Google Scholar]
- McNeill KL, & Krajcik J (2008). Inquiry and scientific explanations: Helping students use evidence and reasoning. In Luft J, Bell R, & Gess-Newsome J (Eds.), Science as inquiry in the secondary setting (pp. 121–134). National Science Teachers Association Press. [Google Scholar]
- Miller-Struttmann NE, Heise D, Schul J, Geib JC, & Galen C (2017). Flight of the bumble bee: Buzzes predict pollination services. PloS One, 12(6), e0179273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moon NW, Todd RL, Morton DL, & Ivey E (2012). Accommodating students with disabilities in science, technology, engineering, and mathematics (STEM) (pp. 8–21). Atlanta: Center for Assistive Technology and Environmental Access, Georgia Institute of Technology. [Google Scholar]
- NGSS Lead States. (2013). Next generation science standards: For states, by states. Washington, DC: The National Academy Press. [Google Scholar]
- Norris SP, & Phillips LM (2003). How literacy in its fundamental sense is central to scientific literacy. Science Education, 87(2), 224–240. [Google Scholar]
- Osborne JF, Henderson JB, MacPherson A, Szu E, Wild A, & Yao SY (2016). The development and validation of a learning progression for argumentation in science. Journal of Research in Science Teaching, 53(6), 821–846. [Google Scholar]
- Pearson PD, Hansen J, & Gordon C (1979). The effect of background knowledge on young children's comprehension of explicit and implicit information. Journal of Reading Behavior, 11(3), 201–209. [Google Scholar]
- Pearson PD, Moje E, & Greenleaf C (2010). Literacy and science: Each in the service of the other. Science, 328(5977), 459–463. [DOI] [PubMed] [Google Scholar]
- Shanahan T. (2019). Why children should be taught to read with more challenging texts. Perspectives on Language and Literacy, 45(4), 17–19. [Google Scholar]
- Sjöström J, & Eilks I (2018). Reconsidering different visions of scientific literacy and science education based on the concept of bildung. In Dori Y, Mevarech Z, Baker D (Eds.), Cognition, metacognition, and culture in STEM education, 9pp. 65–88). Springer. [Google Scholar]
- Smith R, Snow P, Serry T, & Hammond L (2021). The role of background knowledge in reading comprehension: A critical review. Reading Psychology, 42(3), 214–240, DOI: 10.1080/02702711.2021.1888348 [DOI] [Google Scholar]
- Sperry DE, Sperry LL, & Miller PJ (2019). Reexamining the verbal environments of children from different socioeconomic backgrounds. Child Development, 90(4), 1303–1318. [DOI] [PubMed] [Google Scholar]
- Spiro RJ, Collins BP, Thota JJ, & Feltovich PJ (2003). Cognitive flexibility theory: Hypermedia for complex learning, adaptive knowledge application, and experience acceleration. Educational Technology, 43(5), 5–10. [Google Scholar]
- Steinweg C, & Gutowski WJ (2015). Projected changes in greater St. Louis summer heat stress in NARCCAP simulations. Weather, Climate, and Society, 7(2), 159–168. [Google Scholar]
- Swaminathan H, & Rogers HJ (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. [Google Scholar]
- Toulmin S. (1958). The uses of argument. Cambridge: Cambridge University Press. [Google Scholar]
- US Department of Education (2020) Office of Special Education Programs, Individuals with Disabilities Education Act (IDEA) database. Available at https://nces.ed.gov/programs/coe/indicator_cgg.asp
- What Works Clearinghouse. (2020). What Works Clearinghouse standards handbook (Version 4.1). National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. https://ies.ed.gov/ncee/wwc/handbooks [Google Scholar]
- Wright B. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
