Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 1.
Published in final edited form as: Res Sci Educ. 2017 Nov 21;49(6):1783–1808. doi: 10.1007/s11165-017-9675-6

Supporting high school student accomplishment of biology content using interactive computer-based curricular case studies

J Steve Oliver 1, Georgia W Hodges 1, James N Moore 2, Allan Cohen 3, Yoonsun Jang 3, Scott A Brown 5, Kyung-A Kwon 6, Sophia Jeong 1, Sara P Raven 7, Melissa Jurkiewicz 8, Tom P Robertson 2,4
PMCID: PMC7413601  NIHMSID: NIHMS922300  PMID: 32773911

Introduction

There is a shift underway that is moving the content of science curricula to virtual formats (Smetana & Bell, 2012). As a result, science content is being depicted and illustrated in a manner that permits more dynamic and interactive opportunities for learning and authentic inquiry. The opportunities afforded by virtual formats will assist in meeting the goals set forth in in the Next Generation Science Standards (NGSS Lead States, 2013). Hickey et al. (2012) noted, “This synergy between technology and theory is particularly apparent in science education, where the visualization technologies that have become central to inquiry in many domains are ideal for learning how to engage in inquiry” (p. 1240). Thus, the underlying purpose of the dynamic visualizations is not simply to make the invisible aspects of science concepts more accessible, but also to enable students to experience scientific practices.

The need for research on innovative instructional technologies to accomplish new standards was captured by McElhaney et al. (2015) who wrote: “Emerging instructional frameworks that call for students to view science as more than the recounting of scientific facts will benefit from insights concerning how dynamic visualizations can promote authentic investigations and provide valid student assessment” (p. 49). The research reported here had two main purposes related to the design, development and study of such materials. The first is to examine student achievement across an introductory high school biology unit on cell structure and function that incorporated modules containing dynamic visualizations within interactive environments. The second purpose is to present a preliminary analysis of how that achievement is associated with specific features of those modules. Information gleaned from in-depth analyses will form the basis for subsequent manuscripts. During two consecutive academic years, our research team studied how six teachers of introductory biology planned, taught and assessed student learning about diffusion, osmosis and filtration within a curricular unit regarding cell structure and function. In Year 2, the modules were included in the cell structure and function curricular unit replacing laboratory activities covering the same biological processes.

This study was conducted as part of a project to create modules featuring dynamic visualizations that place basic biological concepts into the context of real-world problems. Within each module, students assume the role of a science-based professional and are presented with a scenario through which they solve a relevant problem. In an effort to engage students, the modules were built with interactive videogame-like interfaces. These interfaces immerse students into representations of tissues, cells, molecules and ions in animals, people and medical equipment.

When the research team formed in the early 2000s, our primary goal was to encourage engagement and persistence in STEM learning among high school students. Our team assembled around a belief shared across stakeholders in the science education community that “visual affordances improve the teaching and learning of science by generating student interest and increasing motivation levels” (Waight et al., 2014, p. 468). We were further motivated by our team’s prior work demonstrating the efficacy of accurate dynamic visualizations in improving veterinary student learning of complex cell signaling processes (Author, 2005).

The research team always proceeded from the perspective that instructional materials should be developed with teachers, rather than “for” teachers. Consequently, the team included experienced biology teachers. While these teachers agreed that improved visualizations could enhance instruction, their stronger objective was for more interactive learning experiences. The teachers described how their students enjoyed learning through case studies. We decided that students would be highly engaged by experiencing the case studies by role-playing as a scientist and/or medical practitioner.

The case study modules that resulted qualify as serious games using the definition of Sadler et al. (2015). Further, our work is consonant with efforts to show “how, what, and the conditions under which students can learn from game-based science curricula” (Sadler et al., 2015, p. 699). Tying the case studies to role playing with a game-type environment was a means to promote the realization that a science-related career was within their reach. These linkages are in keeping with the idea that “identity congruence is likely to be a consideration in STEM persistence plans” (Andersen & Ward, 2014, p. 237).

One main research question served to focus the study being reported here. What impacts on student learning can be associated with the introduction of computer-based interactive modules in the Cell Unit of an introductory high school biology course?

Background and Conceptual Framework

In order to put forward a coherent conceptual framework for this study, we examined the growing body of research literature connecting dynamic visualizations and student achievement in science. Most of the research related to the implementation of game-type modules in the science curriculum is nested within that connection. Facilitating students to construct deep understanding of the subject matter in science courses is the outcome of greatest interest to our overall efforts. Like Fisher, et al. (2011), we believe that a primary impediment to students’ development of “a deeper understanding of diffusion and osmosis may be that instructors … rely on more traditional, didactic-style methods of teaching these abstract concepts” (p. 426). As a result, we have sought to construct a conceptual framework in the way that Maxwell (2013) uses the term: “a visual or written product…a tentative theory of the phenomena that you are investigating” (p. 39). The framework presented here links research-based factors of the creation and classroom use of science visualizations to student understanding. Our synthesis of the literature will examine five areas of related research that informed our work and connected students’ response characteristics to the construction of modules featuring dynamic visualizations. These are: (1) the nature of the visualization; (2) instructional use of multiple representations; (3) module design; (4) contextualization of content; and (5) considerations for assessment of student learning (see Figure 1). We have chosen, as one component of this framework, to use research based on the assessment of learning within learning progressions. This research provides support for assessment efforts within the project even though we are not studying a learning progression. This choice was made because our work examines student learning within curricular units and not just as a product of brief encounters with dynamic visualizations. Thus, we found that research on learning progressions provided the most insightful guide to interpretation of our findings. Our work is informed by, and contributes to, the body of work regarding how assessments create understanding of the progress of student learning (e.g., Fulmer et al., 2014).

Figure 1.

Figure 1

Factors connecting student learning and the use of dynamic visualizations

Visualization and animations

Ainsworth and VanLabeke (2004) defined dynamic visualizations as those representations that “display processes that change with respect to time” (p. 241) and considered animations as the “prototypical dynamic representations.” McElhaney et al. (2015) described how dynamic visualizations can help learners develop models of phenomenon and generate “rich explanatory accounts” of those aspects of science that are not directly observable. Most of the concepts of cell biology fall into this “unobservable” category. Oztas (2014) wrote that “one reason students may have difficulty with the concepts of diffusion and osmosis is because the concepts require students to visualize and think about chemical processes at the molecular level” (p. 3680).

In a study of the instructional use of dynamic visualizations in science instruction, Cook (2006) reported that animations were generally found to be better learning supports than static graphics even though the results were not consistent from study to study. In a more recent review of the literature, McElhaney et al. (2015) suggested that: “studies show that dynamic visualizations are better than static visuals at promoting conceptual inferences about science, consistent with the success of inquiry instruction in science” (p. 49). However, these authors pointed out that much of what is known about the implementation of dynamic visualizations arose from small studies covering a single concept and not from larger studies in which dynamic visualizations were used to teach more complex biological functions.

A number of studies have been published that involved longer use of dynamic visualizations. In one of these studies, assessing the efficacy of dynamic visualizations as a means to improve middle school students’ understanding of photosynthesis, Ryoo and Lynn (2012) created and implemented three inquiry activities. The primary aims of their study were to integrate the students’ existing knowledge regarding photosynthesis with the new information being explored, address misconceptions the students might have, and provide opportunities for them to refine their own ideas about how photosynthesis functions in the world. The effectiveness of the dynamic visualizations was compared with that of the same material taught using sequences of static images captured from the dynamic visualizations. While students taught using both approaches made significant improvements in their understanding of energy transformation, students in the dynamic visualizations group significantly outperformed the students in the static visualization group in all areas. The results of this study suggest that dynamic visualizations, when incorporated into a framework that provides appropriate, well-timed support, have the potential to significantly improve student understanding of complex concepts that otherwise may be difficult to envision.

In contrast, Naah and Sanger (2013) determined that the opposite is true for students tasked with understanding the process of dissolving different chemical compounds at the symbolic and particulate levels. They compared the effectiveness of teaching modules that incorporated either static pictures or dynamic visualizations prior to or immediately after presentation of symbolic equations depicting the possible end result of the process. They determined that students performed best when they viewed the static images first and then the balanced equation, and performed more poorly when dynamic visualizations were viewed before the balanced equation. They concluded that balanced equations typically are viewed as “before and after” snapshots of the reactants and final products, and that dynamic visualizations can be distractive if the concept at hand does not involve motion or trajectory. Thus, it is important for instructors to consider the mechanisms underlying the concept being covered and whether or not movement and directionality are important components of those mechanisms.

In a study designed to link scientific explanations of static electricity with everyday observations, Shen and Linn (2011) used the knowledge integration framework to analyze how dynamic visualizations might help students deal with abstract concepts. While they did not compare the effectiveness of dynamic visualizations with static images, they concluded that dynamic visualizations help students integrate and refine ideas about electrostatics.

In two of these studies, Ryoo and Linn (2012) and Shen and Linn (2011), an inquiry learning environment was used as a means to immerse students into dynamic visualizations. Donnelly, et al., (2014) conducted an extensive review of similar environments and discerned that the largest gains in student accomplishment were reported for those studies that used dynamic visualization.

Multiple representations in authentic learning environments

Scholarship regarding the use of dynamic visualizations and animations is closely linked to the large body of work on the use of multiple external representations in teaching science (Treagust & Tsui, 2013). Ainsworth and VanLabeke (2004) proposed that multiple representations can support learning in three main ways: (1) allow for complementary computational processes or information; (2) place constraints on possible interpretations of individual representations, and (3) encourage students to pursue deeper understanding. They proposed that deeper understandings result when a learner connects representations by means of an abstraction, thereby identifying “shared invariant features of a domain” (p. 248) rather than just by features that are the sole properties of individual representations.

While Tsui and Treagust (2013) posit that “learners are likely to benefit when information is presented in more than one representation” (p. 4), the structure of biology is unique among the sciences and must be taught as such. “Whereas how the biological knowledge is represented by different modes rests on the expertise of the designers—of the curriculum, educational software, or classroom instruction—whether these multi-representational learning environments can serve one or more of the pedagogical functions largely depends on the way they are deployed in teaching” (p. 9). Research by Malinska and colleagues (2016) expounded the challenges learners face about the specific content of the work reported here. They reviewed the literature regarding student learning of concepts related to water regulation and reported that “it is thought that problems with understanding diffusion and osmosis are the result of a number of causes: a confusion regarding vernacular and scientific usage of such terms as pressure, concentration, and quantity; misunderstanding technical concepts such as solution, semi-permeability, and molecular and net movement; and insufficient abilities in terms of formal reasoning, visualization, and thinking at the molecular level” (p. 2). The nature of the learning difficulties reported here is validation of the importance of teaching this specific content using multiple representations.

Eilam (2013) examined constraints on the use of visual representations as a component of teaching with multiple representations, and concluded that four major sources of student difficulties arise within the “complex learning environment of multiple representations” (p. 56). Learner characteristics, such as students’ cognition, a major component of which is prior knowledge, powerfully influence how students interface with the technologies used to present the visualizations. Characteristics of the representations are the second source, including “complexity, structure, abstractness, cognitive load, attention, spatial arrangement, and ease of processing” (p. 56). Eilam’s third source of difficulty relates to the characteristics of the pedagogy applied to the visual representations. Examples of these include whether there is interaction vs passivity, the instructional approach, explicitness, supports available for the pedagogy, and the degree to which teachers were trained in the use of the representations. The “contextual characteristics” are the fourth source of difficulty, and include such factors as students’ demographic backgrounds and the attitudes and beliefs of students and teachers. Eilam also suggested a fifth source of student difficulty which is based on the “status and placement” of the visual representation in the curriculum. In the research reported here, we have attempted to align the design and implementation considerations for the interactive case studies with these sources of student difficulty.

Similarly, Waight et al. (2014) concluded that “having access to concrete representations reduces the cognitive load and releases learners from having to contend with imaginary images” (p. 468). One explanation for this was studied using eye-tracking software. Cook et al. (2008) determined that students with lower prior knowledge tended to transition more frequently between macroscopic representations and molecular representations and had “more difficulty forming linkages across the different representational levels” (p. 863). These authors attributed those difficulties to lower prior knowledge because those students had more difficulty “mapping the underlying content of one representation onto another [and] relied heavily on surface features that may not be easily coordinated between macroscopic and molecular levels” (p. 863).

Assessment of learning and contextualization

This final section of the framework makes connections between research on contextualization of science instruction and assessment of science learning. Our work is consistent with the view that high quality student-generated explanations are more likely to be the product of deep science learning (Kang, Thompson, & Windschitl, 2014) than any that might arise from the “reproduction” of what is said in class or read in a textbook.

There are many challenges inherent in getting students to generate high quality explanations. Alonzo and Steedle (2009) identified two of those challenges in the cognitive science literature. The first of those problems “is the consistency with which students respond to problem situations in which the same underlying principles apply. Second is the role of language…” (p. 393). The role of language is subdivided by these authors into the role it plays as students attempt to describe their understanding, and the role it plays when they attempt to interpret the meaning for an assessment. In their study, two types of assessment items, open-ended items and ordered multiple-choice items, were used and compared. The authors concluded that it frequently is impossible to discern the level of student accomplishment given their answers on open-ended responses. Taking this conclusion a bit further, Alonzo and Steedle (2009) suggested that it would not be possible to create open-ended items “to elicit responses including sufficiently detailed explanations of the phenomena” (p. 416). For our work, this issue arose as a challenge of how to get students to provide the information in their open-ended responses that best linked the biological processes of interest to the medical issues underlying the case studies. Kang and colleagues (2014) suggested that “using contextualized phenomena is the strongest single predictor of the quality of student explanation” (p. 695) and hypothesized that contextualization helped students “engage in deeper forms of reasoning and demonstrate in-depth explanations in four ways” (p. 698): (1) “contextualization problematizes a generic set of conditions” (p.698) which means that a “generic phenomenon” can be described within the variables collected at multiple sites; (2) “contextualized phenomenon supports students in moving beyond reproductions of textbook explanations about general phenomena” (p. 698); (3) “situating the phenomena in the everyday experiences of students … helps them draw in additional intellectual resources from observations and relevant accounts of others” (p. 698); and (4) “contextualizing a phenomenon in a particular local community helps students relate to the problem, which allows them to become engaged in the work actively and emotionally” (p. 698). We believe these four outcomes characterize the introductory biology students’ responses to the case study situations in our modules.

Fulmer et al. (2014) described how efforts to understand student accomplishment “requires not only observing students’ growth with respect to understanding of the content assessment tools… [but also] requires examination of whether students exhibit the hypothesized relationships among forms of reasoning and of students’ ability” (p. 2921). This rationale for more sophisticated assessment tools is consistent with the conclusion reached by Donnelly et al. (2014) in their study of interactive learning environments. They described how much work is needed to “broaden the range of inquiry assessments” (p. 592). Drawing on the work of Hickey and colleagues, Donnelly et al (2014) described how embedded assessments can provide opportunities for student learning: “Research and development is needed to show how these items can become valid and reliable options for high-stakes assessment” (p. 593).

Similar to Furtak et al. (2014), our work is “consistent with the perspective that student knowledge develops over time” (p. 653) and recognizes the importance of connecting the ideas that students bring to class from their everyday experiences to “canonical, scientifically accurate descriptions” (p. 653). We considered the student responses to open-ended embedded items within the modules as examples of accomplishment that were “multifaceted, context-dependent, and developing over time” (p. 653). As described by Kang et al. (2014), “the construction of explanations is an essential feature of science, as well as a fundamental classroom activity that engages students in epistemic practices of the discipline” (p. 677) and creates a space in which students link phenomena to “proposed explanatory mechanisms.”

The research literature cited here builds a conceptual framework that links dynamic visualizations to student science learning. Through this linkage, better understanding is gained of how students can be supported to construct deep understanding and to demonstrate that understanding through their creation of rich explanatory accounts. The description of our process for creating, testing and using the interactive case study modules that follows builds from and elaborates this conceptual framework.

Design of the Interactive Case Study Modules

The project team consisted of experts from specialty fields (veterinary medicine, mammalian physiology, science education, instructional design, biology, computer science, dramatic media, videogame design, music and educational technology) within the academic community of a large southeastern US university. Team members shared the responsibilities of designing and creating all aspects of the interactive case studies including writing scripts, ensuring scientific accuracy, obtaining video footage, creating original background sounds and music, and designing and programming the computer interface. Our strategy was to relate the science concepts to real world problems, engage students with interactive learning environments containing dynamic visualizations and immerse students in these environments to enable them to experience authentic inquiry.

Using the scale proposed by Blanchard et al. (2010), we structured inquiry activities to meet either level 1 (i.e. question provided by the teacher but interpretation of the results was open to the student) or in some cases approaching level 2 (i.e. question provided by the student and both data collection and interpretation open to student). Although we designed the modules prior to the work reported by Tsui and Treagust (2013), the modules engage the students in biology across all levels of representation, which they identified as necessary for a “full understanding of biological phenomena” (p. 8).

An iterative design approach was used to create a workable prototype of each module that was then tested by students in biology classrooms at schools other than those ultimately involved in the study reported here. This iterative design process allowed the team to respond rapidly to feedback from teachers and students. Each module uses a variety of means to represent the subject matter content in “complementary” ways (Ainsworth, 1999). This was done to encourage students to develop deeper understandings by connecting and interrelating the different representations. For instance, Yarden and Yarden (2013) described how “active processing” by learners would be most likely to occur when the learners have complementary visual and narrative representations in their active memory. Additional aspects of the modules, including applying the biological concepts to real-life situations, labels and narratives complemented the dynamic visualization, encouraging intrinsic motivation, and making connections among different concepts, also support deep accomplishment by students (Chin & Brown, 2000).

One major goal of the current project was to create interactive modules, within which students conducted authentic inquiry rather than animations to be watched passively. This approach required creating virtual 3-dimensional models of the structures (i.e., cell membranes, blood vessels, alveoli, and dialysis filter membranes) involved in the three basic biological processes. The visualizations allowed the students to “see” each process in action, such as the movement of water through aquaporins in a cell membrane. These models were then incorporated into interactive environments, developed using a popular computer game engine (Unity-3D).

Each module was built around a case study of an animal or human that has been medically impacted by a specific disease or health condition related to one or more of the three biological processes. These scenarios were proposed by the clinicians and scientists and then discussed by the team as a whole. For instance, the topic of the first case study module was osmosis and the subject was a young (five day old) calf that was suffering from seizures due to hyponatremia. Within the module, the students are placed in the role of a veterinarian and are provided with the following scenario: a farmer sends a text message regarding the sick calf that includes a brief video clip showing the calf having a seizure. To ensure that the student is adequately supported to assume his/her role as a veterinarian, he/she will review background information about osmosis, aquaporins, tonicity, hyponatremia and other diseases causing seizures presented in a “seizure manual.” There is also an embedded tutorial to teach students how to navigate within the module. After reviewing the background material, the students see video of the calf and then “fly into” the dynamic visualization of its brain. Once inside this interactive environment, the student is introduced to the different cell types and parts of the environment. Students are tasked with collecting data (e.g., pressure, nerve activity level, sodium concentration) from within the dynamic visualization and are provided with virtual meters that display the data.

Once these data were collected and uploaded to the “patient record” (akin to a virtual laboratory notebook), the students were prompted to analyze and interpret the data in light of their learning related to the background material. They were then tasked with interpreting these data to form a hypothesis and then predict which treatment option would stop the seizures. Returning to the interactive virtual space, they would initiate the treatment and observe what happened on the data collection meters and in the visualizations. If the students selected an ineffective treatment, they observed deleterious changes in the physiological or mechanical system but were then alerted and requested to reconsider. After successfully treating the calf, students wrote a case summary that described the medical problem faced by the calf, the ultimate resolution of this problem, and the role of osmosis therein.

Ainsworth and VanLabeke (2004) wrote that “for multiple dynamic representations to support deeper understanding, learners must relate the representations to each other” (p. 250). The interactive case study lends itself to this form of connection-making as all activities are directed back to the medical issues facing a person or animal suffering from a health condition. Ainsworth and VanLabeke (2004) also described how “time-persistent” representations could play an important role in allowing learners to see the “relationship of current and past values” (p. 249). Within the modules, this was accomplished through the use of virtual meters in the interactive visualization that registered a value when students collected a sample and then allowed them see how their chosen treatments affected the original values. Because the values changed as they would in an actual biological system, the students were able to see how changes in one parameter (e.g., concentration of sodium ions in the blood) result in changes in other parameters (e.g., diffusion of water molecules into or out of the brain).

In their synthesis of research related to dynamic visualization, McElhaney et al. (2015) used three broad categories (placement, scaffolding inquiry component, and type) of design characteristics in order to group different curricular materials. Using their scheme, our modules would be classified as having: (1) the description of the activities occurring concurrently with the visualization; (2) the scaffolding inquiry component focused on “sense-making” by the student; (3) the features of the “type” were the presence of visual cues, inquiry prompts, interactivity, and 3-D visualizations; and (4) to a smaller degree having a personalized narrative. The visual cues and 3-D representations were created in different ways depending on where in the module they were used. Within the case study, the animations were more scientifically accurate representations of tissues and other features.

Approximately 30 assessment items of different types were embedded in each module. These assessments call on the student to: (1) respond to a prompt; (2) make a prediction after being introduced to a situation; (3) make hypotheses or select from among a set of hypotheses; (4) put events into the correct sequence; (5) describe what is on the screen; (6) describe connection between variables; (7) write a case summary. Based on the scale provided by Ruiz-Primo et al. (2012), the embedded items would be classified as “close” assessments.

Methodology

Study design and data collection

Mindful of research demonstrating that the teacher is the most significant variable in the classroom, we sought teachers who would participate in two consecutive years of the study. In Year 1, only traditional inquiry activities were used to teach the concepts that were addressed within the modules. This was done to ensure that the instructional unit in which the modules were to be included in Year 2 could be compared to a curricular unit that included activities for teaching the same topics. To minimize diffusion (Cook and Campbell, 1975) of the treatment effect, teachers were not shown the modules until after the Year 1 research data collection. This quasi-experimental design is labeled an “institutional cycles design” by Cook and Campbell (1975, p. 132). All students were informed that the pre- and post-test assessments would not contribute to their grade for the course, but were encouraged by their teachers to make an effort to do well.

The research team considered several sources of variability before selecting a school in a district in suburban Atlanta, GA. These sources of variability included consistency/collaboration in planning for teaching introductory biology, assurance that the same teachers were highly likely to be present in both years of the study; evidence of the teachers’ competency (e.g. apparent knowledge of the subject matter), and consistency in the presentation of the material among the biology teachers in the school. Ultimately, six teachers in one school agreed to participate in the study, and these teachers had a practice of common planning within the different levels of the biology courses.

At the participating school, introductory biology was taught on four distinctive levels labeled: Gifted (required test scores as evidence of high aptitude); Honors (admission was largely based on course grades, apparent motivation, recommendations, etc.); College Preparatory (CP - the general level course in introductory biology); and CP-collaborative (for students meeting special education guidelines). The CP-collaborative level was an inclusion class co-taught by a biology teacher and a special education teacher; the pre- and post-tests were read to these students.

Permission to collect data regarding race and ethnic group membership was not sought and so these data were not collected. Examination of the participating school’s student demographics revealed the following: 13% African-American, 62% White, 7% Asian, 13% Latino/Hispanic and 5% other; 22% received free and reduced lunch. In the year 2 sample, 211 females and 182 males took part in the study. Gender was not found to be significant source of variability in the analysis and is not presented as a variable in these findings.

Year 1 of the study began with a summer professional workshop during which the six teachers participated in sessions that covered relevant areas of science, but not specifically about the science content or instructional use of the modules to be used in Year 2. For instance, while one of the modules generally dealt with the impact of diabetes on kidney function, this module was specifically aimed at the students’ learning relative to filtration, semi-permeable membranes, diffusion, osmosis, and counter-current exchange. In the workshop, the teachers learned about diabetes in people and pet populations of the US, the medical impacts of diabetes and obesity, and typical misconceptions that students have about these processes.

Student data collected in Year 1 consisted of assessments of student content knowledge that were given as pre-tests and post-tests before and after the Cell Unit, respectively. The teachers of Honors and Gifted biology sections started this unit in early October and continued for about 12 instructional days. The teachers of CP and CP-collaborative sections started about two weeks later and finished on a similar timeframe.

Before the start of Year 2, the teachers again attended a one-week professional development session regarding both the science content and the use of the modules in their courses. During this workshop, university science professors led discussion/lecture sessions with the teachers about diffusion, osmosis, filtration, and homeostasis as they related to each of the case studies. The participating teachers also did a walk-through of the modules and discussed lesson structure for the implementation. Student data were collected during the same time of the year as Year 1.

Well in advance of data collection, the research team constructed and validated the pre- and post-test items. The items were written by a team consisting of science content experts and science educators, edited and validated by high school science teachers, and validated with cognitive interviews with students and with psychometric analyses of students’ answers. Additionally, the items were validated by examining the responses of several hundred students who used the modules, were of a comparable level of education, and were not participants in the study reported here. Two versions of the assessments were constructed, Form A and Form B. A student who took Form A as a pre-test took Form B as a post-test and vice versa. For year 2, when the modules were in use, a third set of items, Form C, were included with the post-tests regardless of which form had been used. The Form C items were used as anchor items to equate the two forms of the test.

Data analysis

Data from the pre- and post-tests were recorded on mark-sense answer sheets and read directly into computer files. In the initial data analysis, the scores on the Form A and Form B tests were equated using mean and sigma equating (Marco, 1977) on the data from the Form C items that all students took in conjunction with the post-test. A statistical technique called differential item functioning analysis (Pines, 1977) was used to screen the Form C items for use as anchors for establishing a common scale on which to express the scores from Forms A and B. The common scale was necessary in order that the scores on the two forms could be transformed to the common scale in order to be directly compared scores between the two forms and also between pre-test and post-test. The transformed scores were used for conducting subsequent higher level analyses. The analyses in this study are presented in terms of these transformed standardized scores.

In Year 1, 22 sections of biology were taught by the six teachers: Honors – 5, Gifted - 5, CP – 10, CP-collaborative – 2. In Year 2, the same teachers taught 18 sections of biology: Honors – 6, Gifted - 5, CP – 5, CP-collaborative – 2. Approximately the same number of students participated in each year, with differences in the number of sections reflecting changes in class sizes (primarily in CP classes) that occurred as the result of decreases in State funding for public education. Data for 407 students who completed the pre- and post-tests in Year 1 were available for analysis, and for 393 students in Year 2. Due to technical problems that resulted in a failure of the computer system to record the responses for some students to the embedded items and also crashing of a few computers while students were in progress completing a module, we were unable to document that each of the 393 students used every module. We had planned before the study began to have the modules available through the school system’s web server such that each student would download the modules directly to their computer and the responses to the embedded items would be electronically transferred back to our server at the university. At the last moment, we became aware that the school system’s computer security software would not allow this transfer of data. Thus, we put the modules on USB memory drives and gave them to the students as they entered their classroom. The data were written back to the memory drives but in some cases failed to do so due to hardware or operator errors. Analysis were run using the entire sample of 393 students and also using the subsample of 308 students for whom we had clear evidence of attempting at least two of the modules. The analyses produced very similar results, suggesting that the errors due to computer malfunctions were random and thus applied equally across the class levels and periods of the school day. Nonetheless, here we present the analyses using only the 308 students whose participation the treatment can be validated.

A mixture Rasch model in the context of latent transition analysis (LTA-MixRM: Cho, Cohen, Kim, & Bottge, 2010) was used to assess individual differences in the effects of the instructional intervention. The LTA-MixRM was used to determine different patterns of responses that reflected the effects of the instructional intervention on answering the pre-test and post-test items. This model detects latent subgroups in the student level data and then examines the different transition paths between latent subgroups from pre-test to post-test.

The LTA-MixRM was used to determine what changes in the students’ test scores might indicate about changes in student learning as a result of the instructional intervention. In this way, changes in the stages of student learning (i.e., as reflected in the pre-test and post-test scores) were evaluated. Fulmer and colleagues (2014) used a similar technique in order to identify “clusters of students into latent classes based on similarities in their responses to items” (p. 2924). We chose to use the LTA-MixRM for a similar purpose in examining the pre- to post-test change, as it was the pattern of responses that was of most interest in understanding the students’ accomplishment.

Next, a multivariate analysis of variance (MANOVA) was used to examine the simple effects, main effects and interactions using students’ responses to the pre-test and post-test as the dependent variable. This analysis was conducted to more fully understand the significance of these variables relative to the latent group transitions noted in the LTA-MixRM analysis.

The final segment of the findings reported in this article examines how the assessment by the embedded and post-test items supported understanding of the LTA. Inductive analysis of student responses to the embedded items during pilot testing led to the development of more than 90 rubrics used to score the items. We developed a comprehensive classification scheme following the guidance of Patton (2015) that included detailed parameters for awarding points for each of the more than 90 embedded items and refined these until the research team could apply the rubrics to new data with inter-rater agreement above 85%. In this study, only the quantitative results from the scoring of the embedded items were considered. Finally, we made the further assumption that the development of the students’ knowledge across their experiences in completing the module could be assessed by the embedded items. As a result, these embedded assessments serve a formative function for the summative assessment occurring at the conclusion of the unit (Hickey et al., 2012). Although a close assessment would be more instructionally sensitive than a summative distal assessment, we hypothesized that a robust relationship would exist between the embedded and post-test assessments.

Results

The LTA-MixRM created novel characterizations of the academic performances of the introductory biology students who participated in this research during the curricular unit on “the Cell”. Although these characterizations were calculated from responses to the pre- and post-tests, the LTA-MixRM provided insight that could not be acquired solely from examining gain scores calculated between pre- and post-test results. More importantly, after considering students’ change across the entire curricular unit in terms of membership in and transition between latent groups, a powerful impetus developed to link these characterizations to responses given to individual assessment items, both those embedded within modules as well as items on the post-test. Using these characterizations (e.g., latent transition pathway), statistical significance of these variables as well as that of other main effects (i.e., year and biology class level) and related interactions was established by a MANOVA model that used the pre- and post-test scores as the dependent variables. After presenting a model of the main effects and the interactions between them, descriptive statistics and the simple effects within the model will be used to support an explanation of how differences based on biology class level and latent transition pathway are related to student achievement. Finally, examination of student responses sorted by membership in the latent transition pathways will provide implications for future module design and assessment.

An exploratory MixRM analysis was conducted to determine the best fitting MixRM model. Candidate models with from 1 to 5 latent groups were considered. Results of the MixRM analysis identified two latent groups of students. This model was determined to be the best fitting based on the Bayesian Information Criterion (BIC) index (Schwartz, 1978). BIC has been shown to be more accurate at detecting the best fitting mixture IRT model among candidate models being considered (Li, Cohen, Kim & Cho, 2009). The two latent groups were characterized as high- and low-achieving based on their response patterns and scores on the pre- and post-tests. The latent transition model was initially built from the post-test data and then applied to the entire dataset. Results of the MixRM for the pre-test data for Year 1 revealed that 402 of the 407 students were classified in the low-achieving latent group. Based on analysis of the post-test data after completion of the Cell Unit, 31% of those students had transitioned to the high-achieving group; the five students initially in the high achieving latent group remained in that group.

In Year 2, results of MixRM analysis of the pre-test data resulted in 251 of the 308 students being grouped in the low-achieving latent group. Based on analysis of the post-test data after completion of the Cell Unit in which the modules were used, 67% of those students had transitioned to the high-achieving group; 88% of the students who were initially classified in the high-achieving group remained in the high-achieving group. Figure 2 shows the distribution of students from Year 2 across the span of the Cell Unit

Figure 2.

Figure 2

Latent group transition analysis for year 2

*number of students within each test or transition pathway

**proportion of students within each, test or transition

MANOVA was used to examine main effect variables exogenous to the students’ achievement on the pre- and post-tests. Three variables were used as main effects and interactions: year, latent transition pathway, and biology class level. The first predictor variable was year as a categorical variable to indicate two cohorts of students (i.e. Year 1 and Year 2). The second predictor was latent transition pathway. The LTA model had detected two latent groups in the data and four transition patterns. These patterns were initially characterized as high-high (i.e., transitioning from the high achieving latent group on the pre-test to high achieving latent group on the post-test), high-low, low-high, and low-low. The third predictor was biology class level, namely CP-collaborative, CP, Honors, and Gifted.

When each predictor variable was examined in the MANOVA model, year, latent group transition pattern and biology class level were significant (p <0.05). As presented in Table 1, Wilks’ lambda was used as a measure of the proportion of variance in the dependent variables that remained unexplained as a result of the model containing the predictor variables. The proportion of explained variance is obtained by taking 1 minus Wilks’ lambda. This proportion was 15.5% (= 1 − .845) of the total variation in the repeated measures (i.e., the difference between pre-test and post-test scores) that is explained by the transition pathway. That is, about 12% of the total variation can be explained by the level of class.

Table 1.

Summary of MANOVA with year, transition pattern, and level of class predictors

Predictors Wilks’ Lambda F value Hypothesis df Error df p-value
Year .991 3.05 2 689 .048*
Transition .845 19.83 6 1,378 <.0001***
Level of Class .877 15.58 6 1,378 <.0001***
Year × Transition .992 1.38 4 1,378 .238
Transition × Level of Class .976 1.05 16 1,378 .400
Year × Level of Class .977 2.72 6 1,378 .012*
Transition × Level of Class .976 1.05 16 1,378 .083

Note.

*

p<.05,

***

p<.001.

In the MANOVA model, the interaction between year and biology class level was significant at the 0.05 level. The interaction between transition pathway and biology class level, however, was not significant. This lack of significance in transition pathway by biology class level is particularly informative. The analysis statistically confirms the result shown with standardized descriptive statistics (see Table 2) in a cross tabulation of latent transition group and introductory biology class levels. Specifically, this table shows that the students who made the transition from low to high achieving were found within all of the four biology class levels, but the proportion of these students increased in the higher level classes. A smaller proportion of Gifted students made the low to high achieving transition than the Honors students. This was due to the much higher percentage of the Gifted students who were initially in the high achieving group on the pre-test and who were retained in high achieving. This table also validates other aspects of the findings. As would be expected: (1) there are very few students in the CP-Collaborative level who were initially in the high achieving group and (2) there were no students in the Gifted sections who made the transition from high to low achieving. Represented graphically in Figure 3, the illustrations clearly show a consistent progression of pre-test and post-test scores from the lowest to the highest level of biology class level across the different transition paths. This representation also indicates that those students who were initially high achieving and subsequently moved to low achieving exhibited little change from pre- to post-test.

Table 2.

Descriptive statistics for ability by Transition Pattern and Type of Class (Year 2)

Transition CP-collaborative
(n=34)
CP
(n=87)
Honors
(n=87)
Gifted
(n=100)
N (%) Mean SD N (%) Mean SD N (%) Mean SD N (%) Mean SD
High-low
(n=7)
Pre
Post
2
(5.9)
−.25
−.12
.89
.26
3
(3.4)
−.24
−.01
.21
.19
2
(2.3)
.37
.00
.38
.09
0
(0.0)
-
-
-
-
Low-low
(n=84)
Pre
Post
19
(55.9)
−.69
−.51
.42
.44
43
(49.4)
−.78
−.40
.52
.48
14
(16.1)
−.70
.18
1.87
.26
8
(8.0)
−.63
.19
2.64
.47
Low-high
(n=167)
Pre
Post
12
(35.3)
−1.37
.30
2.67
.60
34
(39.1)
−.38
.22
.57
.32
60
(69.0)
−.32
.71
1.31
.48
61
(61.0)
.09
.96
1.06
.52
High-high
(n=50)
Pre
Post
1
(2.9)
−.03
.44
.
.
7
(8.0)
−.07
.64
.57
.40
11
(12.6)
.47
1.03
.40
.46
31
(31.0)
.38
1.15
.34
.46
n=34 n=87 n=87 n=100

Figure 3.

Figure 3

Pre- to Post-test standardized score differences broken out by latent transition and class level.

Accumulated across the four biology class levels, the magnitude of these changes provides one measure of the difference in achievement between the students grouped within each of the four latent transition pathways (see Table 3). For instance, those students who transitioned from low to high achieving exhibited the largest average gain with a standardized mean change of 0.932 standard deviations. This contrasts to the high to high achieving group with a somewhat smaller gain of 0.711. This analysis also shows that those students who were initially low achieving and were retained in low achieving did improve, making an average gain of 0.462 standard deviations.

Table 3.

Standardized scores separated by transition path

Standardized score for the initially low– achieving group by transition Pattern (N=251)

Retained in low-achieving Moved to high-achieving

Mean Std. Dev. Sample size Mean Std. Dev. Sample size Difference of moved compared to retained
Pre-test −.734 1.144 84 −.259 1.300 167 .475
Post-test −.272 .513 84 .673 .554 167 .945

Difference .462 .932
Standardized score for the initially high– achieving group by transition pattern (N=57)

Moved to low-achieving Retained in high-achieving

Mean Std. Dev. Sample size Mean Std. Dev. Sample size Difference of moved compared to retained
Pre-test −.067 .485 7 .327 .416 50 .394
Post-test .069 .265 7 1.038 .480 50 .969

Difference .136 .711

How the results obtained in Year 2 differed from those in Year 1 are presented using both multivariate statistics and descriptive statistics divided by biology class level. These results will be presented below both in terms of multivariate statistics as well as descriptive statistics divided within level of biology class. When the MANOVA was run with the simple effects of the biology class level within year, as shown in Table 4, the p values for the pre- and post-tests were significant. The standardized descriptive statistics associated with these simple effects are shown in Table 5. Across the two years of the study, the gain scores for level of class alone ranged from 0.291 for the CP-Collaborative students to 0.581 for the Honors students. The CP and Gifted students had similar gains across two years of 0.400 and 0.411 standard deviations, respectively. A pair of line graphs built from the data of this table (See Figure 4) illustrate the descriptive statistics. In general, across the two years of the study, the pre-test scores for each biology class level were similar to Year 2 with the standardized mean score for Year 1 being within approximately 0.1 to 0.15 standard deviations. The post-test mean scores as presented in the graphs provide vivid evidence of growth in achievement for both the CP and Honors students. As originally indicated by statistical significance in the MANOVA analyses for the simple effects, it is clear that for students in each of the four biology class levels, there were much larger gains during the second year, when the intervention was used. For instance, the mean post-test score for the Honors students in Year 2 was approximately equal to the post-test score of the Gifted students in Year 1, who were taught without benefit of the intervention. It is important to note that across the two years of the study all groups of students ended the Cell Unit with higher scores on the post-test. Reasons for this shift will be discussed in the Discussion of Results section below, but prior to that, we present results of how students within the latent groups responded differently to assessments both those embedded within the modules and the post-tests.

Table 4.

Simple effects for the level of class of Year

Between Factor Level of Year Pre-test Post-test
F value p-value F value p-value
Level of Class Year 1 30.89 <.0001*** 135.78 <.0001***
Year 2 16.16 <.0001*** 82.45 <.0001***

Note.

***

p<.001.

Table 5.

Standardized means and standard deviations across Class Level within Year.

Year CP Collab CP Honor Gifted

Mean SD N Mean SD N Mean SD N Mean SD N
1 Pre
Post
−.799
−.378
.752
.588
45 −.668
−.576
.553
.656
133 −.288
.061
.545
.531
112 .265
.692
.504
.499
117

Within year diff .421 .092 .349 .427

2 Pre
Post
−.886
−.174
1.632
.622
34 −.551
−.059
.580
.534
87 −.280
.650
1.352
.500
86 .120
.958
1.127
.553
100
Within year diff .712 .492 .930 .838

Across year diff .291 .400 .581 .411

Figure 4.

Figure 4

Changes from year 1 to year 2 across level of course

Assessing knowledge growth across the Cell Unit

The first part of the report on assessment within the modules will be reported using analysis of the embedded items. The forced-choice items were scored by the software while the open-ended items were scored using rubrics that were developed and validated by scientists, teachers and science educators. One problem that became evident was that the difficulty level of each embedded item was confounded with its placement within the module. In other words, although we seek to understand the growth of knowledge across the module using the embedded items, a thorough understanding will require detailed qualitative analysis wherein the nature of the students’ open-ended responses can be analyzed for changes across the modules. And yet, there is information within the quantitative analysis that illuminates distinctions of students who made the transition from low to high achieving groups as compared to those who did not. The following analysis will use only the quantitative outcomes from that scoring. Future research reports will detail qualitative analysis of the embedded items.

As was reported earlier, the LTA showed that in Year 2, 67% of the students who were initially in the low achieving latent group transitioned to the high achieving latent group based on the pre- to post-test analysis. This analysis begins with an examination of contrasts between initially low achieving students that remained in that group as compared to those making the transition to the high achieving latent group. Later, these contrasts will be used to examine differences in responses to the post-test items.

Examining transition group response differences using embedded and post-test items

Due to space limitations, all examples in this section will be drawn from the osmosis module. Student responses on three example embedded items with open-ended responses will serve to illustrate how the students that transitioned from the low to high achieving groups distinguished themselves from the students that remained in the low achieving latent group. The first example, which is the twelfth embedded item, is located near the beginning of the case study. It follows items embedded in the “seizure manual.” This item was worded as follows: “Using the sodium concentration data you collected, and what you learned from the seizure manual, [answer the following] why are free water molecules diffusing in the direction they are?” The item was scored on a three point scale, where one point was awarded for “connecting the sodium concentration data to the free water”, one point was awarded for “identifying differences in free water concentration between the blood and the brain”, and one point was awarded for “indicating that water diffused into the brain [matrix].” In Table 6, the percentages of students within the low to high achieving latent group and those that remained in the low achieving latent group are shown for each score category. Although there was a large portion of students ultimately placed in the low to high achieving latent group who scored 0 points (51.0%), far more of the students that remained in the low achieving latent group (80.3%) scored 0 points. Almost all of the other students (47.7%) who were in the low to high achieving latent group scored 1 point. Virtually all of the students that remained in the low achieving latent group (19.1%) also scored 1 point, yet there is a contrast to be made because of the magnitude of the difference (i.e., 28.5%).

Table 6.

Contrast of responses to selected embedded items based on LTA student group membership

Embedded Item Number Score range for the item Percentage of low achieving students who transitioned to high achieving in each score category Percentage of low achieving retained in low achieving in each score category
Item 12: Using the sodium concentration data you collected, and what you learned from the seizure manual, [answer the following question] why are free water molecules diffusing in the direction they are? 0
1
2
3
51.0
47.7
0.0
1.3
80.3
19.1
1.4
0.0
Item 30: Based on what you have learned, summarize the relationship between solute concentrations on opposite side of a semi-permeable membrane and the direction of movement of free water molecules. 0
1
2
3
4
5
35.0
12.9
24.3
15.7
8.6
3.6
66.7
14.0
8.8
8.8
0
1.8
Item 34: Describe how
osmosis was used to stop Clark’s seizures.
0
1
2
3
4
5
6
7
8
37.0
10.9
15.2
2.2
16.3
3.3
8.7
0.0
6.5
70.8
6.3
16.7
0.0
6.3
0.0
0.0
0.0
0.0

Moving further into the module, the contrast becomes more well-defined compared to the example above. One item near the conclusion of the module queries the students to summarize how solute concentration differences are related to water movement across a semipermeable membrane. Point values identified by the rubric were as follows: up to two points for a component of their answer connecting the solute concentration to free water concentration; up to two points for identifying the water concentration on each side of the membrane as a part of determining in which direction the net movement of water will be; and one point for mentioning concentration gradient. The results of this analysis are shown in Table 6 and demonstrate that (1) the proportion of students in the low to high achieving latent group who received points was dramatically greater than for those students who remained in the low achieving latent group, and (2) the distribution of the scores across the range of possible scores showed that the students who were in the low to high achieving latent group were represented in much greater proportion in the higher scores.

Although not entirely compelling by itself, analysis of embedded item 34 points to one additional factor, the role of stimulated recall of the subject of the case study. This item asked students to make the connection between osmosis and the reason that the treatment stopped the seizures being experienced by the calf. As in other examples, large numbers of students for both groups scored 0 points, though the percentage is nearly doubled by those students that remained in the low achieving latent group (i.e., 37% vs. 70.8%, respectively). Likewise, scores were spread across the range of possible scores much more for students in the low to high achieving latent group than for their peers in the low to low achieving latent group. The highest score categories show the most dramatic distinctions. Of those students who transitioned from the low to high achieving groups, 37% scored 3 or more points compared to just 6.3% of the students that remained in the low achieving latent group. None of those students that remained in the low achieving latent group scored 5 or more points compared with 18.5% of the students who transitioned from low to high achieving latent groups. In the Discussion of the Results (below), reasons for these differences will be considered.

The post-test responses were also a source of data for exploring the characterization of the low to high achieving group of students as compared to those that remained in the low achieving latent group. Stimulated recall related to the calf at the center of the case study is an important factor. As stated earlier, in addition to the Form A and B items on the pre- and post-tests, three additional items regarding the osmosis case study were used to assess (only on the post-test) module-specific student knowledge. One of the three Form C items does not serve a discriminating role for separating the students that remained in the low achieving latent group from those who transitioned from the low to the high achieving group. This item queries student knowledge related to a dynamic visualization from initial portion of the module and asks students to identify what happens to a red blood cell that is placed in a saline solution. Of the students that remained in the low achieving latent group, 33% responded correctly as compared to 39% of the low to high achieving students. However, responses on the two other Form C items showed a powerful capacity to differentiate the students who made the low to high achieving transition as compared to the students that remained in the low achieving latent group. Each of these assessment items referenced the calf by name and called on the students, after reading a narrative description of a situation, to match the direction of movement of water to a change in interstitial pressure. In one case the increase in matrix pressure was associated with the onset of the calf’s cerebral edema and in the other with the decrease in pressure and remedy of the condition. This water direction movement was an important aspect of the visualization within the module, but more importantly, the students’ recall of those visualizations was contextualized within the case study of the calf. The ability of these two assessment items to provide a discriminant analysis between the low to high achieving and low to low achieving groups is quite dramatic. In each case, 100% of the CP and CP-Collaborative students who made the low to high achieving transition responded with the correct answer compared to 39% and 11% correct, respectively, for the CP and CP-Collaborative students that remained in the low achieving latent group on the post-test.

Discussion of Results

This research examined how interactive case study modules featuring dynamic visualization support student learning in biology. Four major findings will be highlighted in this discussion: (1) the proportion of students as identified by LTA who transitioned to the high achieving group from the low achieving group was significantly higher when the interactive case study modules were used as an intervention; (2) students across the biology class levels were found to have made the low to high achieving transition, thus validating the efforts to create materials to support science learning for all students; (3) the consonance of the modules’ structure with the contemporary scholarship regarding supports for student knowledge growth is an important reason for the student accomplishment; and (4) the use of contextualized case studies is a primary means to support and identify student knowledge growth.

The LTA-MixRM model showed that two identifiable latent groups emerged as a product of analysis of the pre- and post-test assessment scores. These groups were labeled as low and high achieving due to their statistical characterizations. Most importantly, after experiencing the intervention, 67% of those students who had been initially grouped in the low achieving latent group had transitioned to high achieving and 84% of those students who were initially high achieving remained in that latent group.

The statistical interaction between biology class level and latent transition pathway was not significant in the MANOVA analysis. This is a particularly important finding as it supports the assertion that students in each latent transition pathway were present in all biology class levels. This was also a validating finding for the team given that our goal was to create materials that would be supportive of the biology learning of all students. Yet within this assertion, the descriptive statistics also illustrated how the relative frequency of students appearing in each of the transition pathways was appropriate and expected for each biology class level. For instance, although the CP-Collaborative students are represented across each of the four latent transition pathways, the number of students from that biology class level is small in the initially high achieving group. Given that this class level is taught as special education inclusion, this result was expected. Likewise, the numbers of students from the Gifted sections, which require high aptitude scores as entry criteria, were represented at a much higher proportion of the total in the high achieving latent group on the pre-test.

The finding that latent transition pathway was a significant main effect led to further analyses to understand how this group of learners who transitioned between low and high achieving groups could be characterized. One aspect of this characterization can be seen in pre- to post-test score changes. The students who were initially low achieving and transitioned to high achieving made the greatest mean gain in score. Complimenting this finding is the related result that the group of students who were initially low achieving were identified as having the lowest average prior knowledge of the biology content. As Cook et al. (2008), explained, students with low prior knowledge have more difficulty creating linkages between forms of representation. Based on results of this study, we believe that a large portion of these students were successful in becoming high achieving as a result of the organizations of the multiple representations within the modules. Specifically, we believe that the way the modules were constructed ameliorated many of the sources of student difficulty as described by Eilam (2013). For instance, by virtue of the iterative design process in which a version of the module was tested with students as soon as a reliably working version was available, we were able to understand how to support students who possessed a low prior knowledge. The “manuals” added to the beginning of each module as well as the medical reference dictionaries were examples of design compensations that were enacted to support this very issue. Further, we were highly aware of the problems that could arise from “contextual characteristics” and spent a great deal of time fine tuning how the case studies were illustrated and how complementary representations of the subject matter were included and sequenced. All of these efforts were directed toward the creation of materials that would support deep student learning (Chin and Brown, 2000) that would be evidenced on both the embedded items, on the post-tests as well as on other assessments (both formal and informal) used by the teachers.

Our results provide strong support for the assertion by Kang and her colleagues (2014) that the use of “contextualized phenomena is the strongest single predictor of the quality of student explanation” (p. 695). In these student-generated responses we sought to find a means to understand how the modules supported the growth of student knowledge. Our analyses demonstrated that the responses to contextualized items, within both the post-test and embedded assessments, provided the most powerful insight into which aspects of the modules were associated with the transition to the high achieving latent group. Understanding more about the group of students who made that transition became the focus of an intense effort.

This effort led the team to several realizations. First, the use of the case study approach in which a person or animal was afflicted with a medical condition created a context with which students could easily identify. Many of the students in the first class to use the osmosis manual developed a deep attachment to the calf and his medical issues. We observed this in every subsequent usage of that module. We assert that this connection between the students and the animal served to make the module more meaningful and thus more easily remembered. Second, the students who make the low to high transition are often the students who most fervently made this connection. Evidence from the analysis of the Form C items related to the osmosis case clearly point to a recognition of the case study specific knowledge as a major factor in items response differences. In two examples from those Form C items, 100% of the CP and CP-Collaborative students who make the low to high achieving transition got those two items correct, greatly outpacing the students who remained in the low achieving latent group.

Within the analysis of responses to embedded items, the quantitative distinction between the students who transitioned to high achieving as compared to those who remained in the low achieving latent group was strongly linked to the students’ ability to respond to items that posed an open-ended query. Each of the three embedded items presented in the results from the osmosis module called on students to do this. The item with the greatest ability to illuminate a distinction between the students who transitioned to high the achieving group and those who remained in the low achieving group required students to connect the process of osmosis to reasons why the treatment stopped the seizures experienced by the calf. This item brought together all aspects of the module including students’ awareness of the visualization, hypothesis making, data collections, data analysis, and application of treatments. The greatest importance of this finding arises from the item’s challenge to the students to elaborate the linkage between the scientific concept of interest – osmosis – to the real life understanding of the medical condition from which the calf was suffering. It is the accomplishment of these students who make the transition to high achieving and who are able to generate the rich explanatory accounts that the work of Kang, et. al. (2014), Ainsworth and VanLabeke (2004), Eilam (2013) and others pointed to as possible when the use of dynamic visualizations are combined with multiple representations within an inquiry learning environment.

Limitations

The greatest limitation imposed upon educational research arises from the need to implement the research within schools that have many other priorities. Although there are scholars who will argue this point, it is not prudent to attempt to create research designs that either disrupt the actions of the school day or impose restrictions on how teachers and schools make decisions about the education of children. If a researcher agrees to this assertion, then it is not always possible to have all class sections complete all aspects of an intervention or complete it in an optimal manner. In this study, the teachers and school administration were highly cooperative, but the intervention was not applied with equal fidelity to all of the participating students. As noted above, technical problems with the use of technology within the school site contributed to this issue.

Assessment of student knowledge is always an area for concern and especially when the students are told that the pre- and post-tests are not part of their graded assignments. However, the constraints of the IRB require the researchers to inform the students about this and the related requirement, on all other aspects of the project, that their cooperation and participation are voluntary. Assessment scores and the motivation for participation are, of course, impacted, but again, this research was carried out in a cooperative setting and that included the students.

The concern expressed by Alonzo and Steedle (2009) about the consistency of student responses across a number of open-ended items was very much on our mind as the students completed the embedded assessments within the modules. Our analysis suggests that the contextualized nature of those items has relegated the impact of this concern to a minimum.

Although the MANOVA demonstrated that level of class was a significant variable as linked to the score on distal assessments, we chose to group together the Gifted and Honors students, as well as the CP and CP-Collaborative students, in the final section of analysis and findings. We did this to maintain sample sizes within groups of students, who had made or failed to make the low to high achieving transition, for the purpose of comparing responses to assessment items. Without a sufficient sample size, there would be little to say about differences in response patterns. In the future, exploration of differences among responses across the four biology class levels may provide greater insight.

Conclusions

As Sadler et al. (2015), wrote, “Games designed for and implemented in classrooms necessarily become part of a broader curricular intervention…games become part of a curriculum woven through the complex environment of a classroom community” (p. 699) and this more integrated approach is a great improvement over “simpler articulations of learning as an outcome of game play” (p. 699). Studying this kind of longer term intervention is a highly complex endeavor and is subject to vagaries of the public school day. Nonetheless, it is important to try to understand realistic interventions. Therefore, over the course of two years, the introductory biology students of the same six teachers were studied with regard to their learning of a curricular unit on “the Cell.” This type of curricular unit is found in essentially all introductory biology courses and standards. Educational researchers have a powerful incentive to understand which factors of the intervention are related to the students’ higher level of achievement. As we created these immersive case study-based modules, we attempted to enact the design so that the intersection of dynamic visualizations with the students’ expression of language around the science concepts would powerfully and meaningfully impact students. We now believe that the students’ rich explanatory accounts are a product of this impact.

One of the age old issues in science education is how to encourage students to make connections between in-school learning and the lives they lead out of school and on into their adulthood. So often, student science learning that happens in a formal classroom is not seen by the learner to be applicable to any out-of-school context. Resnick (1987) stated it this way: “In school, however, symbolic activities tend to become detached from any meaningful context” (p. 15). The case study modules that were created within this project seem to have moved a step in the right direction toward resolution of this problem.

However, in this day of constant high stakes testing, any effort to enact highly creative instructional supplement materials can be met with skepticism due to the demand that all efforts in the classroom be directed toward the learning that is absolutely identifiable as related to scores on those tests. We created materials in sync with Blanchard, et al. (2010) who found that inquiry science activities can also promote the “recognizable” student achievement assessed by the high stakes tests. Evidence presented in this research report support the notion that this is also happening with the use of the modules. The learning that had happened during the modules, as recorded and evidenced by the “close” embedded assessments, was retained and displayed on the distal assessments. These results show a predictive relationship between student accomplishment on embedded items and subsequent membership in the low to high achieving transition group.

Finally, we hoped that with this project we could address the issues that Odum (1995) examined with his study of student learning of the same biological concepts addressed by our modules (i.e., osmosis, diffusion, etc.). He wrote: “an important outcome of this research study is [understanding of] the extent to which biology concepts in diffusion and osmosis are not being comprehended by students from secondary and university campuses. Based on the results of this study, biology teachers appeared not to be teaching for comprehension of diffusion and osmosis concepts, but rather for emphasizing the acquisition of facts” (p. 413). The research reported here, both the area of module development and research, provide significant reasons to believe that this project has moved the field forward.

Acknowledgments

This research has been funded by the National Institutes of Health grant #1R25RR025061, but should not be construed to represent the opinions or positions of the National Institutes of Health (NIH) nor the Science Education Partnership Awards (SEPA) program. Responsibility for the content of this document rests entirely with the authors.

References

  1. Ainsworth S. The functions of multiple representations. Computers & Education. 1999;33:131–152. [Google Scholar]
  2. Ainsworth S, VanLabeke N. Multiple forms of dynamic representation. Learning and Instruction. 2004;14:241–255. [Google Scholar]
  3. Author. (2005).
  4. Alonzo AC, Steedle JT. Developing and assessing a force and motion learning progression. Science Education. 2009;93:389–421. [Google Scholar]
  5. Andersen L, Ward TJ. Expectancy-value models for the STEM persistence plans of ninth-grade, high-ability students: A comparison between Black, Hispanic, and White students. Science Education. 2014;98(2):216–242. [Google Scholar]
  6. Blanchard MR, Southerland SA, Osborne JW, Sampson VD, Annetta LA, Granger EM. Is inquiry possible in light of accountability?: A quantitative comparison of the relative effectiveness of guided inquiry and verification laboratory instruction. Science Education. 2010;94(4):577–616. [Google Scholar]
  7. Chin C, Brown DE. Learning in science: A comparison of deep and surface approaches. Journal of Research in Science Teaching. 2000;37(2):109–138. [Google Scholar]
  8. Cho SJ, Cohen AS, Kim SH, Bottge BA. Latent transition analysis with a mixture item response theory measurement model. Applied Psychological Measurement. 2010;34:583–604. [Google Scholar]
  9. Cook MP. Visual representations in science education: The influence of prior knowledge and cognitive load theory on instructional design principles. Science Education. 2006;90:1073–1091. [Google Scholar]
  10. Cook MP, Weibe EN, Carter G. The influence of prior knowledge on viewing and interpreting graphics with macroscopic and molecular representations. Science Education. 2008;92(5):848–867. [Google Scholar]
  11. Cook TD, Campbell DT. Quasi-Experimentation: Design & Analysis Issues for Field Settings. New York: Houghton Mifflin; 1979. [Google Scholar]
  12. Donnelly DF, Linn MC, Ludvigsen S. Impacts and characteristics of computer-based science inquiry learning environments for precollege students. Review of Educational Research. 2014;84(4):572–608. [Google Scholar]
  13. Eilam B. Possible constraints of visualization in biology: Challenges in learning with multiple representations. In: Treagust DF, Tsui CY, editors. Multiple Representations in Biological Education. Dordrecht: Springer; 2013. pp. 55–74. [Google Scholar]
  14. Fisher KM, Williams KS, Lineback JE. Osmosis and diffusion conceptual assessment. CBE Life Science Education. 2011;10(4):418–429. doi: 10.1187/cbe.11-04-0038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fulmer GW, Liang LL, Liu X. Applying a force and motion learning progression over an extended time span using the force concept inventory. International Journal of Science Education. 2014;36(17):2918–2936. [Google Scholar]
  16. Furtak EM, Morrison D, Kroog H. Investigating the link between learning progressions and classroom assessment. Science Education. 2014;98:640–673. [Google Scholar]
  17. Hickey DT, Taasoobhirazi G, Cross D. Assessment as learning: Enhancing discourse, understanding, and achievement in innovative science curricula. Journal of Research in Science Teaching. 2012;49(10):1240–1270. [Google Scholar]
  18. Kang H, Thompson J, Windschitl M. Creating opportunities for students to show what they know: The role of scaffolding in assessment tasks. Science Education. 2014;98:674–704. [Google Scholar]
  19. Li F, Cohen AS, Kim SH, Cho SJ. Model selection methods for dichotomous mixture IRT models. Applied Psychological Measurement. 2009;33:353–373. [Google Scholar]
  20. McElhaney KW, Chang HY, Chiu JL, Linn MC. Evidence for effective uses of dynamic visualisations in science curriculum materials. Studies in Science Education. 2015;51(1):49–85. [Google Scholar]
  21. Malinska L, Rybska E, Sobieszczuk-Nowicka E, Adamiec M. Teaching about water relations in plant cells: An uneasy struggle. CBE-Life Sciences Education. 2016;15:1–12. doi: 10.1187/cbe.15-05-0113. ar78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Marco GL. Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement. 1977;14:139–160. [Google Scholar]
  23. Maxwell JA. Qualitative research design: An interactive approach. Los Angeles: Sage Publications Ltd; 2013. [Google Scholar]
  24. National Research Council. Preparing teachers: Building evidence for sound policy. Washington, DC: The National Academies Press; 2010. [Google Scholar]
  25. NGSS Lead States. Next Generation Science Standards: For States, By States. 2013 Downloaded from: http://www.nextgenscience.org/ September 7, 2015.
  26. Odum AL. Secondary & College Biology Students’ Misconceptions about Diffusion & Osmosis. The American Biology Teacher. 1995;57(7):409–415. [Google Scholar]
  27. Oztas F. How do high school students know diffusion and osmosis? High school students’ difficulties in understanding diffusion and osmosis. Procedia – Social and Behavioral Science. 2014;116:3679–3682. [Google Scholar]
  28. Patton MQ. Qualitative Research & Evaluation Methods. 4th. Los Angeles: Sage; 2015. [Google Scholar]
  29. Pine SM. Applications of item characteristic curve theory to the problem of test bias. In: Weiss DJ, editor. Applications of computerized adaptive testing: Proceedings of a symposium presented at the 18th annual convention of the Military Testing Association. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program; 1977. pp. 37–43. (Research Rep. No. 77-1). [Google Scholar]
  30. Resnick L. Learning in school and out. Educational Researcher. 1987;16(9):13–20. 54. [Google Scholar]
  31. Ruiz-Primo MA, Li Min, Wills K, Giamellaro M, Lan MC, Mason H, Sands D. Developing and evaluating instructionally sensitive assessments in science. Journal of Research in Science Teaching. 2012;49(6):691–712. [Google Scholar]
  32. Ryoo K, Linn MC. Can dynamic visualizations improve middle school students’ understanding of energy in photosynthesis? Journal of Research in Science Teaching. 2012;49(2):218–243. [Google Scholar]
  33. Sadler TD, Romine WL, Menon D, Ferdig RE, Annetta L. Learning biology through innovative curricula: A comparison of game and nongame-based approaches. Science Education. 2015;99(4):696–720. [Google Scholar]
  34. Schwartz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464. [Google Scholar]
  35. Shen J, Linn MC. A technology-enhanced unit of modeling static electricity: Integrating scientific explanations and everyday observations. International Journal of Science Education. 2011;33(12):159–1623. [Google Scholar]
  36. Smetana LK, Bell RL. Computer Simulations to Support Science Instruction and Learning: A critical review of the literature. International Journal of Science Education. 2012;34(9):1337–1370. [Google Scholar]
  37. Waight N, Liu X, Gregorius R, Smith E, Park M. Teacher Conceptions and Approaches Associated with an Immersive Instructional Implementation of Computer-Based Models and Assessment in a Secondary Chemistry Classroom. International Journal of Science Education. 2014;36(3):467–505. [Google Scholar]
  38. Yarden H, Yarden A. Learning using dynamic and static visualizations: Students’comprehension, prior knowledge and conceptual status of a biotechnological method. Research in Science Education. 2010;40:375–402. [Google Scholar]
  39. Yarden H, Yarden A. Learning and teaching biotechnological methods using animations. In: Treagust DF, Tsui C-Y, editors. Multiple representations in biological education. Amsterdam: Springer Science + Business Media; 2013. pp. 93–108. [Google Scholar]

RESOURCES