Skip to main content
CBE Life Sciences Education logoLink to CBE Life Sciences Education
. 2026 Spring 1;25(1):ar10. doi: 10.1187/cbe.25-08-0172

VENOMventure, an Immersive Escape-Style Game, Teaches Families the Foundations of Phylogenetics

Anastasia Thanukos †,*, Teresa MacDonald , Claire Quimby §, Lisa D White
Editor: Nicole C Kelp,
PMCID: PMC12974113  PMID: 41770893

Abstract

Phylogenetics is an essential component of science literacy, but research demonstrates the challenges of interpreting evolutionary trees. While a variety of classroom interventions have been shown effective in this field, the potential of educational games is underexplored. We investigate the efficacy of an escape-style game for teaching tree-thinking skills. VENOMventure immerses English- and Spanish-speaking families with kids ages 8 years and up in a biomedical mystery during a 30–45-min game. Participants (N = 466) at two natural history museums, one urban library, and one rural library played VENOMventure and took part in research that assessed learning through pre-test, post-test, and 4-week follow-up test. Players of all ages, from rural and urban settings, demonstrated significant learning gains, which persisted for at least 4 weeks. Groups with child-led or balanced puzzle-solving styles showed larger overall learning gains than groups with adult-led play. Furthermore, the experience was perceived as fun and memorable, and led to extended interactions with the science concepts from the game. This research provides insight into the variety of interventions that support phylogenetics learning and represents a rare case in which an escape-style game purported to be educational has generated robust evidence supporting that claim.

INTRODUCTION

Phylogenetics, the study of the evolutionary relationships among living things, is critical to an understanding of modern biology (Baum et al., 2005; Donoghue, 2005; Brooks, 2010; Wiley and Lieberman, 2011). Evolutionary relationships, which are represented as evolutionary trees, or phylogenies, provide insight into the diversity and unity of life at a range of scales, from variation in single nucleotide polymorphisms to patterns of diversification and extinction, as well as the processes that underlie these patterns (Glor, 2010; Stadler, 2013; Leaché and Oaks, 2017; Morlon et al., 2024). Phylogenetics is increasingly relevant to everyday life because of its applications in forensics, agriculture, conservation, medicine, and the policies that govern these (Scaduto et al., 2010; Thanukos, 2010; Naxerova and Jain, 2015; Pellens and Grandcolas, 2016; Tucker et al., 2016; Zhukova et al., 2017; Janies, 2019; Pekar et al., 2022; Menalled et al., 2023). Its diverse applications also mean that an understanding of phylogenetics can be an important aspect of STEM career preparedness, particularly in bioinformatics (Jungck and Weisstein, 2013; Kovarik et al., 2013). Understanding phylogenies may also contribute to the acceptance of evolutionary theory (Walter et al., 2013; Gibson and Hoefnagels, 2015). Today, evolutionary trees are found in textbooks (Catley and Novick, 2008), the media (Wade, 2017), popular science books (Tweet, 2016), and exhibits at museums and other informal science institutions (ISIs; MacDonald and Wiley, 2012), reflecting their importance in communicating biology.

Research shows that basic concepts in phylogenetics, such as common ancestry, can be difficult for all age groups to grasp (Evans et al., 2010; Catley et al., 2012; Evans et al., 2012; Spiegel et al., 2012; Dees et al., 2017), and investigation of learners’ ability to interpret evolutionary trees, a skill set known as “tree thinking” (Baum et al., 2005), often reveals misconceptions, even among those taking upper-level college biology courses (Meir et al., 2007; Omland et al., 2008; Sandvik, 2008; Halverson et al., 2011; Phillips et al., 2012; Kummer et al., 2016). Nevertheless, research has shown that children as young as 7 years old are capable of understanding basic phylogenetic concepts (Ainsworth and Saffer, 2013), and learners’ grasp of phylogenetics can be improved through appropriately designed interventions, including museum exhibits, classroom instruction, simulations, and table-top interactives (Perry et al., 2008; Giusti, 2012; Schneider et al., 2012; Eddy et al., 2013; Horn et al., 2016; Daniel et al., 2024).

Educational games engage learners and foster understanding (Bochennek et al., 2007; Burney et al., 2010; National Research Council, 2011; Porcello et al., 2017), but they are an underexplored approach to teaching phylogenetics. Games have been shown to increase knowledge acquisition, conceptual understanding, and motivation (Connolly et al., 2012). Much of the research on educational games has focused on digital games (Connolly et al., 2012; Clark et al., 2016; Hainey et al., 2016; Lamb et al., 2018; Mayer, 2019), but physical games such as board games and card games also have educational value (Bayeck, 2020; Sousa et al., 2023). While some tabletop games have been developed to build students’ tree-thinking skills, they have not, to our knowledge, been formally evaluated for effectiveness (Gibson and Cooper, 2017; Marcy, 2023).

Compared with the extensive research on educational digital games, limited educational research has been performed on immersive, physical games, such as escape games. Escape games bring multiple players into a hypothetical storyline and a real game environment, where they interact with objects and each other to solve puzzles and achieve a goal (Wiemker et al., 2015; Nicholson, 2016). Escape rooms are typically a stand-alone business in which teams solve a series of puzzles in a specially constructed space. However, they can also be created with a set of materials and props to be used in any space. Escape games are mainly designed as entertainment, but recent versions have experimented with the educational possibilities of this game format.

Content-rich interactions among ISI visitor groups, as immersive games can initiate, have been shown to support STEM learning in informal environments (Andre et al., 2017), and several elements of such games have the potential to foster learning. At the most basic level, escape games provide motivation for players to actively engage with puzzles in a problem-solving mode. Active learning strategies have been found to be more effective at teaching STEM concepts than passive strategies at the college level (Freeman et al., 2014); best practices for K-12 STEM teaching emphasize active student engagement with “meaningful investigation, problem solving, and/or design experiences” (National Academies of Sciences, Engineering, and Medicine, 2025) and hands-on/minds-on activities at the elementary level (National Academies of Sciences, Engineering, and Medicine, 2023); and informal STEM learning environments like museums have long emphasized interactivity as a key design feature supporting learning and engagement (National Research Council, 2009).

While an educational escape game is a unique format for active engagement, it bears important similarities to other established and effective pedagogical approaches. For example, clicker case studies engage students with a narrative interspersed with prompts, which request that students express an opinion or intuition, apply a concept, or solve a problem (Beatty et al., 2006; Caldwell, 2007; Serrada-Sotil et al., 2025). Similarly, an escape game engages participants with a narrative, and players must solve a series of problems as they progress in the narrative. Peer instruction (Crouch and Mazur, 2001) and cooperative or collaborative learning techniques (O'Donnell and Hmelo-Silver, 2013) like think-pair-share (Lyman, 1981; Tanner, 2009) offer learners the opportunity to try to answer a problem based on their previous learning and intuition, and then revise their thinking based on discussion with another learner who may bring different intuitions and information to the table. Similarly, in an escape game, different players bring different intuitions and information to each puzzle, and correctly solving a puzzle often involves players sharing that knowledge with others and then revising their ideas about the solution. Inquiry-based teaching (Pedaste et al., 2015) engages students with a question that they investigate with the tools, resources, and research methods available, often by using a testing procedure to collect data. Similarly, in an escape game, players are presented with puzzles that they must solve using the tools and resources available in the game, often testing different possible interpretations of the puzzle and incorporating that immediate feedback on their way to the solution. Escape games may also leverage worked examples, scaffolding, and productive struggle, all of which have analogues in the classroom (Atkinson et al., 2000; Hmelo-Silver et al., 2007; Young et al., 2024). Such embedded opportunities to support learning have been recognized by those who study educational escape rooms (Veldkamp et al., 2020; Rawlinson and Whitton, 2024; Yachin and Barak, 2024).

The dimensionality of the immersive game (i.e., its engagement of a physical space and use of physical props) may also contribute to learning. This is consistent with evidence that digital games that allow exploration of a three-dimensional realistic universe (a mere representation of the actual three-dimensional environment leveraged by an immersive game) may be more effective than two-dimensional games (Lamb et al., 2018), and that virtual reality-based digital games, which have many similarities to physical immersive games, can be effective teaching tools (Merchant et al., 2014). This may be especially important for learning in the domain of phylogenetics, as evidence suggests that engaging with phylogenies kinesthetically and physically (not just visually), as is encouraged by an immersive game with touchable props, fosters understanding (Halverson, 2010; Laurentino et al., 2024).

Finally, the narrative of an escape game offers an opportunity to motivate learning about phylogenetics, as it allows the real-world applications of the scientific concepts to be communicated. Previous research has found that learners are more interested in science when its implications are connected to their lives (Hulleman and Harackiewicz, 2009; Ainley and Ainley, 2011; National Foundation for Educational Research [NFER], 2011), and this is thought to be an important part of increasing motivation to learn about evolution (Thanukos, 2010; Nelson, 2012; Infanti and Wiles, 2015; Pobinar, 2016).

The research that has been performed on educational escape-style games demonstrates the need for better documentation of learning to establish their educational value (Fotaris and Mastoras, 2019; Veldkamp et al., 2020; Lathwesen and Belova, 2021). Evaluation of escape games’ use in formal learning environments often focuses on engagement, interest, and motivation, with less attention to learning (e.g., Borrego et al., 2017; Glavas and Stascik, 2017). For example, a recent metanalysis examined 39 studies on educational escape rooms in formal settings, and only three actually assessed learning outcomes through pre/post-tests (Veldkamp et al., 2020). Another review focusing on STEM-oriented games identified a similarly small number of escape games with direct assessment of learning outcomes (Lathwesen and Belova, 2021). In fact, escape games in formal learning environments are often used as an assessment of learning that takes places through more traditional means (e.g., as a capstone activity at the end of a unit) and may not be designed to build understanding through gameplay (Veldkamp et al., 2020). It should also be noted that these games are often implemented in classrooms using basic props unrelated to the game narrative (e.g., a standard set of locked boxes) due to constraints on resources. Because of this, classroom-based escape games may not be particularly exemplary of the genre of immersive games, which often deploy custom-built props in a game-specific environment, all designed to support the narrative.

ISIs, such as museums, often have the resources and facilities to create more immersive escape game experiences, but research on the educational value of these ISI-based games is similarly limited. Only a few escape-style games for informal settings aim to develop STEM knowledge as a primary goal. These include BranchOut's outdoor environmental education escape games (BranchOut, 2020), NISE's Moon Adventure game kit (Lussenhop et al., 2023), Enigma Bio at the National Museum of Natural Sciences in Madrid (a hybrid digital/immersive game on climate change and biodiversity; Gonzalez-Calero et al., 2024), Infestation at the Science Museum of Minnesota (a narrative-driven interactive theater, game, and puzzle room experience; Pryor, 2019; Pryor et al., 2021), and the Natural History Mystery backpack escape game at the University of Kansas Natural History Museum (MacDonald and Thanukos, 2023). Documentation of learning outcomes varies across these projects. Moon Adventure’s evaluation focused on perceptions of ISI staff and self-reports of knowledge level and learning (Lussenhop et al., 2023). ENIGMA Bio’s evaluation used pre/post testing, among other lines of evidence, but attribution of learning outcomes to the game was confounded by the inclusion of a 30-min didactic lecture on the content (Gonzalez-Calero et al., 2024). The available evaluation for Infestation focuses only on game development and description (Pryor et al., 2021). Of particular relevance, Natural History Mystery was designed to teach about evolutionary relationships. Its evaluation included a post-only test and suggested that the game was an effective pedagogical intervention, but was limited by a small sample size (MacDonald and Thanukos, 2023). All these games show promise for teaching STEM concepts, but the evidence still falls short of delineating exactly what and how much visitors are learning through them.

Based on the literature described above, we hypothesized that an appropriately designed immersive, escape-style game can teach fundamental STEM concepts related to phylogenetics and tree thinking in informal settings. To investigate, we designed VENOMventure, an immersive, bilingual (English/Spanish) game that pops up at rural and urban public libraries and natural history museums. It engages families with youth ages 8 years and up in a narrative- and puzzle-based adventure laden with foundational concepts in phylogenetics. The game takes 20–45 min to play. Importantly, the game involves no didactic introduction or pre-reading, allowing examination of the effects of gameplay itself on learning in this domain. We collected data relevant to our hypothesis through quantitative pre/post and longitudinal data from four geographically disparate sites. A fundamental question addressed by this research is whether this sort of immersive game is effective for a wide range of participants. Hence, we use the data to investigate select subsidiary hypotheses (described below) regarding learning among different subgroups of participants, with particular focus on subgroup characteristics relevant to the design of a traveling immersive game for families (i.e., characteristics of geographic site, age group, group size, collaborative game play style, and previous escape game experience).

METHODS

Study design, recruitment, and participants

The study collected data at four sites: one urban library in California, one rural library in California, one urban natural history museum in California, and one urban natural history museum in Kansas. Participants were recruited in a variety of ways, including through the listservs, webpages, and social media pages of host sites, as well as in local media outlets that aggregate family activities. Our recruitment materials targeted families. Furthermore, because of our requirements for minors to participate in the research, all groups that included a minor also included at least one immediate family member or guardian of that minor. Hence, we shall sometimes refer to participating groups as families, since the vast majority included at least one child and their guardian or parent, as described below. The host sites and research team made efforts to recruit Spanish-speaking participants (e.g., by distributing Spanish-language recruitment flyers at a bilingual elementary school).

Data collection took place in two phases. Phase 1 recruitment sought groups with at least one adult and at least one child in the target age range of 9–13 years. Groups were selected to represent a range of group sizes and backgrounds. Phase 1 used a mixed methods approach that incorporated surveys, interviews, observations, and a reflection banner where participants could write their ideas in an open-ended format (Figure 1). A total of 51 groups and 174 individuals took part in Phase 1. Of those, 48 groups included at least one child in the target age range. Phase 1 participants also received a follow-up survey by email and text message to study the long-term effects of the game. This survey was distributed one month after participation in VENOMventure. In total, 66 individuals, ∼ 39% of the sample, responded to the follow-up survey. Phase 2 recruitment targeted groups with at least one minor but did not require the participation of a minor. Groups were not selected to represent a range of group sizes and backgrounds. Phase 2 data collection included only the pre/post survey, administered by host site staff when the research team could not be present (Figure 1). A total of 84 groups and 292 individuals took part in Phase 2. A total of 75 of these groups included at least one minor. Overall, the study included a total of 466 individuals, 217 adults, and 249 youth. Race/ethnicity reporting was optional, and 4.1% of groups did not report this information. Of those who did, 2% reported as American Indian or Alaska Native, 20% as Asian, 2% as Black or African American, 13% as Hispanic, 1% as Native Hawaiian or other Pacific Islander, 67% as White, and 3% as other. All protocols and instruments received previous approval from the Institutional Review Board of the University of California, Berkeley, under protocol number 2019-02-11879. Results associated with qualitative data and affective components will be reported elsewhere (manuscript in preparation), but associated methods are summarized here to capture the full participant experience.

FIGURE 1:

FIGURE 1:

Summary of study design, instruments, and sample sizes.

The intervention: VENOMventure

The VENOMventure experience begins with two simple warm-up activities to set the stage for participants. The first is a play on the classic TWISTER game, using a large mat that depicts an evolutionary tree of venomous snakes with shared ancestors labeled. A spinner provides instructions that require players to read into the evolutionary relationships, for example, to identify the ancestor of two species on the tree and place their left hand on that ancestor. The activity does not provide didactic instruction on the meaning of the evolutionary tree but prepares participants for the trees they will encounter once the escape game begins. The second warm-up activity is an oral explanation of how escape games in general work and a demonstration of the combination locks used in the game. Participants then solve a sample puzzle involving basic mathematical facts and practice opening a lock.

The VENOMventure escape game takes place within an 11 × 22-foot inflatable “room” that resembles a mobile lab (Figure 2). The lab has an open top and is divided into two interior sections. Teams of two to six players enter the room and watch a short video that introduces their mission. A fantastical venomous plant is causing the community to break out in itchy purple splotches, and players must determine which antivenom will save the day by reconstructing the research of plant scientist Leticia Lopez. When the video stops, the countdown timer begins, and the team is left on their own to find and solve the game's puzzles, though they are provided with a briefcase of hints for use as needed. An attendant is available to ensure that the technology within the room works as intended.

FIGURE 2:

FIGURE 2:

VENOMventure inflatable during playtesting at California Academy of Sciences.

The VENOMventure escape game is designed to teach basic tree-thinking skills (Baum et al., 2005). These skills do not require memorizing a set of facts or even understanding how evolutionary trees are reconstructed. Instead, tree-thinking involves understanding what features of a tree correspond to different aspects of the evolutionary process of lineage splitting, as well as the overall patterns this process generates. Because interpreting trees follows a relatively simple formal logic, it lends itself to instantiation in puzzles through which one can deduce this internal logic.

The VENOMventure escape game consists of seven tree-thinking puzzles (Figure 3; Quimby, 2024). Three of the puzzles are accessible at the start of the game and can be solved in any order. Solving these unlocks a sequence of four more puzzles. Solving each puzzle involves demonstrating the application of a target concept or tree-reading skill. As a team progresses through the puzzles, the required skills become more challenging and/or are required in combination with one another. The puzzles were designed through formative testing with youth ages 9–13 years to be challenging. Within the game, all evolutionary trees are presented in accordance with design principals informed by learning research on evolutionary tree interpretation in informal settings (Novick et al., 2014).

FIGURE 3:

FIGURE 3:

Summary of VENOMventure puzzle structure and learning objectives.

Though the puzzles in the game are related to common misconceptions about evolutionary trees, the game does not target individual tree-thinking misconceptions with particular puzzles. Instead, the game starts by encouraging participants to consider the most basic concepts about evolutionary trees (e.g., the ways that they represent time and evolutionary history) and then aims to integrate these into a more sophisticated understanding of trees. The puzzles are fully described elsewhere (Quimby, 2024), but here we provide a summary:

  • The first puzzle most players encounter, the sea monster puzzle (Puzzle 1), presents an evolutionary tree of cephalopod sea monsters, with branches marked by letters, and asks players to trace the lineage of a particular terminal taxon back in time, yielding a 5-letter code. Understanding the direction of time on a tree is foundational to the tree-thinking skills described elsewhere (Novick and Catley, 2017) and is closely related to common misinterpretations of evolutionary trees, such as the idea that some terminal taxa are older than others (Baum et al., 2005; Meir et al., 2007; Gregory, 2008; Dodick, 2009).

  • The dragon tree puzzle (Puzzle 2) presents an evolutionary tree of dragons with colored branches and asks participants to identify the branches that represent the shared ancestors of different groups of terminal taxa, yielding a three-color code. Recognizing that branches on a phylogenetic tree represent shared ancestral lineages is foundational to all tree-thinking.

  • The fantastical horse tree puzzle (Puzzle 3) presents players with a three-dimensional tree sculpture with rotatable branches representing the relationships among fantastical horses. The corresponding puzzle shows three pairs of fantastical horse evolutionary trees. One tree of each pair is topologically identical to the three-dimensional tree and can be identified by rotating branches on the three-dimensional tree. The puzzle asks players to identify which of the two-dimensional trees is “the same” as the three-dimensional tree for each of the pairs, yielding a three-number code. This puzzle addresses the “rotation” tree-thinking skill (Novick and Catley, 2017).

  • The venomous plant trait matrix puzzle (Puzzle 4) asks players to complete a trait matrix of four dichotomous traits for four species of plant. The traits include two visually assessed traits (wart color and leaf shape), one experimentally assessed behavioral trait (dietary preference), and one genetic trait (assessed with a barcode scanner). The trait matrix provides the foundation for reasoning about shared inherited characteristics, which supports the “identify characters” and “identify taxa” tree-thinking skills (Novick and Catley, 2017).

  • The traits on venomous plant tree puzzle (Puzzle 5) presents players with a tree of four venomous plant species, with the traits of terminal taxa and their shared ancestor indicated visually. Players must then place four trait-change placards (wooden triangular blocks that depict the change from the ancestral to descendent trait) on the branches in positions that explain the distribution of traits among terminal taxa. In this puzzle players practice 1) identifying a trait that multiple taxa share due to inheritance from their most recent common ancestor, and 2) identifying the taxa that share a particular character state (Novick and Catley, 2017).

  • The traits on venom tree puzzle (Puzzle 6) presents players with the tree of a larger clade of venomous plants with the venom type of the clade's shared ancestor marked and trait changes marked on branches. Players must deduce from the trait changes, which of the five venom types each terminal taxon has inherited, yielding one four-letter and one five-letter code. This requires the same set of tree-thinking skills as the prior puzzle, but players are asked to infer inherited character states instead of character state changes.

  • The venom history deduction puzzle (Puzzle 7) refers to the tree of the larger clade of venomous plants used in the last puzzle and asks players to trace the history of a particular lineage back in time and identify its venom type at different times in its history. This puzzle targets another tree-thinking skill: identifying the order in which character states arose on a particular evolutionary path (Novick and Catley, 2017).

The puzzles described above are designed to build upon one another conceptually in a way that supports the narrative of the game, not to target discrete concepts or misconceptions. For example, the notion that time flows root to tip on phylogeny is first addressed by Puzzle 1 and is then leveraged again in Puzzles 2, 5, 6, and 7. The idea that nontip branches on a tree represent shared ancestral lineages is introduced in Puzzle 2 and reinforced by Puzzles 5 and 6. Examining the distribution of shared characters is first introduced in Puzzle 4, and is then connected to shared ancestry by Puzzles 5 and 6. In short, as is appropriate to its narrative, the game is designed to help players build and integrate conceptual understanding of evolutionary trees, not to break this understanding down into component skills and elements, assessing them discretely.

After the game, all participants were invited to take a picture to celebrate their victory. Phase 2 participants were then invited to participate in a debriefing period, shown an evolutionary tree from the game, and asked the following questions as a group:

  1. What do you think the game was about?

  2. What do you think this diagram is showing?

  3. What is happening at the splitting points?

  4. Where are the oldest plants on this diagram? Where are the plants that are alive today?

  5. What are the triangles (trait changes) showing?

They were also given the opportunity to ask the game facilitator questions. Because these questions were part of the Phase 1 interview, Phase 1 participants did not have a separate debriefing period. The debriefing period is an opportunity for metacognition and reflection on learning that occurred during the game and is often viewed as a design element that supports learning (Sanchez and Plumettaz-Sieber, 2019; Veldkamp et al., 2022).

The VENOMventure experience was designed to be memorable enough to encourage extended interactions about the game and its science concepts among players for weeks following the game, with the intention that this would reinforce tree-thinking skills and enthusiasm for the subject matter fostered in the game. To encourage these interactions, youth who participated were offered a free comic book to take home afterwards, available in both English and Spanish (Frankel et al., 2023). The comic book extends the fictional venomous plant story from the game while introducing readers to five real-world researchers who work at the intersection of evolution and medicine. The book also includes tree-thinking puzzles, as well as art and craft activities designed to help readers envision themselves as future scientists and expose them to medical applications of evolutionary research. Additionally, researchers sent a follow-up email that provided links to free, family-friendly evolution activities and websites to one adult from each group in the day after playing the game.

Instruments and other data sources

Figure 1 summarizes the instruments and other data sources for this study. Each is described below, and the instruments and protocols are included the Supplemental Materials. Several established instruments exist to assess tree thinking skills and misconceptions (refer to Blacquiere & Hoese, 2016; Kummer et al., 2019; Blacquiere et al., 2020; Jenkins et al., 2022). Unfortunately, these instruments were not appropriate for our purposes because they were developed with and for undergraduates and so assume reading skills and vocabulary our younger participants do not possess, focus on tree-thinking skills that are not addressed by our intervention (e.g., comparing the closeness of evolutionary relationships), and/or are quite long. The surveys and interview protocols we developed for this study use simple language to be accessible to our target age group (ages 9–13 years). Furthermore, our survey instruments were designed to take less than 5 min to complete to help younger participants maintain focus throughout the entire survey and because Phase 1 participants were often tired after the game and research experience (refer to Figure 1). Hence, the instruments we use here are not designed to assess particular misconceptions or to serve as a concept inventory of tree-thinking skills. Instead, they assess a subset of the tree-thinking concepts addressed by the game and serve as a very simple indicator of participants’ grasp of the most basic elements of tree thinking. In addition, the burden on host sites to administer surveys to obtain the large and geographically diverse sample we sought meant that we could not deploy multiple versions of pre or post surveys and were constrained to a single survey version for each data collection timepoint.

The knowledge items on the survey address the direction of time on a tree, the representation of ancestors and descendants, the notion of shared ancestry, and the inheritance of traits along branches of the tree. These are concepts addressed by Puzzles 1, 2, 4, 5, and 6 in the game (Figure 3). Items were initially developed by subject matter experts and piloted through two rounds of informal testing with adult and youth playtest participants during game development at one rural and one urban site. During these pilots, a subset of participants was asked to explain their answers orally and feedback was solicited from participants about items that they found confusing. After each round of pilot testing, minor changes were made to items to clarify intent and align item performance with intended learning outcomes, and the instrument was rereviewed and edited by a subject matter expert.

Pre-survey. The survey included five knowledge items that required participants to mark on an evolutionary tree where particular ancestors/descendants are located or where evolutionary changes occurred (“tree-annotation knowledge items” in Figure 1; refer to Figure 4 for a sample item). In addition, participants could check a box marked “no guess.” Participants also rated their confidence in their responses to those items and wrote a response to the question, “What kinds of information do you think these diagrams show?” (the “written-response item” in Figure 1).

FIGURE 4:

FIGURE 4:

Two sample tree-annotation knowledge items from the pre-survey.

Post-survey. This survey began with six attitudinal and self-report Likert questions about the game experience, including one for adults only. These addressed enjoyment, collaboration, learning, and interest. Then participants completed five isomorphic tree-annotation knowledge items that mirrored the pre-survey items but that used different taxa/traits and reflected the trees horizontally (each time showing one tree with its basal branch extending to the right and one with its basal branch extending to the left) in comparison to the pre-survey to avoid participants simply recalling their answers to the pre-survey and to balance subtle biases that might be caused by tree topology. Confidence and written-response items were identical to those on the pre-survey.

Follow-up survey. This survey was sent to Phase 1 participants one month after they played the game. It included the knowledge quiz a final time, again with isomorphic items. This survey also asked about longer-term impacts of the game and if participants had used the supporting resources and comic book.

Demographics survey. This survey requested one adult per group to report the gender (open-ended), age (open-ended), and race/ethnicity (check boxes, with more than one check allowed to indicate multiracial/ethnic identities) of each person in the group. Participants were verbally reminded that completing this form was optional, but most groups opted to fill it out.

Observation protocol. During Phase 1 data collection, a researcher used an observation form to document: 1) the balance of adult- versus child-led activity in solving each puzzle, 2) participants’ use of vocabulary tied to the science concepts in the game, 3) the use of hints or answers to solve puzzles, and 4) basic game statistics like whether hints were taken and the time it took to complete the game. The balance of adult- versus child-led activity was determined by watching who initiated interaction with the puzzle first, who handled and manipulated the puzzle props the most, and who did the most talking about the puzzle concepts and their interpretation of the puzzle. For example, in some cases, adults in the room would stand back and watch their children solve the puzzle on their own, without offering any advice or opinion. This would be classified as “child-led,” but if the adult made one or two comments or touched the puzzle components briefly, the observer would classify the interaction as “mostly child-led.” Observers also took notes on interaction and conversation related to learning, such as moments when participants traced their fingers along the tree diagrams, explained concepts to one another, or asked scaffolding questions to help another player solve a puzzle.

Additional qualitative research activities. When gameplay was done, Phase 1 participants engaged in a set of qualitative research activities that included writing on a reflection banner in response to written prompts, verbally responding to a question about whether participants had played an escape game previous to this experience, and participating in an interview. Qualitative results from the banner and interview are not reported here.

Data scoring and coding

The five tree-annotation knowledge items on the pre-, post-, and follow-up surveys were each coded for correctness on a scale of 0 to 1 point by a content expert. Partial credit was given for answers demonstrating partial understanding. For example, participants received full credit (1 point) for the first item shown in Figure 4 if they circled the names, images, or tree tips corresponding with the dog, bear, cat, and rat, whether individually or together. They received 0.5 points for circling the correct answer and marking “no guess.” Participants received 0 points for circling extra animals, not circling all the correct animals, circling nothing, or marking “no guess” alone. A full description of this coding rubric is included in the Supplemental Materials. However, because nearly all responses were scored as either fully correct or fully incorrect, these scores were rounded to 0 or 1 for analyses and visual presentation. The sum of these five rounded values represents participants’ knowledge quiz score, with a maximum value of 5. Pairwise differences were calculated between pre-, post-, and follow-up knowledge quiz scores to yield change scores (with a negative number indicating a lower score from one timepoint to the next and positive change score suggesting a knowledge gain).

The written-response knowledge item was also coded for correctness by a content expert. Full credit (1 point) was given to answers that referred to the relationships conveyed in the diagram among ancestors and descendants, although participants did not have to use these exact terms. Partial credit was given to responses that mentioned evolution, lineage, or ancestry, but didn't refer to the ways different organisms on the diagram are connected. A full description of this coding rubric is included in the Supplemental Materials. The pairwise difference was calculated between pre and post scores to yield a change score (with a negative number indicating a lower score from one timepoint to the next and positive change score suggesting a knowledge gain).

Likert scale survey items where participants had circled more than one answer (e.g., “Not sure” and “Yes”) were recoded using a conservative approach, selecting the lesser level of agreement. Participants’ ages were coded as either within the target age range (9–13 years old), younger children (6–8 years old), older children (14–17 years old), and adults (18+ years old). When statistical tests found that older children and adult responses did not show meaningful differences, these two groups were combined.

Observational notes on the extent to which children versus adults led the puzzle solving were transformed into a whole-number value between -2 (adult led) and +2 (child led) depending on the balance of puzzle solving among participants. These scores were added together across the seven puzzles to get the puzzle-solving style score, with possible values between −14 and 14. For statistical tests, puzzle-solving style scores were further categorized into an “adult led” category (scores less than −2), “balanced” category (scores between −2 and +2), and “child led” category (scores greater than +2).

Statistical analyses

For comparisons of pre- to post- and follow-up survey scores, survey data were cleaned by removing unmatched pre/post surveys, the one instance where a parent and child completed a survey together instead of individually, and survey sets for which participant age data were unreported, leaving 446 data sets. The inter-item consistency among the tree-annotation knowledge items on the pre-, post-, and follow-up surveys were calculated separately, yielding Kuder-Richardson Formula 20 scores (KR20) of 0.746, 0.721, and 0.705 respectively, indicating a reasonable level of internal reliability for the knowledge quiz portion of the instrument. Game experience survey items missing data were excluded from the analysis of that item.

Because the tree-thinking quiz score and knowledge change score data are largely categorical and not a continuous, normal distribution (e.g., most participants scored exactly 0, 1, 2, 3, 4, or 5 on the tree-annotation items on the quiz and distributions of scores were often skewed), violating assumptions for parametric contrasts, we examine them using nonparametric statistics. The related-samples Wilcoxon signed-rank test is used to compare differences between two sets of scores when individual score pairs are related (e.g., because they are pre- and post-scores from the same participant). This test assumes a symmetric distribution of differences between data point pairs, so if visual examination of a box plot for our data for a planned analysis revealed an asymmetric distribution, the less powerful sign test was used instead. The Mann Whitney U test is used to compare two distributions of data when the groups are independent from one another (e.g., when comparing the knowledge change scores of participants who played at urban versus rural sites). When visual inspection indicated that the two distributions have similar shapes, this statistic is interpreted as indicating whether there is a difference in the median of the two distributions and is otherwise interpreted as indicating an unspecified difference in the underlying distributions. A nonsignificant outcome in this test indicates insufficient evidence of any difference in the two underlying distributions, whether median-related or dispersion-related. An independent-samples Kruskal-Wallis test is used when comparing three or more independent groups (e.g., when comparing knowledge change scores of participants among the four data collection sites), requires similar assumptions to the Mann Whitney U test, and is interpreted in a similar way. Effect sizes for Mann-Whitney U tests and Wilcoxon signed-rank tests are reported as the rank-biserial correlation r. Effect sizes for Kruskal-Wallis tests are reported as eta squared, which is transformed to Cohen's f for comparison to standards for effect size cut-offs (Cohen, 1992). These tests rely on signs, signed ranks, and sums of ranks; however, because these descriptive statistics are unfamiliar to most, we instead report medians (Mdn) and interquartile ranges (IQR), as more easily graspable nonparametric descriptors of the data, along with frequency distributions wherever possible. We report parametric descriptive statistics—means and standard deviations (SD)—only for the time-to-completion of the game.

In addition to examining evidence for or against the existence of a learning gain, our data present an opportunity to compare the size of any learning gain among groups with different traits. However, statistically comparing learning gains among groups here presents a challenge because individuals begin with different levels of knowledge (as indicated by variation among pre-test scores) and so have different amounts of learning that they could potentially demonstrate through their post-test. Some participants are likely to experience a ceiling effect, that is, they have reached the maximum score of our instrument (the ceiling), beyond which, it cannot measure changes in knowledge. Some address this by examining normalized gains, defined as the ratio of the actual gain (post score minus pre score) to the maximum possible gain (maximum score minus pre score; Hake, 1998). However, for our comparisons, this transformation would give unfair weight to certain learning gains. For example, the normalized gain of a participant with a pre-test score of 2 and post-test score of 4 would be 66.7%, while the normalized gain of a participant with a pre-test score of 1 and a post-test score of 3 would be only 50%, which doesn't fairly reflect the significant learning of the latter participant. Both participants have scored 2 more points on the post than on the pre, and neither has maxed out the scale. In addition, using normalized gains for individual participants means eliminating data from those with perfect scores on the pre-test and those who scored lower on the post-test than on the pre-test (refer to Miller et al., 2010). An ideal approach to addressing this challenge would involve examining raw score changes only among participants with the same pre-test score; however, that would necessitate multiple contrasts with very low power for each comparison. Instead, we approach this challenge by first examining the pre-test scores of the groups for which we wish to compare learning gains; if they are not significantly different, then we expect the effects on the contrast of different starting levels of knowledge to be minimal and perform the contrast on the full set of raw scores.

In cases where the pre-test scores vary significantly among groups (i.e., when evidence suggests that our analysis of learning gains is likely to be impacted by varying levels of starting knowledge), we examine learning gains in two different ways. First, we consider only data from participants who did not reach the maximum score on our instrument and so did not experience a ceiling effect; that is, we eliminate data from participants who scored a 5 on the pre-test (and so have nothing to gain) and data from participants who scored a 5 on the post-test (and so have reached the limit of the amount of learning that they could have demonstrated) before performing the contrast with the raw scores. This analysis mainly captures data from participants with lower starting levels of knowledge and/or smaller gain scores. Second, focusing on participants with larger gains, we use a chi-square test to examine whether individuals who achieved learning gains of 3 or more points from pre to post (the “largest gainers”), and so were more likely to have been excluded from the preceding analysis, were randomly distributed among groups, first eliminating participants who could not have achieved a gain of 3+ points because their pretest score was greater than 2.

In addition, we report normalized gains at the group level (Bao, 2006), that is:

graphic file with name cbe-25-ar10-e001.jpg

which allows us to include data from individuals who contribute to ceiling effects and from individuals with negative score changes. Note, however, that this metric does more heavily weight learning gains that occur among groups with high levels of starting knowledge. We also report normalized loss at the group level using the corresponding formula when applicable.

RESULTS

Game outcomes and engagement

Data suggest that VENOMventure successfully engaged a wide range of participants. All but two of the 51 observed groups in Phase 1 successfully completed the puzzles within the 45-min time limit. Among these groups, the average time to complete the game was 28 min (SD = 7.76). Thirty-one (61%) of the observed groups completed the game without using the provided hints or answers.

After the experience, participants of all ages (N = 452) were overwhelmingly likely to agree with the statements “I had a lot of fun playing this game” (98% responding “YES!” or “Yes”) and “I'd be excited to play a game like this (with science puzzles) again” (96% responding “YES!” or “Yes”; Figure 5).

FIGURE 5:

FIGURE 5:

Distribution of responses to Likert items on fun and engagement. (a) Count of participants responding NO!, No, Not sure, Yes, or YES! when asked to rate their agreement with the statement “I had a lot of fun playing this game,” broken out by whether the participant was a minor child or an adult. (b) Count of participants responding NO!, No, Not sure, Yes, or YES! when asked to rate their agreement with the statement “I'd be excited to play a game like this (with science puzzles) again,” broken out by whether the participant was a minor child or an adult.

Learning

Several lines of evidence suggest that VENOMventure supports learning about phylogenetics, including self-reported survey items and pre/post quiz items. After the game, participants of all ages were likely to agree that the game helped them learn about evolutionary trees (85% responding “YES!” or “Yes”, N = 446), and adults were likely to perceive the game as being of educational value for the children in their group (98% responding “YES!” or “Yes”, N = 194; Figure 6).

FIGURE 6:

FIGURE 6:

Distribution of responses to Likert items on perceptions of learning and educational value. (a) Count of participants responding NO!, No, Not sure, Yes, or YES! when asked to rate their agreement with the statement “This game helped me understand evolutionary trees,” broken out by whether the participant was a minor child or an adult. (b) Count of adult participants responding NO!, No, Not sure, Yes, or YES! when asked to rate their agreement with the statement “I think this was an educational experience for the kids in my group.”

Analysis of the knowledge quiz on participants’ pre/post surveys confirmed these perceptions. A related-samples Wilcoxon signed-rank test found a significant difference of large effect size between pre- (Mdn = 2, IQR = 1–4) and post-test scores (Mdn = 4, IQR = 2–5) on the tree-annotation items in the direction consistent with learning (N = 446, Z = −12.230, p<0.001, r = 0.579); nearly every individual with a different score from pre to post had a positive change consistent with learning (Figure 7a). From pre to post, the number of participants receiving a score of 0 on the knowledge quiz was more than halved and the number receiving a perfect score of 5 more than doubled (Figure 7b). The overall normalized gain for the tree-annotation items on the knowledge quiz was 36.4%.

FIGURE 7:

FIGURE 7:

Pre to post tree-thinking knowledge quiz score comparison. (a) Count of participants with different pre and post score combinations. On the 5-item quiz, 5 represents a perfect score. Score combinations representing learning (higher post than pre score) appear on the half of the graph closer to the viewer. The score combinations running diagonally from 0,0 to 5,5 represent participants with no score change. The very few score combinations on the opposite side of that diagonal represent participants with learning losses, as indicated by the instrument. (b) Distribution of tree-thinking quiz scores across pre- and post-game tests. As reported in the text, pre and post scores are significantly different.

To check whether the additional qualitative research activities of Phase 1 participants (answering post-game interview questions and completing the post-game banner reflection) had an effect on their learning, we compared the size of the knowledge quiz score change (where 1 indicates a gain of 1 point on the 5-point scale, and -1 indicates a 1-point decrease in score from pre to post) across Phase 1 (Mdn = 1, IQR = 0–2) and Phase 2 participants (Mdn = 1, IQR = 0–2) using an independent samples Mann-Whitney U test, first confirming that pretest score did not vary significantly by phase (refer to Supplemental Table S1 for pre-test comparison statistics throughout). The difference in raw pre/post quiz score change between the two groups was not significant (N = 446, U = 23310, p = 0.907), indicating that the two groups exhibited similar distributions of changes from pre to post and that participating in qualitative research activities did not meaningfully impact learning gains. The normalized gain was 36.7% for Phase 1 participants and 36.2% for Phase 2 participants.

Comparing pre- to post-game surveys, participants also showed improvement in their responses to the written-response question, “What kinds of information do you think these diagrams show?”, on which scores could range between 0 and 1. A related-samples Wilcoxon signed-rank test found a significant difference of small effect size between pre- (Mdn = 0.75, IQR = 0–0.75) and post-test scores (Mdn = 0.75, IQR = 0.75–0.75) in the direction consistent with learning (N = 325, Z = −5.248, p<0.001, r = 0.291). The normalized gain for the written-response item was 27.1%.

Results from the 4-week follow-up survey suggested that Phase 1 participants retained most of their learning gains over the longer term (Figure 8). For Phase 1 participants, box plots revealed the knowledge quiz score change distribution to be asymmetric, so sign tests were used to examine differences between scores. As expected, the difference between pre (Mdn = 3, IQR = 2–4) and post (Mdn = 4, IQR = 3–5) remained significant (N = 66, Z = −5.092, p<0.001). There was also a significant difference between pre and 4-week-follow-up knowledge quiz scores (Mdn = 4, IQR = 2.75–5), suggesting that participants retained knowledge gained during the game at least four weeks later (N = 66, Z = −2.586, p = 0.010). As might be expected, there was a slight downward shift in scores between the post-test immediately after the game and the 4-week follow-up, but this did not reach statistical significance at this sample size (N = 66, Z = −1.327, p = 0.185). For these Phase 1 participants, the normalized gain was 48.3% for the pre/post comparison and 30.3% for the pre/follow-up comparison, and the average loss from post-test to follow-up in comparison to the average amount that could have been lost (normalized loss) was 10.2%.

FIGURE 8:

FIGURE 8:

Tree-thinking quiz scores across pre-, post-, and follow-up tests among Phase 1 participants who took the longitudinal survey. On the 5-item quiz, 5 represents a perfect score.

The VENOMventure program was designed to inspire extended interactions about the science concepts in the game. To see if those post-game interactions might have fostered retention of the STEM concepts learned during the game, we examined participants’ responses to a multiple-choice item in the longitudinal follow-up survey asking about the behaviors and activities participants engaged in after playing the game. Participants reported a wide variety of activities after the game, with most participants doing at least one follow-up activity (Figure 9). We classified activities as science-concept-related or -unrelated (Figure 9) and classified participants as having either high post-game engagement (reporting two or more activities) or low post-game engagement (reporting zero or one post-game activity), for all activities and for science-concept-related activities specifically. We then confirmed that, in each of these cases, post-test score did not significantly vary across the engagement levels (Supplemental Table S1) before performing independent samples Mann-Whitney U tests, which found no difference in the size of post/follow-up raw change scores across the high and low engagement groups for all activities (N = 66, Z = −0.197, p = 0.844), and for science-concept-related activities specifically (N = 66, Z = −0.734, p = 0.463). The normalized loss was 12.5% for the low engagement group, 9.8% for the high engagement group, 8.7% for the low science-concept-related engagement group, and 12.6% for the high science-concept-related engagement group. These results are consistent with the idea that retention of knowledge gains is a function of the game experience itself, not the follow-up activities.

FIGURE 9:

FIGURE 9:

Percent of participants (N = 66) who reported engaging in various related activities after the game. Stars indicate those classified as science-concept-related activities.

Learning by site and rural/urban setting

The game was formatively tested with rural and urban audiences at museums and libraries, and so we hypothesized that the learning gains would be of similar sizes across all four sites. The pre-test scores did not significantly vary by site (Supplemental Table S1), so we examined raw pre/post knowledge quiz change scores across all four sites (one rural library, one urban library, and two urban museums) using an independent-samples Kruskal-Wallis test, finding no significant difference among them (H(3) = 2.177, p = 0.537). The normalized gain was 41.6% for the rural library, 40.5% for the urban library, 29.2% for one urban museum, and 36.4% for the other urban museum. The pre-test scores also did not vary by library versus museum setting, rural versus urban setting, or rural versus urban library (Supplemental Table S1), so we examined raw pre/post knowledge quiz change scores in these contrasts and found no significant differences among them (Table 1). The normalized gain was 41.0% for libraries, 33.4% for museums, and 35.3% for urban sites.

TABLE 1:

Independent samples Mann-Whitney U test statistics for comparisons of knowledge quiz change scores across different types of site.

Comparison N Z p
Libraries vs. museums 446 −1.157 .247
Rural vs. urban sites 446 −1.313 .189
Rural library vs. urban library 179 −0.871 .384

Learning by age group

Because the game was designed to engage youth ages 9–13 years (the “target group”), we hypothesized that this group would show the largest learning gains. To examine this, we compared quiz scores across three groups of participants: the target group, those younger, and those older (Figure 10). First, we sought to establish that each individual age group exhibited learning. The distribution of pre/post knowledge quiz score changes was asymmetric for younger children and those older than the target group, so we used sign tests to examine the significance of pre/post differences for each group. These tests showed that the knowledge quiz score increased from pre to post in each individual age group (Table 2). In addition, because the distribution of pre/post knowledge quiz score changes was symmetric for the target group, we performed a related-samples Wilcoxon signed-rank test on the data for this group in order to obtain an effect size, and found that it was large (N = 157, Z = −7.031, p < 0.001, r = 0.561). The normalized gain was 32.4% for the target group, 23.8% for those younger, and 45.3% for those older.

FIGURE 10:

FIGURE 10:

Tree-thinking quiz scores across pre- and post-game tests by age group. On the 5-item quiz, 5 represents a perfect score.

TABLE 2:

Difference in pre/post knowledge quiz scores by age group through sign tests.

Age group N Pre median scorea Pre IQ rangea,b Post median scorea Post IQ rangeab Z p
Younger children (8 years and younger) 53 1 0–2 2 1–3 −4.603 <0.001
Target group (9–13 years) 157 2 1–4 4 2–4 −7.035 <0.001
Older children and adults (14+ years) 236 3 2–4 4 3–5 −8.952 <0.001

aOn the 5-item quiz, 5 represents a perfect score.

bIQ range indicates the interquartile range.

Next, we compared the size of learning gains across the age groups. An independent-samples Kruskal-Wallis test found that, as one might expect, the pre knowledge quiz score distributions varied by age group, with the oldest age group having higher pre scores with a left-skewed tail and the youngest age group having lower pre scores with a right-skewed tail, an effect of medium size (H[2] = 59.098, p < 0.001, η2 = 0.129; Supplemental Table S1). Hence, we eliminated participants whose scores could contribute to a ceiling effect before examining the raw knowledge quiz change scores with a Kruskal-Wallis test (Table 3). This analysis found that score change among those remaining did not differ across the age groups (H[2] = 2.665, p = 0.264): all groups showed a similar degree of improvement (Table 3).1 The distribution of change scores are similarly shaped among these groups, so this indicates no significant difference in the median raw change score across the age groups, when participants who could contribute to a ceiling effect are eliminated.

TABLE 3:

Pre/post knowledge quiz score changes by age group, excluding participants with perfect scores on the pre or post.a

Descriptive statistics after elimination of potential ceiling effect contributors
Age group N Number scoring 5 on pre Number scoring 5 on post N Median score changeb IQ rangec
Younger children (8 years and younger) 53 1 1 51 1 0–2
Target group (9–13 years) 157 10 37 117 0 0–2
Older children and adults (14+ years) 236 54 101 124 1 0–2

aNo significant difference among the knowledge quiz change scores across the three age groups was detected.

bKnowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and −1 indicates a 1-point decrease in score from pre to post.

cIQ range indicates the interquartile range.

However, recognizing that older participants were more likely to be eliminated from the previous analysis, we further investigated how age relates to learning gains by instead considering individuals who increased their score by 3 or more points (the “largest gains” group). Many of these participants reached the ceiling of the instrument (a score of 5) and so were eliminated by the prior analysis. Are the “largest gainers” unusually likely to come from a particular age group? Among the participants who could potentially have increased their score by 3+ points (those with a pretest score of 2 or less), the largest gainers were nonrandomly distributed across the three age groups (χ2[2, N = 232] = 10.109, p = 0.006), with the largest gainers underrepresented among younger children and overrepresented among older children and adults (Supplemental Table S2).

Learning by player group size

Player group size ranged from two to six (Figure 11). We hypothesized that larger groups would have smaller learning gains, since we expect larger groups to dilute direct engagement with the puzzles and content across more people. However, another plausible hypothesis is that larger groups would have larger learning gains if, for example, larger groups bring more shareable knowledge of evolution to the collaborative process. To examine these possibilities, we compared knowledge quiz change scores across player group size, first confirming that pretest score did not vary significantly by group size (Supplemental Table S1). An independent-samples Kruskal-Wallis tests found that there was no significant difference in the raw change scores on the knowledge quiz across the group sizes (H[4] = 4.811, p = 0.307; Figure 12). We note, however, that our data show a nonsignificant trend where larger groups exhibited smaller learning gains (Table 4 and Figure 12), and this is also apparent in the normalized gains: the normalized gain was 44.0% for two-person groups, 37.3% for three-person groups, 37.5% for four-person groups, 31.7% for five-person groups, and 15.4% for six-person groups.

FIGURE 11:

FIGURE 11:

Distribution of group sizes across all games in Phases 1 and 2.

FIGURE 12:

FIGURE 12:

Change in tree-thinking quiz score from pre to post across individuals with different player group sizes. Knowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and -1 indicates a 1-point decrease in score from pre to post. Only participants with matched pre and post scores are included, so the number of participants in each panel may not be a multiple of the group size.

TABLE 4:

Pre/post knowledge quiz score changes by group size.a

Group size N Median score changeb IQ range
2 50 1 0–2
3 136 1 0–2
4 262 1 0–2
5 83 1 0–2
6 15 0 −1 to 1

aNo significant difference in the change scores on the knowledge quiz across the group sizes was detected.

bKnowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and −1 indicates a 1-point decrease in score from pre to post.

Learning by collaborative puzzle solving style

Our puzzle solving style measure, as noted above, could theoretically range between −14 (entirely adult driven) and +14 (entirely child driven). The distribution of scores on this measure for Phase 1 suggests that in most groups, adults and children contribute relatively equally to solving (Figure 13). Because children were often observed taking the lead in solving puzzles and explaining the puzzle to an adult, we hypothesized that individuals in child-led groups would show larger learning gains that those in balanced or adult-led groups. To examine this, we compared knowledge quiz change scores across puzzle solving style, categorizing groups as adult-led (if they scored −2.5 or lower on our scale), balanced (−2 to +2), or child-led (+2.5 or greater), first confirming that pretest score did not vary significantly by solving style (Supplemental Table S1). An independent-samples Kruskal-Wallis tests found a significant difference of small effect size in the distribution of raw change scores on the knowledge quiz across puzzle-solving style (H[2] = 9.506, p = 0.009, η2 = 0.045; Figure 14) with child-led and balanced solving style groups having distributions that appear to include more gains of large size in comparison to adult-led groups (Dunn's Test Bonferroni adjusted p = 0.041 and 0.008 respectively; Figure 14; Table 5). The normalized gain was 14.0% for individuals in adult-led groups, 47.1% for individuals in balanced groups, and 37.8% for individuals in child-led groups.

FIGURE 13:

FIGURE 13:

Distribution of collaborative puzzle solving styles across the player groups in Phase 1.

FIGURE 14:

FIGURE 14:

Change in tree-thinking quiz score from pre to post across individuals in Phase 1 groups with different child/adult gameplay balance. Knowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and -1 indicates a 1-point decrease in score from pre to post.

TABLE 5:

Pre/post knowledge quiz score changes by collaborative puzzle solving style.

Group puzzle solving style N Median score changec IQ ranged
Adult-leda,b 36 0 0–1
Balanceda 65 1 0–2.5
Child-ledb 69 1 0–2

aDunn's Test Bonferroni adjusted p=0.041

bDunn's Test Bonferroni adjusted p=0.008

cKnowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and −1 indicates a 1-point decrease in score from pre to post.

dIQ range indicates the interquartile range.

To examine this further, we considered the possibility that the balance of gameplay is meaningfully related to the learning of different aged players. To investigate, we examined knowledge quiz change scores across gameplay balance and age, first confirming that pretest score did not vary significantly by gameplay style either among adults or among children (Supplemental Table S1). Independent-samples Kruskal-Wallis tests found no significant difference in the raw change scores on the quiz across gameplay style among children (H[2] = 3.292, p = 0.193), but did find a significant difference of small effect size among adults (H[2] = 6.207, p = 0.045, η2 = 0.055) with adults in adult-led groups having a narrower distribution of gains toward the lower end of the change scale compared with adults in balanced-play groups (Dunn's Test Bonferroni adjusted p = 0.045; Figure 15; Table 6). Among children, the normalized gain was 18.6% for individuals in adult-led groups, 40.4% for individuals in balanced groups, and 35.2% for individuals in child-led groups. Among adults, the normalized gain was 3.7% for individuals in adult-led groups, 56.3% for individuals in balanced groups, and 43.3% for individuals in child-led groups. These findings suggest many interesting possibilities for how child/adult dynamics can influence learning in an escape game.

FIGURE 15:

FIGURE 15:

Change in tree-thinking quiz score from pre to post across Phase 1 individuals in groups with different child/adult gameplay balance and ages. Knowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and -1 indicates a 1-point decrease in score from pre to post.

TABLE 6:

Pre/post knowledge quiz score changes by age (adult vs. child) and collaborative puzzle solving style.

Adult or child Group puzzle solving style N Median score changeb IQ rangec
Adult Adult-leda 18 0 −0.25 to 1
Balanceda 34 1 0–2.25
Child-led 27 1 0–2
Child Adult-led 18 0 −0.25 to 1.25
Balanced 31 1 0–3
Child-led 42 1 0–2

aDunn's Test Bonferroni adjusted p=0.045

bKnowledge quiz score change is calculated by subtracting the pre score from the post, so, for example, 1 indicates a gain of 1 point on the 5-point scale, and −1 indicates a 1-point decrease in score from pre to post.

cIQ range indicates the interquartile range.

Learning by previous escape room experience

Of the 171 participants who responded to the inquiry about previous game experience, 47 had played an escape game before (in person or virtual) and 124 had not. We hypothesized that previous escape room experience might correspond with larger learning gains, since players with previous experience might be less distracted by surface features of this style of game and better able to focus on the scientific content. To examine this, we compared quiz scores across participants who had and had not done an escape room before (Figure 16), first confirming that the knowledge quiz score increased from pre to post in both groups with sign tests (Table 7). A Mann-Whitney U test found that the pre knowledge quiz score distributions varied by previous escape room experience, with the experienced group having higher pre scores with a left-skewed tail and the no experience group having lower pre scores with a flat distribution, an effect of small size (N = 167, Z = −3.303, p < 0.001, r = 0.256; Supplemental Table S1). Hence, we eliminated participants whose scores could contribute to a ceiling effect before examining the raw knowledge quiz change scores with a Mann-Whitney U test, which found that there was no difference in the quiz change scores across the two groups (N = 103, Z = −0.735, p = 0.462): both groups’ scores have similar distributions. However, recognizing that those with previous experience were more likely to be eliminated from the preceding analysis, we further investigated how escape room experience relates to learning gains by instead considering individuals who increased their score by 3 or more points (the “largest gains” group). Among the participants who could potentially have increased their score by 3+ points (those with a pretest score of 2 or less), there was insufficient evidence of the largest gainers being nonrandomly distributed across the two experience groups (χ2[1, N = 92] = 2.066, p = 0.151). The normalized gain was 34.1% for the group with no previous escape game experience and 48.3% for those with escape game experience.

FIGURE 16:

FIGURE 16:

Tree-thinking quiz scores across pre- and post-game tests by experience group. On the 5-item quiz, 5 represents a perfect score.

TABLE 7:

Difference in pre/post knowledge quiz scores by prior escape game experience through sign tests.

Experience group N Pre median scorea Pre IQ rangea,b Post median scorea Post IQ rangea,b Z p
No previous escape game experience 120 2 1–4 4 2–5 −5.926 <0.001
Previous escape game experience 47 3 2–5 5 4–5 −3.714 <0.001

aOn the 5-item quiz, 5 represents a perfect score.

bIQ range indicates the interquartile range.

DISCUSSION

Our results show that VENOMventure is a fun, engaging experience that leads to significant learning gains regarding phylogenetics for a diverse group of players, ages six through adult, in museum and library settings. Furthermore, these learning gains largely persist at least 4 weeks after the game. This conclusion is supported by evidence from pre, post, and longitudinal surveys described here (see also Quimby (2024) for qualitative data on this issue). Our findings are consistent with literature suggesting that basic concepts in phylogenetics are accessible to children as young as 7 years old (Ainsworth and Saffer, 2013) and add to the list of interventions that have been shown to improve understanding in an area rife with misconceptions (Perry et al., 2008; Giusti, 2012; Schneider et al., 2012; Eddy et al., 2013; Horn et al., 2016; Daniel et al., 2024). To our knowledge, this is the first direct evidence of a non-classroom-based game supporting learning about phylogenetics.

From our perspective, the finding that the game is overwhelmingly perceived as “a lot of fun” is not a mere bonus but is intertwined with the documented learning outcomes. The game was designed through formative testing to offer science problems at a level that would motivate learners to engage in productive struggle (Young et al., 2024). VENOMventure invites players to grapple with problems that they are capable solving, but only with significant effort, the affordances provided by the clues in the room, and/or the knowledge of others in their group. Players are rewarded for their efforts with fun surprises, advances in the storyline, and of course, the satisfaction of having overcome an obstacle that at first seemed insurmountable. This aligns with evaluation of another educational escape room that found a correlation between student engagement and self-perceived learning (Lopez-Pernas et al., 2019b) and one that found a correlation between knowledge gain and appreciation of the gain (Veldkamp et al., 2022). We propose that a game design that feels less fun to players would also fail to motivate them to engage in productive struggle in a free choice learning environment, resulting in diminished STEM learning.

Our data also meaningfully extend the literature on educational escape games (Pryor, 2019; Pryor et al., 2021; Lussenhop et al., 2023; MacDonald and Thanukos, 2023; Gonzalez-Calero et al., 2024) by providing strong evidence of STEM learning through gameplay alone. Our evidence goes beyond self-reports and qualitative observations, stems from a robust sample size of participants at multiple sites, and takes the form of clear, quantitative evidence of improved understanding. This stands in contrast to most current research on this topic. One recent review found that most publications on educational escape games “describe specific game scenarios with little, if any, evidence on their effectiveness” (Lathwesen and Belova, 2021). Another recent review (Veldkamp et al., 2020) identified only three out of 39 studies of educational escape rooms that used pre/post-tests to assess learning.

The size of the learning gains documented here for VENOMventure, based on a very simple and short instrument, appears to be within the range of that documented for other educational escape through pre/post-tests, as measured by the games’ respective instruments. Comparing pre- to post-game, VENOMventure quiz score distributions moved from a median of 40% correct to a median of 80% correct, and most of this gain was retained 4 weeks later. Of the three studies identified by Veldkamp et al. (2020) as using pre/post-tests to assess learning, only one showed a significant gain in knowledge associated with gameplay. In this study, test performance improved from a mean of 56% correct to 81% correct (Eukel et al., 2017). A few other studies of educational escape rooms for formal environments have also documented learning gains of similar magnitude through pre/post-tests: mean of 33% correct to 54% correct (Lopez-Pernas et al., 2019a), median of 57% correct to 71% correct (Mystakidis et al., 2019), mean of 65% correct to 75% correct (Lin et al., 2017), median of 50% correct to 83.3% correct (Caldas et al., 2019), mean of 41% correct to 78% correct (Veldkamp et al., 2022), and mean of 57% correct to 88% correct, with a follow-up survey after one month yielding 80% correct (Berthod et al., 2019). The one museum-based educational escape game evaluation that used a pre/post design that we identified did not find a significant learning gain from gameplay alone (Gonzalez-Calero et al., 2024).

We anticipated that our target age group would show the largest learning gains after playing VENOMventure because the game was formatively tested with this age group. However, we found that, without correcting for pretest score, players of all ages learned roughly the same amount. However, we also found that individuals with the largest learning gains were disproportionately older children and adults and that younger children were underrepresented among the largest gainers. This is consistent with the idea that ceiling effects are shaping our outcomes, and that among those with similar levels of starting knowledge, older children and adults tend to learn more and younger children tend to learn less in comparison to our target group. This contrasts with another study of an educational escape room in which students that started off with lower levels of knowledge learned more during the activity (Veldkamp et al., 2022), though it is possible that those results were also shaped by ceiling effects.

While the interaction we detected was not anticipated, in retrospect, it is not surprising. The large number of older children/adults with perfect scores on the pre-test suggests that this age group generally has the cognitive readiness to grasp the tree-reading skills addressed by the instrument (ancestor/descendant relationships, direction of time, shared ancestry, and trait inheritance) when appropriately scaffolded, that is, these tree-reading skills are fully within this age group's zone of proximal development (Vygotsky, 1978). However, for some in this age group, VENOMventure may represent their first opportunity to grapple with these representations, which some evidence suggests may be given little attention in high school and college-level biology coursework. For example, the high-school-level Next Generation Science Standards focus on natural selection and only obliquely reference the existence of evolutionary relationships among species as “common ancestry” (NGSS Lead States, 2013). And while evolutionary trees are commonly found in middle school through undergraduate biology textbooks, these volumes offer little guidance for interpreting the diagrams (Catley and Novick, 2008; Machová, 2021). Hence, older participants playing VENOMventure may be prepared to understand a lot about the diagrams in the game but simply have not yet had a reason to learn about them. If that is the case, we would expect the experience to generate relatively large knowledge gains from pre- to post-test for this age group when these individuals start off with low levels of knowledge. Younger age groups may require more supports or time to understand all the concepts addressed by the game, and a few concepts may be out of reach for some younger players. Nevertheless, we assert (and our formative testing suggests) that designing the game around concepts that are unfamiliar to many players is a key part of designing a game that players of all ages perceive as “a lot of fun” and joyfully collaborate on.

We also note that some of the more advanced concepts addressed by puzzles in the game, as well as some of the more simplistic concepts, could not be included in the pre/post-quizzes because of the time constraint that logistical considerations imposed on the instrument. With a more comprehensive instrument, we might have seen a different distribution of learning gains among the age groups.

Our analysis did not detect a significant effect of group size on learning gain size with this sample. However, it seems likely for a deterioration of individual learning to occur at the largest group sizes because, at some point, additional players will be unable to meaningfully engage with the puzzles; for example, each player on a 10-player team would simply not be able to physically participate in solving the fantastical horse tree puzzle together because they could not all reach or see the artifacts at the same time. As noted, by Veldkamp et al. (2020), “A group size of 4–6 players seems most suitable for immersion, participation and group communication during game play.” In fact, we see evidence that even the upper end of that scale (six-player groups) may be suboptimal in the few very large groups that played VENOMventure and would expect to see this reach significance if more of these large groups had participated in the study. From a game-design perspective, it is notable that we didn't find evidence for a “just right” group size that strongly optimizes learning. Groups of two, three, four, and five all performed similarly. It would be illuminating to include singletons in future studies to begin to disentangle the relative contributions of knowledge sharing and puzzle scaffolding to learning gains.

Our analysis found a main effect of puzzle-solving style on learning gain size, with individuals in child-led and balanced solving style groups learning more than individuals in adult-led groups. We also found an interaction in which adults in adult-led groups had a different distribution of gains than adults in balanced-play groups, which appears to be narrower and more towards the low end of the gain scale. Our data cannot tell us if these relationships are causal (e.g., if child-centered or balanced play causes larger learning gains, or if families naturally inclined toward child-led or cooperative play happen to also be primed to take advantage of the learning opportunities embedded in the game in ways that families inclined towards adult-led play are not). Nor do our data illuminate why different families exhibited different solving styles. Nevertheless, the patterns in the data spark many interesting questions to be explored in future research. Are there characteristics of different family configurations (e.g., of ages or previous knowledge levels) or perhaps different attitudes toward learning that are associated with different collaborative gameplay styles? Are behaviors that might differentially foster learning associated with different collaborative gameplay styles? Can targeted pre-game prompts or elements of the game environment shape the collaborative gameplay styles and behaviors that groups exhibit?

In addition, the findings described above align in interesting ways with research on learning through guided play. Guided play is play initiated by adults but controlled by children, who are empowered within the play environment to make decisions about what to do next and how to respond as the play situation and environment evolve (Weisberg et al., 2015). Guided discovery and guided play have been shown to enhance learning more than explicit instruction or unassisted discovery across a range of age groups (Alfieri et al., 2011; Skene et al., 2022). We speculate that adults in the child-led and balanced play groups are doing more of this scaffolding (and that this benefits group learning overall) than are adults in the adult-led groups, who are instead focused on directing the activity and are more concerned with the team's success than the process of exploration. This is consistent with the main effect of collaborative puzzle solving style we found. Growing out of the literature on guided play, we further speculate that children in adult-led groups experience less pride in discovery, autonomy over their learning, and engagement with the material (Weisberg et al., 2016), though they do seem to learn a comparable amount to children in groups with other solving styles. Future work could undertake a more detailed behavioral analysis to determine what sort of scaffolding behaviors individuals in groups with different solving styles exhibit and which of these are most closely related to learning and engagement. Future work could also examine the finding that adults in adult-led groups have different learning outcomes and the explanation for this. Are these adults less open to learning because they already perceive themselves to “know it?” By taking the lead, are they missing out on opportunities to learn from other members of their group? Finally, we are interested to see what collaborative and scaffolding behaviors might arise in all-youth teams that play the game, and how this relates to learning.

Learning gains from the game also appear to be consistent across players who have and have not had previous experience with escape games, though there were many a priori reasons to think that this variable might impact learning gains. From a project-design perspective, this finding is reassuring. We wanted all players to be able to learn from and have fun with the game. Because commercial escape rooms are costly, not present in all communities, and often designed for adults, VENOMventure represented most participants’ first escape room experience. Through formative testing, we designed the game to be friendly to first-time players by including an introduction to escape games, general strategies for playing, and guidance on the game's locks. We also crafted puzzles so that the main challenge of each puzzle was a phylogenetics concept and did not use common puzzle types seen in commercial escape rooms (e.g., cooperative action puzzles, riddles and dexterity puzzles). These strategies appear to have been successful.

VENOMventure appears to support STEM learning among a wide range of participants, so perhaps it is not a coincidence that its design corresponds with several of Veldkamp and colleagues's (2020) recommendations for designing educational escape rooms. These include close alignment of learning goals with puzzles and game mechanics, a debriefing period, and avoiding designing puzzles around red herrings not intrinsically tied to the educational goals of the game. We anticipate that future research on VENOMventure and other educational escape games will help disentangle which design elements most contribute to learning, and this work has already begun in formal learning settings (Veldkamp et al., 2022).

One important question is, to what extent do the potential educational and affective benefits of an escape game stem from the physical nature of the game and its devoted space. One study (Schneider et al., 2012) compared a scenario-based, problematized, digital tabletop interface designed to teach phylogenetics with the same experience instantiated on paper. They found that the seamless, digital version of the game fostered collaborative learning better than the paper version, leading us to wonder how the outcomes might have changed with a three-dimensional, physically embodied format. Physical interactions may be particularly important when it comes to building tree-thinking skills. Laurentino et al. (2024) reported that physically interacting with an evolutionary tree yielded significantly better tree-reading performance than a condition which only allowed visual observations. VENOMventure provides families with a private STEM learning experience within the public space of a museum or library, and in this space, learners have a high degree of agency to interact with the physical environment as they choose. They are fully immersed in a game environment in which they are the only protagonists. They can touch the representations in the game, approach puzzles in the order that suits them, and move props around, making physical connections between artifacts in different parts of the space. These bodily and agential interactions are potentially powerful parts of the game experience that can contribute to learning and cognition (Georgiou and Ioannou, 2019), but they also limit the game's throughput and pose scheduling challenges for the institution hosting the game. Could similar outcomes be achieved without the devoted game space, with a linear progression of puzzles, or with a digital experience? Future work should illuminate this question.

While this research was robust in comparison to much of the research on educational escape rooms, it has several limitations. Most relevant to our findings regarding learning, the instrument used to gather data on participants’ understanding of evolutionary trees was necessarily simple, including just five tree-annotation items and one written-response item addressing knowledge, in addition to items on perceived learning. An ideal data collection scenario would have leveraged a longer, established instrument (e.g., the BETTSI; Jenkins et al., 2022) and would have incorporated multiple pre/post/follow-up survey versions to balance potential order and tree effects. This would have allowed us to better account for effects related to survey design features, fostered comparison to other tree-thinking literature, and might have allowed us to detect more subtle shifts and effects related to learning. However, obtaining a large sample size across multiple, geographically diverse, informal science education venues, our primary goal here, required a very short instrument appropriate for participants of all ages with a single version at each data collection timepoint, to reduce the burden of survey administration on host sites. Despite its limitations, our instruments consistently detected shifts consistent with learning. Having now collected evidence that the game supports learning in a broad range of environments and locations and for a diverse set of participants of many ages and experience levels, we are better positioned to examine this learning in future studies with more subtle and longer instruments in an environment where we can deploy and track multiple versions of the instrument.

Other concerns involve sufficient recruitment from different demographics and for different portions of the research. VENOMventure was developed as a bilingual English/Spanish experience. However, despite targeted recruitment efforts, we were unable to include enough bilingual or primarily Spanish-speaking participants to investigate how well this game serves the Spanish-speaking population. In addition, just 66 people participated in our follow-up, longitudinal survey. With a larger sample for this portion of the research, we might have been able to detect subtler effects regarding the durability of knowledge gains. Furthermore, the players that self-selected into participating in this phase of the research could be nonrepresentative of the larger group. For example, those who opted into the follow-up survey may have felt particularly engaged with the game and so may have been more likely to retain knowledge than a typical participant. Addressing this limitation is methodologically challenging.

In addition, our investigation addressed several pertinent factors that might plausibly affect learning and that are closely related to the design of a pop-up escape game for families at rural and urban locations: level of knowledge about phylogenetics as indicated by pretest score, characteristics of geographic site, age group, group size, collaborative game play style, and previous escape game experience. However, there are a host of other factors less closely related to the design of this program that might also factor into learning outcomes. Investigating such factors (e.g., education level, socio-economic status, gender, etc.) was beyond the scope of this study but could be addressed by future work.

Finally, we note that our focus in this study was on conducting a robust investigation of whether an educational escape game appropriate for a free-choice learning environment, like a museum or library, can teach through gameplay alone. Having shown that it is possible, a logical next step would be to compare this experience to other more typical approaches to teaching tree-thinking skills in informal environments (e.g., through an exhibit involving educative panels), to hands-on classroom interventions of similar length, and to didactive instruction.

CONCLUSION

This study serves as proof of concept that escape games can foster learning, filling a major gap in the literature on escape games by providing robust evidence of conceptual STEM learning that persists for a substantial length of time after gameplay. It also contributes to our understanding of phylogenetics education by providing evidence that a game can produce gains in this area, which is rife with misleading intuitive conceptions. The overall learning gains demonstrated by our quantitative analysis are of a large effect size, which is meaningful, especially given the short nature of the intervention, as well as players’ perception of the game as fun experience that would be happily repeated in a free-choice learning environment. This establishes a baseline for future work investigating and optimizing the features of escape-game design and player interactions/characteristics that best support learning, including exploration of group size, mix of player ages, and gameplay dynamics.

SUPPLEMENTAL MATERIALS

Institutions interested in hosting the game can complete this form. Hosting opportunities are likely to depend on grant funding. Pdfs of the comic book are freely available in English and Spanish. A manuscript to submit our data to BMC Research Notes is in preparation. They are also available upon request from the corresponding author.

Supporting information

cbe-25-ar10-s001.pdf (813.2KB, pdf)

ACKNOWLEDGMENTS

This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under SEPA award number R25GM132971. Any opinions, findings, conclusions, or recommendations expressed in this report are those of the research team and do not necessarily reflect the views of the National Institutes of Health. We thank Adam Moylan and Kristin Bass for advice on statistical analyses; UCMP support staff and contractors: Helina Chin, Trish Roque, Chris Mejia, and Josh Frankel for their contributions to the project; project advisors Emma Coleman, Greta Binford, Melinda Owens, Shannon Bennett, Jack Baur, Satish Pillai, Greta Binford, Elda Sanchez, Brent Holman, Amber O'Brien-VerHulst, Amy Miller, Folashade Agusto, and Sarah Dentan; volunteer tech advisors Dan Egnor, Kirill Shklovsky, Michael Seidel, and Michel Dedeo; volunteer game facilitators Derrick Leong, Emma Teng, Kayli Stowe, Angeles Ramirez, Kat Magoulick, Tanner Frank, James Pinto, Brianna Hernandez, Lina Geragiyour, Maria Merza, Stephenos Sada, Angel Reyes, Jeanette Pirlo, and Charles Cano; and volunteer playtesters Yossi and Amir Fendel, Alexander Kilimnik, families from Stanislaus County Public Library, the afterschool kids at Tarea Hall Pittman South Branch, the afterschool kids at North Branch, and afterschool teen group at Claremont Branch, Berkeley Public Library.

Footnotes

1The same is true of the change scores if participants are simply divided into adults and children (independent samples Mann-Whitney U test: N = 446, U = 24733, p = 0.971).

REFERENCES

  1. Ainley, M., & Ainley, J. (2011). Student engagement with science in early adolescence: The contribution of enjoyment to students' continuing interest in learning about science. Contemporary Educational Psychology, 36, 4–12. [Google Scholar]
  2. Ainsworth, S., & Saffer, J. (2013). Can children read evolutionary trees? Merrill-Palmer Quarterly, 59(2), 221–247. [Google Scholar]
  3. Alfieri, L., Brooks, P. J., Aldrich, N. J., & Tenenbaum, H. R. (2011). Does discovery-based instruction enhance learning? Journal of Educational Psychology, 103(1), 1–18. 10.1037/a0021017 [DOI] [Google Scholar]
  4. Andre, L., Durksen, T., & Volman, M. L. (2017). Museums as avenues of learning for children: A decade of research. Learning Environments Research, 20, 47–76. [Google Scholar]
  5. Atkinson, R. K., Derry, S. J., Renkl, A., & Wortham, D. (2000). Learning from examples: Instructional principles from the worked examples research. Review of Educational Research, 70(2), 181–214. [Google Scholar]
  6. Bao, L. (2006). Theoretical comparisons of average normalized gain calculations. American Journal of Physics, 74(10), 917–922. [Google Scholar]
  7. Baum, D. A., DeWitt-Smith, S., & Donovan, S. (2005). The Tree-Thinking Challenge. Science, 310, 979–980. [DOI] [PubMed] [Google Scholar]
  8. Bayeck, R. Y. (2020). Examining board gameplay and learning: A multidisciplinary review of recent research. Simulation & Gaming, 51(4), 411–431. 10.1177/1046878119901286 [DOI] [Google Scholar]
  9. Beatty, I., Leonard, W., Gerace, W., & Dufresne, R. (2006). Question driven instruction: Teaching science (well) with an audience response system. In Audience Response Systems in Higher Education: Applications and Cases ed. Banks DA, Hershey, Pennsylvania: Idea Group Inc, 96–115. 10.4018/978-1-59140-947-2.ch007 [DOI] [Google Scholar]
  10. Berthod, F., Bouchoud, L., Grossrieder, F., Falaschi, L., Senhaji, S., & Bonnabry, P. (2019). Learning good manufacturing practices in an escape room: Validation of a new pedagogical tool. Journal of Oncology Pharmacy Practice, 26(4), 853–860. 10.1177/1078155219875504 [DOI] [PubMed] [Google Scholar]
  11. Blacquiere, L. D., Fawaz, A., & Hoese, W. J. (2020). Who's related to whom? Use published phylogenies and make customized tree-thinking assessments. Evolution: Education and Outreach, 13(1), 20. 10.1186/s12052-020-00134-8 [DOI] [Google Scholar]
  12. Blacquiere, L. D., & Hoese, W. J. (2016). A valid assessment of students' skill in determining relationships on evolutionary trees. Evolution: Education and Outreach, 9(5), 1–12. [Google Scholar]
  13. Bochennek, K., Wittekindt, B., Zimmermann, S., & Klingebiel, T. (2007). More than mere games: A review of card and board games for medical education. Medical Teacher, 29(9–10), 941–948. 10.1080/01421590701749813 [DOI] [PubMed] [Google Scholar]
  14. Borrego, C., Fernandez, C., Blanes, I., & Robles, S. (2017). Room escape at class: Escape games activities to facilitate the motivation and learning in computer science. Journal of Technology and Science Education, 7(2), 162–171. [Google Scholar]
  15. BranchOut. (2020). BranchOut Outdoor Escape Rooms. Retrieved May 6, 2025 from https://www.branchoutparks.com/
  16. Brooks, D. R. (2010). Sagas of the children of time: The importance of phylogenetic teaching in biology. Evolution: Education and Outreach, 3, 495–498. [Google Scholar]
  17. Burney, J., Powers, R. G., & Carnes, M. (2010). Reacting to the past: A new approach to student engagement and to enhancing general education. In: White Paper Report submitted to the Teagle Foundation. [Google Scholar]
  18. Caldas, L. M., Eukel, H. N., Matulewicz, A. T., Fernández, E. V., & Donohoe, K. L. (2019). Applying educational gaming success to a nonsterile compounding escape room. Currents in Pharmacy Teaching and Learning, 11(10), 1049–1054. 10.1016/j.cptl.2019.06.012 [DOI] [PubMed] [Google Scholar]
  19. Caldwell, J. E. (2007). Clickers in the large classroom: Current research and best-practice tips. CBE—Life Sciences Education, 6(1), 9–20. 10.1187/cbe.06-12-0205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Catley, K. M., & Novick, L. R. (2008). Seeing the wood for the trees: An analysis of evolutionary diagrams in biology textbooks. BioScience, 58(10), 976–987. [Google Scholar]
  21. Catley, K. M., Novick, L. R., & Funk, D. J. (2012). The promises and challenges of introducing tree thinking into evolution education. In Evolution Challenges: Integrating research and practice in teaching and learning about evolution ed. Rosengren KR, Brem S, Evans EM, and Sinatra G, New York: Oxford University Press, 93–118. [Google Scholar]
  22. Clark, D. B., Tanner-Smith, E. E., & Killingsworth, S. S. (2016). Digital games, design, and learning. Review of Educational Research, 86(1), 79–122. 10.3102/0034654315582065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. 10.1037/0033-2909.112.1.155 [DOI] [PubMed] [Google Scholar]
  24. Connolly, T. M., Boyle, E. A., MacArthur, E., Hainey, T., & Boyle, J. M. (2012). A systematic literature review of empirical evidence on computer games and serious games. Computers & Education, 59(2), 661–686. [Google Scholar]
  25. Crouch, C. H., & Mazur, E. (2001). Peer instruction: Ten years of experience and results. American Journal of Physics, 69(9), 970–977. [Google Scholar]
  26. Daniel, K. L., Ferguson, D., Leone, E. A., & Bucklin, C. J. (2024). A comparison of measured outcomes across tree-thinking interventions. The American Biology Teacher, 86(2), 71–77. 10.1525/abt.2024.86.2.71 [DOI] [Google Scholar]
  27. Dees, J., Momsen, J. L., Niemi, J., & Montplaisir, L. (2017). Student interpretations of phylogenetic trees in an introductory biology course. CBE—Life Sciences Education, 13(4). 10.1187/cbe.14-01-0003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Dodick, J. (2009). Phylogeny exhibits and understanding geological time. Presented in: Understanding the Tree of Life Conference, Carnegie Museum of Natural History, Pittsburgh, PA. [Google Scholar]
  29. Donoghue, M. J. (2005). Comparisons, phylogeny, and teaching evolution. Revised proceedings of the BSCS/AIBS Symposium, November 2004, Chicago, IL. [Google Scholar]
  30. Eddy, S. L., Crowe, A. J., Wenderoth, M. P., & Freeman, S. (2013). How should we teach tree-thinking? An experimental test of two hypotheses. Evolution: Education and Outreach, 6(13). 10.1186/1936-6434-6-13 [DOI] [Google Scholar]
  31. Eukel, H. N., Frenzel, J. E., & Cernusca, D. (2017). Educational gaming for pharmacy students–Design and evaluation of a diabetes-themed escape room. American journal of pharmaceutical education, 81(7), 6265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Evans, E. M., Rosengren, K. S., Lane, J. D., & Price, K. L. S. (2012). Encountering counterintuitive ideas: Constructing a developmental learning progression for evolution understanding. In Evolution Challenges: Integrating Research and Practices in Teaching and Learning about Evolution ed. Rosengren KR, Brem S, Evans EM, and Sinatra G. New York: Oxford University Press. [Google Scholar]
  33. Evans, E. M., Spiegel, A., Gram, W., Frazier, B. N., Tare, M., Thompson, S., & Diamond, J. (2010). A conceptual guide to natural history museum visitors' understanding of evolution. Journal of Research in Science Teaching, 47(3), 326–353. [Google Scholar]
  34. Fotaris, P., & Mastoras, T. (2019). Escape rooms for learning: A systematic review 13th European Conference on Games Based Learning (ECGBL 2019), Odense, Denmark:. [Google Scholar]
  35. Frankel, J., MacDonald, T., & Thanukos, A. (2023). Plant on a Rampage. The University of California Museum of Paleontology. Retrieved from https://evolution.berkeley.edu/wp-content/uploads/2023/06/PlantOnARampage_English.pdf [Google Scholar]
  36. Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410–8415. https://doi.org/doi: 10.1073/pnas.1319030111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Georgiou, Y., & Ioannou, A. (2019). Embodied learning in a digital world: A systematic review of empirical research in K-12 education. In Learning in a Digital World: Smart Computing and Intelligence, ed. Díaz P., Ioannou A., Bhagat K., and Spector J., Singapore: Springer, 155–177. 10.1007/978-981-13-8265-9_8 [DOI] [Google Scholar]
  38. Gibson, J. P., & Cooper, J. T. (2017). Botanical Phylo-Cards: A tree-thinking game to teach plant evolution. The American Biology Teacher, 79(3), 241–244. 10.1525/abt.2017.79.3.241 [DOI] [Google Scholar]
  39. Gibson, J. P., & Hoefnagels, M. H. (2015). Correlations between tree thinking and acceptance of evolution in introductory biology students. Evolution: Education and Outreach, 8(1), 15. 10.1186/s12052-015-0042-7 [DOI] [Google Scholar]
  40. Giusti, E. (2012). Yale Peabody Museum of Natural History's “Travels in the Great Tree of Life.” Evolution: Education and Outreach, 5(1), 68–75. 10.1007/s12052-012-0397-y [DOI] [Google Scholar]
  41. Glavas, A., & Stascik, A. (2017, May). Enhancing positive attitudes towards mathematics through introducing Escape Room games Mathematics Education as a Science and a Profession, Osijek. [Google Scholar]
  42. Glor, R. E. (2010). Phylogenetic insights on adaptive radiation. Annual Review of Ecology, Evolution, and Systematics, 41(1), 251–270. [Google Scholar]
  43. Gonzalez-Calero, P. A., Camps-Ortueta, I., Gutiérrez-Sánchez, P., & Gómez-Martín, P. P. (2024). On the importance of contextualizing an educational escape room activity. Electronic Journal of e-Learning, 22(4), 43–56. 10.34190/ejel.22.4.3199 [DOI] [Google Scholar]
  44. Gregory, T. R. (2008). Understanding evolutionary trees. Evolution: Education and Outreach, 1, 121–137. [Google Scholar]
  45. Hainey, T., Connolly, T. M., Boyle, E. A., Wilson, A., & Razak, A. (2016). A systematic literature review of games-based learning empirical evidence in primary education. Computers & Education, 102, 202–223. 10.1016/j.compedu.2016.09.001 [DOI] [Google Scholar]
  46. Hake, R. R. (1998). Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66(1), 64–74. [Google Scholar]
  47. Halverson, K. L. (2010). Using pipe cleaners to bring the tree of life to life. The American Biology Teacher, 72(4), 223–224. [Google Scholar]
  48. Halverson, K. L., Pires, J. C., & Abell, S. K. (2011). Exploring the complexity of tree thinking expertise in an undergraduate systematics course. Science Education, 95, 794–823. [Google Scholar]
  49. Hmelo-Silver, C. E., Golan, D. R., & Chinn, C. A. (2007). Scaffolding and achievement in problem-based and inquiry learning: A response to Kirschner, Sweller, and Clark (2006). Educational Psychologist, 42(2), 99–107. 10.1080/00461520701263368 [DOI] [Google Scholar]
  50. Horn, M., Phillips, B., Evans, E. M., Block, F., Diamond, J., & Shen, C. (2016). Visualizing biological data in museums: Visitor learning with an interactive tree of life exhibit. Journal of Research in Science Teaching, 53(6), 895–918. 10.1002/tea.21318 [DOI] [Google Scholar]
  51. Hulleman, C. S., & Harackiewicz, J. M. (2009). Promoting interest and performance in high school science classes. Science, 326, 1410–1412. [DOI] [PubMed] [Google Scholar]
  52. Infanti, L. M., & Wiles, J. R. (2015). “Evo in the News:” Understanding Evolution and students’ attitudes towards the relevance of evolutionary biology. Bioscene, 40(2), 9–14. [Google Scholar]
  53. Janies, D. (2019). Phylogenetic concepts and tools applied to epidemiologic investigations of infectious diseases. Microbiology Spectrum, 7(4). 10.1128/microbiolspec.ame-0006-2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Jenkins, K. P., Mead, L., Baum, D. A., Daniel, K. L., Bucklin, C. J., Leone, E. A., … Naegle, E. (2022). Developing the BETTSI: A tree-thinking diagnostic tool to assess individual elements of representational competence. Evolution, 76(4), 708–721. 10.1111/evo.14458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Jungck, J. R., & Weisstein, A. E. (2013). Mathematics and evolutionary biology make bioinformatics education comprehensible. Briefings in Bioinformatics, 14(5), 599–609. 10.1093/bib/bbt046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Kovarik, D. N., Patterson, D. G., Cohen, C., Sanders, E. A., Peterson, K. A., Porter, S. G., & Chowning, J. T. (2013). Bioinformatics education in high school: Implications for promoting science, technology, engineering, and mathematics careers. CBE—Life Sciences Education, 12(3), 441–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kummer, T. A., Whipple, C. J., Bybee, S. M., Adams, B. J., & Jensen, J. L. (2019). Development of an evolutionary tree concept inventory. Journal of Microbiology & Biology Education, 20(2). 10.1128/jmbe.v20i2.1700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Kummer, T. A., Whipple, C. J., & Jensen, J. L. (2016). Prevalence and persistence of misconceptions in tree thinking. Journal of Microbiology & Biology Education, 17(3). 10.1128/jmbe.v17i3.1156 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Lamb, R. L., Annetta, L., Firestone, J., & Etopio, E. (2018). A meta-analysis with examination of moderators of student cognition, affect, and learning outcomes while using serious educational games, serious games, and simulations. Computers in Human Behavior, 80, 158–167. 10.1016/j.chb.2017.10.040 [DOI] [Google Scholar]
  60. Lathwesen, C., & Belova, N. (2021). Escape rooms in STEM teaching and learning—prospective field or declining trend? A literature review. Education Sciences, 11(6), 308. 10.3390/educsci11060308 [DOI] [Google Scholar]
  61. Laurentino, T. G., Scheller, M., Glover, G., Proulx, M. J., & de Sousa, A. A. (2024). Thinking on your feet: Potentially enhancing phylogenetic tree learning accessibility through a kinaesthetic approach. Evolution: Education and Outreach, 17(1), 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Leaché, A. D., & Oaks, J. R. (2017). The utility of single nucleotide polymorphism (SNP) data in phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 48(1), 69–84. 10.1146/annurev-ecolsys-110316-022645 [DOI] [Google Scholar]
  63. Lin, F. J., Wang, C. P., Zhung, H. C., Wang, H. Y., Wang, S. M., Li, C. T., … Hou, H. T., (2017, 9-13 July). Paper Romance©-an educational simulation game for learning papermaking with contextual scaffoldings for elementary students: The evaluation of learning performance and flow state. 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).
  64. Lopez-Pernas, S., Gordillo, A., Barra, E., & Quemada, J. (2019a). Analyzing learning effectiveness and students’ perceptions of an educational escape room in a programming course in higher education. IEEE Access, 7, 184221–184234. 10.1109/access.2019.2960312 [DOI] [Google Scholar]
  65. Lopez-Pernas, S., Gordillo, A., Barra, E., & Quemada, J. (2019b). Examining the use of an educational escape room for teaching programming in a higher education setting. IEEE Access, 7, 31723–31737. 10.1109/access.2019.2902976 [DOI] [Google Scholar]
  66. Lussenhop, A., Atwood, A., & Kollmann, E. K. (2023). Summative Evaluation of the Moon Adventure Game. NISE. Retrieved from https://www.nisenet.org/sites/default/files/catalog/uploads/moon_adventure_game_summative_evaluation_report_2023_july.pdf [Google Scholar]
  67. Lyman, F. T. (1981). The responsive classroom discussion: The inclusion of all students. Mainstreaming Digest, 109(1), 113. [Google Scholar]
  68. MacDonald, T., & Thanukos, A. (2023). Natural History Mystery: Immersing families in a problem-solving game using natural history exhibits. Journal of STEM Outreach, 6(1). 10.15695/jstem/v6i1.08 [DOI] [Google Scholar]
  69. MacDonald, T., & Wiley, E. O. (2012). Communicating phylogeny: Evolutionary tree diagrams in museums. Evolution Education and Outreach, 5, 14–28. 10.1007/s12052-012-0387-0 [DOI] [Google Scholar]
  70. Machová, M. (2021). Phylogenetic trees and other evolutionary diagrams in biology textbooks and their importance in secondary science education. Scientia in Educatione, 12(1), 16–36. 10.14712/18047106.1923 [DOI] [Google Scholar]
  71. Marcy, A. E. (2023). Go Extinct! An award-winning evolution game that teaches tree-thinking as students pursue the winning strategy. CourseSource, 10, 1–14. 10.24918/cs.2023.9 [DOI] [Google Scholar]
  72. Mayer, R. E. (2019). Computer games in education. Annual Review of Psychology, 70(1), 531–549. 10.1146/annurev-psych-010418-102744 [DOI] [PubMed] [Google Scholar]
  73. Meir, E., Perry, J., Herron, J. C., & Kingsolver, J. (2007). College students’ misconceptions about evolutionary trees. American Biology Teacher, 69, 71–76. [Google Scholar]
  74. Menalled, U. D., Smith, R. G., Cordeau, S., Ditommaso, A., Pethybridge, S. J., & Ryan, M. R. (2023). Phylogenetic relatedness can influence cover crop-based weed suppression. Scientific Reports, 13(1). 10.1038/s41598-023-43987-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Merchant, Z., Goetz, E. T., Cifuentes, L., Keeney-Kennicutt, W., & Davis, T. J. (2014). Effectiveness of virtual reality-based instruction on students' learning outcomes in K-12 and higher education: A meta-analysis. Computers & Education, 70, 29–40. 10.1016/j.compedu.2013.07.033 [DOI] [Google Scholar]
  76. Miller, K., Lasry, N., Reshef, O., Dowd, J., Araujo, I., & Mazur, E. (2010). Losing it: The influence of losses on individuals' normalized gains. AIP Conference Proceedings 2010 Physics Education Research Conference, Portland. [Google Scholar]
  77. Morlon, H., Andréoletti, J., Barido-Sottani, J., Lambert, S., Perez-Lamarque, B., Quintero, I., … Veron, P. (2024). Phylogenetic insights into diversification. Annual Review of Ecology, Evolution, and Systematics, 55, 1–22. [Google Scholar]
  78. Mystakidis, S., Cachafeiro, E., & Hatzilygeroudis, I. (2019, July). Enter the serious e-scape room: A cost-effective serious game model for deep and meaningful e-learning. 10th International Conference on Information, Intelligence, Systems and Applications, Patras, Greece. [Google Scholar]
  79. National Academies of Sciences, Engineering, and Medicine. (2023). Rise and Thrive with Science: Teaching PK-5 Science and Engineering. The National Academies Press. 10.17226/26853 [DOI] [PubMed] [Google Scholar]
  80. National Academies of Sciences, Engineering, and Medicine. (2025). K-12 STEM Education and Workforce Development in Rural Areas. The National Academies Press. 10.17226/28269 [DOI] [Google Scholar]
  81. National Foundation for Educational Research (NFER). (2011). Exploring young people's views on science education. Retrieved from https://www.nfer.ac.uk/exploring-young-peoples-views-on-science-education/ [Google Scholar]
  82. National Research Council. (2009). Learning Science in Informal Environments: People, Places, and Pursuits. The National Academies Press. Retrieved from https://www.nap.edu/read/12190 [Google Scholar]
  83. National Research Council. (2011). Learning Science Through Computer Games and Simulations. The National Academies Press. Retrieved from https://www.nap.edu/read/13078 [Google Scholar]
  84. Naxerova, K., & Jain, R. K. (2015). Using tumour phylogenetics to identify the roots of metastasis in humans. Nature Reviews Clinical Oncology, 12(5), 258–272. 10.1038/nrclinonc.2014.238 [DOI] [PubMed] [Google Scholar]
  85. Nelson, C. E. (2012). Why don't undergraduates really “get" evolution? What can faculty do? In Evolution Challenges: Integrating Research and Practices in Teaching and Learning about Evolution ed. Rosengren K. R., Brem S., Evans E. M., & Sinatra G.. New York: Oxford University Press, 311–347. [Google Scholar]
  86. NGSS Lead States. (2013). Next Generation Science Standards Standards: For States, By States (T. N. A. Press, Ed.). The National Academies Press. Retrieved from http://www.nextgenscience.org/ [Google Scholar]
  87. Nicholson, S. (2016). October 20-22). The state of escape: Escape room design and facilities Meaningful Play 2016, Lansing, MI:. Retrieved from http://scottnicholson.com [Google Scholar]
  88. Novick, L. R., & Catley, K. M. (2017). Fostering 21st-Century evolutionary reasoning: Teaching tree thinking to introductory biology students. CBE Life Sciences Education, 15(4). 10.1187/cbe.15-06-0127 [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Novick, L. R., Pickering, J., MacDonald, T., Diamond, J., Ainsworth, S., Aquino, A. E., … Scott, M. (2014). Depicting the tree of life: Guiding principles from psychological research. Evolution: Education and Outreach, 7(1), 1–13. 10.1186/s12052-014-0025-0 [DOI] [Google Scholar]
  90. O'Donnell, A. M., & Hmelo-Silver, C. E. (2013). Introduction: What is collaborative learning? An overview. In The international handbook of collaborative learning ed. Hmelo-Silver C., Chinn C., Chan C., & O'Donnell A.. New York: Routledge, 1–15. [Google Scholar]
  91. Omland, K. E., Cook, L. G., & Crisp, M. D. (2008). Tree thinking for all biology: The problem with reading phylogenies as ladders of progress. BioEssays, 30, 854–867. [DOI] [PubMed] [Google Scholar]
  92. Pedaste, M., Mäeots, M., Siiman, L. A., de Jong, T., van Riesen, S. A. N., Kamp, E. T., … Tsourlidaki, E. (2015). Phases of inquiry-based learning: Definitions and the inquiry cycle. Educational Research Review, 14, 47–61. 10.1016/j.edurev.2015.02.003 [DOI] [Google Scholar]
  93. Pekar, J. E., Magee, A., Parker, E., Moshiri, N., Izhikevich, K., Havens, J. L., … Wertheim, J. O. (2022). The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science, 377(6609), 960–966. 10.1126/science.abp8337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Pellens, R., & Grandcolas, P. (eds.). (2016). Biodiversity Conservation and Phylogenetic Systematics: Preserving our evolutionary heritage in an extinction crisis. Cham, Switzerland: Springer. [Google Scholar]
  95. Perry, J., Meir, E., Herron, J. C., Maruca, S., & Stal, D. (2008). Evaluating two approaches to helping college students understand evolutionary trees through diagramming tasks. CBE—Life Sciences Education, 7(2), 193–201. 10.1187/cbe.07-01-0007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Phillips, B. C., Novick, L. R., Catley, K. M., & Funk, D. J. (2012). Teaching tree thinking to college students: It's not as easy as you think. Evolution: Education and Outreach, 5, 595–602. [Google Scholar]
  97. Pobinar, B. (2016). Accepting, understanding, teaching, and learning (human) evolution: Obstacles and opportunities. Yearbook of Physical Anthropology, 159, 232–274. [DOI] [PubMed] [Google Scholar]
  98. Porcello, D., McCarthy, C., & Ostman, R. (2017). Gaming and the NISE Network: A Gameful Approach to STEM Learning Experiences. S. M. o. M. f. t. N. Network. [Google Scholar]
  99. Pryor, L. (2019, February). Can theatrical games improve museum visitors’ understanding of complex topics requiring conceptual shifts? NSF AISL PI Conference, Alexandria, VA:. [Google Scholar]
  100. Pryor, L., King, Z., Ronning, E., Long, S., & Nordberg, T. (2021). Infestation: The evolution begins. S. M. o. Minnesota. [Google Scholar]
  101. Quimby, C. (2024). VENOMventure | aVENENOtura 2024 Summative Report. Retrieved from https://evolution.berkeley.edu/wp-content/uploads/2025/05/VENOMventure-Y4-Report.pdf
  102. Rawlinson, R. E., & Whitton, N. (2024). Escape rooms as tools for learning through failure. Electronic Journal of e-Learning, 22(4), 19–29. [Google Scholar]
  103. Sanchez, E., & Plumettaz-Sieber, M. (2019). Teaching and learning with escape games from debriefing to institutionalization of knowledge. In Games and Learning Alliance. GALA 2018. Lecture Notes in Computer Science ed. Gentile M., Allegra M., & Söbke H.. Cham: Springer, 242–253. 10.1007/978-3-030-11548-7_23 [DOI] [Google Scholar]
  104. Sandvik, H. (2008). Tree thinking cannot be taken for granted: Challenges for teaching phylogeny. Theory in Biosciences, 127(1), 45–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Scaduto, D. I., Brown, J. M., Haaland, W. C., Zwickl, D. J., Hillis, D. M., & Metzker, M. L. (2010). Source identification in two criminal cases using phylogenetic analysis of HIV-1 DNA sequences. Proceedings of the National Academy of Sciences, 107(50), 21242–21247. 10.1073/pnas.1015673107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Schneider, B., Strait, M., Muller, L., Elfenbein, S., Shaer, O., & Shen, C. (2012). Phylo-Genie: Engaging students in collaborative ‘tree-thinking’ through tabletop techniques. Conference on Human Factors in Computing Systems, Austin, Texas:. [Google Scholar]
  107. Serrada-Sotil, J., Huertas Martínez, J. A., & Granado-Peinado, M. (2025). Do audience response systems truly enhance learning and motivation in higher education? A systematic review. Humanities and Social Sciences Communications, 12(1), 1767. 10.1057/s41599-025-06042-w [DOI] [Google Scholar]
  108. Skene, K., O'Farrelly, C. M., Byrne, E. M., Kirby, N., Stevens, E. C., & Ramchandani, P. G. (2022). Can guidance during play enhance children's learning and development in educational contexts? A systematic review and meta-analysis. Child Development, 93(4), 1162–1180. 10.1111/cdev.13730 [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Sousa, C., Rye, S., Sousa, M., Torres, P. J., Perim, C., Mansuklal, S. A., & Ennami, F. (2023). Playing at the school table: Systematic literature review of board, tabletop, and other analog game-based learning approaches [Review]. Frontiers in Psychology, 14, Retrieved from https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2023.1160591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Spiegel, A., Evans, E. M., Frazier, B. F., Hazel, A., Tare, M., Gram, W., & Diamond, J. (2012). Changing museum visitors' concepts of evolution. Evolution: Education and Outreach, 5, 43–61. [Google Scholar]
  111. Stadler, T. (2013). Recovering speciation and extinction dynamics based on phylogenies. Journal of Evolutionary Biology, 26(6), 1203–1219. 10.1111/jeb.12139 [DOI] [PubMed] [Google Scholar]
  112. Tanner, K. D. (2009). Talking to learn: Why biology students should be talking in classrooms and how to make it happen. CBE—Life Sciences Education, 8(2), 89–94. 10.1187/cbe.09-03-0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Thanukos, A. (2010). Evolutionary trees from the tabloids and beyond. Evolution: Education and Outreach, 3(4), 563–572. 10.1007/s12052-010-0290-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Tucker, C. M., Cadotte, M. W., Carvalho, S. B., Davies, T. J., Ferrier, S., Fritz, S. A., … Mazel, F. (2016). A guide to phylogenetic metrics for conservation, community ecology and macroecology. Biological Reviews, 10.1111/brv.12252 [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Tweet, J. (2016). Grandmother Fish: A Child's First Book of Evolution. Feiwel & Friends. Retrieved from https://www.amazon.com/Grandmother-Fish-Childs-First-Evolution/dp/1250113237 [Google Scholar]
  116. Veldkamp, A., Rebecca Niese, J., Heuvelmans, M., Knippels, M. C. P. J., & Van Joolingen, W. R. (2022). You escaped! How did you learn during gameplay? British Journal of Educational Technology, 53(5), 1430–1458. 10.1111/bjet.13194 [DOI] [Google Scholar]
  117. Veldkamp, A., van de Grint, L., Knippels, M.-C. P. J., & van Joolingen, W. R. (2020). Escape education: A systematic review on escape rooms in education. Educational Research Review, 31, 100364. 10.1016/j.edurev.2020.100364 [DOI] [Google Scholar]
  118. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes (Vol. 86). Cambridge, MA: Harvard University Press. [Google Scholar]
  119. Wade, N. (2017, March 22). Shaking up the dinosaur family tree. New York Times. Retrieved from https://www.nytimes.com/2017/03/22/science/dinosaur-family-tree.html
  120. Walter, E. M., Halverson, K. M., & Boyce, C. J. (2013). Investigating the relationship between college students’ acceptance of evolution and tree thinking understanding. Evolution: Education and Outreach, 6(1), 26. 10.1186/1936-6434-6-26 [DOI] [Google Scholar]
  121. Weisberg, D. S., Hirsh-Pasek, K., Golinkoff, R. M., Kittredge, A. K., & Klahr, D. (2016). Guided play: Principles and practices. Current Directions in Psychological Science, 25(3), 177–182. 10.1177/0963721416645512 [DOI] [Google Scholar]
  122. Weisberg, D. S., Kittredge, A. K., Hirsh-Pasek, K., Golinkoff, R. M., & Klahr, D. (2015). Making play work for education. Phi Delta Kappan, 96(8), 8–13. 10.1177/0031721715583955 [DOI] [Google Scholar]
  123. Wiemker, M., Elumir, E., & Clare, A. (2015). Escape room games: “Can you transform an unpleasant situation into a pleasant one?.” In Game Based Learning-Dialogorientierung & spielerisches Lernen analog und digital ed. Weibenböck J. H. J.. Vienna, Austria: Ikon, 55–68. Retrieved from https://thecodex.ca/wp…/2016/08/00511Wiemker-et-al-Paper-Escape-Room-Games.pdf [Google Scholar]
  124. Wiley, E. O., & Lieberman, B. S. (2011). Phylogenetics: Theory and Practice of Phylogenetic Systematics. Hoboken: John Wiley & Sons. [Google Scholar]
  125. Yachin, T., & Barak, M. (2024). Science-based educational escape games: A game design methodology. Research in Science Education, 54(2), 299–313. 10.1007/s11165-023-10143-4 [DOI] [Google Scholar]
  126. Young, J. R., Bevan, D., & Sanders, M. (2024). How productive is the productive struggle? Lessons learned from a scoping review. International Journal of Education in Mathematics, Science and Technology, 12(2), 470–495. [Google Scholar]
  127. Zhukova, A., Cutino-Moguel, T., Gascuel, O., & Pillay, D. (2017). The role of phylogenetics as a tool to predict the spread of resistance. The Journal of Infectious Diseases, 216(suppl_9), S820–S823. 10.1093/infdis/jix411 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

cbe-25-ar10-s001.pdf (813.2KB, pdf)
cbe-25-ar10-s001.pdf (813.2KB, pdf)

Articles from CBE Life Sciences Education are provided here courtesy of American Society for Cell Biology

RESOURCES