ABSTRACT
For STEM faculty to approach teaching as a scientist, they must develop tools for data collection and analysis of student learning outcomes. Here, we report a methodology for the development of a learning outcomes assessment instrument and statistical analysis of that instrument that can be undertaken in a short amount of time by a few faculty members with little to no funding. Our team of instructors at a public land‐grant university developed an instrument for our single‐semester nonmajors biochemistry course. The instrument consists of eight sets of multiple true/false questions assessing learning objectives covering topics within protein structure and function, thermodynamics, and metabolism. We employed the instrument as a pre‐ and postcourse evaluation for several semesters. We conducted statistical analyses on overall exam scores and on individual questions. The results indicate that between the beginning and the end of the semester students achieved statistically significant increases in their cumulative scores. Finer‐grained analysis revealed that students displayed little to no improvement in specific content areas and concepts. These findings point to areas in need of pedagogical interventions.
Keywords: active learning, assessment instrument, concept inventory, learning outcomes
1. Introduction
Over the past several decades a growing body of literature has illustrated the effectiveness of active‐learning pedagogical approaches in enhancing learning gains in a variety of STEM disciplines [1, 2, 3]. Given the comprehensive nature of this body of literature a variety of scientific associations have developed teaching standards to emphasize active‐learning pedagogical approaches and in 2004 Handelsman et al. postulated that science instructors should approach teaching as they do their research, coining the phrase “scientific teaching” [4].
The growing awareness of this body of literature has resulted in an increasing number of faculty adapting their teaching methods to include multiple forms of active learning. In most cases, instructors assume that such methods increase student learning outcomes. Unfortunately, few objective means exist for measuring such learning gains. Thus, instructors cannot measure if the time‐intensive adaptations and changes they make increase learning outcomes. In recognition of this, faculty in several STEM disciplines have developed, published, and employed concept inventories (CIs) to measure student learning gains [5, 6, 7, 8, 9, 10, 11]. The first of these developed was the Force Concept Inventory (FCI), which has been widely used by instructors of introductory physics [5]. This inventory, used as a pre/posttest, has successfully identified effective pedagogical adaptations and changes from the traditional lecture‐based model [12]. These successes have led to the development of many other CIs in the STEM disciplines. In the molecular life sciences, CIs have been developed for a variety of courses and biomolecular topics [13, 14, 15, 16, 17, 18, 19].
While the number and diversity of CIs in the molecular life sciences have increased over the past two decades, many molecular life science instructors looking for a means to evaluate their pedagogical practices and adaptations are still left without a viable option. This is particularly true for biochemistry and molecular biology instructors. Biochemistry and molecular biology are rapidly progressing fields in which new scientific discoveries are reported regularly. This rate of discovery, along with the multidisciplinary nature of the field, poses challenges to life science educators who must choose what content to present in their courses [20]. This selection process can lead to content that does not align with existing CIs. For instance, the learning objectives for our one semester biochemistry course (described below) differs significantly from the molecular life science courses assessed by the previously cited CIs. Given these factors, the cohort of published molecular life science CIs may cover only a portion of the learning objectives put forth for any one course at any given college or university. Furthermore, many STEM instructors do not have the time, the funding, or the expertise to produce a CI that addresses the learning objectives developed for their course.
To bridge this gap, we report here a methodology for the development of a reliable instrument for measuring learning outcomes over the course of a few semesters. The methodology falls short of that required to develop a fully vetted CI which requires a Delphi study, student interviews, test administration at multiple institutions, and other validation methods [21]. However, the methodology can produce an instrument that has minimized the use of language that confuses students, leading to question misinterpretation. This instrument can be used to:
Track student learning gains over the course of a semester.
Identify student misunderstanding at the outset of a semester.
Identify instructional weaknesses, leading to interventions and pedagogical adaptations.
Probe student misconceptions.
Through our approach, an instrument can be developed in a relatively short period of time (three semesters), giving STEM instructors a means to collect data to evaluate their pedagogical approaches.
The learning assessment instrument produced in this study consists of 42 multiple true/false (MTF) questions grouped into eight sets with their own “narrative stem” [22] that we wrote and revised in a manner to avoid the use of language that confused students, leading to question misinterpretation. We employed the instrument as a pre/postcourse evaluation in six separate sections of a single‐semester biochemistry course spanning three semesters of data collection. We statistically analyzed the data obtained with the instrument at two different levels. First, we assessed the total test scores, highlighting student overall improvement from precourse evaluations (pretest) to postcourse evaluation (posttest). Second, we assessed data on individual questions, comparing pre‐ to postcourse performance. We report here the methods used to produce the instrument and the statistical analyses of the data collected. We also discuss how the data collected from the instrument can be used to identify content areas in which pedagogical interventions are needed.
2. Methods
The methods for developing a learning assessment instrument detailed below were utilized for a single‐semester nonmajors biochemistry course taught at Colorado State University. Topics covered within the course include thermodynamics, intermolecular interactions, protein structure, enzymology, ligand binding, protein cooperativity, protein allostery, membrane proteins and transport, carbohydrate and lipid metabolism, and cellular respiration. The course is typically presented in two different sections in the fall, spring, and summer sessions with enrollment around 750–850 students a year. The teaching burden is shared by three different instructors each of whom have varying teaching styles ranging from traditional lecture to nontraditional active learning pedagogical methodologies. We developed our instrument with the goal of designing an assessment that could be used to measure student learning gains both over the course of an entire semester and for specific content areas. Further, we sought to design a method (Figure 1) that could easily be replicated by other instructors for the development of a tool specific to their course, and teaching styles (traditional vs. nontraditional), that would yield data useful in identifying content areas in need of instructional interventions, and a means to evaluate such interventions.
FIGURE 1.

Method flow chart illustrating steps taken within each semester of instrument development. Following semester three, the instrument was administered to sections of the course as a MTF (multiple true/false) assessment at the beginning and end of the semester, followed by statistical analysis.
With these goals in mind, we began the development of our instrument by carefully defining learning objectives for our single‐semester biochemistry course. We developed learning objectives (Table 1) via an iterative process with discussion amongst ourselves, as well as other instructors who had taught the course. We also consulted the list of learning objectives published by the American Society of Biochemistry and Molecular Biology (ASBMB) [23]. We chose learning objectives for our instrument that represent those concepts directly related to previously identified threshold concepts [24]. Threshold concepts are defined as concepts that are transformative, altering a student's conception of the field, are not easily forgotten, integrative, troublesome, and bounded [25]. Our single‐semester biochemistry course focuses primarily on the following threshold concepts: the physical basis of interaction, thermodynamics of macromolecular formation, free energy, steady state, and biochemical metabolic pathway dynamics and regulation [23, 24].
TABLE 1.
Learning objectives, corresponding questions, and threshold concepts.
| Learning objective | Questions | Threshold concepts |
|---|---|---|
| Predict the polarity of a functional group and what type of noncovalent interaction it will participate in. | 1–5 | The physical basis of interactions. |
| Compare and contrast the interactions driving the formation of secondary, tertiary, and quaternary structure of proteins. | 6–10 |
The physical basis of interactions. Thermodynamics of macromolecular structure formation. |
| Predict whether reactant or product will accumulate given a reaction's actual free energy change or Q and K eq. | 11–15 | Free energy |
| Predict the effects of mutations or ligand structural changes on the activity, structure, or stability of a protein. | 16–20 | The physical basis of interactions. |
| Apply laws of thermodynamics and explain/illustrate instances when energy is converted from one form to another in biological processes. | 21–25 | Free energy |
| Apply the principles of kinetics, equilibrium, and Le Châtelier's principle to biological steady states, metabolic flux, and pathway design. | 26–30 |
Steady state Biochemical pathway dynamics and regulation |
| Compare and contrast various mechanisms for regulating the function of a macromolecule, enzymatic reaction rate, or pathway. | 31–35 | Biochemical pathway dynamics and regulation |
| Describe, illustrate, and differentiate the stages, pathways, and steps in cellular metabolism and diagram their interconnectedness. | 36–40 | Biochemical pathway dynamics and regulation |
Upon successful identification of course learning objectives, we generated MTF questions consisting of a “stem” and five answer choices for each learning objective (a total of 40 questions, in eight sets of five). We chose the MTF format over that of traditional multiple‐choice questions (MCQs) because MCQ improperly represent student mastery of a particular question/content area [22].
Once we developed the initial MTF questions, we administered the instrument as a “two‐tier” quiz to an initial cohort of volunteer students (n = 8) at the end of a semester. In this initial administration, we asked students to answer the statement as true or false (Tier 1) and then state, in writing, their reasoning for their selection (Tier 2). We evaluated student responses not only for the correct answer but also for correct reasoning. This evaluation made it clear that the language of some of the questions led students to select the incorrect answer despite correct reasoning regarding the concept. Alternatively, some students selected the correct response despite incorrect reasoning. We rewrote questions that exhibited such problems (a total of 19 questions) to avoid these outcomes.
After the first round of revisions, we administered the instrument again to an entire section of students as a two‐tier quiz. The instrument was divided into two sections of 20 questions each, to reduce the cognitive burden on the student while taking the quiz. After the successful administration of both sections of the quiz (n = 128 for Questions 1–20 and n = 117 for Questions 21–40) student reasoning was evaluated as either “valid,” “invalid,” or “indeterminate.” The “indeterminate” ranking was applied to students who either did not fill out the reasoning portion of the instrument or who gave insufficient data to determine whether they had a sufficient understanding of the material to support their answer. These students' data were removed from further analysis.
The “valid” ranking of student reasoning was given when a student either answered the question correctly or incorrectly for valid reasons. By this, we mean that the student's explanation illustrated that they understood the question. These students were given a score of one.
We scored student reasoning as “invalid” when a student correctly or incorrectly answered the question for invalid reasons. By this, we mean that the student's explanation revealed a misinterpretation of the question and that they were not addressing the concept being evaluated. In these cases, the student's response was given a score of zero. Following this analysis, we further evaluated questions scored with less than 85% “valid” reasoning, and in some cases we rewrote them using cues from students who had clearly misunderstood the question. This analysis yielded a total of eight questions needing revision, while the other 32 questions were considered “validated.” The eight invalidated questions were rewritten and once again presented to students (n = 137) in the following semester. Using the analysis scheme described above it was determined that all eight previously revised questions met validation criteria. Importantly, two “read‐through” questions were then placed strategically in the exam (making for a total of 42 questions on the exam). These questions state, “If you read this question answer ‘true’” and “If you read this question answer ‘false’.” Incorrect answers on these questions indicate that the student did not read them. It was then assumed that this student did not seriously engage the instrument and their data was excluded during statistical analysis.
Once the above validation process was complete, we employed the instrument as a pre‐ and posttest in a single semester biochemistry course entitled “Principles of Biochemistry.” In each semester two separate sections were taught by two different instructors making for a total of six sections analyzed. The demographics of each section can be seen in Table 2. After each semester, statistical analysis was performed on individual sections. Additionally, statistical analysis was performed on the combined data from all six sections.
TABLE 2.
Section demographics.
| S‐01 | S‐02 | S‐03 | S‐04 | S‐05 | S‐06 | Totals | |
|---|---|---|---|---|---|---|---|
| n | 121 | 202 | 86 | 183 | 96 | 170 | 858 |
| Female a | 62% | 83% | 68% | 76% | 68% | 78% | 72% |
| Male a | 38% | 17% | 32% | 24% | 32% | 22% | 28% |
| Non‐White a | 24% | 16% | 27% | 25% | 34% | 30% | 27% |
| White a | 76% | 84% | 73% | 75% | 66% | 70% | 73% |
| Animal science | 8% | 9% | 10% | 8% | 5% | 6% | 8% |
| Biology | 26% | 57% | 39% | 42% | 25% | 51% | 42% |
| Biomedical sciences | 19% | 17% | 16% | 18% | 17% | 21% | 18% |
| Engineering | 19% | 0.1% | 10% | 9% | 35% | 1% | 10% |
| FSHN b | 8% | 4% | 8% | 3% | 9% | 6% | 6% |
| HES c | 5% | 0.2% | 2% | 5% | 0% | 2% | 3% |
| MIP d | 11% | 5% | 8% | 3% | 3% | 9% | 6% |
| Other | 4% | 7.7% | 7% | 12% | 6% | 4% | 7% |
Gender and ethnicity data were taken from enrollment data collected by Colorado State University.
Food science and human nutrition.
Health and exercise science.
Microbiology, immunology, and pathology.
Statistical analysis was done in collaboration with Colorado State University's Institute for Research in the Social Sciences (IRISSs). To begin, data on pre‐ and posttest scores, as well as student demographic data, were combined using a script developed by statisticians within IRISS in R‐studio. These data, once generated, were then analyzed using additional scripts written for R‐studio to statistically analyze our results. Statistical analysis began with a Spearman Rho correlation test as well as a Shapiro–Wilks normality test. We ran these two tests to determine if the pre‐ and posttest data were dependent upon one another and whether they exhibit Gaussian distribution. Once we completed these tests, we executed a paired t‐test (if data was normally distributed) or Wilcoxon signed‐rank test (if data were not normally distributed) for hypothesis testing and to determine the statistical significance of our data for the pre‐ and posttest summative data [26]. In addition to summative data analysis, we performed statistical analysis on each individual question to identify weaknesses in instruction within specific areas. We performed a McNemar test to determine the statistical significance of differences in pre‐ and posttest performance on individual questions.
3. Learning Outcome Assessment Instrument
The instrument produced from the above methodologies is currently being used as a research tool within our undergraduate courses. In the interest of this research, we have decided not to make the instrument publicly available to ensure that future undergraduate students taking our course do not have access to it, confounding our evaluation of such students. The instrument can, however, be shared with fellow instructors through a request to the corresponding author.
4. Results
4.1. Lessons From Early Administration of the Exam
After validation of all questions on the instrument, we gave students the exam both pre‐ and postcourse. In our initial administration, we were concerned what effect student “guessing” would have on the instruments summative scores. The nature of a true/false question format underscores this concern since each student has a one‐in‐two chance of guessing the correct answer. Given this concern, we gave one section of students (n = 92) the precourse instrument with three possible answer options: true, false, and “I don't know,” while we gave another section (n = 173) only the traditional true/false option. We scored the “I don't know” option identically to an incorrect answer, giving the student that selected this option a score of 0/1. To alleviate students' concerns with the impact of this scoring upon their semester grade within the class, we gave the pre and posttest as participation assignments, and all students that took both the pre‐ and posttest received 10 points regardless of their overall performance.
Interestingly, in the section in which the “I don't know” option was included the mean score was 47.3% with a low score of 25% and a high score of 75% as compared to the section without this option which had a mean score of 60.7%, a low of 32.5% and high of 77.5%. Student's selecting the “I don't know” option ranged from 48 of 92 students (on Question 19) to as few as 3 of 92 students (on Question 21), with an average of ~22 of 92 students (standard deviation of ~12) stating “I don't know” on any question. Given these data, it was decided very early on that the pre‐ and posttest would be given with the “I don't know” option to avoid artificially high pre‐ and posttest scores due to student guessing.
4.2. Pre‐ and Postcourse Test Summative Scores
We performed a statistical analysis of the summative score of the pre‐ and posttests for six individual sections taught across three different semesters (two sections per semester). While the instructor and content delivery differed across each section, the learning objectives and pre‐ and posttest remained constant. Individual sections had pretest score means ranging from 16.2 (40.5%) to 19.8 (49.5%) and had posttest score means ranging from 23.8 (59.5%) to 26.2 (65.5%) (Table 3). In all cases, the mean score differences between pre‐ and posttests were statistically significant (p < 0.0001) when analyzed by a paired t test. Additionally, combining data from all six sections resulted in a mean score of 17.8 (44.5%) on the pretest and a mean score of 25.0 (62.5%) on the posttest (Table 3). The differences in these mean scores, pre‐ and postinstruction, are statistically significant (p < 0.0001) according to a Wilcoxon rank sum test. The effect size (0.5664) for this difference is large. These results are not unexpected, as instruction targeted at specific learning objectives should be effective in improving student mastery of content within any subject.
TABLE 3.
Mean pretest and posttest scores for all sections.
| n | Mean pretest | Mean posttest | |
|---|---|---|---|
| S‐01 | 121 | 19.8 | 26.2 a |
| S‐02 | 202 | 16.4 | 25.3 a |
| S‐03 | 86 | 19.6 | 23.8 a |
| S‐04 | 183 | 18.1 | 24.1 a |
| S‐05 | 96 | 18.8 | 25.1 a |
| S‐06 | 170 | 16.2 | 25.3 a |
| All sections | 858 | 17.8 | 25.0 a |
Note: Pretest and posttest scores reflect the mean score out of 40.
Posttest scores are significantly better than pretest scores (p < 0.0001 paired t test and Wilcoxon).
4.3. Pre‐ and Postcourse Test Score on Individual Questions
Summative test scores before and after instruction indicate overall improved performance for all students in each of the six sections analyzed as well as their combined data (as described above). These results, while encouraging, lack the specific detail needed for instructors to gauge the effectiveness of their instruction on learning outcomes for specific topics and learning objectives. Therefore, we performed quantitative analysis on the questions individually, looking first at the proportion of students answering a question correctly in both the pre‐ and posttest. Following such analysis, a McNemar test was used to evaluate the statistical significance of the differences in student performance on each individual question. It should be noted that the statistical analysis could only be completed for five of the six sections, as one section had too few students to lead to statistically valid results.
Precourse evaluation of individual questions indicates that students lack knowledge in specific content areas at the beginning of the course. Less than 40% of students (looking at the combined data of all six section) scored well on questions (17 in total) regarding the following content areas:
Forces that stabilize protein structure.
The relationship between standard and actual free energy changes.
The effect that mutation will have on protein structure and function.
The mechanisms utilized to regulate enzyme function and maintain homeostasis.
Conversely, greater than 65% of students come into the course able to predict functional group polarity (two questions), the effect ion gradients can have in inducing conformational changes in proteins (one question), and the efficiency of aerobic versus anaerobic metabolism (one question). Such data can be used to inform future instruction by highlighting the various strengths and weaknesses students possess at the beginning of the course.
Statistical evaluation of pre‐ and posttest scores on individual questions illustrates that there was a statistically significant change in pre‐/postcourse performance on 19 of 40 questions in all five sections as well as the combined data from all three semesters. These data suggest that the instructional interventions employed over the course of the semester can be deemed successful. However, deeper analysis of these data from three semesters suggests room for improvement remains since less than 60% of all students answered correctly on six of these 19 questions upon completion of the course (Figure 2). Interestingly, all six of these questions had a low percentage of students answering them correctly prior to any instruction.
FIGURE 2.

Pre‐ and posttest scores on individual questions with statistically significant changes (p < 0.0001) but low posttest score performance. Data were taken for 858 students spanning six sections of instruction over the course of three semesters. The percent of students responding correctly to the question on both pre‐ (green) posttest (blue) is given. Of the 19 questions that illustrated statistically significant improvement in all sections the six questions shown here had less than 60% of the students answering them correct on the posttest.
Of the remaining 21 questions, we saw no statistically significant improvement from pre‐ to posttest in at least one of the five sections analyzed. Of these 21, 10 questions showed no improvement in three or more sections. Of these 10, three show no statistically significant improvement in at least four of the five sections as well as the combined data from all three sections (Figure 3). These data, in combination with those in Figure 2, indicate a need for developing instructional intervention to improve student learning outcomes for these specific content areas within our course.
FIGURE 3.

Pre‐ and posttest scores on individual questions with statistically insignificant changes (p > 0.05). Data were taken for 858 students spanning six sections of instruction over the course of three semesters. The percent of students responding correctly to the question on both pre‐ (green) posttest (blue) is given. The data shown here represent the only questions that did not show statistically significant improvement in all sections analyzed. These data point to content in need of instructional interventions for students to show improvement from pre‐ to posttest performance.
5. Discussion
In this study we sought to develop a fast and effective method for producing an assessment instrument that molecular life science instructors could employ within their individual courses. A multitude of previous studies have led to the development of CIs in various STEM disciplines [5, 6, 7, 8, 9, 10, 11] including the molecular life sciences [13, 14, 15, 16, 17, 18, 19]. Use of these CIs is limited by consisting of questions based on learning objectives not found in all molecular life science courses. Here we show that within three semesters instructors can develop an assessment instrument tailored to their specific learning objectives and consisting of questions that greater than 85% of their students will understand. Further, we demonstrate that the instrument can be employed as a pre‐ and posttest and, through some simple statistical methods, be used to analyze students incoming knowledge, as well as student learning gains over the course of a semester.
Our findings produced using our instrument illustrate that students in our course are indeed leaving with a greater overall knowledge of biochemistry content than they started the semester with. In each semester, students demonstrated significant improvement in mean scores of the posttest compared to the pretest. This is expected, but also good to confirm, especially as we seek to approach our instruction as scientists. We also illustrate that our instrument can be used in a more nuanced way, giving insight into precourse knowledge. Further, we demonstrate that the data can be analyzed at both the individual instructor and section level, as well as multiple sections and semesters.
Students taking our course are required to have previously taken a single‐semester of general chemistry, a single‐semester of organic chemistry, and a single semester of introductory cell biology. Not surprisingly, results from our pretest indicate that our students lack specific knowledge regarding topics specific to a biochemistry course and covered in little to no detail in the required prerequisite courses including protein structure, function, and stability as well as enzymatic and metabolic regulation. Results from our pretests also illustrate that students' lack a grasp of the thermodynamic concept of free energy. While the required prerequisite courses do cover free energy, along with other thermodynamic concepts, our results indicate the importance of extensively reviewing these concepts in a biochemical context to ensure student success when applying such concepts to biochemical topics, consistent with previous findings [27].
Additionally, we used our instrument to point to specific knowledge gaps students leave the course with and areas in which little to no learning gains occur. The proportion of students answering correctly showed significant improvement across all sections analyzed in their understanding of concepts related to the forces that stabilize protein structures, how proteins function, and how enzymes are regulated. Nevertheless, less than 6 of 10 students can answer questions on these concepts upon exiting the course. Furthermore, our data across all sections analyzed indicate that students show little to no improvement in interpreting an image depicting a protein structure and the relationship of its structure to its function. Such results indicate that students struggle with conceptualizing protein structure using the traditional means of presenting such structures both in the literature and in textbooks, a finding consistent with previous reports [28, 29, 30]. Finally, across all sections analyzed, students show little to no improvement in their grasp of the concept of free energy, specifically as it relates to chemical equilibria (Question 14). This result indicates the inherent challenges students face conceptualizing thermodynamics, equilibrium, and free energy [27].
Our instrument and analysis also provide information to individual instructors. Each instructor in our group has employed information from the inventory regarding the prior knowledge base of students to tailor specific course content and instructional interventions to ensure improvement. Moreover, analysis of student learning gains in individual sections identifies potential gaps within instruction for each instructor based on scores on individual questions. Indeed, our data indicate that students performed poorly on groups of questions only within specific sections, whereas they performed well in other sections on the same grouping of questions.
6. Limitations of Our Instrument
We awarded participation points to students for completing the pre‐ and postassessment with total points for participation amounting to approximately 4% of the student's total course grade. Such a practice does not come without its limitation as student motivation plays a role in how well or how poorly they may perform as pointed out by Shi et al. [17]. A low‐stakes assessment is less likely to be taken seriously by students as compared to a high‐stakes assessment. Whether or not this has impacted the results reported here is unknown. We decided to implement this instrument as a participation grade to allow incorporation of the non‐traditional “I don't know” option along with the binary true/false option. This option reduced artificially high scores mitigating the effect of student guessing on a true/false assessment. For students to be free to take this third option the penalty for its selection had to be removed through administration of the pre‐ and posttest for a participation grade only. Interesting avenues of research that warrant further investigation relate to the number of students selecting this third option on both pre‐ and posttests and how instructional interventions affect students' confidence in their answers.
While our instrument development method seeks to reduce student misinterpretation of each question, this cannot be achieved with certainty. As such, any conclusions we make regarding the implications of our results must be confirmed through additional methods. For instance, we observed low pre‐ and posttest scores on Question 21 in all sections analyzed as well as the combined sections data. Furthermore, the pre‐and posttest student performance did not change significantly. This question asks students to interpret two different images of protein structure and relate these images to the protein's function. Currently, the authors interpret this result as pointing to the difficulty students have with conceptualizing protein structure and its relationship to protein function. However, further analysis must be performed to obtain a clearer understanding of this result.
7. Conclusions
In conclusion, we report here a method for the development of an instrument that can be used as a pre‐ and posttest assessment for any set of learning objectives developed by an instructor or educational unit. Our method ensures that questions are reliably communicated to students by removing jargon and confusing language that leads to misinterpretation of questions. Such an instrument, in combination with the analysis reported here, can be used to inform instructional changes and adaptation, giving college instructors the tools they need to approach their teaching as a scientist.
Ethics Statement
Data collection on pre‐ and posttest student performances, as well as demographic data, were approved by the Institutional Review Board at Colorado State University as following appropriate ethical standards (exempt status: Protocol No. 2204).
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
We thank Anne Mook and her team at IRISS for their help and advice in statistical analysis and R‐studio. Thank you also to Heather Novak and Lee Tyson at Institutional Research (IR) at Colorado State University for obtaining student demographics from each section analyzed. Financial support for this research was provided by the College of Natural Sciences and the Department of Biochemistry at Colorado State University. Finally, a special thanks goes to our students who enrolled in our course sections and were willing to participate in this research.
Laybourn P. J., Kalet B., and Sholders A. J., “Development and Analysis of a Learning Outcomes Assessment Instrument for a Single‐Semester Nonmajors Biochemistry Course,” Biochemistry and Molecular Biology Education 53, no. 5 (2025): 546–554, 10.1002/bmb.21913.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
References
- 1. Prince M., “Does Active Learning Work? A Review of the Research,” Journal of Engineering Education 93 (2004): 223–231. [Google Scholar]
- 2. Freeman S., Eddy S. L., McDonough M., and Wenderoth M. P., “Active Learning Increases Student Performance in Science, Engineering, and Mathematics,” Proceedings of the National Academy of Sciences of the United States of America 111 (2014): 8410–8415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Theobald E. J., Hill M. J., and Freeman S., “Active Learning Narrows Achievement Gaps for Underrepresented Students in Undergraduate Science, Technology, Engineering, and Math,” Proceedings of the National Academy of Sciences of the United States of America 117 (2020): 6476–6483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Handelsman J., Ebert‐May D., Beichner R., et al., “Scientific Teaching,” Science 304 (2004): 521–522. [DOI] [PubMed] [Google Scholar]
- 5. Hestenes D., Wells M., and Swackhamer G., “Force Concept Inventory,” Physics Teacher 30 (1992): 141–158. [Google Scholar]
- 6. Zeilik M., “Birth of the Astronomy Diagnostic Test: Prototest Evolution,” Astronomy Education Review 1 (2002): 46–52. [Google Scholar]
- 7. Anderson D. L., Fisher K. M., and Norman G. J., “Development and Evaluation of the Conceptual Inventory of Natural Selection,” Journal of Research in Science Teaching 39 (2002): 952–978. [Google Scholar]
- 8. Steif P. S. and Dantzler J. A., “A Statics Concept Inventory: Development and Psychometric Analysis,” Journal of Engineering Education 94 (2005): 363–371. [Google Scholar]
- 9. Epstein J., “Development and Validation of the Calculus Concept Inventory,” in Proceedings of the Ninth International Conference on Mathematics Education in a Global Community (2007). [Google Scholar]
- 10. Smith M. K., Wood W. B., and Knight J. K., “The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics,” CBE Life Sciences Education 7 (2008): 422–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Libarkin J., Ward E., Andersln S., Kortemeyer G., and Raeburn S., “Revisiting the Geoscience Concept Inventory: A Call to the Community,” GSA Today 21 (2011): 26–28. [Google Scholar]
- 12. Hake R. R., “Interactive‐Engagement vs Traditional Methods: A Six‐Thousand‐Student Survey of Mechanics Test Data for Introductory Physics Courses,” American Journal of Physics 66 (1998): 64–74. [Google Scholar]
- 13. Bretz S. L. and Linenberger K. L., “Development of the Enzyme‐Substrate Interactions Concept Inventory,” Biochemistry and Molecular Biology Education 40, no. 4 (2012): 229–233. [DOI] [PubMed] [Google Scholar]
- 14. Wright T., Hamilton S., Rafter M., Howitt S., Anderson T., and Costa M., “Assessing Student Understanding in the Molecular Life Sciences Using a Concept Inventory,” FASEB 23 (2009): LB307. [Google Scholar]
- 15. Howitt S., Anderson T., Costa M., Hamilton S., and Wright T., “A Concept Inventory for the Life Sciences: How Will It Help Your Teaching Practice?,” Aust Biochem 29 (2008): 14–17. [Google Scholar]
- 16. Garvin‐Doxas K. and Klymkowsky M. W., “Understanding Randomness and Its Impact on Student Learning: Lessons Learned From Building the Biology Concept Inventory (BCI),” CBE Life Sciences Education 7 (2008): 227–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Shi J., Wood W. B., Martin J. M., Guild N. A., Vicens Q., and Knight J. K., “A Diagnostic Assessment for Introductory Molecular and Cell Biology,” CBE Life Sciences Education 9 (2010): 453–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Wright T. and Hamilton S., “Assessing Student Understanding in the Molecular Life Sciences Using a Concept Inventory,” in ATN Assessment Conference Proceedings, eds. Duff A., Green M., and Quinn D. (Australian Technology Network (ATN) Assessment Conference: Engaging Students in Assessment, 2008). [Google Scholar]
- 19. Marbach‐Ad G., Briken V., El‐Sayed N. M., et al., “Assessing Student Understanding of Host Pathogen Interactions Using a Concept Inventory,” Journal of Microbiology & Biology Education 10 (2009): 43–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Tibell L. and Rundgren C.‐J., “Educational Challenges of Molecular Life Science: Characteristics and Implications for Education and Research,” CBE Life Sciences Education 9, no. 1 (2010): 25–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Nelson M. A., Geist M. R., Miller R. L., Streveler R. A., and Olds B. M., “How to Create a Concept Inventory: The Thermal and Transport Concept Inventory,” in Annual Conference of the American Educational Research Association (2007). [Google Scholar]
- 22. Couch B. A., Hubbard J. K., and Brassil C. E., “Multiple–True–False Questions Reveal the Limits of the Multiple–Choice Format for Detecting Students With Incomplete Understandings,” Bioscience 68 (2018): 455–463. [Google Scholar]
- 23. ASBMB.org, ASBMB , “Foundational Concepts,” accessed April 15, 2024, https://www.asbmb.org/education/core‐concept‐teaching‐strategies/foundational‐concepts.
- 24. Loertscher J., Green D., Lewis J. E., Lin S., and Minderhout V., “Identification of Threshold Concepts for Biochemistry,” CBE Life Sciences Education 13 (2014): 517–528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Meyer J. H. F. and Land R., “Threshold Concepts and Troublesome Knowledge,” in Improving Student Learning—Ten Years on, ed. Rust C. (Oxford Centre for Staff and Learning Development, 2003), 412–424. [Google Scholar]
- 26. Riina M. D., Stambaugh C., Stambaugh N., and Huber K. E., “Chapter 28—Continuous Variable Analyses: T‐Test, Mann–Whitney, Wilcoxin Rank,” in Handbook for Designing and Conducting Clinical and Translational Research, eds. Eltorai A. E. M., Bakal J. A., Kim D. W., and Wazer D. E. (Academic Press, 2023), 153–163. [Google Scholar]
- 27. Villafañe S. M., Bailey C. P., Loertscher J., Minderhout V., and Lewis J. E., “Development and Analysis of an Instrument to Assess Student Understanding of Foundational Concepts Before Biochemistry Coursework,” Biochemistry and Molecular Biology Education 29 (2011): 102–109. [DOI] [PubMed] [Google Scholar]
- 28. Villafane S., Lertscher J., Minderhout V., and Lewis J., “Uncovering Students' Incorrect Ideas About Foundational Concepts for Biochemistry,” Chemistry Education Research and Practice 12 (2011): 210–218. [Google Scholar]
- 29. Harle M. and Towns M. H., “Students' Understanding of External Representations of the Potassium Ion Channel Protein. Part I: Affordances and Limitations of Ribbon Diagrams, Vines, and Hydrophobic/Polar Representations,” Biochemistry and Molecular Biology Education 40 (2012): 349–356. [DOI] [PubMed] [Google Scholar]
- 30. Robic S., “Mathematics, Thermodynamics, and Modeling to Address Ten Common Misconceptions About Protein Structure, Folding, and Stability,” CBE Life Sciences Education 9 (2010): 189–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
