Abstract
Teaching evaluation at many institutions is insufficient to support, recognize, and reward effective teaching. We developed a long-term intervention to support science, technology, engineering, and mathematics (STEM) department heads in advancing teaching evaluation practices. We describe the intervention and systematically investigate its impact on departmental practices within a research-intensive university. The outcomes varied considerably by department, with four departments achieving extensive teaching evaluation reform and seven departments achieving more limited reform. We used qualitative content analysis of interviews and meetings to investigate department head readiness for change and how it related to the reforms they achieved. All department heads perceived inadequacies in their current evaluation practices, but this dissatisfaction did not reliably predict the changes they pursued. Heads only pursued changes that they perceived to have clear benefits. All heads worried that faculty might resist new practices, but heads who were most successful in facilitating change saw ways to work around resistance. Heads who led the most change questioned their own expertise for reforming teaching evaluation and delegated the work of developing new evaluation practices to knowledgeable colleagues. We discuss emergent hypotheses about factors that support heads in challenging the status quo with more robust and equitable evaluation practices.
INTRODUCTION
Reward systems at many higher education institutions inadequately support, recognize, and incentivize effective teaching and teaching improvement (e.g., Brickman et al., 2016; Dennin et al., 2017). Research-intensive institutions often make reward decisions (e.g., promotion, tenure, raises, titled positions) based primarily on an individual's research contributions, missing an opportunity to incentivize faculty investment in improving teaching (Bradforth et al., 2015). Even if an institution or department wants to give meaningful weight to teaching in reward decisions, the evidence they collect to evaluate teaching is often limited and unreliable (Bradforth et al., 2015; Dennin et al., 2017).
Teaching evaluation in most institutions of higher education fails to support teaching improvement or provide robust evidence of teaching quality. Student evaluations are the most common form of teaching evaluation (e.g., Brickman et al., 2016), and student ratings show bias based on instructor's social identities (e.g., race, gender, native language) and may not correlate with student learning outcomes (e.g., Bedard and Kuhn, 2008; Boring, 2017; Fan et al., 2019; Esarey and Valdes, 2020; Aragón et al., 2023). Additionally, instructors feel dissatisfied with student evaluations because they only measure satisfaction, often suffer from low response rates, and fall short of providing constructive and insightful feedback that instructors can use to improve their teaching (e.g., Bouwma-Gearhart and Hora, 2016; Brickman et al., 2016). Peer evaluation is another form of teaching evaluation available to some faculty, but these processes typically lack structure or trained observers and, as a result, tend to provide feedback on superficial aspects of teaching rather than the substantive feedback instructors desire (e.g., Bouwma-Gearhart and Hora, 2016; Brickman et al., 2016). Not surprisingly then, instructors often do not perceive information from teaching evaluation as meaningful and may not rely on these data to inform their teaching (e.g., Blaich and Wise, 2010; Bouwma-Gearhart and Hora, 2016). If institutions aim to support, recognize, and incentivize high-quality teaching, teaching evaluation practices must be reconsidered.
Multiple projects have responded to the need to advance teaching evaluation practices, often in the context of science, technology, engineering, and mathematics (STEM) departments. At a national scale, the Association of American Universities and Cottrell Scholar Collaborative have gathered institutional leaders and change agents to propose ways that teaching could be better recognized and rewarded in research-intensive universities (e.g., Bradforth et al., 2015; Dennin et al., 2017). At the institutional level, the TEval project has developed new approaches to teaching evaluation and supported their implementation at four institutions (e.g., Finkelstein et al., 2020; Weaver et al., 2020). Key outcomes of TEval include a common framework and rubric that define dimensions of effective teaching and articulate criteria for each dimension in a rubric, as well as tools for collecting evidence of teaching effectiveness from multiple perspectives. Researchers at Boise State University also created a framework to define effective teaching and evaluate teaching formatively and summatively (Simonson et al., 2022). Outside of the peer-reviewed literature, numerous departments and institutions have developed resources for teaching evaluation (see a list in Supplemental Materials Appendix B from Krishnan et al., 2022). These efforts have contributed to national conversations about the importance of shifting teaching evaluation practices, produced resources, and demonstrated that change to teaching evaluation practices is feasible in research-intensive institutions and STEM departments.
An important next step for the work of transforming teaching evaluation, is developing, testing, and reporting the effectiveness of different models of facilitating change. Currently, evidence of the impact of such efforts remains scarce. Here we begin to address this gap by presenting the design, implementation, and evaluation of a novel Leadership Action Team (LAT) intervention at one research-intensive university. The LAT is a facilitated working group of STEM department heads developing and piloting new teaching evaluation practices in their departments over 2–3 years.
In this paper, we start by describing the changes to teaching evaluation practices that we aimed to achieve with the LAT intervention, its theoretical underpinnings, and how the LAT worked. Then we present evidence about changes to teaching evaluation practices in participating departments and how the change achieved related to readiness for change among department heads.
Desired Change: Robust and Equitable Teaching Evaluation
Instead of relying solely on student end-of-course evaluations, a more robust and equitable approach to teaching evaluation involves collecting evidence from multiple sources (Andrews et al., 2020; Weaver et al., 2020; Krishnan et al., 2022). Considerable work has pointed toward more holistic ways of evaluating teaching (e.g., Glassick et al., 1997; Smith, 2008; Lyde et al., 2016; Dennin et al., 2017; Weaver et al., 2020). The three-voice framework for teaching evaluation stems from this work, and uses evidence from students, trained peers, and the instructor to create a holistic picture of teaching effectiveness and improvement (Reinholz et al., 2018). The student voice can include end-of-course evaluations, as well as midsemester evaluations, assessments of student outcomes, and student interviews (Weaver et al., 2020). Data about student learning, such as from comparisons of preassessments and postassessments, can be an especially powerful source of evidence. Notably, student evaluations do not necessarily correlate positively with student learning and therefore cannot be assumed to serve as evidence of learning (e.g., Bedard and Kuhn, 2008; Boring, 2017; Fan et al., 2019; Esarey and Valdes, 2020; Aragón et al., 2023). Nonetheless, students, as the intended beneficiaries of teaching, can offer important perceptions of their experiences in the course (Krishnan et al., 2022). The peer voice collects evidence about an instructor's teaching from other instructors, usually through teaching observations or review of teaching materials (Weaver et al., 2020). Peers have relevant expertise in the discipline and teaching experience, situating them to provide more constructive feedback than students (Thomas et al., 2014). The self-voice involves an instructor reflecting on their teaching and teaching improvement (Weaver et al., 2020). The instructor knows best their goals for the course and changes they have made, and can synthesize and contextualize evidence from the student and peer voices to characterize teaching accomplishments, strengths, and opportunities for improvement (Krishnan et al., 2022). Using three voices provides more holistic evidence because it provides evidence from important and different perspectives and mitigates the potential bias present in any single perspective.
Research-based teaching evaluation practices for each voice are structured, reliable, and longitudinal (Krishnan et al., 2022). Structured evaluation involves formalizing processes and expectations, such as using standard forms and consistently enacting processes across faculty (Table 1). Adding structure standardizes the experiences of faculty and makes processes and expectations transparent, both of which can result in more equitable experiences for faculty. Reliable evaluation draws from multiple sources of evidence and appropriately analyzes and interprets this evidence, making the findings more trustworthy (Table 1). Considering reliability is critical to mitigating potential biases and engendering trust in the conclusions drawn from evidence of teaching effectiveness. Longitudinal evaluation is able to document change over time (Table 1). These practices make it possible to value improvement, not just excellence in teaching.
TABLE 1.
Research-based target practices for peer voice, which align with three characteristics of robust and equitable evaluation (Krishnan et al., 2022). See the target practices for student and self-voice in Krishnan et al. (2022) or the Supplemental Materials Appendix B
| Structured | Department uses a formal observation form to guide what is observed and which other data are being collected (e.g., class materials, assessments, preobservation meeting). |
| Department has a formal template for writing a report based on peer review, potentially distinguishing between formative and summative review. | |
| Department uses formal processes or criteria to select peer observer(s) for all instructors. | |
| Department enacts policy about the number of peer observations and observers during a review period and/or across review periods. | |
| Department designates a coordinator, leader, or committee to carry out and refine peer observation practices. | |
| Department has a process for allocating and recognizing workload related to coordinating and conducting observations. | |
| Department periodically discusses and improves peer evaluation practices to maximize utility to instructors and the department. | |
| Department provides or arranges formal training about the departmental peer review process for peer observers. | |
| Reliable | Department relies on multiple observations for all instructors, such as using multiple observers, observing multiple lessons, and/or observing multiple courses. |
| Department specifies which class materials (e.g., syllabi, exams, homework, slides, handouts) are collected and evaluated as part of peer observation. | |
| Department expects observers to talk with instructors to properly contextualize observations and review of materials. This might include discussing course goals, lesson goals, class structure, and students. | |
| Longitudinal | Department conducts peer observation over multiple time points in a review period for all instructors to document teaching improvements. |
| Department ensures that the peer observation process provides feedback to instructors via follow-up discussion that covers strengths and areas for improvement. |
Guiding Theory Related to Change
Second-order change is needed to achieve the long-term goal of STEM departments enacting robust and equitable teaching evaluation. Second-order change occurs when organizations question and then alter their operating systems, underlying values, and culture (Schön and Argyris, 1996; Kezar, 2018). In contrast, first-order change modifies existing practices without shifting the status quo, and is therefore easier to achieve but less likely to have lasting effects. For example, altering the questions asked in student end-of-course evaluations is likely to be a first-order change to teaching evaluation practices. This change could produce less biased ratings from students (e.g., Peterson et al., 2019), but without changing the approach to evaluation as a whole, this change may not impact how these less-biased ratings can inform instructor evaluations. In contrast, adopting a three-voice approach with research-based practices for each voice would be a second-order change in many STEM departments because it would change the operating systems.
To move beyond implicit assumptions about how change occurs, we used theory to ground the design of the LAT intervention and research (Connolly and Seymour, 2015; Reinholz et al., 2021). Achieving second-order change in STEM higher education requires flexibly drawing on multiple change perspectives (Corbo et al., 2016; Kezar, 2018). Two theoretical perspectives on change are best suited for achieving second-order change: social cognition and cultural perspectives. These change perspectives served as anchors in our design and implementation.
Social Cognition Perspective Guided LAT Design
A social cognition perspective defines change as the development of new ways of thinking among individuals (i.e., learning; Kezar, 2018). Therefore, facilitating change requires surfacing underlying ideas, providing feedback and new information, challenging prior beliefs, and pointing out leaps in logic (Kezar, 2018). Developing new ways of thinking may involve the use of new language and concepts, such as the contrast between formative and summative evaluation (Eckel and Kezar, 2003). It may also involve attaching new meaning to familiar concepts, such as viewing teaching evaluation as involving three distinct voices, rather than equating teaching evaluation with student evaluations (Eckel and Kezar, 2003). Guided by this perspective, the LAT was designed to engage department heads in repeated opportunities for reflection on current thinking and learning about robust and equitable teaching evaluation.
Cultural Perspective Guided LAT Design
A cultural perspective defines change as a gradual shifting of values, beliefs, and underlying assumptions within an organization (e.g., department). Underlying assumptions and values in an organization tend to be implicit, rarely challenged, and hard for members to articulate (Schein, 1996; Kezar, 2018), yet they influence an organization's functioning both implicitly and explicitly. Cultural change tends to be slow, involves many members of an organization, and requires long-term intervention (Kezar, 2018). Change agents create opportunities to make underlying assumptions visible to members of the organization, so they can be reconsidered. Change agents can also promote scholarly engagement, shared decision-making, and rationality around the change initiative (Bergquist and Pawlak, 2007; Kezar, 2018). Guided by this perspective, the LAT was designed to engage department heads in reflecting on underlying values, beliefs, and assumptions that they held and that undergirded departmental teaching evaluation practices.
Readiness for Change Guided Research
We drew on a specific change theory, the readiness for change framework, to ground our research into what distinguished department heads who achieved more change. Readiness for change is the extent to which an individual has positive beliefs and attitudes about 1) the need for a given change in their organization and 2) the ability of the organization to achieve the change (Armenakis et al., 1993). The readiness for change framework was originally developed by organizational management researchers to explain why so many organizational change efforts failed (e.g., Armenakis et al., 1993). The framework has also been used as a tool to help leaders more successfully engage employees in organizational change (e.g., Armenakis and Harris, 2002; Holt et al., 2007). An individual's or group's readiness reflects their thoughts and is considered a precursor to engaging productively in a change process (Armenakis et al., 1993). Therefore, leaders hoping to lessen resistance to change can work to increase readiness for change (Holt et al., 2007). This framework relates to both the social cognition and cultural perspectives on change in that it centers individuals’ thoughts and learning about change (i.e., social cognition) and is influenced by underlying values, beliefs, and assumptions about the organization (i.e., cultural).
Five components contribute to an individual's readiness for change (Rafferty et al., 2013). In this study, we examined readiness for change among department heads, and we describe each component in that context. Discrepancy is the head's belief that changing teaching evaluation in the department is necessary (Armenakis and Harris, 2002). Discrepancy can result from comparing the current status of departmental practices with some goal or value. Appropriateness is the head's sense that the solution under consideration (e.g., three-voice framework for teaching evaluation) will address the identified need for change. Next, efficacy is a department head's assessment of their individual and the department's capacity to effectively advance teaching evaluation practices. Valence is the head's perception of the costs and benefits of reformed teaching evaluation practices, and could focus on themselves, the department, or faculty (Holt et al., 2007). Finally, principal support is the head's belief that the university will provide necessary support for advancing teaching evaluation (Rafferty et al., 2013). We used the readiness for change framework to guide our investigation of how heads who achieved more change in their department differed from those who achieved less change.
Department Heads as the Focus
We designed the LAT to engage department heads for several reasons. We chose to target change in departments because these are the organizational units that most impact hiring, mentoring, evaluating, and rewarding faculty in research-intensive institutions. Departments are also often the organizational unit that enacts practices for peer and self-voice and that rely on data from student evaluations to make decisions. Furthermore, departments have unique histories, cultures, and practices, even within the same institution, and may opt to pursue different changes and take different paths toward a change (Bouwma-Gearhart et al., 2016).
We choose to prioritize department heads because of their roles and power. At our institution, department heads serve an indefinite number of 3-year appointments, and are appointed by the dean. However, they often serve no more than two or three terms. Department heads are central to teaching evaluation and reward decisions in our institution. They conduct annual evaluations of faculty and lead evaluations for promotion and tenure. Heads also have the power to set agendas within the department, populate departmental committees, and orient new faculty toward departmental priorities. As a result, they can prioritize particular changes to departmental practices and steer the department away from other changes. Therefore, changing thinking among department heads (i.e., social cognition perspective) has the potential to impact faculty thinking and departmental direction. Cultural change requires more than changing individual thinking, and department heads can create conditions favorable to cultural change. Department heads have the positioning and power to elevate and concentrate on particular values and priorities, to advocate for and direct resources toward changing existing structures, and to navigate the complexity of individual differences within the department, each of which are important components of culture and culture change (Reinholz and Apkarian, 2018).
LAT Design
We convened STEM department heads in a LAT with the goal of providing support to learn about and enact more robust and equitable teaching evaluation in their departments. We assumed such support would be needed because many individuals who become department heads have not served in similar positions previously and feel unprepared to step into the role (Wolverton et al., 2005). In addition, department heads often report concerns about a lack of adequate resources within the department (Cipriano and Riccardi, 2017), and difficulty balancing their various responsibilities (Gmelch et al., 2017).
In alignment with our guiding theory, we designed the LAT to create space for learning and reflection about robust and equitable teaching evaluation, to prompt critical examination of assumptions that underlie existing departmental practices, and to scaffold and provide differentiated support for learning and action to develop new departmental practices (Table 2). We engaged department heads in considering the three-voice framework and curated teaching evaluation tools, and discussing with each other the changes they wanted to pursue and the challenges they faced. We used directive facilitation to keep conversation on track and to challenge ideas that acted as barriers. We also used meeting time to accomplish work, knowing that department heads had very limited time outside of meetings to make progress.
TABLE 2.
Typical LAT meeting structure, including activities and their intended purposes
| Length (min.) | Activity | Purposes |
|---|---|---|
| 5–10 | Information sharing from lead facilitator |
• Frame the work of the LAT • Frame the focus of the meeting • Introduce new information and resources • Provide instructions for small group discussion |
| 30–45 | Small group discussion with 2–4 department heads and facilitators. Individual thinking time about prompting questions or example materials, followed by discussion. Facilitators kept discussions on topic, asked probing questions, and moderated participation to quiet dominators and invite listeners to share. |
• Prompt reflection on current thinking and current department practices • Provide a chance to critically consider teaching evaluation materials and practices and culture in other departments • Create conditions to recognize underlying assumptions • Troubleshoot barriers and challenges with peers |
| 5–15 | Share out from small group discussion in larger group. Share out led by facilitator and/or department head from each group. Brief period of questions and answers. |
• Provide a chance to hear ideas from other department heads • Amplify ideas from small groups to highlight progress, point toward next steps, or problematize what is needed to make progress |
| 5 | Next steps offered by the lead facilitator and/or participating department heads asked to commit to one small next step to accomplish before the next meeting. |
• Provide accountability for making progress • Provide concrete ideas for next steps • Frame the upcoming work of the LAT |
The LAT met five times per year for 3 years for 1-h facilitated sessions. A lead facilitator (P.P.L.) started each meeting with a brief presentation (<10 min) in order to frame the work of the meeting and then engaged department heads in small-group and whole-group discussion (Table 2). Additional team members (E.L.D., P.B., and two other faculty) facilitated small-group discussion and contributed to the whole-group discussion. In addition to LAT meetings, we provided individualized support based on the interests and desires of each head. This included one-on-one meetings, small working sessions, and joining faculty meetings to present and/or facilitate discussions.
Department heads willingly participated in the LAT. We recruited by meeting with heads to explain the project goals and inquiring about their interest. Every head that we asked agreed to participate and attended most meetings. We did not provide personal incentive, nor did heads receive institutional resources or goodwill as a result of their participation. The external credibility of National Science Foundation (NSF) funding and institution-level celebration of the grant may have encouraged participation, but upper administrators, to our knowledge, did not directly encourage departmental participation. One head declined to participate in one element of the research, but otherwise heads consented to all LAT activities and research.
Relevant Context
Institutions function as important contexts for departmental reform because institutions set expectations and priorities that departments must respond to in order to secure resources. The LAT was designed, enacted, and investigated within the context of a large, public land-grant and research-intensive university. Though research is a key priority at UGA, the institution also prides itself on providing quality undergraduate education. The project both benefited from and contributed to interest in teaching evaluation reform among upper administrators. Prior to our intervention, the President charged a Task Force on Student Learning and Success that led to 12 recommendations to further enhance the education experiences for undergraduates inside and outside the classroom. One of the 12 recommendations made by the group was to “Strengthen systems to document and promote effective teaching” (Task Force on Student Learning and Success, 2017). Based on this recommendation, the Office of Instruction convened a committee to consider changes to teaching evaluation. A member of our team served on the task force and then on the more focused committee. When the committee work stalled due to a few members’ concerns about faculty workload, our team stepped in to contribute to a new teaching evaluation policy and help shepherd it through faculty governance structures. The policy formally went into effect just before the end of the 3-year intervention reported here. This series of events demonstrates interest in teaching evaluation reform among some upper administrators. Andrews et al. (2021) describes in more detail the university-level work that our team undertook alongside this department-level intervention.
The LAT also occurred within a larger, NSF-funded project. Four life sciences faculty (P.P.L., P.B., E.L.D., T.C.A.) led the project, each of whom is a celebrated teacher and discipline-based education researcher. We used a shared leadership approach, in which each leader spearheaded one part of the project, in alignment with their strengths and spheres of influence, and the team worked together closely to plan and enact all project components. P.P.L. led the LAT and the larger project, E.L.D. and P.B. cofacilitated LAT meetings and led other aspects of the larger project, and T.C.A. led project research. We secured support for the project from department heads and upper administrators as part of our proposal to NSF. When we received the award, we used formal communication channels within the institution to promote the project. As the project proceeded, we annually reported on successes to a wide range of upper administrators. The positive perception of the project among administrators may have contributed to some department heads’ willingness to stay involved, though not, we suspect, to the change they actually achieved. The larger project also included efforts to support the uptake of evidence-based teaching that engaged over 65 faculty. This raised awareness of the project within and beyond STEM departments. Additionally, some heads may have perceived that their involvement in the LAT supported the involvement of their faculty, which may have encouraged their continued engagement.
MATERIALS AND METHODS
This section and the subsequent results address two questions by examining departmental practices and thinking among department heads:
-
RQ1:
To what extent did participating departments change their teaching evaluation practices to align with research-based practices?
-
RQ2:
How did department head readiness for change relate to change achieved in departmental practices?
Participants
We studied 11 STEM departments, including a range of disciplines (e.g., life sciences, physical sciences, engineering) and the heads that served during the 3-year duration of the study (n = 16). This work was determined to be exempt by the University of Georgia Institutional Review Board (PROJECT00009085).
Collecting Data about Teaching Evaluation Practices (RQ1)
We gathered data about departmental teaching evaluation practices using interviews with department heads and other faculty. We conducted initial interviews at the start of the project and a second round of interviews 3 years later. Each interview used a semistructured protocol that asked department heads to describe how their department evaluated teaching (see Supplemental Materials, Appendix A). The initial interviews asked more general questions because department heads were not yet familiar with the details of robust and equitable teaching evaluation. Questions included “Can you please talk me through how teaching effectiveness is evaluated for promotion and tenure?” and “How is the evaluation of teaching effectiveness different for annual review than for promotion & tenure?” The interviewer specifically prompted participants to address whether and how each voice was used. After several years in the LAT, department heads were much more familiar with the range of research-based teaching evaluation practices, and we were able to ask more specific questions about the nuances of their departmental practices. The second interview included both general questions, such as “Can you walk me through how peer voice works in your department?”, and more specific questions, such as “How do observers provide feedback to instructors who are observed?”
In addition to interviews with department heads, we relied on two additional data sources. First, recordings of LAT meetings provided an additional source of information about departmental practices because several meetings invited each department head to share their progress on changing practices. Occasionally, what a department head said in an LAT meeting and what they reported in an interview differed. In those cases, we sought an additional perspective on the current departmental practices. Additionally, two department heads were not available for interviews 3 years into the project, so we relied on interviews of other faculty likely to be knowledgeable about teaching evaluation practices, such as associate department heads or someone involved in departmental teaching evaluation reforms. These additional sources ensured we had comprehensive and accurate information about practices for each department.
Analyzing Teaching Evaluation Practices (RQ1)
We post-hoc assigned a score to each department's use of robust and equitable teaching evaluation practices using the Guides to Advance Teaching Evaluation (GATEs) (Krishnan et al., 2022). The GATEs includes lists of research-based target practices for each of the three voices (see examples in Table 1). The target practices in the GATEs are comprehensive and aspirational, so we did not expect most departments to achieve all target practices for each voice. We made minor modifications to the GATEs lists so we could reliably judge the presence and absence of each target practice for each voice (see Supplemental Materials, Appendix B). For example, we divided one target practice into two because these practices are distinct and needed to be evaluated separately. The divided practices were “Department recognizes known biases, such as bias against women, marginalized groups, and large class size,” and “Department limits comparisons of mandatory student evaluations between instructors.”
We characterized teaching evaluation practices in two steps, analyzing each voice separately. First, we characterized whether a department used a voice to evaluate teaching or did not use that voice. A department could use a voice (e.g., peer observation) as part of teaching evaluation and not meet any of the target practices. Therefore, the second step involved determining which research-based target practices a department had in place, for those departments using a given voice. Two researchers made these characterizations independently, and we discussed any disagreements to consensus. We achieved high interrater reliability using a weighted Cohen's kappa for each step of analysis (weighted Cohen's kappa = 0.82 and 0.75, respectively).
Collecting Data about Readiness for Change (RQ2)
We used two sources of data to identify and characterize readiness for change among department heads: recorded LAT meetings and semistructured interviews. Each data source had affordances. First, we audio-recorded LAT meetings and created verbatim transcripts. These data captured authentic and impromptu interactions among department heads and facilitators. In total, we analyzed transcripts spanning the first seven LAT meetings, which occurred in the first 1.5 years of the intervention. We selected the earlier meetings to analyze for evidence of readiness for change because the latter six meetings focused more narrowly on discussing specific resources for teaching evaluation, and provided scant information about readiness for change.
Second, we interviewed LAT members at the end of the 3-year span reported in this paper. Interviews allowed us to collect targeted, comprehensive information from each participating LAT member. Some heads talked in LAT meetings more than others, resulting in a range of the quantity of data from LAT meetings alone. In addition to asking about department teaching evaluation practices (see section called Collecting data about teaching evaluation practices), these interviews aimed to elicit department heads’ readiness for advancing teaching evaluation. We designed the protocol based on the readiness for change framework, asking questions targeted at each component (see Supplemental Materials, Appendix A). For example, a question focused on the dimension of valence asked, “How would changes to teaching evaluation benefit your department, or faculty in your department?” A question about efficacy asked, “How comfortable do you feel leading changes to teaching evaluation in your unit?” Interviews were transcribed verbatim for analysis.
Qualitative Analyses of Readiness for Change (RQ2)
We conducted qualitative content analysis to address our second research question. We began by conducting provisional coding of LAT meeting transcripts, which is a deductive process using codes based on an established framework (i.e., readiness for change) (Saldaña, 2013). We developed and refined codes to capture the variation in our data for each readiness for change component. During this process, we determined that the data we had elicited about appropriateness did not capture the full range of ideas that heads likely held. Within meetings and interviews, heads sometimes commented positively about the fit between a voice or practice and their department, but only rarely commented about a lack of fit with their department. We were concerned that they had refrained from sharing more negative views of fit, meaning that our data could not accurately represent appropriateness. Therefore, we excluded data about appropriateness from further analyses. After we refined the codebook for the remaining components of readiness for change, we recoded all previously coded data, first coding independently, then discussing all disagreements to consensus. H.C.E., T.C.A., and an undergraduate researcher completed all coding.
We next analyzed transcripts from the interviews with LAT members. We started with the codebook created for the LAT meeting recordings and refined it to better align with data in interviews. The interviews provided more comprehensive information, allowing us to add new codes and refine existing codes. We again worked iteratively and collaboratively, with two coders working independently and then discussing to consensus. To draw on both the meeting and interview data for further analysis, we recoded all meeting recordings using the final codebook created with the interview data (see Supplemental Materials, Appendix C).
Readiness for Change Profile Development and Analysis (RQ2)
We constructed profiles for each department head to organize and synthesize qualitative data and enable comparisons. This approach has a similar purpose as creating a case record in case study research, as it aims to record the data comprehensively and is a manner that is manageable (e.g., Patton, 1990; Yin, 2014). Each profile summarized the data related to the included components of readiness for change from LAT meeting recordings and one-on-one interviews. For each LAT member, one researcher read and briefly summarized every coded data segment across both types of data, and then summarized all data related to a single readiness component in a brief paragraph. This process resulted in a 4- to 10-page profile for each department head. We then considered each component of readiness for change separately. Two researchers read the entirety of evidence for each department head, and noted trends in the data. We specifically looked for similarities and differences between the department heads who achieved more and less change, and also for relationships between readiness dimensions and the changes that a department head achieved. As we noticed an emergent pattern, we revisited all relevant data to scrutinize whether the evidence supported or refuted the pattern. The process was iterative and highly collaborative. After agreeing on emergent patterns, the researchers returned to consider each profile as a whole, as an additional opportunity to identify patterns and test conclusions against the data. The outcome of these analyses are patterns in how readiness for change related to the teaching evaluation changes achieved, by component.
Trustworthiness
Trustworthiness in a qualitative analysis encompasses four components: credibility, transferability, dependability, and confirmability (Anfara et al., 2002; Shenton, 2004). In addressing both research questions, we used well-established research methods, and used several different methods of collecting data from our participants to obtain a more cohesive picture of their thoughts and ideas (triangulation) to help ensure the credibility of our work. To support the potential for transferability, we present details about the context of our study and the LAT and later discuss potential implications of our context. This allows an understanding of our unique situation to contextualize our results. We took several steps to improve the dependability and confirmability of our work. First, we provided a detailed description of our methods to enable potential replication. Second, we engaged in constant comparison to help mitigate the biases of any one researcher. We completed analyses in two phases, with researchers first working independently and then discussing to reach consensus. We also protected against inconsistency over time in our coding by repeatedly comparing quotes within a code to each other. Third, we intentionally masked the names and departments of each department head while coding, relying instead on randomized pseudonyms in each transcript, in order to lessen the impact of any impressions we had formed through our work with the LAT.
RESULTS
Our systematic investigation of the impact of the LAT on departmental teaching evaluation practices revealed considerable variation. Here we present these findings and the results of our investigation of how department head readiness for change related to the outcomes achieved in a department.
-
RQ1:
To what extent did participating departments change their teaching evaluation practices to align with research-based practices?
Four departments (A, B, C, and D) achieved extensive changes to their teaching evaluation practices (Figure 1). These departments made progress on multiple voices and enacted a range of research-based target practices (Table 1). They are hereafter referred to as “High Change” departments. The remaining seven departments (E, F, G, H, I, J, and K) achieved more limited change to teaching evaluation practices, often without the adoption of target practices. In the first few months of the LAT, we asked three of the participating departments (A, C, and F) to serve as “pilot” departments for adopting new teaching evaluation practices. We hoped that work in pilot departments could chart a path forward for other departments and that the heads of pilot departments would feel accountable for making changes. Two of these are categorized as High Change and one is not (Figure 1). The willingness to serve as a pilot department could be indicative of the readiness the department heads had for changing teaching evaluation practices in their units, but it was not a perfect predictor. Departments advanced the three voices differently. Departments that made progress with peer observation had used this voice prior to their involvement in the LAT (Figure 1). Each had a few target practices in place at the start and then tripled or quadrupled the number of target practices. In contrast, no departments used self-voice at the start of the project. Of those that added self-voice, half enacted target practices (Figure 1). Most commonly, departments began providing support for instructors engaging in self-reflection, such as sharing examples of self-reflections or providing a guide for engaging in the reflection process. All departments used student voice at the beginning of the project, because the institution required end-of-course student evaluations, but none had target practices in place. Seven departments advanced this voice, typically by adding a single target practice (Figure 1). Departments most commonly began expecting faculty to take steps to achieve higher response rates on their end-of-course evaluations. Next, we describe the changes that took place in one High Change and one Low Change department in more detail.
FIGURE 1.

Use of research-based teaching evaluation practices in 11 STEM departments (A–K) at the start of the LAT (squares) and 3 years later (circles), by voice. X-axis denotes number of target practices used. Dark gray band labeled “NA” denotes absence of a voice in departmental practices and light gray labeled “0” denotes the use of voice with no target practices (Krishnan et al., 2022). Blue symbols highlight departments that advanced their practices in multiple voices and with multiple research-based target practices, referred to as “High Change” departments. Yellow symbols highlight departments that achieved more limited change, referred to as “Low Change” departments.
High Change Example: Department B
At the start of their involvement, Department B used two voices to evaluate teaching, student and peer, with two research-based target practices. Three years later, the department used all three voices, each with multiple target practices (Figure 1). The department integrated all three voices via written self-reflection, which was new to the department, and also added new target practices to their use of peer voice.
The change process in this department started when the head joined the LAT and asked a faculty member to lead changes to teaching evaluation. This faculty member, who was seen as a teaching leader in the department, developed an initial plan for both peer evaluation and written self-reflection in collaboration with a few departmental colleagues. At the same time, the head raised the topic of teaching evaluation at multiple faculty meetings. At two meetings, the head invited members of the project team to present about robust and equitable teaching evaluation and to answer faculty questions. At a later meeting, the faculty leader presented the pilot plan and solicited feedback from faculty. After revisions based on feedback, the faculty voted in favor of adopting the proposed teaching evaluation practices.
Department B's new practices included a teaching self-reflection form that all faculty submit as part of their annual evaluation. The 2-page form leveraged student and peer voices, and included two or three sections: 1) Faculty reflection on what went well in their teaching and what was challenging for them in the prior year; 2) Data from faculty's student end-of-course evaluations from the last 3 years (including response rates and reflection on student comments); and 3) Information from a peer observation if one was performed during the year.
The department also developed and piloted a new system of peer observation. Their peer observation process relied on a standard form, which specified that peer evaluation should include a preobservation meeting (with some guiding instructions) and a postobservation meeting. The form included guidance for the meetings and details about the focus of observations. It contained a table for the observer to fill out, prompting them to look at specific aspects of the lesson, such as content, clarity, student engagement, etc. The faculty also agreed on designated timepoints in a faculty member's career when their teaching would be evaluated by peers.
Low Change Example: Department F
Department F changed one voice (self) and did not implement any target practices (Figure 1). The department head decided to add self-reflection to the annual evaluation process. In the first year, the head allowed faculty to decide whether to include a written self-reflection. In subsequent years, the head conveyed the expectation that all faculty submit a written self-reflection. In neither year did the head provide guidance about what questions faculty should answer or what process they should use to reflect, which falls short of using any research-based target practices (Krishnan et al., 2022). The department did not advance student nor peer voice. Their peer evaluation practices remained ad hoc, with no structures in place to guide the process. During the 3-year period of this study, a new head took over and they described a peer evaluation practice with even fewer target practices than the prior head. The department's use of student evaluations remained stagnant without target practices.
The data presented in this section demonstrate that the LAT had mixed results. About one-third of the participating departments made impressive progress in transforming their teaching evaluation practices to be more robust and equitable. And they did so during a global pandemic that affected every aspect of how academic departments function and the work of teaching. During the same time period, two-thirds of participating departments achieved limited progress. Because we designed the LAT with the department heads as the learners and leaders of change, we wanted to better understand how differences among heads might have impacted the change they achieved. We investigated this question using the readiness for change framework.
-
RQ2:
How did department heads’ readiness for change relate to changes in departmental practices?
Some components of readiness for change predicted the changes department heads achieved and others did not. Each head described inadequacies they saw in their current teaching evaluation practices (i.e., discrepancy), but this did not reliably lead them to pursue change. However, one particular discrepancy distinguished the two heads who led the most change. Considering valence, heads pursued changes to teaching evaluation practices only when they saw clear potential benefits. Each head expressed concerns about potential costs of changing teaching evaluation practices, but heads of High Change departments did not see these as insurmountable barriers. The heads of High Change departments most distinguished themselves when it came to efficacy. Their personal efficacy for changing teaching evaluation was low. They reported that they personally lacked relevant expertise, yet they believed the department could rely on knowledgeable colleagues and they delegated accordingly. Last, department heads were not convinced or dissuaded from pursuing changes in their departments based on their perceptions of principal support from upper administration. We do not report findings about appropriateness because the evidence we collected did not provide sufficient detail (see Qualitative analysis of readiness for change).
The remainder of this section elaborates each of these findings and provides supporting evidence in the form of quotes. Quotes have been edited lightly for grammar and we have used ellipses to indicate portions cut from quotes for the sake of brevity and clarity. Brackets indicate text added to convey the meaning implied by the quote or previous utterances that are not shown here. Quotes from meeting recordings often require these additions for clarity because the speaker may be referencing something said by others in the room.
Discrepancy: Seeing Problems with Teaching Evaluation Practices did not Reliably Translate into Action
Department heads all saw a need for changes to teaching evaluation in their departments (i.e., discrepancy), but this did not reliably translate into action. Department heads frequently expressed a desire to add a voice that their department did not currently use to evaluate teaching. For example, the head of Department E explained that they wanted to move beyond sole reliance on student evaluations:
“We had a faculty meeting a couple of weeks ago and discussed this… We're starting from the point that none of us are happy with student evaluations. It's more or less useful, but as responsible faculty, we don't want to rely upon that. We want to have feedback from our peers, but we're not used to that.”
Despite this stated need for change, Department E did not add peer nor self-voice to their teaching evaluation practices (Figure 1).
Heads also often talked about problems they saw with the specific way in which the department used a voice. For example, heads lamented low response rates on course evaluations, as described here by the head of Department B:
“I think the biggest challenge is always getting students to participate [in end-of-course evaluations] because what usually happens is you get 10–15% of your class who have an ax to grind one way or the other, and then you don't really get information from the bulk of the class.”
Though this head expressed dissatisfaction with response rates to student evaluations, they did not set an expectation that faculty take steps to achieve a higher response rate, which is a target practice for increasing the reliability of student voice. As another example, the head of Department D shared that their department needed to more regularly conduct peer evaluation, a dissatisfaction shared by other participants:
“I think everyone needs feedback [from peer evaluation]. And more periodically instead of at random times. Typically, if there's only two stages of promotion, some people have not been evaluated for many years.”
Yet, Department D did not work to ensure that peer evaluations occurred over multiple timepoints throughout a faculty member's career, a target practice. In addition to wanting peer evaluation to occur more regularly, department heads commonly reported that observers needed training and that the workload for peer observation had not been equitably distributed, but seeing these needs did not necessarily result in pursuing change.
One particular discrepancy was unique to the heads of the two departments that achieved the most change. The heads of Departments A and B felt that their departments undervalued teaching and that teaching evaluation needed to change to address this problem. As the head of Department A explained, historically their department had not known how to evaluate teaching, and this had consequences for how teaching was valued:
“This is… an ongoing issue in the department… We don't have a good sense of how to evaluate people's teaching… If the majority of your appointment is classroom teaching, how do you evaluate that?… [This] creates some logical resentments within the department because there's not a good appreciation [for teaching]. We've spent a fair amount of time trying to appreciate each other's science, and giving each other the benefit of the doubt for science. But we don't have a similar way of doing that for teaching because we just don't think of teaching as an activity that has known skills that make you good at it.”
This head felt that faculty tended to view teaching and research differently in the department. Whereas faculty viewed research as requiring particular expertise, they did not necessarily see teaching as requiring the same level of expertise. The head felt this resulted in the department underappreciating the work invested in teaching. For this head, advancing teaching evaluation practices would enable better recognition of teaching, which had the potential to improve teaching effectiveness. Similarly, the head of Department B aimed to set an expectation in their department that teaching would be valued and rewarded,
“We have an amazing research enterprise here, but sometimes teaching is given a short shrift. It's viewed as a necessary evil, or it's a bitter pill to swallow. You get done with it in two or three weeks and you survive. I personally enjoy and value highly the part of our job that is related to instruction. And I think that developing a system for evaluation is a way to positively highlight the importance of that job and the value that we place on it in the department and in the university.”
These two heads believed that teaching was a critical component of faculty work, that effective teaching follows from expertise and investment in improving, and that existing systems did not allow for tangible recognition and reward for effective teaching. Though these ideas align with national calls to improve undergraduate education (i.e., AAAS, 2011), most participating department heads did not espouse these views.
Valence: Heads Only Pursued Changes they Perceived to Offer Clear Benefits, and Heads of High Change Departments saw Ways to Mitigate Costs
Department heads pursued changes in teaching evaluation practices only when they perceived concrete benefits for faculty in their department. The heads of High Change departments perceived benefits to changing multiple voices, and then pursued these changes. As an example, the head of Department C discussed the benefits that they saw in implementing new peer and self-evaluation practices. In this quote, they described how peer evaluation could be beneficial for the observer and the observed:
“I actually learned a lot [participating in peer evaluations]. Especially because I teach a non-majors class with 250 students, and I evaluated somebody [else] teaching it in the other semester. I learned a lot about the classroom dynamic in the corners of the classroom that you can't see and things like that. It actually gave me a lot of valuable perspective. I think there's value in the evaluating that goes on. I think that it's beneficial for people to watch others teach and get ideas, see what works and what doesn't and think about things in new ways.”
The head of Department C also saw concrete benefits to self-reflection, including helping faculty focus on how they could improve, even if they believed they did not need to improve:
“Just because things [in the classroom] are going well doesn't mean that they can't be improved. For some people, I think they think things are going perfectly well, but maybe they're not. I see student evaluations, I see [the students’ responses], and sometimes those don't agree [with a faculty's evaluation of their own teaching]. I think asking people [in a written self-reflection] to identify an area that they would like to work on, or a new practice that they would like to implement would have some value.”
In alignment with the benefits that the head of Department C anticipated, Department C made progress in implementing new target practices in the peer voice, and initiated a system of faculty self-reflection on teaching (Figure 1).
In contrast, the heads of Low Change departments anticipated benefits from more limited changes to teaching evaluation. For example, the head of Department G described student end-of-course evaluations as a valuable source of data for them:
“I read all of the evaluations from the students. If I see, for instance, a good number of students saying the same thing, then I really pay attention to that one thing. Sometimes, if there's a problem, I talk to that faculty member. I think it is extremely helpful for the faculty. And we can actually help prevent issues by going back to the students.”
This head added one target practice to their use of the student voice (Figure 1). They started expecting their instructors to take all necessary steps to improve their response rates on end-of-course evaluation. This head did not discuss benefits of any of the other voices and did not pursue changes to them.
All department heads had concerns about potential costs to changing their department's teaching evaluation practices. However, these concerns did not dissuade the heads of High Change Departments from leading change. Heads raised a variety of concerns, but the most common dealt with faculty resistance and, relatedly, the potential to increase faculty workload. Some heads had already experienced resistance, whereas others anticipated future resistance. For example, when considering implementing self-reflection as a part of annual evaluations, the head of Department K said:
“I don't think [my department] will ever be in a place where I can tell all the faculty they have to do [self-reflections]. There will be a significant fraction that will just simply refuse this, not do it… This actually requires some effort on their part. I think for some people, some of the senior faculty that have already been promoted to full professor, for example, I can imagine some of them being like ‘Well, okay, this is just some other thing that the department is asking me to do…’ I'm not entirely optimistic that I could get buy-in from everyone at all levels.”
The quote above illustrates that the department head anticipated that faculty would resist new teaching evaluation practices because these practices would increase their workload. More than half of the participating heads shared this concern. For example, when asked whether they thought their faculty would be supportive of making changes to their peer evaluation practices, the head of Department E said:
“I want [peer evaluation] to be an efficient process that doesn't add significant burden to the lives of faculty… It's a burden to do [peer evaluations]… There'll be a perception that this is one more thing that needs to be done on top of everything else.”
Even though all heads had similar concerns about potential costs, heads of High Change departments saw ways to move forward productively. Heads of High Change departments tended to view resistance as a normal part of making change. For example, the head of Department B explained that most efforts to change how things worked in a department would engender resistance from some faculty. They felt they could balance costs and benefits for faculty, and as a result, most faculty would be supportive. They explained,
“I think the thing that people would be most worried about is the amount of time required [for them] to spend in other people's lectures. That could add up. We're trying to strike a balance when doing this. It has to be often enough so that it's part of our lives… I'm worried even doing it once a year for everyone in the department, it's going to be tough… I mean, there's nothing that doesn't engender complaints. In the entire world, that's the way it is. And in our little microcosm of [Department B], that's the way too. So yeah, I think most people will be mostly supportive, and some people will complain. It's life.”
The head of Department A similarly anticipated resistance from some faculty and felt equipped to successfully navigate the resistance by having open conversations with faculty. When asked how they might handle resistance from faculty as they pursued new teaching evaluation practices, the head of Department A explained,
The [faculty] who come to you directly… are the easiest ones [to deal with] because they've already started the dialogue… You can say, ‘Look, I'm sorry that you find this to not be very valuable. If you want to be involved in the process, we could try and make it a more valuable experience for you… I could put you on the committee.’ Or just tell them ‘Look, I'm sorry that you don't find this valuable, but we have to have a way to fairly evaluate instruction. This is a way that is relatively well supported with data and that provides us with actual feedback, as opposed to the haphazard way this has been done in the past. If you don't want to be engaged in your teaching, that's up to you, but it will be reflected in your annual evaluations going forwards.’ … The key thing is not to just tell people ‘Shut up and do it,’ because that never works with academics… It's to try and have the dialogue and engage with people.”
This department head is able to describe what responses they anticipate from faculty and different ways they could respond. Thus, they were not dissuaded from pursuing change they felt was important for the department. Other heads similarly anticipated resistance from some faculty but did not articulate clear ideas about how they could foster buy-in among faculty or work with or around a few resistant individuals.
Efficacy: Heads of High Change Departments Questioned their Own Knowledge and Delegated to Others
Heads of High Change departments reported that they lacked expertise in teaching and changing teaching evaluation and sought help. For example, when asked whether they had felt prepared to lead changes to teaching evaluation, the head of Department A replied,
“Not even a little bit. This is so far outside my area of expertise that when I started on this project, I was myself resistant to it. I had no idea what this was going to look like. I didn't have the vocabulary to talk about it. I had no idea.”
Each head of a High Change department expressed similar doubts about their knowledge of teaching and teaching evaluation. In contrast, not a single head of a Low Change department questioned their own expertise in teaching or preparation for leading change to teaching evaluation. For example, when asked whether they felt comfortable leading these changes, the head of Department H responded,
“I feel very comfortable… We've implemented a number of changes over the past five years for our academic programs… I think change management is a fairly large topic, there is a skill to change management.”
Heads of Low Change departments generally felt that they were equipped to lead change, but our data suggest that they did not choose to use these leadership skills to engage in the work of changing teaching evaluation practices.
Following their assessment that they lacked relevant expertise, heads of High Change departments delegated the work of developing and piloting new teaching evaluation practices to knowledgeable faculty. For example, the head of Department C explained,
“In terms of leading change in this specific area, I'm not, by training, a pedagogical expert. And so I would like to have partners in it. In our department, we have a lot of really good instructional faculty, tenure track and non-tenure track… I wouldn't be comfortable implementing everything or deciding what needs to be implemented, but I would be perfectly comfortable supporting the implementation and prompting change.”
The heads of High Change departments each identified one or two faculty who were interested and willing to spearhead the development of new departmental practices, and who the head felt had expertise that they personally lacked. Of the four High Change departments, one created a new committee to undertake teaching evaluation reform, one charged an existing committee with this work, and in the other two, the work largely fell to individual faculty.
Principal Support: Department Heads’ Pursuit of Change was Unaffected by their Perceptions of Administrative Support
Department heads did not rely on their perceptions of administrative support in their decisions to pursue changes to teaching evaluation. Each head, including those of both High and Low Change departments, felt that the upper administration of the university supported advancements to teaching evaluation. For example, the head of Department D stated,
“I think they're supportive at the college level and even at the Provost level. I can't say that I've seen a push per se. There are new directives now that we have to adjust our annual evaluation and promotion documents to include [the three voice framework]. So in that way they're supportive.”
This head and others pointed toward policy changes as evidence that the upper administration would be supportive of departmental work advance teaching evaluation. Two heads perceived that there would not be resources allocated for any potential changes, but still felt that the administration was generally in favor of changes to teaching evaluation.
DISCUSSION
One clear finding of this work is that achieving second-order change in departmental teaching evaluation practices is challenging. The LAT intervention facilitated extensive teaching evaluation reform for some departments but a greater number of departments accomplished only first-order change. Pursuing extensive reforms relied on seeing a need for change, perceiving concrete benefits of the change, seeing ways to address potential costs, humility about one's expertise, and tapping human resources in the department. These findings have informed our ongoing approaches to advance teaching evaluation within our own institution. We elaborate in this section about hypotheses that emerge from this work that warrant consideration in other change efforts and testing in future research.
We hypothesize that department heads who leverage the human resources in their department will more successfully advance teaching evaluation. This requires a department head to 1) acknowledge the limitations of their own expertise related to teaching and teaching evaluation, and 2) delegate the work to departmental colleagues who are more knowledgeable and/or eager to learn. Though most department heads in this study had a lot to learn about robust and equitable teaching evaluation, only a few reflected openly on what they knew or had room to learn in this area. This humility played a role in their ability to lead change because it prompted them to rely on other departmental faculty who were more knowledgeable or eager to learn. A team with diverse knowledge and skills can more successfully lead change within an organization (e.g., Kotter and Cohen, 2012).
Delegating the work of developing new teaching evaluation practices to other faculty had two additional benefits as well. First, it meant that someone besides the department head was working to make progress. This is likely critical because department heads often struggle to balance the workload of their positions (Gmelch and Miskin, 1993) and therefore may also struggle to find time to work on teaching evaluation. We observed this first hand in the LAT. Second, including other faculty can ultimately foster faculty buy-in because the process and products are guided by what faculty view as both useful and feasible (e.g., Cheldelin, 2000; Lucas, 2000). Relatedly, second-order change requires cultural shifts and culture change involves many members of a department shifting their thinking and practices, not just the leader. Department heads who felt they needed to do all of the work of developing new teaching evaluation practices themselves, rather than delegating, made limited progress in enacting new practices and likely had no impact on the culture surrounding teaching evaluation.
We also hypothesize that department heads need to be adequately prepared to effectively lead change. Many department heads take the job out of a sense of duty to serve their department (Gmelch and Miskin, 1993), which is commendable. In turn, they deserve adequate preparation to maximize their effectiveness and efficiency in this role. Department heads occupy a difficult position, caught between serving their faculty and their administration (Wolverton et al., 2005; Kruse, 2022). This challenge can be exacerbated by a lack of preparation for the department head role (Gmelch et al., 2017). Department heads have both management and leadership responsibilities, which require different skills. Management involves maintaining day-to-day operations of the department (e.g., budgeting, supervising staff, facilities management, interfacing with upper administrators, addressing problems), whereas leadership involves defining a vision for the future and inspiring faculty to make it happen (Gmelch and Miskin, 1993; Kotter, 2012). Change leadership is a specific type of leadership, and requires a distinct set of skills (e.g., advocating for a change, spreading an inspiring vision, encouraging faculty involvement) (Yukl, 2012). Most department head training provides instruction on campus policies and procedures (Normore and Brooks, 2014), rather than building leadership capacity. In the absence of leadership training, department heads resourcefully rely on their own experiences and ideas about leadership to guide their actions, which vary considerably from person to person.
The heads of High Change departments successfully led teaching evaluation reform and their cases may be instructive regarding the skills and strategies that leadership training should foster. In particular, High Change department heads functioned in two ways that align with scholarship about organizational change and departmental leadership. First, High Change heads built a team to do the change work; team building is an essential step in organizational change and change teams are most effective when they include individuals with political power, expertise, credibility, and leadership skills (Kotter, 2012). Furthermore, academia values shared decision-making and governance, requiring heads to be equipped to manage these processes (Lucas, 2000). Building a team ensures that these values are built into the change process. Second, High Change heads dealt effectively with resistance. Resistance is a normal part of any organizational change. Change involves letting go of how things have worked previously and uncertainty about how things will work in the future, both of which can cause stress (Cheldelin, 2000). High Change heads recognized different forms of resistance, heard concerns without taking them personally, offered reassurances about shared values and a clear vision for the future, and moved forward (Cheldelin, 2000). These ideas about effective leadership and others, if put into practice, could help department heads lead change. Therefore, we see a need for leadership training to be encouraged and incentivized for department heads.
Finally, we hypothesize that achieving second-order change in teaching evaluation is more likely when a department head perceives that teaching is undervalued and interprets this as a problem. This hypothesis is based on two departments who achieved the most change and national conversations that have called on institutions and departments to reconsider how teaching is being valued and rewarded (Dennin et al., 2017). Inadequate teaching evaluation is not the only structure in the academy that systematically places lower value on teaching than research. Reward systems often also devalue teaching. Within the institution where this work occurred, faculty who have teaching as most or all of their responsibility are paid less than tenure-track faculty. There are also fewer awards and titles available for teaching-focused faculty, which are symbols of prestige in the academy. Last, teaching-focused faculty are excluded from consideration for particular leadership roles, limiting their power to address systemic structures that undervalue teaching. All of our participants had the opportunity to notice that the systems of higher education in which they work undervalue teaching relative to research, yet only two expressed this idea. It may be incumbent upon change agents to create learning opportunities for leaders to critically consider the systemic devaluing of teaching. The department heads in this study expressed value for teaching, but we struggled to help everyone see how evaluation practices were misaligned with these espoused values, a step that may be necessary for cultural change.
The findings of this project have informed our ongoing work to advance teaching evaluation. We expanded our approach to include faculty leaders in addition to department heads. We continue to convene department heads in an LAT, and focus their time on learning about teaching evaluation and leading change. We now support their work by making delegation part of our intervention, since this was the strongest predictor of being a High Change department. As a condition of their participation, we ask heads to identify one or more faculty to lead the development and piloting of new evaluation practices. We gather these faculty bi-weekly for an academic year to support their learning and development of new practices, and compensate them for their time. The department head helps facilitate piloting new practices and fostering buy-in among faculty.
Readiness for Change as a Guiding Theory
Though it was developed to describe change in the context of private businesses, the readiness for change framework provided a valuable lens for looking at change in the context of STEM higher education. We gained new insights about the department heads’ motivations to advance teaching evaluation and how we might more directly foster motivation in future interventions. Though it shares similarities with theories of personal motivation, such as situated expectancy-value theory (Eccles and Wigfield, 2020), the readiness for change framework encompasses organizational level components, and therefore likely has more explanatory power for work on changing departments or institutions. For example, within the readiness for change framework, valence considers an individual's perceptions of costs or benefits of a change for themselves and the organization. Department heads prioritized the department (and faculty therein) over personal costs and benefits when they considered teaching evaluation change, suggesting the need for theories that reach beyond the individual. Though no participants in this study were familiar with readiness for change, they talked extensively about securing “buy-in” from faculty about teaching evaluation change. Some of the ways they talked about buy-in align with components of readiness for change. Accordingly, one addition we have made to the LAT are meetings focused on fostering buy-in, in which we help department heads craft messaging about change that addresses components of readiness for change (e.g., Armenakis and Harris, 2002).
The component of readiness for change that most distinguished heads of High Change departments was efficacy, but the pattern was actually the opposite of what the framework predicts. The readiness for change framework stipulates that employees are more likely to productively engage in with change if they feel capable of implementing the change (e.g., Armenakis et al. 1993; Rafferty et al. 2013). Yet in this study, leaders achieved more change if they lacked personal efficacy and sought others they viewed as more capable than themselves in the domain of the change. This deserves attention in future research as it may be specific to the organizational structure of higher education, where departments use more shared governance and responsibility than may be common in businesses.
A Multitude of Factors may Influence Departmental Change
Though useful, readiness for change alone is insufficient to explain the differences in teaching evaluation reform achieved by participating departments. We must also consider the larger context surrounding departments and department heads. We discuss some potential contextual influences on departmental change that future interventions and research may wish to consider.
Departmental history and culture can influence how change unfolds. For example, a department whose culture prioritizes full consensus for major decisions may take longer to gain approval for new practices and policies than a department that relies on majority votes to finalize decisions. In contrast, a department that has hired a cohort of new faculty who they are eager to support may see teaching evaluation change as a promising mechanism for supporting these new faculty with constructive feedback and valuable evidence for their promotion dossiers.
Factors at the college and institutional level could also impact changes in teaching evaluation. For example, in a department experiencing pressure from their college to increase research productivity, the head and faculty may feel that they are already overburdened with carrying out other changes and cannot also pursue teaching evaluation reform. Indeed, experiencing too many changes in a short amount of time can create stress for employees, and may negatively impact the outcome of those changes (Bernerth et al., 2011). In contrast, a department experiencing pressure from the college to reduce DFW rates in introductory courses may see research-based peer evaluation practices as a way to provide constructive feedback to faculty, spark more conversations across sections of a course, and demonstrate to upper administration that they are taking action.
Institutional and/or college policies and structures may also influence departmental autonomy in teaching evaluation. For example, college or institutional promotion and tenure guidelines could require that faculty include their quantitative student evaluation scores compared with a departmental average. This approach is highly problematic due to the many factors that influence course evaluation scores that are out of the instructor's control, including instructor race, gender, country of origin and course time and size (e.g., Bedard and Kuhn, 2008; Boring, 2017; Fan et al., 2019; Esarey and Valdes, 2020; Aragón et al., 2023). Yet a department may have to comply until policies at a higher level change. Faculty unions represent another institutional factor that can influence change, for example by mandating or limiting the use of peer evaluations for faculty promotion and tenure consideration.
Disciplinary context also matters. For example, some disciplines undergo extensive accreditation of their undergraduate programs, which can be a barrier to change processes (e.g., Laursen et al., 2019). The Accreditation Board for Engineering and Technology, Inc. (ABET) is an accrediting agency for engineering and computing disciplines that expects departments to define their program's education objectives, as well as outcomes for students, and focus on continuous improvement over time. This requires considerable planning and ongoing work for departments, who then may feel both overburdened by assessment and convinced that they already evaluate teaching sufficiently. Yet assessments for ABET do not focus on individual instructors. These data may not provide feedback that helps faculty improve or evidence on which to base reward decisions for individual faculty. In these ways and others, accreditation may act as a barrier to teaching evaluation reform (Laursen et al., 2019).
Limitations
Given that this work examines 11 STEM departments at a single research-intensive institution, the findings are best considered exploratory and most appropriately used to generate and test hypotheses in other contexts. We caution readers about generalizing these findings to other disciplines, institutions, or institution types. Additionally, all of our department heads agreed to participate voluntarily. We note that multiple participants made limited progress, so a willingness to attend meetings appears unrelated to a willingness to change.
The timing of this work creates another set of limitations. We examined change over 3 years, but change in academic departments is slow and this project is ongoing. Change to teaching evaluation is continuing in some of these departments, so our findings may under-represent the change accomplished by prioritizing those able to achieve change more quickly. We also did not follow departments long enough to study how faculty use the information provided by more robust teaching evaluation practices. It may be the case that faculty need additional support to build their confidence and skills for relying on these data to inform their teaching (Hora et al., 2017; Lenhart and Bouwma-Gearhart, 2021). Additionally, the years of this work included the two academic years most negatively impacted by the COVID-19 pandemic. This undoubtedly slowed progress. In later iterations of the LAT, we have been able to facilitate learning much more quickly, which may have resulted from our own development as change agents, the greatly reduced impact of the pandemic on academic functioning, and growing awareness of the successes of High Change departments.
Our research is limited by the fact that we did not have the exact same data from each department. Several department heads were unavailable during the second round of interviews due to turnover, meaning we relied on other sources of information to characterize practices and had only data from LAT meetings to assess readiness for change. We see that two of these departments F and G, appeared to regress their peer evaluation practices (Figure 1). It is unclear whether the initial or second report of practices was inaccurate, or whether practices actually moved away from research-based practices.
Last, we were not able to fully apply the readiness for change framework in this context. Our data lacked sufficient resolution to compare appropriateness, which is a sense that the solution under consideration will address the identified need for change. Future work can consider how to elicit data about appropriateness and relate this to change achieved.
CONCLUSIONS
Without robust and equitable teaching evaluation practices, departments are unable to determine who is teaching effectively and who is working to improve their teaching over time. As a result, they cannot reward investments in teaching and teaching improvement. Evidence-based teaching will only become widespread when evaluation and reward systems illuminate and incentivize effective teaching. Changing the status quo in these systems requires long-term interventions. As a community, we will be best equipped to succeed if we take a scholarly approach to these changes and share what we learn. We hope that others can learn from our efforts, including what went well and where our efforts feel short.
Supplementary Material
ACKNOWLEDGMENTS
Thank you to Craig Wiegert and Malcolm Adams for assistance facilitating LAT meetings, Mellinda Craig for scheduling meetings, and Janette Hill for valuable formative evaluation data and insights about context. Thank you also to Kiyanie Fedrick, Nichole Mumuney, Ryan O'Donnell, Cloe Reynolds, Abbie Vaughn, and Mina Ziai who all served as undergraduate researchers for this project, and were invaluable throughout the coding process. Support for this work was provided by the NSF's Improving Undergraduate STEM (IUSE) program under award 1821023. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
REFERENCES
- American Association for the Advancement of Science (2011). Vision and change in undergraduate biology education: A call to action, Washington, DC. [Google Scholar]
- Anderson, T., & Shattuck, J. (2012). Design-based research: A decade of progress in education research? Educational Researcher, 41(1), 16–25. [DOI] [Google Scholar]
- Andrews, S. E., Keating, J., Corbo, J. C., Gammon, M., Reinholz, D. L,, & Finkelstein, N. (2020). Transforming teaching evaluation in disciplines: A model and case study of departmental change. In White K., Beach A., Finkelstein N., Henderson C., Simkins S., Slakey L., Stains M., Weaver G., & Whitehead L. (Eds.), Transforming institutions: Accelerating systemic change in higher education. Pressbooks. Retrieved May 27, 2022, from http://openbooks.library.umass.edu/ascnti2020/. [Google Scholar]
- Andrews, T. C., Brickman, P., Dolan, E. L., & Lemons, P. P. (2021). Every tool in the toolbox: Pursuing multilevel institutional change in the DeLTA project. Change: The Magazine of Higher Learning, 53(2), 25–32. 10.1080/00091383.2021.1883974 [DOI] [Google Scholar]
- Anfara, V. A., Brown, K. M., & Mangione, T. L. (2002). Qualitative analysis on stage: Making the research process more public. Educational Researcher, 31(7), 28–38. [DOI] [Google Scholar]
- Aragón, O. R., Pietri, E. S., & Powell, B. A. (2023). Gender bias in teaching evaluations: The causal role of department gender composition. Proceedings of the National Academy of Sciences, 120(4), e2118466120. 10.1073/pnas.2118466120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Armenakis, A. A., & Harris, S. G. (2002). Crafting a change message to create transformational readiness. Journal of Organizational Change Management, 15(2), 169–183. 10.1108/09534810210423080 [DOI] [Google Scholar]
- Armenakis, A. A., Harris, S. G., & Mossholder, K. W. (1993). Creating readiness for organizational change. Human Relations, 46(6), 681–703. 10.1177/001872679304600601 [DOI] [Google Scholar]
- Barab, S. (2014). Design-based research: A methodological toolkit for engineering change. The Cambridge Handbook of the Learning Sciences, 2, 151–170. [Google Scholar]
- Bedard, K., & Kuhn, P. (2008). Where class size really matters: Class size and student ratings of instructor effectiveness. Economics of Education Review, 27(3), 253–265. 10.1016/j.econedurev.2006.08.007 [DOI] [Google Scholar]
- Bergquist, W. H., & Pawlak, K. (2007). Engaging the Six Cultures of the Academy: Revised and Expanded edition of the Four Cultures of the Academy. San Francisco, CA: John Wiley & Sons. [Google Scholar]
- Bernerth, J. B., Walker, H. J., & Harris, S. G. (2011). Change fatigue: Development and initial validation of a new measure. Work & Stress, 25(4), 321–337. 10.1080/02678373.2011.634280 [DOI] [Google Scholar]
- Blaich, C. F., & Wise, K. S. (2010). Moving from assessment to institutional improvement. New Directions for Institutional Research, 2010(S2), 67–78. [Google Scholar]
- Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of Public Economics, 145, 27–41. 10.1016/j.jpubeco.2016.11.006 [DOI] [Google Scholar]
- Bouwma-Gearhart, J. L., & Hora, M. T. (2016). Supporting faculty in the era of accountability: How postsecondary leaders can facilitate the meaningful use of instructional data for continuous improvement. Journal of Higher Education Management, 31(1), 44–56. [Google Scholar]
- Bouwma-Gearhart, J., Sitomer, A., Fisher, K. Q., Smith, C., & Koretsky, M. (2016). Studying organizational change: Rigorous attention to complex systems via a multi-theoretical research model. Paper presented at: American Society for Engineering Education Annual Conference (New Orleans, LA: ). [Google Scholar]
- Bradforth, S. E., Miller, E. R., Dichtel, W. R., Leibovich, A. K., Feig, A. L., Martin, J. D., ... & Smith, T. L. (2015). University learning: Improve undergraduate science education. Nature, 523(7560), 282–284. 10.1038/523282a [DOI] [PubMed] [Google Scholar]
- Brickman, P., Gormally, C., & Martella, A. M. (2016). Making the grade: Using instructional feedback and evaluation to inspire evidence-based teaching. CBE—Life Sciences Education, 15(4), ar75. 10.1187/cbe.15-12-0249 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheldelin, S. I. (2000). Handling resistance to change. In Lucas, A. F. (Eds.), Leading Academic Change: Essential Roles for Department Chairs. San Francisco, CA: Jossey-Bass Publishers. [Google Scholar]
- Chi, M. T., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243. 10.1080/00461520.2014.965823 [DOI] [Google Scholar]
- Choi, M. (2011). Employees' attitudes toward organizational change: A literature review. Human Resource Management, 50(4), 479–500. 10.1002/hrm.20434 [DOI] [Google Scholar]
- Cipriano, R. E., & Riccardi, R. L. (2017). The Department Chair: A decade-long analysis. The Department Chair, 28(1), 10–13. 10.1002/dch.30144 [DOI] [Google Scholar]
- Connolly, M. R., & Seymour, E. (2015). Why Theories of Change Matter Working Paper No. 2015-2. Retrieved http://wcer-web.ad.education.wisc.edu/docs/working-papers/Working_Paper_No_2015_02.pdf
- Corbo, J. C., Reinholz, D. L., Dancy, M. H., Deetz, S., & Finkelstein, N. (2016). Framework for transforming departmental culture to support educational innovation. Physical Review Physics Education Research, 12(1), 010113. [Google Scholar]
- Dawson, S. M., & Hocker, A. D. (2020). An evidence-based framework for peer review of teaching. Advances in Physiology Education, 44(1), 26–31. 10.1152/advan.00088.2019 [DOI] [PubMed] [Google Scholar]
- Dennin, M., Schultz, Z. D., Feig, A., Finkelstein, N., Greenhoot, A. F., Hildreth, M., ... & Miller, E. R. (2017). Aligning practice to policies: Changing the culture to recognize and reward teaching at research universities. CBE—Life Sciences Education, 16(4), es5. 10.1187/cbe.17-02-0032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eccles, J. S., & Wigfield, A. (2020). From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemporary Educational Psychology, 61, 101859. 10.1016/j.cedpsych.2020.101859 [DOI] [Google Scholar]
- Eckel, P. D., & Kezar, A. (2003). Key strategies for making new institutional sense: Ingredients to higher education transformation. Higher Education Policy, 16, 39–53. 10.1057/palgrave.hep.8300001 [DOI] [Google Scholar]
- Edelson, D. C. (2002). Design research: What we learn when we engage in design. The Journal of the Learning Sciences, 11(1), 105–121. 10.1207/S15327809JLS1101_4 [DOI] [Google Scholar]
- Esarey, J., & Valdes, N. (2020). Unbiased, reliable, and valid student evaluations can still be unfair. Assessment & Evaluation in Higher Education, 45(8), 1106–1120. 10.1080/02602938.2020.1724875 [DOI] [Google Scholar]
- Fan, Y., Shepherd, L. J., Slavich, E., Waters, D., Stone, M., Abel, R., & Johnston, E. L. (2019). Gender and cultural bias in student evaluations: Why representation matters. PLoS One, 14(2), e0209749. 10.1371/journal.pone.0209749 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finkelstein, N., Greenhoot, A. F., Weaver, G., & Austin, A. E. (2020). A department-level cultural change project: Transforming evaluation of teaching. In White K., Beach A., Finkelstein N., Henderson C., Simkins S., Slakey L., Stains M., Weaver G., & Whitehead L. (Eds.), Transforming Institutions: Accelerating Systemic Change in Higher Education. Pressbooks. Retrieved May 27, 2022, from http://openbooks.library.umass.edu/ascnti2020/ [Google Scholar]
- Glassick, C. E., Huber, M. T., & Maeroff, G. I. (1997). Scholarship Assessed: Evaluation of the Professoriate. San Francisco, CA: John Wiley & Sons. [Google Scholar]
- Gmelch, W. H., & Miskin, V. D. (1993). Leadership Skills for Department Chairs. Bolton, MA: Anker Publishing Company, Inc. [Google Scholar]
- Gmelch, W. H., Roberts, D., Ward, K., & Hirsch, S. (2017). A retrospective view of department chairs: Lessons learned. The Department Chair, 28(1), 1–4. 10.1002/dch.30140 [DOI] [Google Scholar]
- Holt, D. T., Armenakis, A. A., Harris, S. G., & Feild, H. S. (2007). Toward a comprehensive definition of readiness for change: A review of research and instrumentation. Research in Organizational Change and Development, 289–336. 10.1016/S0897-3016(06)16009-7 [DOI] [Google Scholar]
- Hora, M. T., Bouwma-Gearhart, J., & Park, H. J. (2017). Data driven decision-making in the era of accountability: Fostering faculty data cultures for learning. The Review of Higher Education, 40(3), 391–426. [Google Scholar]
- Kezar, A. (2014). Higher education change and social networks: A review of research. The Journal of Higher Education, 85(1), 91–125. 10.1080/00221546.2014.11777320 [DOI] [Google Scholar]
- Kezar, A. (2018). How Colleges Change: Understanding, Leading, and Enacting Change. New York, NY: Routledge. [Google Scholar]
- Kotter, J. P. (2012). Leading Change. Boston, MA: Harvard Business Press. [Google Scholar]
- Kotter, J. P., & Cohen, D. S. (2012). The Heart of Change: Real-life Stories of How People Change their Organizations. Boston, MA: Harvard Business Press. [Google Scholar]
- Krishnan, S., Gehrtz, J., Lemons, P. P., Dolan, E. L., Brickman, P., & Andrews, T. C. (2022). Guides to Advance Teaching Evaluation (GATEs): A resource for STEM departments planning robust and equitable evaluation practices. CBE—Life Sciences Education,, 21(3), ar42. 10.1187/cbe.21-08-0198 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kruse, S. D. (2022). Department chair leadership: Exploring the role's demands and tensions. Educational Management Administration & Leadership, 50(5), 739–757. 10.1177/1741143220953601 [DOI] [Google Scholar]
- Laursen, S., Andrews, T., Stains, M., Finelli, C. J., Borrego, M., McConnell, D., & Malcom, S. (2019). Levers for Change: An Assessment of Progress on Changing STEM Instruction. Washington, DC: American Association for the Advancement of Science. Retrieved May 27, 2022, from www.aaas.org/sites/default/files/2019-07/levers-for-change-WEB100_2019.pdf [Google Scholar]
- Lenhart, C., & Bouwma-Gearhart, J. (2021). STEM faculty instructional data-use practices: Informing teaching practice and students’ reflection on students’ learning. Education Sciences, 11(6), 291. Retrieved from https://www.mdpi.com/2227-7102/11/6/291 [Google Scholar]
- Lucas, A. F. (2000). A teamwork approach to change in the academic department. In Lucas, A. F. (Eds.), Leading Academic Change: Essential Roles for Department Chairs. (pp. 7–32. San Francisco, CA: Jossey-Bass Publishers. [Google Scholar]
- Lyde, A. R., Grieshaber, D. C., & Byrns, G. (2016). Faculty teaching performance: Perceptions of a multi-source method for evaluation. Journal of the Scholarship of Teaching and Learning, 16(3), 82–94. [Google Scholar]
- Mohr, D. C., Cuijpers, P., & Lehman, K. (2011). Supportive accountability: A model for providing human support to enhance adherence to eHealth interventions. Journal of Medical Internet Research, 13(1), e30. 10.2196/jmir.1602 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Normore, A. H., & Brooks, J. S. (2014). The department chair: A conundrum of educational leadership versus educational management. In Lahera, A. I., Hamdan, K., & Normore, A. H. (Eds.), Pathways to Excellence: Developing and Cultivating Leaders for the Classroom and Beyond (pp. 3–19). Bingley, UK: Emerald Group Publishing Limited. [Google Scholar]
- Patton, M. Q. (1990). Qualitative Evaluation and Research Methods. Thousand Oaks, CA: SAGE Publications, Inc. [Google Scholar]
- Peterson, D. A. M., Biederman, L. A., Andersen, D., Ditonto, T. M., & Roe, K. (2019). Mitigating gender bias in student evaluations of teaching. PLoS One, 14(5), e0216241. 10.1371/journal.pone.0216241 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rafferty, A. E., Jimmieson, N. L., & Armenakis, A. A. (2013). Change readiness: A multilevel review. Journal of Management, 39(1), 110–135. 10.1177/0149206312457417 [DOI] [Google Scholar]
- Reinholz, D. L., & Andrews, T. C. (2020). Change theory and theory of change: What's the difference anyway? International Journal of STEM Education, 7, 2. 10.1186/s40594-020-0202-3 [DOI] [Google Scholar]
- Reinholz, D. L., & Apkarian, N. (2018). Four frames for systemic change in STEM departments. International Journal of STEM Education, 5, 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reinholz, D. L., Corbo, J. C., Bernstein, D. J., & Finkelstein, N. D. (2018). Evaluating scholarly teaching: A model and call for an evidence-based approach. In Lester, J., Klein, C., Johri, A. & Rangwala H. (Eds.), Learning Analytics in Higher Education (pp. 69–92). New York, NY: Routledge. [Google Scholar]
- Reinholz, D. L., White, I., & Andrews, T. (2021). Change theory in STEM higher education: A systematic review. International Journal of STEM Education, 8(1), 1–22. 10.1186/s40594-021-00291-2 [DOI] [Google Scholar]
- Saldaña, J. (2013). The Coding Manual for Qualitative Researchers (2nd ed.). Thousand Oaks, CA: SAGE Publications Ltd. [Google Scholar]
- Schein, E. H. (1996). Kurt Lewin's change theory in the field and in the classroom: Notes toward a model of managed learning. Systems practice, 9, 27–47. 10.1007/BF02173417 [DOI] [Google Scholar]
- Schön, D., & Argyris, C. (1996). Organizational Learning II: Theory, Method and Practice, 305, 107–120. Reading: Addison Wesley. [Google Scholar]
- Scott, E. E., Wenderoth, M. P., & Doherty, J. H. (2020). Design-based research: A methodology to extend and enrich biology education research. CBE—Life Sciences Education, 19(2), es11. 10.1187/cbe.19-11-0245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22(2), 63–75. [Google Scholar]
- Simonson, S. R., Earl, B., & Frary, M. (2022). Establishing a framework for assessing teaching effectiveness. College Teaching, 70(2), 164–180. 10.1080/87567555.2021.1909528 [DOI] [Google Scholar]
- Smith, C. (2008). Building effectiveness in teaching through targeted evaluation and response: Connecting evaluation to teaching improvement in higher education. Assessment & Evaluation in Higher Education, 33(5), 517–533. 10.1080/02602930701698942 [DOI] [Google Scholar]
- Task Force on Student Learning and Success. (2017). Report of progress and recommendations. Retrieved on December 17, 2024 from https://president.uga.edu/wp-content/uploads/final_task_force_report.pdf
- Tanner, K. D. (2013). Structure matters: Twenty-one teaching strategies to promote student engagement and cultivate classroom equity. CBE—Life Sciences Education, 12(3), 322–331. 10.1187/cbe.13-06-0115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas, S., Chie, Q. T., Abraham, M., Jalarajan Raj, S., & Beh, L.-S. (2014). A qualitative review of literature on peer review of teaching in higher education: An application of the SWOT framework. Review of Educational Research, 84(1), 112–159. 10.3102/0034654313499617 [DOI] [Google Scholar]
- Weaver, G. C., Austin, A. E., Greenhoot, A. F., & Finkelstein, N. D. (2020). Establishing a better approach for evaluating teaching: The TEval project. Change: The Magazine of Higher Learning, 52(3), 25–31. 10.1080/00091383.2020.1745575 [DOI] [Google Scholar]
- Wolverton, M., Ackerman, R., & Holt, S. (2005). Preparing for leadership: What academic department chairs need to know. Journal of Higher Education Policy and Management, 27(2), 227–238. 10.1080/13600800500120126 [DOI] [Google Scholar]
- Yin, R. K. (2014). Case Study Research: Design and Methods (5th ed.). Thousand Oaks, CA: Sage. [Google Scholar]
- Yukl, G. (2012). Effective leadership behavior: What we know and what questions need more attention. Academy of Management Perspectives, 26(4), 66–85. 10.5465/amp.2012.0088 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
