Abstract
Introduction
The University of Manitoba’s ambulatory pediatric clerkship transitioned to daily encounter cards (DECs) from single in-training evaluation reports (ITERs). The impact of this change on quality of student assessment was unknown. Using the validated Completed Clinical Evaluation Report Rating (CCERR) scale, we compared the assessment quality of the single ITER to the DEC-based system.
Methods
Block randomization was used to select from a cohort of ITER- and DEC-based assessments during equivalent points in clerkship training. Data were transcribed and anonymized and scored by two blinded raters using the CCERR.
Results
Inter-rater reliability for total CCERR scores was substantive (> 0.6). Mean total CCERR score for the DEC cohort was significantly higher than for the ITER cohort (25.2 vs. 16.8, p < 0.001), as were the mean scores for each item (2.81 vs. 1.86, p < 0.05). Multivariate logistical regression supported the significant influence of assessment method on assessment quality.
Conclusions
There is improvement in the average quality of student assessments associated with the transition from an ITER-based system to a DEC-based system. However, the improvement to only average CCERR scores for the DEC cohort suggests an unmet need for faculty development.
Keywords: Daily evaluation cards, Feedback, Medical students, Student assessment, Quality improvement
Introduction
In most clerkship and residency programs, student assessment has traditionally been given in the form of a single, summative in-training evaluation report (ITER). ITERs have a dual role; they are used by program and clerkship directors to ensure clinical competence and are a major consideration in determining whether a student will pass or fail a given rotation. They also serve as a structured means of providing feedback to medical students, one of only a few program-mandated pieces of feedback that the learner might receive [1, 2]. However, the comprehensive ITER has several limitations. As a summative assessment, the ITER relies heavily on preceptor recall and is influenced by the amount of contact between the preceptor and the student [2–4]. Furthermore, a lack of faculty training and guidance as to what constitutes good feedback for an ITER has also been frequently identified as a limitation [4, 5]. Finally, as a feedback tool, the ITER is significantly limited by a lack of timeliness and/or consistently detailed comments [2, 3, 6], which risks omitting context and specific recommendations regarding areas of weakness or strength, impairing the ability of trainees to demonstrate improvement [7].
The pediatric undergraduate medical education (UGME) program at the University of Manitoba (Winnipeg, Canada) has an ambulatory rotation where students are placed in a five half-day general pediatric clinics, emergency and minor treatment area shifts, and sub-specialty clinics. Historically, only one ITER was completed, usually by the preceptor for the five general pediatric half day clinics, without any input from other preceptors. The ambulatory pediatrics rotation transitioned to a system of daily evaluation cards (DECs) in 2017 using three versions of the assessment tool focused on specific CanMEDS roles. This transition coincided with the start of the new 2017–2018 clerkship year and was not preceded by any specific student or faculty training on the use of DECs or the components of good feedback. DECs, in which preceptors complete a brief student assessment after every clinical encounter, rather than retrospectively at the end of a rotation, are a feasible and reliable method of student assessment [8, 9]. DECs are less dependent on preceptor recall and provide immediate, actionable feedback for students. As a deliverable, DECs incorporate all aspects of the STOP acronym, which summarizes the components of effective feedback—specific, timely, objective, and based on observed behaviors, with a plan for improvement [7]. During the entire ambulatory rotation, the current DEC-based method of assessment allows for 15 assessments per student, rather than the single ITER generated by the previous method of assessment. Within the context of the general pediatric half day clinics, the preceptor now completes five assessments rather than a single assessment for the ambulatory rotation. This transition attempted to provide equitable representation of an individual preceptor’s in-the-moment assessment of the learner’s progress and enable timely and actionable feedback.
However, we do not know what the impact of transitioning to DECs—and prescribing specific CanMEDS roles—will have on the quality of student assessments compared to the single ITER system—which aims to evaluate the entire CanMEDS portfolio. The Completed Clinical Evaluation Report Rating (CCERR) scale is a validated instrument which considers nine components of a student assessment to objectively assess the quality of feedback provided [10]. A five-point Likert scale is used for each of nine items that assess for various aspects of high-quality feedback, such as citing specific instances or actions that were areas of strength or areas requiring improvement. A total CCERR score is obtained by summing the scores of all nine individual items; thus, the total CCERR score can range from a minimum of nine to a maximum of 45. First published in 2008, the CCERR has since been used and validated in both ITERs and DECs [10, 11]. Using historical full ITERs, and prospectively obtaining and averaging the prescriptive DECs, we aim to see if the quality of assessments has changed.
Methods
The pediatric UGME program at the University of Manitoba consists of eight periods of 6-week duration, with at least 14 students per period. Each student completes a 3-week inpatient rotation and a 3-week ambulatory rotation. Students either start on ambulatory or start on inpatient wards, and switch at the midpoint of the rotation. During the ambulatory rotation, each student attends multiple ambulatory experiences, including five half-day general pediatric clinics supervised by a staff pediatrician affiliated with the University of Manitoba. These clinics variably take place in community settings around the city of Winnipeg or in the ambulatory children’s clinic located at our children’s hospital. The supervising pediatrician completes an assessment of each medical student at the end of each clinic (in the case of the DEC cohort), or retrospectively after all 5 clinics have been completed (in the case of the ITER cohort). These assessments are then combined by the UGME office to generate a final performance report on each student incorporating the available narrative feedback and numerical scoring. For our study, we examined only student assessments that were completed based on performance in the 5 half-day general pediatrics clinics. This was done because, out of all of the outpatient preceptors in the ambulatory pediatrics rotation, the preceptors for these clinics generally had the most consistent clinical contact with each student.
In the original paper describing the CCERR scale [10], the difference between a “poor” and “average” quality assessment was 8 ± 11.5 points on the 45-point CCERR scale. Based on this, and accounting for 95% confidence and a power of 80%, we estimated that at least 33 student assessments in each cohort are required in order to detect a statistically significant difference between the ITER and DEC cohorts. Compared to published CCERR scoring differences for DECs and to take an ITER from “average” to “high” quality [10, 11], the required sample size to detect a difference between a “poor” and “average” assessment was the largest, and therefore felt to be the most conservative means of establishing sample size enabling detection of statistically significant differences between the CCERR score of our two cohorts.
In an attempt to control for the effects of temporal influences on assessment quality, in particular any bias on student performance on the first clinical rotation and the last clinical rotation, we sought to select student assessments completed during equivalent points in clerkship training. To that end, we identified student assessments completed in each half of periods 2 through 7, for both of the last year of the ITER-based system, and the first year of the DEC-based system, as eligible for inclusion in our analysis. We included assessments from first-time rotation takers regardless of rotation outcome. A total of 84 ITERs and 87 DEC-based assessments were thus eligible. There was only one borderline pass assessment in the ITER cohort, and no failures among either cohort.
From these totals, we then removed any assessment that was incomplete or missing components (missing categories for the ITER cohort or missing any daily evaluation cards for the DEC cohort). One ITER and eight DEC-based assessments were therefore removed. We also removed any assessments that had been completed for students who were remediating their pediatrics rotation, anticipating that the special circumstances for these students would naturally require more detailed and thorough assessment compared to a first-time rotation taker. This resulted in zero ITERs and two DEC-based assessments being removed. The final sample consisted of 83 ITERs and 75 DEC-based assessments, all from first-time rotation takers.
We then randomly selected four student assessments from each half of each period, for each of the ITER and DEC cohorts, giving us a sample of 48 student assessments for both cohorts, with equal representation from each half of each period. To avoid confounding by the potential effects of single versus multiple preceptors, we then removed any assessment that had been completed by more than one preceptor. A total of five student assessments in the ITER cohort and six assessments in the DEC cohort were removed from each of the sample populations, leaving a total of 43 student assessments in the ITER cohort and 42 student assessments in the DEC cohort. As our remaining sample was consequently unbalanced in terms of equal representation from all six periods between the two cohorts, we used linear regression to analyze the effects of potential confounding factors on total CCERR score, and also specifically analyzed period effect. Figure 1 contains a schematic representation of our sampling process.
Fig. 1.
Selection process for ITER- and DEC-based student assessments
A research assistant transcribed de-identified student assessments (numeric scores, written comments, and student and preceptor characteristics) into a database. Spelling and grammatical mistakes were left unaltered. Student assessments were given a unique alpha-numeric code; only the research assistant had access to the master list of student codes.
We collected student and preceptor demographics to analyze any potential confounding effects on assessment quality. Student and preceptor characteristic data were separated from assessment numeric score and written comment data when assessment data was initially presented to the evaluators. In this way, preceptor data was analyzed without specific preceptors being linked by name to any one student assessment, so as to reduce any potential bias from evaluators when assigning CCERR scores. Student and preceptor characteristics included gender (M or F), rotation outcome (pass, borderline pass, or fail), preceptor years in practice (< 5 years, 5–10 years, or > 10 years), and preceptor in-hospital clinical teaching unit (CTU) responsibilities (yes or no; a surrogate marker for academic pediatricians with possibly more student assessment experience). Student characteristics were limited to protect their status as a vulnerable population within the context of this project. Preceptor characteristics were obtained through the College of Physicians and Surgeons of Manitoba physician directory, a public domain source. We did not evaluate the number of unique assessments conducted by particular preceptors for either cohort. Student and preceptor characteristics were subject to sub-analysis through multivariate linear regression on their contribution to CCERR scores; however, due to our sample size, we were not sufficiently powered to perform adequately powered regression analyses.
The original CCERR scale was used without modification for all scoring attempts [10]. The two-membered scoring team (the authors) performed CCERR scoring on student assessment data collected from a clerkship period not included in the analysis. Serial scoring attempts were conducted individually, and then compared between the two raters. Discussion regarding scoring discrepancies was conducted, and further scoring was attempted until both raters consistently scored within two points of one another. Then, each rater performed independent CCERR scoring on all student assessments included in the study period. Each rater also assigned a quality assessment (“poor”, “average”, or “high”) to each assessment. Cohen’s kappa was used to assess inter-rater reliability between the two raters once all student assessments had been scored. Weighted Cohen’s kappa scores were calculated for total CCERR scores, for per-item scores, and for qualitative assessment quality scores, for each cohort.
Student’s t test was used to compare mean total and mean per-item CCERR scores between each cohort. Mean scores were the result of averaging both rater scores and rounding to the closest whole number. Multivariate linear regression was used to assess the impact of time period and student and preceptor characteristics, on mean total CCERR scores. Fisher’s exact test was used to compare qualitative assessment quality scores with the method of assessment (ITER or DEC). Quality scores were determined by unanimous agreement between both raters; in cases of inter-rater disagreement, the lower quality score was used. Chi-squared and Fisher’s exact testing were used to compare group proportions between the two cohorts, as appropriate based on count data. Statistical analysis was performed using R (Vienna, Austria; https://www.R-project.org/).
Ethical approval for this project was obtained through the University of Manitoba Health Research Ethics Board (HS21694 (H2018:143)).
Results
Student and preceptor characteristics for each cohort are included in Table 1, as is a breakdown of period representation. There is no significant difference between student or preceptor gender, preceptor years in practice, preceptor CTU responsibilities, period representation, or the number of students with multiple preceptors, between the ITER and DEC cohorts.
Table 1.
Student and preceptor characteristics in each cohort of student assessments completed by single preceptors using an ITER-based or DEC-based method of assessment
| ITER (n = 43) | DEC (n = 42) | p | ||
|---|---|---|---|---|
| Student characteristics | Gender | |||
| M | 25 | 25 | 1 | |
| F | 18 | 17 | ||
| Preceptor characteristics | Gender | |||
| M | 16 | 15 | 1 | |
| F | 27 | 27 | ||
| Years in practice | ||||
| < 5 | 5 | 4 | 0.4981 | |
| 5–10 | 15 | 20 | ||
| > 10 | 23 | 18 | ||
| CTU responsibilities | ||||
| Y | 10 | 15 | 0.3067 | |
| N | 33 | 27 | ||
| Period | ||||
| 2 | 7 | 8 | 0.9247 | |
| 3 | 8 | 8 | ||
| 4 | 8 | 8 | ||
| 5 | 7 | 7 | ||
| 6 | 6 | 7 | ||
| 7 | 7 | 4 | ||
Comparison of proportions by Chi-squared test and Fisher’s exact test as appropriate for count data
Inter-rater reliability for mean total CCERR scores was substantive (> 0.6 but < 0.8, p < 0.05) for each cohort. Inter-rater reliability for per-item CCERR scoring was variable, but always greater than 0. Inter-rater reliability for quality scores was substantive for the ITER cohorts (> 0.6 but < 0.8, p < 0.05) and fair for the DEC cohort (> 0.2 but < 0.4, p < 0.05).
Mean total CCERR score for the DEC cohort was significantly higher compared to the ITER cohort, 25.2 versus 16.8 (p < 0.05). Mean per-item CCERR scores were also significantly higher for the DEC cohort compared to the ITER cohort for all 9 items included on the CCERR (p < 0.05 for all items). The mean per-item score for the entire DEC cohort was 2.81, significantly higher than the mean per-item score of 1.86 seen in the ITER cohort (p < 0.05). Of note, mean per-item scores were at least one whole point higher for items 2, 6, and 7 among the DEC cohort compared to the ITER cohort. These items assess for the presence of balanced comments, clear examples of areas for improvement, and concrete recommendations for improvement, respectively. Cohen kappa scores for per-item scores were variable, but consistently above 0 for all 9 CCERR items in the ITER and the DEC cohorts. Mean total and mean per-item scores for each cohort are included in Table 2. The proportion of assessments assessed as unanimously “average” or better was substantially higher in the DEC cohort compared to the ITER cohort, 90.4% versus 16.3%, respectively (p < 0.05) (Table 3).
Table 2.
Mean total and mean per-item CCERR scores for both cohorts of student assessments, including weighted Cohen kappa and Student’s t test scores
| ITER | DEC | ||||
|---|---|---|---|---|---|
| CCERR score | Weighted Cohen kappa | CCERR score | Weighted Cohen kappa | p (CCERR score) | |
| Total | 16.8 | 0.677, p < 0.001 | 25.2 | 0.648, p < 0.001 | < 0.001 |
| Per item | |||||
| Item 1 | 2.31 | 0.480, p < 0.001 | 2.86 | 0.339, p < 0.05 | < 0.001 |
| Item 2 | 1.72 | 0.805, p < 0.001 | 3.67 | 0.373, p < 0.001 | < 0.001 |
| Item 3 | 1.28 | 0.808, p < 0.001 | 1.74 | 0.316, p < 0.001 | < 0.05 |
| Item 4 | 2.43 | 0.494, p < 0.001 | 2.80 | 0.442, p < 0.05 | < 0.05 |
| Item 5 | 1.55 | 0.052, p < 0.05 | 2.49 | 0.203, p < 0.05 | < 0.001 |
| Item 6 | 1.12 | 0.096, p = 0.361 | 2.61 | 0.350, p < 0.05 | < 0.001 |
| Item 7 | 1.28 | 0.501, p < 0.001 | 2.80 | 0.461, p < 0.001 | < 0.001 |
| Item 8 | 3.08 | 0.239, p = 0.058 | 3.39 | 0.213, p = 0.069 | < 0.05 |
| Item 9 | 2.00 | 0.179, p < 0.05 | 2.94 | 0.302, p < 0.05 | < 0.001 |
| Average | 1.86 | 2.81 | < 0.05 | ||
Table 3.
Qualitative scores for both cohorts
| ITER | DEC | pa | |
|---|---|---|---|
| Poor | 36 | 4 | < 0.001 |
| Average | 7 | 33 | |
| High | 0 | 5 | |
| Cohen kappab | 0.7, p < 0.001 | 0.39, p < 0.001 |
aFisher’s exact test for comparing group proportions of assessment qualitative score between the ITER and DEC cohorts
bCohen’s kappa reflects inter-rater reliability of qualitative scores given by the two raters within each cohort
Multivariate linear regression confirmed that the method of student assessment as the most significant factor influencing mean total CCERR score (β = 5.67, p < 0.001), although CTU responsibilities were also associated with higher mean total CCERR score (β = 1.46, p < 0.05), albeit to a lesser degree. All other student and preceptor characteristics were not associated with a significant impact on mean total CCERR score. Clerkship period was likewise not associated with a significant impact on mean total CCERR score. It is essential to note, however, that we did not meet our required sample size for any sub analysis of the student characteristics, preceptor characteristics, or clerkship periods to perform adequately powered analyses. The results of multiple variable linear regression are included in Table 4.
Table 4.
Results from multiple variable linear regression of mean CCERR scores, controlling for clerkship period, and student and preceptor characteristics of interest
| Variable | Reference parameter | β | p |
|---|---|---|---|
| Method of assessment | DEC vs. ITER | − 5.67 | < 0.001 |
| Clerkship period | Period 2 through 7 | range: − 1.32 to 0.91 | 0.61 |
| Student gender | F vs. M | − 0.39 | 0.67 |
| Preceptor gender | F vs. M | − 0.45 | 0.84 |
| Years in practice | < 5 through > 10 | range: − 0.54 to − 0.45 | 0.39 |
| CTU responsibilities | N vs. Y | 1.57 | < 0.05 |
Having found a significant impact of preceptor CTU responsibilities on total CCERR score in multivariate linear regression, we sought to further analyze the relationship between CTU responsibilities and total CCERR score using Student’s t test. We noted that there was substantially fewer preceptors with CTU responsibilities compared to those without CTU responsibilities; thus, our groups (with and without CTU responsibilities) were neither balanced nor sufficiently powered based on our sample size calculations for either cohort. Nevertheless, the results of this sub-analysis are included in Table 5. Mean CCERR scores were significantly higher in the DEC cohort regardless of CTU responsibilities (p < 0.005); however, when examining only assessments from the ITER cohort, the difference in mean CCERR scores among preceptors with and without CTU responsibilities approached, but did not reach, significance (p = 0.06).
Table 5.
Sub-analysis of the effects of preceptor CTU responsibilities on mean total CCERR scores
| CTU responsibilities | Mean total CCERR score | t test | |
|---|---|---|---|
| ITER (n) | DEC (n) | ||
| Y (n) | 19.0 (10) | 26.1 (15) | p value < 0.05, 95%CI: 3.64, 10.55 |
| N (n) | 16.1 (33) | 24.7 (27) | p value < 0.05, 95%CI: 6.46, 10.73 |
| t test | p value = 0.06, 95%CI: − 6.03, 0.18 | p value = 0.31, 95%CI: − 4.13, 1.34 | |
Discussion
We have observed that student assessment completed using a DEC-based system is significantly higher in quality than those completed using a single, summative ITER-based system. This occurred without substantive professional development on improving student assessment quality. To our knowledge, this is the first study in which these two methods of student assessment have been directly compared using an objective quality assessment tool. Our results suggest that the method of assessment itself was the key factor responsible for the improvement in quality, as our populations were remarkably similar.
It is worthwhile to note that the most substantial increases in per-item quality scores came on items 2, 6, and 7; these items of the CCERR assess the quality of constructive feedback given from a preceptor to a trainee—whether balanced comments are given, whether specific areas for improvement are addressed, and whether concrete recommendations for improvement are given to trainees. A lack of constructive feedback provided to trainees in their assessments has been a well-known phenomenon within pediatrics [12], and among medical education at large [1, 2, 7].
While our study was not sufficiently powered to detect a significant difference in assessment quality based on student or preceptor characteristics, we did observe in multivariate linear regression that having a preceptor with CTU responsibilities was associated with a small but significant increase in assessment quality as measured by the total CCERR score. While the single summative ITER was still in use for the pediatric inpatient/CTU portion of the rotation for both the ITER and DEC cohorts, these mixed CTU/ambulatory preceptors would, in general, have more opportunities to provide feedback and assessment to trainees than preceptors who only worked in ambulatory settings. We also note that, when we examined only the impact of CTU responsibilities on total CCERR score, assessments completed using the DEC had a higher CCERR score, regardless of preceptor CTU responsibilities. It appears as though any additional opportunities, such as those afforded to preceptors with CTU responsibilities, can lead to higher quality feedback; however, in terms of a robust, easily implementable tool, a DEC-based system is clearly associated with higher-quality student assessments in all preceptors. It is therefore reasonable to continue using a DEC-based system of assessment for our ambulatory pediatrics rotation, and it may be beneficial to consider its implementation in other parts of the clerkship curriculum.
Student assessments are not only used for documentation of trainee progress, and to identify struggling students; assessments are also a means of providing feedback that trainees can use to improve. Trainees consistently report wanting more feedback, and often feel as though their formal assessments are one of the only sources of feedback that they receive [2]. Unfortunately, comments such as “keep reading” and “a pleasure to work with” are not meaningful or particularly useful [7, 12]. In-the-moment assessments seem to allow for easier recall of specific trainee strengths and weaknesses, which can be used to provide recommendations and guidance. Furthermore, negative or constructive feedback carries less weight overall [8]. An increase in the amount of constructive feedback given to the students in the DEC-based cohort was not associated with more rotation failures or borderline passes. When constructive feedback does not mean the difference between passing and failing a student (or interpreted as such by a preceptor), it can serve its intended purpose to push forward, rather than punish. Specific feedback is also meaningful for curriculum directors—if preceptors can identify areas where students struggle, program directors and designers can respond with meaningful curriculum changes. Finally, medical education is moving towards a competency-based framework, in which assessment and promotion are linked to entrustment [13]. The first cohorts of Canadian residents training in a competency-based model are already established, as have their international counterparts. Competency-based medical education is also a natural fit for medical student education. As DECs are essentially another name for the work-based assessments in competency-based medical education, our study helps to lend support for this transition, at least in terms of the feedback that is being provided to medical students.
Our results are meaningful and applicable beyond ambulatory pediatrics. We have shown that the transition to a DEC-based system is associated with higher-quality student assessments, implying that the arguable increased time and effort associated with completing a brief assessment every day, rather than just once, is worthwhile. However, despite the improvement in quality that was observed with the switch from an ITER-based system to a DEC-based system, there is still work to be done. The improvement in assessment quality among the DEC-based cohort was still only associated with a mean total CCERR score equivalent to the “average” range described in the original CCERR study [10]. This implies that the tool itself is not enough; continued faculty development is required to continue improving the overall quality of student assessments. For example, faculty development that uses the different CCERR items as an instructional tool may ensure that trainees are consistently receiving high-quality feedback.
While part of the value of DECs is that they are a reliable means of student assessment in multiple-preceptor environments [14], we wondered whether there may be potential pedagogical implications—and therefore confounding factors—of using a mix of single- and multiple-preceptor assessments when our primary outcome was to compare two different methods of student assessment. In particular, we wondered about the necessary but potentially confounding role academic handover could have on impact the ability for preceptors and students to create, implement, and observe action plans based on feedback provided, response to feedback being a component of the CCERR score. As the vast majority of our assessments in ambulatory pediatrics are done by single preceptors, we did not have the sample size to divide our DEC and ITER cohorts into single- and multiple-preceptor sub-cohorts to assess this effect. Given the primary aim, we did have adequate single preceptor assessors to adequately power our study to assess the primary outcome of the effect of DEC vs. ITER on quality of narrative feedback. Our study had important limitations. Our sub-analyses were under-powered, so it is possible that preceptor characteristics apart from CTU responsibilities may have had more of an impact on assessment quality than we were able to observe. However, we do believe that the robust effect observed (and documented in our accompanying tables) in our primary analysis—the change in CCERR score associated with a change in method of assessment—supports our conclusion that changing the method of student assessment from ITERs to DECs has played a significant role in improving the quality of student assessments in our setting (Table 3). Finally, as this study was conducted on student assessments completed in the first year of the DEC-based system, it is not yet clear whether the increase in feedback quality that we observed will persist over time. However, given that the effect we observed was independent of any targeted faculty development, and thus seeming to be inherent to the tool used to provide feedback, we hypothesize that we will continue to see the benefits of our transition to a DEC-based system. In any event, we do look forward to continuing to survey and analyze the quality of feedback provided to our medical clerks.
In summary, we have observed that a method of student assessment based on daily encounter cards is associated with an objective improvement in the quality of assessment provided. This has important implications for students, preceptors, and clerkship curriculum directors in terms of determining the most valuable means of student assessment for their clerkship rotations, and for directing future faculty development.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Hewson MG, Little ML. Giving feedback in medical education: verification of recommended techniques. J Gen Intern Med. 1998;13:111–116. doi: 10.1046/j.1525-1497.1998.00027.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sender Liberman A, Liberman M, Steinart Y, et al. Surgery residents and attending surgeons have difference perceptions of feedback. Med Teach. 2009;27:470–472. doi: 10.1080/0142590500129183. [DOI] [PubMed] [Google Scholar]
- 3.Watling CJ, Kenyon CF, Zibrowski EM, Schulz V, Goldszmidt MA, Singh I, Maddocks HL, Lingard L. Rules of engagement: Resident’s perceptions of the in-training evaluation process. Acad Med. 2008;83:s97–s100. doi: 10.1097/ACM.0b013e318183e78c. [DOI] [PubMed] [Google Scholar]
- 4.Watling CJ, Kenyon CF, Schulz V, et al. An exploration of faculty perspectives on the in-training evaluation of residents. Acad Med. 2010;85:1157–1162. doi: 10.1097/ACM.0b013e3181e19722. [DOI] [PubMed] [Google Scholar]
- 5.Dudek NL, Marks MB, Regehr G. Failure to fail: the perspectives of clinical supervisors. Acad Med. 2005;80:s84–s87. doi: 10.1097/00001888-200510001-00023. [DOI] [PubMed] [Google Scholar]
- 6.Patel R, Drover A, Chafe R. Pediatric faculty and residents’ perspectives on in-training evaluation reports (ITERs) Can Med Educ J. 2015;6:e41–e53. doi: 10.36834/cmej.36668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gigante J, Dell M, Sharkey A. Getting beyond “good job”: how to give effective feedback. Pediatr. 2011;127:205–207. doi: 10.1542/peds.2010-3351. [DOI] [PubMed] [Google Scholar]
- 8.Hatala R, Norman GR. In-training evaluation during an internal medicine clerkship. Acad Med. 1999;74:s118–s120. doi: 10.1097/00001888-199910000-00059. [DOI] [PubMed] [Google Scholar]
- 9.Kogan JR, Holmboe E. Realizing the promise and importance of performance-based assessment. Teach Learn Med. 2013;25:s68–s74. doi: 10.1080/10401334.2013.842912. [DOI] [PubMed] [Google Scholar]
- 10.Dudek NL, Marks MB, Wood TJ, Lee AC. Assessing the quality of supervisors’ completed clinical evaluation reports. Med Educ. 2008;42:816–822. doi: 10.1111/j.1365-2923.2008.03105.x. [DOI] [PubMed] [Google Scholar]
- 11.Cheung WJ, Dudek N, Wood TJ, Frank JR. Daily encounter cards - evaluating the quality of documented assessments. J Grad Med Educ. 2016;8:601–604. doi: 10.4300/JGME-D-15-00505.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lye PS, Biernat KA, Bragg DS, et al. A pleasure to work with - an analysis of written comments on student evaluations. Ambul Pediatr. 2001;1:128–131. doi: 10.1367/1539-4409(2001)001<0128:APTWWA>2.0.CO;2. [DOI] [PubMed] [Google Scholar]
- 13.Royal College of Physicians and Surgeons of Canada. Competence by Design: The Rationale for Change. In: Royal College of Physicians and Surgeons of Canada. n.d. http://www.royalcollege.ca/rcsite/cbd/rationale-why-cbd-e. Accessed 10 Feb 2019.
- 14.Cheung WJ, Dudek NL, Wood TJ, Frank JR. Supervisor-trainee continuity and the quality of work-based assessments. Med Educ. 2017;51:1260–1268. doi: 10.1111/medu.13415. [DOI] [PubMed] [Google Scholar]

