. 2021 Jun 22;36(9):2745–2754. doi: 10.1007/s11606-021-06916-0

Table 3.

Results of the 17 Included Empirical Studies

No.	Source	Teaching/learning method	Assessment/measurement tools	Improvement in CR?	KPM level
1.	Mamede et al., 2012	- Reflective reasoning	- Participants solved 12 clinical cases using either: (1) Non-analytical reasoning (2) Reflective reasoning (3) Deliberation without attention - Their results for non-analytical reasoning and reflective reasoning were evaluated by 5 board-certified internists	No (students did not demonstrate an increased number of correct diagnoses)	2
2.	Mamede et al., 2012	- Structured reflection	- Medical students diagnosed six clinical cases using either: (1) Structured reflection (2) Immediate diagnosis (3) Differential diagnosis generation - Students’ diagnostic performance during their learning phase, immediate post-test phase, and delayed (1 week) post-test phase were compared	Yes (improvement in performances for delayed test phase was observed)	2
3.	Mamede et al., 2014	- Structured reflection	- Medical students diagnosed four clinical of two criterion diseases using either: (1) Structured reflection (2) Single diagnosis (3) Differential diagnosis generation - One week later, students were requested to diagnose: (a) Two novel exemplars of each criterion disease (b) Four cases of new diseases that were plausible alternative diagnosis to the criterion diseases in the learning phase	Yes (students obtained higher mean diagnostic accuracy score when diagnosing new exemplars of criterion diseases)	2
4.	Mamede et al., 2019	- Structured reflection	- Medical students diagnosed nine clinical cases using either: (1) Free reflection (2) Cued reflection (3) Modeled reflection - Two weeks later, students were requested to diagnose: (a) Four novel exemplars of diseases studied in first phase (b) Four cases of adjacent disease (c) Two fillers	Yes (students performed better in terms of diagnostic accuracy measured by the number of correctly diagnosed cases)	2
5.	Myung et al., 2013	- Structured reflection	- Mean diagnostic scores from four objective structured clinical examinations (OSCE) were compared between the control and intervention group	Yes (students have a significantly higher mean diagnostic accuracy score)	2
6.	Chamberland et al., 2015	SE - Residents’ SEs with prompts - Residents’ SEs without prompts	- Students’ diagnostic performances were compared pre- and post-intervention, both immediately and after 1 week	Yes (students obtained a higher diagnostic accuracy and performance score)	2
7.	Chamberland et al., 2015	SE - Provided by peers - Provided by experts	- Students’ diagnostic performances were compared pre- and post-intervention, based on 4 clinical cases, both immediately and after 1 week	No (diagnostic performance did not show any difference on transfer cases)	2
8.	Chamberland et al., 2011	- SE	- Students’ diagnostic performances were compared pre- and post-intervention, both immediately and after 1 week, using a scoring grid developed to mark the questions	Yes (intervention group demonstrated better diagnostic performances for less familiar cases)	2
9.	Peixoto et al., 2017	SE - on pathophysiological mechanisms of disease	- Medical students solved 6 criterion cases with or without SE - One week later, the students solved 8 new cases of the same syndrome - Students’ performances were compared between both groups for: (1) Accuracy of initial diagnosis for the cases in the training phase (2) Accuracy of final diagnosis after SE have taken place (3) Accuracy of the initial diagnosis provided for the cases in the initial assessment	No (no improvement in diagnostic performances for all diseases)	2
10.	Chai et al., 2017	Generating DDx using: - Surgical sieve - Compass medicine (handheld tool)	- 30-min written test to generate possible DDx - The numbers of DDx generated before and after the teaching were compared	Yes (significantly greater number of differential diagnosis was generated)	2
11.	Chew et al., 2017	Generating DDx using: - TWED checklist	- Script concordance tests	No (intervention group did not score significantly greater)	2
12.	Lambe et al., 2018	Generating DDx using: - Short guided reflection process - Long guided reflection process	- A series of clinical cases were diagnosed by using first impressions, or by using a short or long guided reflection process - Participants were asked to rate their confidence at intervals	No (did not elicit more accurate final diagnoses than diagnosis based on 1^st impression)	2
13.	Shimizu et al., 2013	Generating DDx using: (1) DDXC Structured framework using: (1) GDBC	- Medical students were tasked to complete 5 diagnostic cases (2 difficult, 3 easy) using either DDXC or GDBC - The diagnostic mean score was compared between both groups	Yes (increased in proportion of correct diagnoses using DDXC)	2
14.	Sawanyawisuth et al., 2015	SNAPPS - In ambulatory clinic	- 12 outcome measures were used to assess an audio-recorded case presentation of medical students - Relevant measures were: (1) Number of basic clinical attributes of the chief complaint and history of present illness (2) Number of differential diagnoses considered (3) Number of justified diagnosis in the differential diagnosis (4) Number of basic attributes used to support the diagnosis (5) Number of presentations containing both history of present illness and physical exam findings	Yes (students had a greater number of differential diagnosis and more features to support the differential diagnosis)	2
15.	Wolpaw et al., 2009	- SNAPPS	- The content of each audio recording was coded for 10 presentation elements - Relevant measures are: (1) Basic attributes of chief content and history of present illness (2) Inclusion of both history and examination findings (3) Formulation of a differential diagnosis (4) Justification of the hypothesis in the differential (5) Comparing and contrasting hypotheses	Yes (students had a greater number of differential diagnosis and more features to support the differential diagnosis)	2
16.	Lee et al., 2010	3-h workshop on CR that used illness scripts, conducted with small-group teaching	- Using DTI to assess students’ CR style and attitude pre- and post-intervention - Students’ diagnostic performances in solving CR problems were compared pre- and post-intervention	Yes (improvement in diagnostic performances for CRP score)	2
17.	Blisset et al., 2012	- Use of schemas	- Written and practical test to assess diagnostic accuracy measured by percentage of questions answered correctly	Yes (students performed better on structured knowledge questions and had higher diagnostic success)	2

SE self-explanation, DDx differential diagnoses, DDXC differential diagnoses checklist, GDBC general debiasing checklist, DTI diagnostic thinking inventory