Skip to main content
GMS Journal for Medical Education logoLink to GMS Journal for Medical Education
. 2016 Aug 15;33(4):Doc65. doi: 10.3205/zma001064

Assessment formats in dental medicine: An overview

Prüfungsformate für die Zahnmedizin: eine Übersicht

Susanne Gerhard-Szep 1,*, Arndt Güntsch 2, Peter Pospiech 3, Andreas Söhnel 4, Petra Scheutzel 5, Torsten Wassmann 6, Tugba Zahn 7
PMCID: PMC5003142  PMID: 27579365

Abstract

Aim: At the annual meeting of German dentists in Frankfurt am Main in 2013, the Working Group for the Advancement of Dental Education (AKWLZ) initiated an interdisciplinary working group to address assessments in dental education. This paper presents an overview of the current work being done by this working group, some of whose members are also actively involved in the German Association for Medical Education's (GMA) working group for dental education. The aim is to present a summary of the current state of research on this topic for all those who participate in the design, administration and evaluation of university-specific assessments in dentistry.

Method: Based on systematic literature research, the testing scenarios listed in the National Competency-based Catalogue of Learning Objectives (NKLZ) have been compiled and presented in tables according to assessment value.

Results: Different assessment scenarios are described briefly in table form addressing validity (V), reliability (R), acceptance (A), cost (C), feasibility (F), and the influence on teaching and learning (EI) as presented in the current literature. Infoboxes were deliberately chosen to allow readers quick access to the information and to facilitate comparisons between the various assessment formats. Following each description is a list summarizing the uses in dental and medical education.

Conclusion: This overview provides a summary of competency-based testing formats. It is meant to have a formative effect on dental and medical schools and provide support for developing workplace-based strategies in dental education for learning, teaching and testing in the future.

1. Starting point

A concerted connection between teaching and testing (constructive alignment) is crucial to impart dental competencies during university study [1].

Also connected with the definition of competency-based learning objectives are the appropriate testing formats that measure the requisite combination of knowledge, practical skills and ability to engage in professional decision-making for each particular task (see Figure 1 (Fig. 1)).

Figure 1. Examples of assessment scenarios depending on competency level according to the requirements in Miller and the National Competency-based Catalogue of Learning Objectives for Undergraduate Dental Education (NKLZ).

Figure 1

2. Method

A survey of the literature was undertaken between January 17 and December 17, 2014, in the databases of the German National Library (DNB), MEDLINE using the PubMed interface, Excerpta Medica Database (EMBASE), Education Resource Information Centre (ERIC), Cochrane Library, Science Citation Index, and Google Scholar. The search was conducted automatically and supplemented manually. In addition, available dissertations, open-access publications by German Medical Science (GMS), BEME (Best Evidence Medical and Health Professional Education), including German-language conference proceedings, such as the AKWLZ and GMA, were evaluated. The search terms included the German equivalents for “MCQ”; “MEQ”; “multiple choice”; “MC”; “multiple-choice questionnaire”; “SMP”; “structured oral examination”; “SOE”; “key feature”; “OSCE”; “OSPE”; “standardized patient”; “CEX”; “miniCEX”; “entrustable professional activities”; “DOPS”; “portfolio”, “multi-source feedback” in combination with “AND” and “dental”; “medicine”; “education”; “assessment.”

In an initial step, literature was selected based on title and abstract in accordance with pre-defined inclusion and exclusion criteria (inclusion criteria: published 1966-2013 in German or English with topical relevance; exclusion criteria: failure to meet inclusion criteria, full text not available in English or German, lack of topical relevance). The selected publications were then evaluated in terms of their relevance to the issue at hand and excluded, as required.

The articles were analyzed, and the results were described according to categories based on the value of assessments [2]. These categories cover the parameters of validity (V), reliability (R), acceptance (A), cost (C), feasibility (F) and influence on teaching and learning (EI). The results were then organized according to the evaluation parameters above. These criteria were further developed in 2011 by the working group headed by Norcini, and the parameters of “equivalence” (assessments conducted at different sites) and “catalytic effect” (consequences for the medical school) were added [3]. In this overview, both of these additional parameters were included and discussed in text form. The focus of this analysis has been carried out using the six criteria listed above (V, R, A, C, F, EI) in the form of tables to allow for clarity and comparisons.

3. Results

In total, n=223 publications were identified using the search strategy outlined above and drawn upon as the basis for the following analyses.

4. Discussion

Structured oral examinations and the multiple-choice questionnaire (MCQ) are suited for testing theoretical knowledge, meaning descriptive knowledge of competency level 1 [4], [5], [6], [7]. The MCQ is a written assessment with several response options (closed questions), of which a single choice or multiple ones (multiple choice, multiple select) can be the right answer. After a brief introduction of the content and question come the response options that include the correct answer(s) and distracters (wrong answers). Multiple-choice exams can be paper-based, combined with computer-assisted grading, or even administered entirely at computer workstations [4].

Use in medical and dental education

  • MCQs are presently used in both medical and dental study programs [6], [8].

  • The most important preliminary and final examinations include multiple-choice questions: the preliminary exam in natural science (NVP), preliminary dental exam (ZVP), dental exam (ZP), first and second state medical exams (ÄP). Moreover, MCQs are specifically found in all pre-clinical and clinical subjects in both study programs; this type of question represents one of the most traditional, predominating assessment formats [4], [8].

To assess factual knowledge, the MCQ offers a cost-efficient testing format with high reliability and validity if the questions correspond to the quality criteria. With MCQs it is possible to objectively test a large amount of content in a short period of time. However, this type of assessment can lead to superficial learning of facts.

Multiple-choice Questionnaire

Validity

  • Classified as high [9]

  • Quality criteria for questions must be met to have sufficient validity [10].

  • A high construct validity can be achieved if questions are subjected to a review process (e.g. via Item Management System [IMS]) [11].

Reliability

  • Classified as high [9], [11]

  • A minimum of 40 high-quality questions are needed to yield α Cronbach’s α of 0.8 [6].

Acceptance

  • Scoring is objective [4].

  • MCQs are considered fair if what has been taught corresponds with what is tested [12].

  • The possibility of passing by giving “strategic” responses, guessing, or picking up on cues is viewed critically by teachers [13], [14], [15], [16].

Cost

  • In light of the numbers and frequency of tests, it is an effective assessment format [9], [17].

  • A broad range of content can be assessed on one test [5].

  • Proportionally low costs [18]

  • Positive cost-benefit ratio

  • An existing question pool can be kept current at relatively little cost [19].

Feasibility

  • Effort is primarily involved in generating questions; administering and grading tests require much less time and resources.

  • Creating the question pool is associated with not insignificant costs [4].

  • Online assessment with digital scoring is possible [5].

  • A question pool shared by multiple universities increases efficiency via synergies (e.g. IMS) [20].

Influence on teaching and learning

  • Can lead to superficial learning [21]

  • Theoretical knowledge is more important than practical skills [4].

  • Correct responses are already given making passive recognition possible [14].

Structured oral examinations (SOE) are oral assessments which are conducted by an individual examiner or a panel of examiners.

Structured Oral Examination

Validity

  • Direct dependence on the degree of structuredness [22]

  • Validity increases with planning, design and conditions of testing [23], [24].

  • Validity is more dependent on the examiners than the method.

Reliability

  • Increases with the number of questions, length of assessment and decreases in the face of strongly differentiated scoring [10]

  • Reliability and objectivity increase with several examiners [10], [17].

  • Absolute verification of reliability is practically impossible [10].

  • With a Cronbach’s α of 0.65-0.88 [25], [26], [27], SOEs come out ahead of conventional exams (Cronbach’s α of 0.24-0.50) [25], [28], [29].

Acceptance

  • Performance-inhibiting stress, anxiety and other disruptive factors play a larger role compared to MCQs [12].

  • Acceptance by teachers and students is reduced by:

  • Intensive supervision by examiner

  • Justification of scores

  • Limited information during limited time

  • Questions or objections possible on the part of the student with no written test to refer to; the difference between content and type of response can lead to this [12].

Cost

  • More cost-intensive than MCQ exams [10]

  • Relativizes itself on high-stakes exams: emphasis is on reliability and validity, not on cost-effectiveness. [30], [31]

Feasibility

  • More effort is required compared with MCQ, high financial burden resulting from need for staff and rooms / logistics [10].

Influence on teaching and learning

  • Alongside facts, clinical reasoning, professional thinking, self-confidence and self-assurance can be assessed [12], [22].

  • Since students adapt their behavior to fit a test [4], [5], [18], extensive preparation can be assumed.

If an examination is taken before a panel, the examiners consult and agree on their evaluation of the examinee’s performance. Ideally, the final grades are assigned according to a blueprint governing exam content [7].

The SOE is a testing format that enables assessment of competency level 1 (NKLZ) and beyond within the scope of usual interactions in dental care. However, the higher expenses connected with the greater need for time and personnel should be noted, as well as the potential for performance-inhibiting stress in examinees.

Use in medical and dental education

  • Oral examinations with different degrees of structuredness are used in dental and medical study programs [8].

  • The most important preliminary and final assessments (high-stakes exams) in both study programs (NVP, ZVP, ZP, first and second ÄP) include SOEs in various settings. Furthermore, the SOE is represented in all pre-clinical and clinical subjects in both study programs; it represents one of the traditional, predominating assessment formats [4], [32].

Assessments that do not just measure factual knowledge (=descriptive knowledge: knows) [33], [34]), but also capture the ability to apply theoretical knowledge in a specific context to solve a problem or reach a clinical decision (= procedural knowledge: knows how), require a special testing format that is indeed capable of representing this skill. It must be noted that the ability to solve problems or reason is highly specific to context and always depends on the particular context-related factual knowledge [2], [35]. In addition to the SOE, other assessment formats for evaluating procedural knowledge are the written modified essay question (MEQ) and key features exam. These involve case-based, written assessments that evaluate active knowledge recall, problem-solving and higher order cognitive skills while simulating clinical situations in which decisions are made in the course of a physical examination, diagnosis and therapy. A patient’s history is presented in stages, after each of which several questions are responded to in writing or by selecting the best of several possible responses. Previous questions are partially explained in the following sections making it impermissible to flip back and forth between pages.

Use in medical and dental education

  • Developed in Great Britain in the 1970s for the membership examination of the Royal College of General Practitioners [36], [37], [38], [39], [40], [41].

  • Used internationally in the field of medicine, from undergraduate education to post-graduate training [42], [43], [44], [45], [46], [47], [48], [49], [50].

  • Used in Germany as an undergraduate testing format and as a written exam that replaces the state examination [51], [52] in model study programs (Witten/Herdecke, Cologne, Bochum, etc.).

  • Hardly any examples of use in dental education; potential areas of application include assessing problem-solving skills within POL and independent learning using case-based, problem-based learning [53], practical, case-based testing with virtual patient cases (e.g. in connection with procedures for handling acute toothache in endodontics) [54].

The MEQ represents a reliable instrument to assess context-specific, procedural knowledge in clinical situations if several basic rules are adhered to: 1. inclusion of the largest number of cases possible; 2. quality control of the pre-defined grading criteria for the write-in (WI) format by several evaluators; 3. computer-based short-menu (SM) or long-menu (LM) response format. Through the simulation of decision making in a clinical setting with questions that build off of each other, learning paired with feedback becomes part of the test experience. The MEQ format represents a significant addition to the written tests commonly used at present in dental education, but it is connected with distinctly higher costs than simply running down a list of MCQs to measure purely factual knowledge.

Modified Essay Question

Validity

  • Higher validity than for the MCQ format through case-based, context-rich question format [48], [55], [56]

  • Contradictory results for correlation (γ) between MEQs and the results of the final exam (NBME) and post-graduate performance in the first year of professional medical practice: γ 0 0.3/0.3–0.26 [57], γ=0.51 [56].

Reliability

  • Reliability (Cronbach’s α)=0.57–0.91 [38] depends on multiple factors [38], [39], [40], [47], [48], [58], [59]:

  • Quality of the predetermined performance scale

  • Response format (open-ended responses poorer than selecting from a given list)

  • Number of cases and questions

  • Number of graders

  • → e.g. increase of Cronbach’s α from 0.7 to 0.8 by increasing the number of questions from 7 to 12 or increasing the number of graders from 1 to 4 [40].

Acceptance

  • Students generally rate the MEQ positively [41], [51] since the MEQ format reflects practice more closely than the MCQ [60].

  • Teachers/examiners: greater effort involved in creating tests, coordination challenges [51]

Cost

  • Drafting and grading an MEQ are very time consuming and requires personnel [36], [41], [51].

  • Efforts can be minimized in terms of grading by using a computer-based testing format [61].

Feasibility

  • Generating and grading MEQs is distinctly more involved than for MCQs; difficult to design questions that actually measure the ability to solve problems or make clinical decisions and do not simply test factual knowledge [37], [41], [42], [43], [44], [45], [46], [50], [52], [53], [54], [62].

Influence on teaching and learning

  • MEQs simulate clinical reasoning processes enabling feedback and learning during the test [39], [51], [60].

In the key features exam (KFE) a case unfolds in a specific clinical situation about which multiple questions are asked focusing very closely on only those critical actions or decisions (key features) that are central to the key feature problem or those that are often done incorrectly [34], [63]. Key feature cases are developed in eight defined steps [34], [64], [65]]:

  1. identification of the domain or context;

  2. selection of a clinical situation;

  3. identification of the critical elements of the situation (key features [KF] of the problem);

  4. selection and description of the clinical scenario (case vignette);

  5. drafting of the questions about the key features of the problem (1-3 question per KF);

  6. determination of the response format (open-ended text = write-in, selection = short menu or long menu);

  7. generation of the evaluation scale; and

  8. content validation.

Use in medical and dental education

  • The KF assessment format proposed by Bordage und Page was developed to replace the commonly used written assessment of procedural knowledge using patient management problems (PMP) in medical specialty examinations [64], [65].

  • Transfer to undergraduate education by Hatala & Norman [66], used worldwide since in medical education as a written assessment format to evaluate context-specific procedural knowledge during the study phase and post-graduate education [67], [68].

  • Recognized testing format in the German-speaking countries in the field of medicine (see the detailed information on the design and implementation of assessments published by the medical schools at the Universities of Bern and Graz [34], [60], [69].

  • Studies and reports on the use of the KFE as a written assessment at German medical schools, including internal medicine (Universities of Freiburg, Heidelberg, and Munich [70], Universities of Heidelberg, Tübingen [71]), hematology and oncology (University of Düsseldorf [72]), communication skills (University of Witten-Herdecke [73]).

  • Extensive pilot project in veterinary medicine at the school of veterinary medicine at the University of Hanover [74].

  • Only a few reports of KF problems used as a written assessment format in dental education [75], [76].

The key feature exam is a valid and reliable instrument for assessing context-specific, procedural knowledge in connection with solving a clinical problem and represents a meaningful addition to the written testing formats currently used in dental education. KFEs can also be used in independent learning with virtual patient cases. For practical reasons, the computer-based format with the long menu response format is preferable to the paper-based version. It is also easier to hinder examinees from returning to previous pages or turning the pages out of order. To increase reliability, it is better to use many short KF cases (at least 15) with a maximum of three questions each than to use fewer, more in-depth cases with four or more questions.

Key Features Exam

Validity

  • High content validity (92-94%) when graded by teachers/examiners [63], [65], [67].

  • Piloting and regular review of the key features by students, teachers/examiners is a pre-requisite for high content validity [34], [63], [65].

  • When a LM format is intended, a WI format is recommended for the pilot to improve the quality of the LMs (supplement missing answers and distracters) [34].

  • Correlation between KFE scores and other assessment scores (e.g. MCQ) is only moderate (γ=0.35-0.54, [66], [70] which can be explained by the reference to different competency levels.

Reliability

  • Reliability of the KF format is higher than for the PMP format [65].

  • Due to greater case specificity [48], reliability is directly dependent on the number of KF problems (KFP=cases) → number of cases should be as high as possible; number of questions on each case should not exceed three items, since four or more reduces reliability [77].

  • The selected response format appears to influence reliability, when the same number of KF cases are used:

  • 15 KFPs with 1-4 questions, 2h length, WI format: Cronbach’s α=0.49 [66]

  • 15 KFPs with 3-5 questions, 1.5h length, computer-based LM format: Cronbach’s α=0.65 [70] → α=0.75 is possible with 25 KFs!

Acceptance

  • Students: relatively high acceptance [74], [78]: evaluated as realistic and supportive of practical learning.

Cost

  • Generation and validation of a KFE involves great amounts of time and staff [67].

Feasibility

  • Generating KFEs is more difficult and requires more time than an MCQ [60], [69].

  • Necessary testing time depends on the selected response format: LM>WI>SM>MC [79].

  • The advantages of LM response format (lower cueing effect than MCQ/SM, higher inter-rater reliability than WI) can be realized by using computer-based testing with a moderate testing time [70], [72], [79].

  • Testing time for 15 KFPs with 3-5 questions is 90 minutes for a computer-based exam [70] and 120 minutes for a paper-based test with a WI response format [66].

  • Practical examples exist [68], [71], [75], [76], [80].

Influence on teaching and learning

  • KFE format is closer to a real patient situation, promotes the learning of clinically relevant material and practical case-based learning [81][.

While study programs in dental medicine do impart advanced theoretical knowledge, they also require students to develop manual skills. Consequently, suitable assessment formats are needed to measure not only factual and procedural knowledge but also to give students an opportunity to demonstrate their practical abilities (shows how, [33]) and to evaluate this objectively. Simply “knows how” is raised a level to “shows how”.

When creating such assessments, the learning objectives should be selected in advance and only those which represent a practical competency level should be employed. Standardization of test and examiner allows for an objective assessment of student performance. Suitable assessment formats for this are objective structured clinical examinations (OSCE), objective structured practical examinations (OSPE) and the use of simulated, or standardized, patients (SP).

An OSCE is appropriate for evaluating practical skills and the ability to communicate [14]. Students pass through different stations where particular practical skills are demonstrated (including partial treatments) or mock medical consultations are conducted. Evaluations are documented using a checklist created by a group of experts according to how the exam content is weighted. Test time per station is around five minutes; two minutes need to be planned for the examinee to change stations and for the examiner to make final notes or give feedback.

Use in medical and dental education

  • Widely used internationally in all clinical subjects since its introduction.

  • Can be used in undergraduate and post-graduate programs [82], [83], [84], [85].

  • There are many examples of use in dental disciplines: pre-clinical phase [86], [87], [88], orthodontics [89], [90], oro-maxillofacial surgery [91], [92], [93], restorative dentistry [87], [94], [95], [96], parodontology [97], clinical prosthetics [86], pediatric dentistry [98], radiology [99], microbiology [94], [97].

  • Interdisciplinary OSCE [87], [94], [100], [101].

  • Integration of an OSCE in the preliminary dental exam [102].

  • Also used in dental education to evaluate communication skills [103], [104], problem-solving skills, and critical thinking [105].

  • If possible, feedback should be included as part of the exam.

The OCSE is a reliable and valid testing format to assess individual competencies; it enjoys a high level of acceptance by students and teachers.

Objective Structured Clinical Examination

Validity

  • Predictive validity

  • Significant correlation between OSCE and performance on practical tests and scores on preliminary practical medical exams p<0.01 [87]

  • No correlation between OSCE and MCQ [105]

  • High content and construct validities [4], [106]

  • High face validity [107]

  • Acceptable predictive validity [108]

  • Caution required if students have a language problem or suffer from high levels of stress [109]

  • Attention must be paid to the blueprint [106], [110]

  • Determine content areas early [110]

  • Define questions within the content areas [110]

Reliability

  • Cronbach’s α between 0.11-0.97 [4]

  • High reliability among OSCEs, with fewer than n=10 stations it being approximately 0.56, with more than n=10 stations, 0.74 [111]

  • Varying recommendations on station number:

  • at least 19 [4]

  • 14-18 for 5-10 minutes each [106], [112]

  • Stations with an SP should be assessed for at least 15 minutes [110]

  • The more examiners, the higher the values.

  • [111], [113] Method of evaluation critical: high values for global assessments, combinations of global assessments and checklists are good, only checklists alone are least suitable

  • Post-OSCE tests increase the reliability [110].

Acceptance

  • Students: high acceptance, appropriate testing format for functional skills [96]

  • Teachers/examiners: high acceptance [112], [114], [115], [116]

Cost

  • 181 Euros/examinee

  • 86-130 Euros/examinee [4], [117]

  • 2.5h/examinee [118]

  • $15-200/examinee [117], [119], [120]

  • Higher costs when including SPs [121]

  • $21-200/examinee [122]

Feasibility

  • Testing format demands great amounts of time and resources [106], [119]

  • Thorough preparation needed:

  • Establishing shared structures helps on interdisciplinary OSCEs [100].

  • Evaluation by external examiners is recommended.

  • Ensure the quality of SPs

  • Station content should be selected to match the OSCE scenario.

  • Peer reviews pre- and post-OSCE (psychometric analysis with difficulty, discrimination, etc. is recommended)

  • Take the extent of the examiner’s experience, field of expertise, sex, and level of fatigue into consideration [3], [106], [112].

  • Practical examples exist [95], [102].

Influence on teaching and learning

  • Positive influence on learning [106], [108], [123]

  • Stimulates learning [112]

  • Learning at the stations has little to do with the reality of patients [112].

  • Allot time for feedback [110]

Due to the extensive preparation involved before and after its administration, interdisciplinary cooperation is recommended to minimize this disadvantage. OSCEs can be substituted for previously used assessment formats or supplement them in meaningful ways. A sufficient number of stations (n>10), a blueprint, peer review of station content and the scoring criteria, as well as a balance among the modes of evaluation (global, checklist, combination), training the examiners and, if needed, conducting a pilot OSCE should be taken into account when designing an OSCE. A special type of OSCE is embodied in the objective structured practical examination (OSPE) during which practical skills, knowledge and/or interpretation of data are demonstrated in a non-clinical situation [124]. These assessments can be conducted in labs or simulated stations in SimLab. In contrast to the OSCE, an entire process can be evaluated through to the end result (for instance, a dental filling).

It is possible to confidently assess practical skills and/or the interpretation of clinical data with the OSPE. This format involves a reliable and valid assessment method to evaluate individual competencies; The OSPE enjoys a high level of acceptance by students and teachers.

Objective Structured Practical Examination

Validity

  • High validity, γ>7

  • High construct validity

Reliability

  • High reliability among the stations, Cronbach’s α=0.8 [125]

  • Inter-rater reliability ICC>0.7

  • High inter-rater reliability with equivalent levels of experience and knowledge among examiners, γ=0.79-0.93; p<0.001

Acceptance

  • Students: high acceptance [126], [127]

  • felt to be a “fair test” [128]

  • preferred over traditional exam formats [126]

  • Teachers: relevant, fair, objective and reliable testing format

Cost

  • No information available

Feasibility

  • Requires extensive planning and teamwork [128]

Influence on teaching and learning

  • Individual competencies can be assessed, the need to demonstrate factual and procedural knowledge influences learning behavior [128].

  • Makes strengths and weaknesses in practical skills discernible [129]

  • Stimulates learning [129]

  • Positive learning experience [130]

Defined grading criteria for each step within a process are necessary.

Use in medical and dental education

  • OSPEs are administered around the world in medicine, including pharmacology [128], physiology, forensic medicine [130], and dentistry [131], [132].

  • In Germany they are primarily used in the pre-clinical phase of dental education [133].

Simulated, or standardized, patients in dental education are specially trained (lay) actors who are capable of acting out common clinical pictures or typical occasions for dental consultations. They are used for both practicing and assessing doctor-patient consultations and examination techniques; the use of an SP also provides opportunities to learn how to conduct physical examinations and acquire better communication skills. It is also possible to incorporate SPs into assessments, most frequently in OSCE scenarios.

Standardized patients can be used to assess doctor-patient interactions and examination techniques. They are especially suited for evaluation of clinical competencies and communication skills within the scope of an OSCE. When implementing this, the complexity of the case should be tailored to match the testing scenario.

Standardized Patients

Validity

  • Assesses clinical competencies [134]

Reliability

  • Consistent examination

  • (No significant differences between exam cohorts and time points) [135]

Acceptance

  • Use of standardized patients (SP) within the scope of an OSCE station [136]

Cost

  • 10-18 Euro/examinee [136]

Feasibility

  • Case complexity can be controlled and adjusted to reflect educational level [137]

  • Faculty members can determine relevant learning objectives and coordinate role creation.

  • Greater need for time and staff to select and train SPs and to monitor for quality [137]

  • Checklists to record all SP observations of the doctor-patient consultation [138]

  • Practical examples exist [139].

Influences on teaching and learning

  • Improves students’ clinical skills [140]

Use in medical and dental education

  • This method has been used in clinical education since the 1960s [138].

  • Patient contact can be simulated under standardized conditions [139].

  • SPs can also provide feedback and critique the examinee’s abilities [139].

The term “workplace-based assessment” (WBA) encompasses a wide variety of testing scenarios meant to assess practical skills associated with treating patients in complex situations.

The clinical evaluation exercise (CEX) involves a workplace-based assessment in the clinical setting that stretches over a longer period of time (several hours to days) and covering treatment processes during which an examinee conducts a consultation with a single patient recording a patient health history and carrying out a physical examination. A maximum of two assessors should participate, but do not generally have to be present the entire time. Often the data is collected from the patient without the assessor being present. This assessment format, also known as the tCEX (traditional CEX), represents a single event measure.

Use in medical and dental education

  • Originally developed in the 1960s as an assessment in internal medicine by the American Board of Internal Medicine (ABIM), it replaced the oral examination as the standard method in 1972 [141], [142].

  • Replaced by the mini-CEX around 1995 [143], [144].

  • No documented examples of use in dental education are found in the literature.

This assessment format is an instrument of low validity and poor reliability for testing practical skills in complex situations. It is possible to improve the assessment by using the greatest number of patients possible (cases), the greatest number of assessors possible, and the most structured evaluation instruments possible. In addition, providing feedback as part of this testing format should be mandatory. Overall, it can be asserted that in dental education the CEX is a reasonable assessment format for measuring practical competencies in complex situations only if the previously mentioned attempts at improvement have been made.

Clinical Evaluation Exercise

Validity

  • Insufficient content validity; does not completely cover curricular learning objectives [145]

  • Simulated situation, does not correspond with the reality of medical practice since it is too long and detailed [144]

Reliability

  • Questionable reliability since only few exercises can be done due to the great amount of time needed [146]

  • Low inter-rater reliability [147]

  • Cronbach’s α is 0.24 for one case and even for two cases only 0.39 [141].

Acceptance

  • Low level of acceptance since it is very dependent on the assessor [148]

Cost

  • Less costly than the OSCE because real patients are used who do not need to be trained [145]

Feasibility

  • Relatively simple since no special preparation is necessary [141]

  • Practical examples exist [142], [143].

Influence on teaching and learning

  • Patient-oriented, real-life situations [141]

The mini-clinical evaluation exercise (mCEX) is a patient-centered assessment format in the clinical setting that, in contrast to the CEX, requires a shorter amount of time and always includes feedback (approximately 15 minutes of assessment and 10 minutes of feedback). This testing format can be described as having three phases: observation, documentation and feedback. Over the course of the assessment, several assessors observe the examinee and evaluate what they see according to pre-defined criteria. Medical care is given to more than one patient under normal circumstances with a focus on communication and clinical examination [144]. Evaluations are generally formulated according to defined criteria valid for each examinee. These criteria can consist of a rating scale and/or short written comments. The difficulty remains in terms of the different patients undergoing physical examination. Viewed according to Miller’s pyramid, a high level of practical skill is attained. Strictly speaking, it is a structured clinical observation.

Mini-Clinical Evaluation Exercise

Validity

  • Higher validity than CEX [149]

  • Acceptable validity and reliability have been demonstrated [146], [150].

  • Able to validly differentiate between competency levels (first year, second year, etc.) [151]

Reliability

  • Low inter-rater reliability [149]

  • A minimum of 10 evaluations are necessary to yield reliable results; a larger number is better [151]

  • At least 12-14 evaluations are recommended per year if there are different assessors to increase inter-rater reliability [152].

  • Reliability of G=0.4 for 10 evaluations; G=0.8 for 50 evaluations [151]

  • Dependent on number of assessors: if there is one examiner, a minimum of eight observations of different patients are necessary for a reliability of 0.8, in the case of two, four are necessary, and for three examiners, three observations [153].

  • Nine items are better than five to cover differences in competencies [154].

Acceptance

  • High level of satisfaction for students and teachers [151], [155], [156]

  • Implementation is at present slow, since it involves something new [156].

  • Partially problematic due to discrepancies between self-assessment and assessment by another [157].

Cost

  • Substantial expense as a consequence of the amount of time needed [158], [159].

Feasibility

  • Observations of authentic doctor-patient interactions by different educators in different situations; feedback on different clinical pictures at different locations each with a different focus [155]

  • Thorough planning is necessary because giving feedback takes 8-17 minutes [155], [160].

  • Relatively simple to implement with enough flexibility in the dental setting [161]

  • Practical examples exist [162].

Influence on teaching and learning

  • Improvement in competency through regular feedback from experts [163]

  • Examiner/examinee receive feedback or a clear impression of clinical work making targeted mentoring possible [156].

  • Giving constructive feedback must be learned and practiced; teaching skills are needed [164].

  • No new discoveries or knowledge in comparison with traditional evaluation procedures [158]

  • No influence in comparison with control groups [153]

  • Learning objectives must reflect teaching content [165].

  • Predictive validity between OSCE and mCEX cannot be demonstrated [165].

This assessment format is frequently referred to as the mCEX (mini-CEX) and represents a single event measure.

Use in medical and dental education

  • Developed in 1995 by Norcini [144]; replaced the tCEX in the 1990s.

  • Reliability depends heavily on the number of assessors and cases [151], [153].

  • Several documented instances in the literature of use in dental medicine (Dental Foundation Training in Great Britain), however, often without any precise information on the evaluation instruments [161], [162].

The mCEX is a valid and reliable instrument to assess practical skills in complex situations. Options for improvement include 1. increasing the number of response items (nine are better than five) or increasing the number of observations (a minimum of 10 observations are needed) and 2. offering train-the-teacher programs (for instance in the form of video demonstrations and role playing). Longitudinal use is recommended with implementation conceivable in a wide variety of different settings (including high-stakes exams). The mCEX format is a good testing format for use in dental education to measure practical competencies in dental medicine.

Entrustable professional activities close the gap between the theory of competency-based education and patient-centered practice in a clinical context [166]. This method first became known for its use in the area of post-graduate education; since 2013 it has also appeared in undergraduate medical education [167], [168]. The integration of theoretical and practical knowledge to solve complex problems is assessed (e.g. anamnesis, clinical examination of a patient in connection with different reasons for seeking medical advice) using existing competency-based roles, such as those defined by CanMeds or ACGME. During the assessment it is determined whether the examinee is able to perform the activity while receiving directions, under supervision, with occasional assistance, or independently [169], [170]. As a result, different performance levels can be identified [171]. It is not individual learning objectives that are assessed, but rather an overall activity centering on a patient [172]. In order to differentiate EPAs from general learning objectives, it is recommended that following sentence be completed: One day, the doctor/dentist will be expected to do (insert particular activity) without direct supervision [166]. According to its definition, an EPA should include activities that are important to daily practice, very often are subject to error when being performed, and integrate multiple competencies [172], [173]. Consequently, an EPA consists of diverse roles, each role, in turn, of multiple learning objectives, and each learning objective of different performance levels. The assessment can be a direct or indirect observation and include feedback. It is crucial that the observed performance of the examinee is combined with the performance evaluation over a defined period of time.

Entrustable Professional Activities

Validity

  • High face validity [174]

Reliability

  • Low inter-rater reliability [175]

Acceptance

  • Potential for wide acceptance [166]

  • Helps those learning to develop their own study schedule [176]

  • Helps the entire faculty to maintain transparency in education [176]

Costs

  • No information available

Feasibility

  • Initially requires intensive, well thought-out preparation while EPAs are being designed [177]

  • 20-30 EPAs are recommended for a degree program [177]

  • Practical examples exist [178], [179]

Influence on teaching and learning

  • EPAs require numerous competencies in an integrated, holistic manner [177].

  • Methods of evaluation that focus on the required degree of supervision [180]

  • Feedback is vital [174].

  • Support from the faculty is necessary [175].

  • Enables a broad (panoramic) view of the educational program [174].

A commonly reported combination is that of the mCEX with MSF (Multi-source feedback). Strictly speaking, this involves a multiple event measure.

Use in medical and dental education

  • Introduced in the Netherlands by ten Cate in 2005; since then it has been used in the fields of surgery, family medicine, internal medicine, neurology, emergency medicine, pediatrics, urology, and is used widely by the Royal Australian and New Zealand College of Psychiatrists [178], [179].

  • Initially in the pilot phase in German medical education [165].

  • No documented instances of use in dental medicine

EPAs are a relatively new, little researched instrument for assessing practical skills in complex situations. The implementation of EPAs requires extensive and well thought-out preparation when determining the focus. To the extent possible, a maximum of 30 interdisciplinary EPAs per curricular unit should be defined drawing upon input from university instructors and practicing physicians or dentists. EPAs create a realistic link between competency-based learning objectives and higher level activities. Train-the-teacher programs (with practice giving feedback) should improve implementation. Longitudinal use is recommended. Implementation is conceivable in a wide variety of settings, including high-stakes exams. The EPA format represents an innovative approach with great future potential in terms of assessing practical skills in complex situations in dental education.

Similar to the mCEX, Directly Observed Procedural Skills (DOPS) entail a short workplace-based assessment in a clinical setting that includes feedback (approximately 15 minutes of assessment and 10 minutes of feedback). This also involves a three-phase assessment in which observation, documentation and feedback occur. Treatment given to (multiple) patients under conditions typical to a medical practice, as with the mCEX, but with a focus on manual skills and interventions observed by several assessors and evaluated according to defined criteria. This assessment format also represents a single event measure.

Use in medical and dental education

  • Originally introduced in the United Kingdom by the General Medical Council in 2002 [144].

  • Use reported in the fields of general medicine, surgery, and internal medicine [181].

  • International reports of use in dentistry in Iran (universities of Shiraz and Mashad) and at Kings College in London [182], [183].

DOPS is a valid and reliable instrument to evaluate practical skills in complex situations. It is possible to improve this format by having three assessors intervene during two observations, conducting at least two observations, and by holding train-the-teacher sessions. Overall, longitudinal use is recommended. Implementation is conceivable in diverse settings, including high-stakes exams. The DOPS format is a very reasonable testing format to capture practical skills in complex situations during dental education.

Directly Observed Procedural Skills

Validity

  • High face validity [181]

  • Formative assessment tool [182]

  • Significantly different from MCQ; provides different assessments of student performance [182]

  • Separate assessment tool that does not enable an overall evaluation; a system with different possibilities is needed [184].

  • DOPS efficiently evaluates practical skills [182].

Reliability

  • To achieve a high reliability, at least three assessors should observe a student during two different case scenarios [181].

  • G=0.81 [185]

  • Internal consistency is 0.94 and inter-rater reliability is 0.81

  • Students do not view it as suitable for improving inter-rater reliability [186].

  • Substantial differences between the assessors can influence the validity of the results if there has not been strict standardization [187].

  • Good reliability and consensus among assessors is possible [188].

  • Fewer assessors are needed in comparison with the mCEX [160].

  • Fewer assessors and cases are needed in comparison with the mCEX [181].

  • Higher item correlation values than for the mCEX: 0.7-0.8 versus 0.5-0.8 [150], [189]

  • Reliability depends on the case [181].

  • Reliability independent of process [160]

Acceptance

  • High acceptance by students [186]

  • Examinees find the scenarios to be stressful, but appreciate the feedback [190].

Cost

  • Substantial expense is to be expected [159], [191].

Feasibility

  • Great amount of time needed [163], [182]

  • Great amount of time needed for preparing DOPS, including giving feedback [160][

  • To increase the learning effect, it is necessary to give feedback directly after the assessment and to address strengths and weaknesses [192].

  • Assessors must be trained in advance [12].

  • It is feasible to use only one assessor [193].

Influence on teaching and learning

  • Examinees perceive a positive influence on independence and the learning process [186][.

  • DOPS assessment improves practical clinical skills [192].

  • Positive effect through directly observing the learner [192]

  • Promotes an in-depth approach to learning in the clinical context [21]

  • Positive influence on student reflections [181]

  • Seventy percent of those observed believe that DOPS is helpful for improving practical skills [194].

  • Compared to control groups there are significantly better results for DOPS regarding practical skills [195].

  • Can also be used in peer arrangements in the pre-clinical and clinical context [183]

The Portfolio as an assessment tool is a pre-defined, objectives-centered collection of student learning activities with assigned self-reflection exercises, as well as feedback [20]. Portfolio contents are developed in alignment with the learning process; the following aspects can be taken into consideration: personal experiences (what was done, seen, written, created?), learning process (awareness that what has been experienced is relevant to future medical or dental practice), documentation (certificates, etc.), future goals regarding learning (looking ahead), and learning environments [196]. Portfolios are a multiple event measure.

Use in medical and dental education

  • Portfolio-based learning was introduced in 1993 by the Royal College of General Practitioners, Portfolio assessing described by Shulman in 1998 [197], [198].

  • Publications in the fields of general medicine, otorhinolaryngology, internal medicine, pediatrics, public health at universities in Maastricht (NL), Nottingham (GB), and Arkansas (USA) [196].

  • Found in German medical education in Cologne [196].

  • International reports of use in dentistry [199], [200], [201].

The portfolio entails a highly valid and reliable instrument for evaluating practical skills in complex situations, one that assesses collected, cumulative information about performance and development. Possibilities for optimization exist when more than one neutral grader is used, the student’s mentor is not one of these graders, and train-the-teacher sessions on giving feedback are held. Longitudinal use is recommended. Implementation is conceivable in diverse setting, including high-stakes exams. The portfolio format represents a valuable assessment format to evaluate practical skills in complex situations in dental education.

Portfolio

Validity

  • Good validity if there is an appropriate selection of all required competency areas [202], [203].

Reliability

  • Cronbach’s α is 0.8 with four graders [204]

  • Cronbach’s α is 0.8 with 15 portfolio entries and two graders [202].

  • Use of a clear, competency-based master plan, clear grading criteria, inclusion of guidelines and experienced graders for development and evaluation [202], [203]

  • Uniform and consistent grading is difficult [200].

Acceptance

  • Portfolios are viewed as time consuming, a source of anxiety and not very effective [205].

  • The acceptance of portfolios decreases the longer students spend time on them [205].

Cost

  • No information available

Feasibility

  • A portfolio typically includes seven case reports, two presentations, three self-reflections [202].

  • Typical content includes diagnoses and treatment plans [202].

  • Problematic since there is a conflict when portfolios are used for both assessment and learning [205].

  • Difficulties being self-critical and honest [205]

  • Conducting interviews with students about portfolio content improved feasibility [206]

  • Practical examples exist [199], [201].

Influence on teaching and learning

  • Allows the assessment of competencies that could not otherwise be measured [200]

  • Portfolio content must be aligned with the learning objectives [202].

  • Increases self-knowledge and encourages critical thinking [205]

  • Improves the ability to learn independently and connects theory with practice [205]

  • Time consuming for grader and student [200], [207]

  • Students receive constructive feedback [207].

  • Calibration and validation are critically important [200].

  • Provides cumulative information on performance and progress [205]

  • When it is known that the portfolio will be graded, students attempt to fulfill expectations which, in turn, affects the portfolio’s content and educational value [205].

  • Positive effects are heavily dependent on the support, direction, time commitment and feedback given by the teacher [205].

Multi-source feedback, also known as 360-degree feedback (MSF, multi-rater feedback), involves a workplace-based assessment in a clinical setting incorporating different groups of people associated with that particular work setting and the examinee (peers, dentists, nursing staff, patients, administrators, etc.). The focus of the observations is on professional conduct and teamwork, as well as the examinee taking responsibility as the person in charge [208], [209]. These aspects are observed by several assessors and evaluated according to defined criteria. The “supervisor” is given a special role in this testing scenario: this person collects all the results and gives them to the examinee. As a result, the individuals who have given feedback remain anonymous. The student receives a comprehensive picture based on all the input from different sources. High acceptance is achieved through selection of the assessors. Narrative comments and metric rating scales can be combined. This format entails a multiple event measure.

Use in medical and dental education

  • Used in medicine since 1970, widespread in North America (Canada and USA), Europe (England, Holland), and Asia [210], [211].

  • Reports of use in the fields of general medicine, internal medicine, surgery, gynecology, psychiatry, pathology, and radiology, etc. [210].

  • Used in dental medicine by the Royal College of Surgeons of England, University of Bristol, UK Committee of Postgraduate Dental Deans.

  • Validated instruments exist for evaluation (PAR: Physicians Achievement Review, SPRAT: Sheffield Peer Assessment Tool).

This method consists of a highly valid and reliable instrument for evaluating practical skills in complex situations.

Multisource Evaluations

Validity

  • Can make it easier to evaluate inter-personal and communicative skills in particular [212]

  • Good validity [213]

Reliability

  • Review: to reach a value of 0.9 minimum for Cronbach’s α, eight medical assessors, eight non-medical assessors and 25 patients must participate [210]

  • High internal consistency (=0.8) with five assessors on two observed occasions [214]

  • To reach a value of 0.8 for Cronbach’s α, a minimum of 11 assessors must participate [215].

  • Value for Cronbach’s α is 0.98 [216].

  • Problematic due to the number of assessors required [217]

Acceptance

  • Rated 4.5 by examinees on a scale of 1-7 [214]

  • Rated 5.3 by assessors on a scale of 1-7 [214]

  • Evaluations are possibly too positive since anonymization is not fully trusted [217]

Cost

  • Expense needs to be taken into account before implementation [159].

Feasibility

  • Rated 4.4 by examinees on a scale of 1-7 [214]

  • Rated 5.1 by assessors on a scale of 1-7 [214]

  • Evaluations are generally verified via questionnaires making the process simple [159].

  • To achieve a valid assessment, a certain number of evaluations are necessary; however, not all are possible to do [217].

  • Ideally, feedback is gathered over a longer period of time [217].

  • Can be easily implemented, even in a busy hospital [211], [218]

Influence on teaching and learning

  • General improvement in clinical work, communication with co-workers and patients [219]

  • Rated 4.2 by examinees on a scale of 1-7 [214]

  • Rated 4.4 by assessors on a scale of 1-7 [214]

  • Improvement of the evaluation process, advantage of receiving more detailed information and being exposed to different perspectives [217]

  • Varying results: improvement in communication and conduct after receiving 360° feedback [220].

  • Immensely time consuming and no improvement in assessment as a consequence of the feedback [221]

  • It is possible to identify weak performers at an early stage [218].

  • Feedback from SPs for students also possible [222].

Belonging to the success factors are a clear definition of the objectives and the sources of feedback. An important role is played by the selection of the assessors, credibility of the assessors and their familiarity with the situation under evaluation, along with the anonymity of the individuals supplying the feedback. This format can be optimized by using approximately five assessors for two observed situations and holding train-the-teacher sessions concerning constructive feedback. The combination of external feedback with self-evaluation by the examinee can be helpful, as can be jointly determining specific learning objectives for the future, including the discussion and documentation of concrete learning opportunities and supports. Longitudinal use is recommended. Implementation is also conceivable in diverse setting, including high-stakes exams. The MSF format represents a valuable assessment format for evaluating practical skills in complex situations in dental education.

5. Conclusion

The range of assessment methods presented in this overview significantly broadens the spectrum of already established university-specific exams—mostly MCQs and (structured) oral exams. Each of the methods outlined here meets different requirements and thus covers different competency levels. This must be taken into particular consideration by those who are involved in designing, administering and evaluating assessments in dental medicine.

When developing and implementing a curriculum, not only the choice of assessment format is critical but also noting the general functions of an exam, which in turn has an effect on the curriculum [223]: assessments can be summative or formative. Summative assessments usually come at the end of a semester or after a skill has been taught in order to evaluate learning outcomes. Formative assessments are reflective of the learning process itself and do not determine whether a student passes or fails a course or is ultimately successful in displaying the mastery of a particular competency. Such an assessment shows students their current level of proficiency and is supposed to support the learning process through reflection by students on their weaknesses. Purely formative assessments are few in the face of limited staffing resources and time constraints, but are an ideal tool for fostering the learning process.

Within the scope of drafting the NKLZ it became clear that in the future other assessment formats will be needed in addition to the established methods such as oral examinations and MC exams; these new formats will need to measure required practical skills in dental medicine, not just in the Skills Lab, but also in patient treatment. Each assessment format should correspond with the targeted competency levels.

The presentation of the assessment formats in this overview enables quick orientation within each method and makes reference to relevant literature for those who wish to know more. Including even more detailed information on each of the assessment formats would have compromised the intended character of this article as an overview. Along with theoretical knowledge of an assessment format, it is important to engage in direct exchange with colleagues in higher education who are already following a particular method. For this reason, it is desirable, and perhaps the task of the relevant working groups, to establish a network of professionals who have already gathered experience with special assessment formats and who are willing to make themselves available to those with questions. Depending upon demand, continuing education programs could emerge from such a network providing substantial assistance in implementing new assessment formats.

6. Outlook

With the new licensing regulations for dentists (Approbationsordnung), German dental education will be brought up to date and more closely linked to medical education. The assessment methods mentioned as examples in the NKLZ and outlined in this paper demonstrate the various options for assessing at the competency level. After experience has been gathered with university examinations in dental education and following scientific analysis of these testing methods, additional appropriate assessment methods should be included in the licensing requirements for dentistry. These should also be used to improve the quality of the state examinations.

Together with the introduction of the NKLZ, compiling experience in organizing, preparing, administering, conducting and evaluating the assessment formats profiled here will be an important task in the coming years, whereby dental medicine can make good use of the competencies under development for medical students since 2002. Dental medicine can also bring to bear its own experience and expertise in the assessment of practical skills. Our shared goal should be to continue developing assessment formats for the different competency levels in dental and medical education in cooperation with the German medical schools.

Acknowledgements

The authors wish to extend their gratitude to all those who have helped to write, edit and finalize this article. Special thanks to the executive board of AKWLZ, especially Prof. P. Hahn, MME (University of Freiburg) and Prof. H.-J Wenz, MME (University of Kiel), for the detailed feedback and suggestions for improvement.

Compting interests

The authors declare that they have no competing interests.

Authors

Authors are listed in alphabetical order.

References


Articles from GMS Journal for Medical Education are provided here courtesy of German Medical Science

RESOURCES