Strengths |
Project group |
Consisted of professionals with profound content knowledge, a medical educationalist and a statistician with experience in test-development. |
Test content |
Based on nationally defined learning objectives, which generated relevant and coverable test content. |
Test blueprint |
Predefined and based on nationally developed learning objectives. |
Test format |
MCQ’s, which can test more than simple facts, is suitable for large groups and time- and cost effective. Assess competences at the two lower levels of Millers triangle, knows and knows how. |
Language |
Predefined spelling and abbreviations ensured consistency in wordings and terms. |
Proofreading |
Several proofreaders. Proofreading of content, language and structure/format. |
Pilot test participants |
A large sample representing in part the intended test-takers. |
Pilot testing |
Written and verbal feedback gave insight into the pilot participants’ thought processes during testing. |
Standard setting |
An acknowledged method was used. The passing score was adjusted to minimize false-positive values and was validated on initial test responses. |
Psychometric properties |
Evaluated on both pilot test responses and the responses from the real test-takers. |
Test-takers |
A high number of participants enabled the use of advanced statistical analyses such as Rasch analyses. |
No. of options in each item |
Three or four options were chosen dependent on the numbers of plausible distractors. |
Challenges |
Test format |
A written assessment cannot assess competences on the two higher levels of Millers triangle, shows how and does (i.e. clinical performance). |
Number of items |
More items would expectedly have increased reliability and would have allowed for the development of an item bank. |
Item difficulty |
Items of a higher difficulty would expectedly have increased reliability and entailed a more challenging test. |
Pilot test participants |
Medical and midwifery students did not represent the intended test-takers and lowered the percentage of correct answers. |
Relations to other variables |
There was no test available for comparison. |
Context |
The context of pilot testing and real testing differed; pilot participants did not attend a one-day teaching course prior to testing and the test was therefore more challenging than in the real setting. |
Time devoted for assessment |
More items and items with a higher difficulty require more time devoted for assessment in an education program. |