. 2017 May 18;17:88. doi: 10.1186/s12909-017-0915-2

Table 3.

Strengths and challenges in the test development process

Strengths
Project group	Consisted of professionals with profound content knowledge, a medical educationalist and a statistician with experience in test-development.
Test content	Based on nationally defined learning objectives, which generated relevant and coverable test content.
Test blueprint	Predefined and based on nationally developed learning objectives.
Test format	MCQ’s, which can test more than simple facts, is suitable for large groups and time- and cost effective. Assess competences at the two lower levels of Millers triangle, knows and knows how.
Language	Predefined spelling and abbreviations ensured consistency in wordings and terms.
Proofreading	Several proofreaders. Proofreading of content, language and structure/format.
Pilot test participants	A large sample representing in part the intended test-takers.
Pilot testing	Written and verbal feedback gave insight into the pilot participants’ thought processes during testing.
Standard setting	An acknowledged method was used. The passing score was adjusted to minimize false-positive values and was validated on initial test responses.
Psychometric properties	Evaluated on both pilot test responses and the responses from the real test-takers.
Test-takers	A high number of participants enabled the use of advanced statistical analyses such as Rasch analyses.
No. of options in each item	Three or four options were chosen dependent on the numbers of plausible distractors.
Challenges
Test format	A written assessment cannot assess competences on the two higher levels of Millers triangle, shows how and does (i.e. clinical performance).
Number of items	More items would expectedly have increased reliability and would have allowed for the development of an item bank.
Item difficulty	Items of a higher difficulty would expectedly have increased reliability and entailed a more challenging test.
Pilot test participants	Medical and midwifery students did not represent the intended test-takers and lowered the percentage of correct answers.
Relations to other variables	There was no test available for comparison.
Context	The context of pilot testing and real testing differed; pilot participants did not attend a one-day teaching course prior to testing and the test was therefore more challenging than in the real setting.
Time devoted for assessment	More items and items with a higher difficulty require more time devoted for assessment in an education program.