Skip to main content
. 2022 Mar 1;27(2):405–425. doi: 10.1007/s10459-022-10092-z

Table 3.

AIG validity assessment

Inferences (Gierl et al., 2012a, b) (Gierl & Lai, 2013a) (Gierl & Lai, 2013b) (Gierl et al., 2016) (Gierl & Lai, 2016) (Gierl & Lai, 2018) (Lai et al., 2016a, b) (Pugh et al., 2016) (Pugh et al., 2020) (Shappell et al., 2020)
Proposed use of AIG Generate MCQs for medical licensure testing. Generate MCQs for medical licensure testing. Generate MCQs for medical licensure testing. Generate MCQs for medical licensure testing.

Generate

MCQs for medical assessment.

Generate MCQs and rationales for medical formative testing. Generate MCQs and distractors for medical licensure testing. Generate MCQs for medical assessment. Generate MCQs for medical assessment. Generate MCQs for medical mastery learning assessment.
Scoring

Existing

evidence

Cognitive and item models were developed and reviewed by specialists. Items were blindly evaluated for quality by a panel of experts. Cognitive and item models were developed and reviewed by specialists. Cognitive and item models were developed and reviewed by specialists. Experts evaluated the content and the logic specified in the cognitive model and in the item model. Experts blindly reviewed the rationales generated for formative testing. Cognitive and item models were developed and reviewed by specialists. Cognitive and item models were developed and reviewed by specialists. Quality of items generated was evaluated by experts. Item models were developed and reviewed by specialists.
Generalisation

Existing

evidence

UN UN UN Item response theory was used, but not reported. CTT was used. Generated items measured a broad range of difficulty levels. UN UN CTT was used. Generated items measured a broad range of difficulty levels; UN UN No significant differences in item difficulty between tests were found.
Extrapolation Existing evidence UN UN UN Consistent levels of item discrimination. UN UN Consistent levels of item discrimination. UN UN No significant differences in mean item discrimination between tests were found.
Implications Existing evidence UN UN UN UN UN UN UN UN UN UN

*UN - Unclear / Unreported