Skip to main content
. 2024 Oct 30;26:e53636. doi: 10.2196/53636

Table 3.

Comparison of different electronic health record (EHR) question answering (QA) datasets on unstructured data.

Dataset Mode of dataset generation Total questions, n Unanswered questions, n Average question length (tokens, n) Total articles, n Average article length (tokens, n)
emrQA [26] Semiautomatically generated 1,295,814 0 8.6 2425 3825
RxWhyQA [27] Automatically derived from the n2c2b 2018 ADEsc NLPd challenge 96,939 46,278 a 505
Raghavan et al [34] Human-generated (medical students) 1747 0 71
Fan [35] Human-generated (author) 245 0 138
RadQAe [37] Human-generated (physicians) 6148 1754 8.56 1009 274.49
Oliveira et al [38] Human-generated (author) 18 0 9
Yue et al [42,74] Trained question generation model paired with a human-in-the-loop 1287 0 8.7 36 2644
DiSCQf [43] Human-generated (medical experts) 2029 0 4.4 114 1481

Mishra et al [45] Semiautomatically generated 6 questions or article 568
Yue et al [46] Human-generated (medical experts) 50 0
CLIFTg [47] Validated by human experts 7500 0 6.42, 8.31, 7.61, 7.19, and 8.40 for smoke, heart, medication, obesity, and cancer datasets 217.33, 234.18, 215.49, 212.88, and 210.16 for smoke, heart, medication, obesity, and cancer datasets, respectively
Hamidi and Roberts [48] Human-generated 15 5
Mahbub et al [50] Combination of manual exploration and rule-based NLP methods 28,855 6.22 2336 1003.98
Dada et al [51] Human-generated (medical student assistants) 29,273 Unanswered questions available 1223

aNot applicable.

bn2c2: natural language processing clinical challenges.

cADE: adverse drug events.

dNLP: natural language processing.

eRadQA: Radiology Question Answering Dataset.

fDiSCQ: Discharge Summary Clinical Questions.

gCLIFT: Clinical Shift.