Skip to main content
. 2021 Oct 2;21:142. doi: 10.1186/s12880-021-00671-8

Table 2.

Items used to assess the quality of reporting criteria in the current review

Quality heading Quality criteria Definition
Data source (1) Sampling Reported details of the sampling strategy for radiology reports, including whether they are from consecutive patients
(2) Consistent imaging acquisition Reported whether radiology reports were from images taken from one imaging machine or more and, if more, whether these machines were of comparable specification
Dataset criteria (3) Dataset size Reported their dataset size of > 200
(4) Training dataset Reported training data set size—the part of the initial dataset used to develop an NLP algorithm
(5) Test dataset Reported test data set size—part of the initial dataset used to evaluate an NLP algorithm
(6) Validation dataset Reported validation data set size—a separate dataset used to evaluate the performance of an NLP algorithm in a clinical setting (may be internal or external to the initial dataset)
Ground truth criteria (7) Annotated dataset Reported annotated data set size—data which has been marked-up by humans for ground truth
(8) Domain expert for annotation Reported use of a domain expert for annotation—annotation carried out by a radiologist or specialist clinician
(9) Number of annotators Reported the number of annotators
(10) Inter-annotator agreement Reported the agreement between annotators (if more than one annotator used)
Outcome criteria (11) Precision Reported precision (positive predictive value)
(12) Recall Reported recall (sensitivity)
Reproducibility criteria (13) External validation Reported whether the NLP algorithm is tested on external data from another setting (a separate healthcare system, hospital or institution)
(14) Availability of data Reported whether their data set is available for use (preferably with link provided in paper)
(15) Availability of NLP code Reported whether their NLP code is available for use (preferably with link provided in paper)