Artificial intelligence for detecting keratoconus

. 2023 Nov 15;2023(11):CD014911. doi: 10.1002/14651858.CD014911.pub2

DOMAIN	Low risk/concern	Unclear	High risk/concern
PATIENT SELECTION	Describe methods of patient selection; describe included patients (prior testing, presentation, intended use of index test and setting):
Was a consecutive or random sample of patients enroled?	Consecutive sampling or random sampling seeking refractive error correction or refractive surgery in eye services.	Unclear whether consecutive or random sampling used.	Selection of non‐consecutive patients.
Was a case‐control design avoided?	No selective recruitment of people with or without keratoconus.	Unclear selection mechanism.	Selection of either cases or control in a predetermined, non‐random fashion; or enrichment of the cases from a selected population.
Did the study avoid inappropriate exclusions?	Exclusions are detailed and felt to be appropriate (e.g. people already diagnosed with keratoconus or with other corneal diseases).	Exclusions are not detailed (pending contact with study authors).	Inappropriate exclusions are reported (e.g. of people with borderline index test results).
Risk of bias: could the selection of patients have introduced bias?	'No' for any of the above
Concerns regarding applicability: are there concerns that the included patients do not match the review question?	Inclusion of patients seeking refractive error correction or refractive surgery in primary or secondary care eye services.	Unclear inclusion criteria.	Inclusion of patients attending cornea services for known disease, population‐based studies, registry‐based studies.
INDEX TEST	Describe the index test and how it was conducted and interpreted:
Were the index test results interpreted without knowledge of the results of the reference standard?	Test performed "blind" or "independently and without knowledge of" reference standard results are sufficient and full details of the blinding procedure are not required; or clear temporal pattern to the order of testing that precludes the need for formal blinding.	Unclear whether results are interpreted independently.	Reference standard results available to those who conducted or interpreted the index test.
If a threshold was used, was it prespecified?	The study authors declare that the selected cut‐off used to dichotomize data was specified a priori, or a protocol is available with this information.	No information on preselection of index test cut‐off values.	A study is classified at higher risk of bias if the authors define the optimal cut‐off post hoc based on their own study data.
Risk of bias: could the conduct or interpretation of the index test have introduced bias?	'No' for any of the above.
Concerns regarding applicability: are there concerns that the index test, its conduct, or interpretation differ from the review question?	Tests used and testing procedure clearly reported and tests executed by personnel with sufficient training.	Unclear execution of the tests or unclear study personnel profile, background, and training.	Tests used are not validated, or study personnel is insufficiently trained.
REFERENCE STANDARD	Describe the reference standard and how it was conducted and interpreted:
Is the reference standard likely to correctly classify the target condition?	Topography and/or tomography interpreted independently by 2 or more cornea specialists.	Topography and/or tomography interpreted by cornea specialists, but not enough details to adjudicate 'yes' or 'no'.	Topography and/or tomography interpreted by only one cornea specialist.
Were the reference standard results interpreted without knowledge of the results of the index test?	Reference standard performed "blind" or "independently and without knowledge of" index test results are sufficient and full details of the blinding procedure are not required; or clear temporal pattern to the order of testing that precludes the need for formal blinding.	Unclear whether results are interpreted independently.	Index test results available to those who conducted the reference standard.
Risk of bias: could the reference standard, its conduct, or its interpretation have introduced bias?	'No' for any of the above.
Concerns regarding applicability: are there concerns that the target condition as defined by the reference standard does not match the review question?	Same or similar definition of the target condition as described in the protocol.	Unclear definition of the target disease diagnosed by the reference standard.	Different definition of the target condition as defined in the protocol.
FLOW AND TIMING	Describe any patients who did not receive the index test(s) and/or reference standard or who were excluded from the 2 × 2 table (refer to flow diagram): describe the time interval and any interventions between index test(s) and reference standard.
Was there an appropriate interval between index test(s) and reference standard?	No more than three months between index and reference test execution.	—	More than three months between index and reference test execution.
Did all patients receive a reference standard?	All participants receiving the index test are verified with the reference standard.	—	Not all participants receiving the index test are verified with the reference standard.
Did all patients receive the same reference standard?	Not applicable for this review.
Were all patients included in the analysis?	The number of participants included in the study matches the number in analyses, or participants with undefined or borderline test results are excluded.	—	The number of participants included in the study does not match the number in analyses, or participants with undefined or borderline test results are excluded.
Risk of bias: could the patient flow have introduced bias?	'No' for any of the above,
ADDITIONAL QUESTIONS	These questions concern the direct comparisons between AI tests,
Were different AI tests developed and interpreted without knowledge of each other?	Different AI tests were developed and interpreted "blind" or "independently and without knowledge of" the results of each other.	—	Different AI tests were developed or their results interpreted with knowledge of the results of each other.
Are the proportions and reasons for missing data similar for all index tests?	Missing data and their causes were similar for each AI test.	—	The amount of missing data or their causes differed between AI tests.