. 2023 Nov 15;2023(11):CD014911. doi: 10.1002/14651858.CD014911.pub2

Chen 2021.

*Study characteristics*
Patient Sampling	Multicentre retrospective case‐control study comparing 4 convolutional neural network methods. It included 1926 images from the Pentacam (Oculus GmbH, Wetzlar, Germany) of keratoconic and healthy volunteers' eyes provided by 3 centres (UK, Iran, New Zealand).
Patient characteristics and setting	Keratoconic scans were classified according to the Amsler‐ Krumeich classification. Only scans of acceptable quality were included. Control group: subjects with a BAD‐D < 1.6 SDs from normative values.
Index tests	Convolutional neural network method that uses 4 colour‐coded corneal maps obtained by a Scheimpflug camera (Pentacam)
Target condition and reference standard(s)	The definition of keratoconus is unclear. Unclear who performed the classification. However, cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other. Missing data and their causes were similar for each AI test.
Notes	The study authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not‐for‐profit sectors.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Yes
		Unclear risk