Artificial intelligence for detecting keratoconus

Magali MS Vandevenne; Eleonora Favuzza; Mitko Veta; Ersilia Lucenteforte; Tos TJM Berendschot; Rita Mencucci; Rudy MMA Nuijts; Gianni Virgili; Mor M Dickman

doi:10.1002/14651858.CD014911.pub2

. 2023 Nov 15;2023(11):CD014911. doi: 10.1002/14651858.CD014911.pub2

Artificial intelligence for detecting keratoconus

Magali MS Vandevenne ¹, Eleonora Favuzza ², Mitko Veta ³, Ersilia Lucenteforte ⁴, Tos TJM Berendschot ¹, Rita Mencucci ², Rudy MMA Nuijts ¹, Gianni Virgili ^2,⁵, Mor M Dickman ^1,^✉

Editor: Cochrane Eyes and Vision Group

PMCID: PMC10646985 PMID: 37965960

Abstract

Background

Keratoconus remains difficult to diagnose, especially in the early stages. It is a progressive disorder of the cornea that starts at a young age. Diagnosis is based on clinical examination and corneal imaging; though in the early stages, when there are no clinical signs, diagnosis depends on the interpretation of corneal imaging (e.g. topography and tomography) by trained cornea specialists. Using artificial intelligence (AI) to analyse the corneal images and detect cases of keratoconus could help prevent visual acuity loss and even corneal transplantation. However, a missed diagnosis in people seeking refractive surgery could lead to weakening of the cornea and keratoconus‐like ectasia. There is a need for a reliable overview of the accuracy of AI for detecting keratoconus and the applicability of this automated method to the clinical setting.

Objectives

To assess the diagnostic accuracy of artificial intelligence (AI) algorithms for detecting keratoconus in people presenting with refractive errors, especially those whose vision can no longer be fully corrected with glasses, those seeking corneal refractive surgery, and those suspected of having keratoconus. AI could help ophthalmologists, optometrists, and other eye care professionals to make decisions on referral to cornea specialists.

Secondary objectives

To assess the following potential causes of heterogeneity in diagnostic performance across studies.

• Different AI algorithms (e.g. neural networks, decision trees, support vector machines) • Index test methodology (preprocessing techniques, core AI method, and postprocessing techniques) • Sources of input to train algorithms (topography and tomography images from Placido disc system, Scheimpflug system, slit‐scanning system, or optical coherence tomography (OCT); number of training and testing cases/images; label/endpoint variable used for training) • Study setting • Study design • Ethnicity, or geographic area as its proxy • Different index test positivity criteria provided by the topography or tomography device • Reference standard, topography or tomography, one or two cornea specialists • Definition of keratoconus • Mean age of participants • Recruitment of participants • Severity of keratoconus (clinically manifest or subclinical)

Search methods

We searched CENTRAL (which contains the Cochrane Eyes and Vision Trials Register), Ovid MEDLINE, Ovid Embase, OpenGrey, the ISRCTN registry, ClinicalTrials.gov, and the World Health Organization International Clinical Trials Registry Platform (WHO ICTRP). There were no date or language restrictions in the electronic searches for trials. We last searched the electronic databases on 29 November 2022.

Selection criteria

We included cross‐sectional and diagnostic case‐control studies that investigated AI for the diagnosis of keratoconus using topography, tomography, or both. We included studies that diagnosed manifest keratoconus, subclinical keratoconus, or both. The reference standard was the interpretation of topography or tomography images by at least two cornea specialists.

Data collection and analysis

Two review authors independently extracted the study data and assessed the quality of studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS‐2) tool. When an article contained multiple AI algorithms, we selected the algorithm with the highest Youden's index. We assessed the certainty of evidence using the GRADE approach.

Main results

We included 63 studies, published between 1994 and 2022, that developed and investigated the accuracy of AI for the diagnosis of keratoconus. There were three different units of analysis in the studies: eyes, participants, and images. Forty‐four studies analysed 23,771 eyes, four studies analysed 3843 participants, and 15 studies analysed 38,832 images.

Fifty‐four articles evaluated the detection of manifest keratoconus, defined as a cornea that showed any clinical sign of keratoconus. The accuracy of AI seems almost perfect, with a summary sensitivity of 98.6% (95% confidence interval (CI) 97.6% to 99.1%) and a summary specificity of 98.3% (95% CI 97.4% to 98.9%). However, accuracy varied across studies and the certainty of the evidence was low.

Twenty‐eight articles evaluated the detection of subclinical keratoconus, although the definition of subclinical varied. We grouped subclinical keratoconus, forme fruste, and very asymmetrical eyes together. The tests showed good accuracy, with a summary sensitivity of 90.0% (95% CI 84.5% to 93.8%) and a summary specificity of 95.5% (95% CI 91.9% to 97.5%). However, the certainty of the evidence was very low for sensitivity and low for specificity.

In both groups, we graded most studies at high risk of bias, with high applicability concerns, in the domain of patient selection, since most were case‐control studies. Moreover, we graded the certainty of evidence as low to very low due to selection bias, inconsistency, and imprecision.

We could not explain the heterogeneity between the studies. The sensitivity analyses based on study design, AI algorithm, imaging technique (topography versus tomography), and data source (parameters versus images) showed no differences in the results.

Authors' conclusions

AI appears to be a promising triage tool in ophthalmologic practice for diagnosing keratoconus. Test accuracy was very high for manifest keratoconus and slightly lower for subclinical keratoconus, indicating a higher chance of missing a diagnosis in people without clinical signs. This could lead to progression of keratoconus or an erroneous indication for refractive surgery, which would worsen the disease.

We are unable to draw clear and reliable conclusions due to the high risk of bias, the unexplained heterogeneity of the results, and high applicability concerns, all of which reduced our confidence in the evidence.

Greater standardization in future research would increase the quality of studies and improve comparability between studies.

Keywords: Humans, Artificial Intelligence, Case-Control Studies, Cross-Sectional Studies, Keratoconus, Keratoconus/diagnostic imaging, Physical Examination

Plain language summary

How accurate is artificial intelligence for diagnosing keratoconus?

Key messages

• The included studies suggest that artificial intelligence (AI) can identify keratoconus. This may lead to early detection and prevention of vision loss. • Estimates were similar for different types of AI algorithms. • We have little confidence in the evidence; there is a need for more research on this topic.

What is keratoconus and why is (early) diagnosis so important?

Keratoconus is a disease of the cornea (the clear window at the front of the eye) that affects people between the ages of 10 and 40 years. In those affected, the cornea weakens and thins over the years, gradually bulging into the typical cone‐like shape, which leads to reduced vision. Glasses can resolve this problem in the early stages of keratoconus, but no longer offer a satisfying solution as the disease becomes more severe. Early diagnosis is imperative to ensure follow‐up and treatment and thus prevent loss of vision.

The diagnosis of keratoconus is based on an eye exam (measuring the eye and evaluating the cornea with a vertical beam of light and a microscope) and imaging (computer‐assisted techniques that create three‐dimensional pictures or maps of the cornea). Interpreting the images can be challenging, especially in primary eye care settings and in the early stages of the disease. Not recognizing keratoconus could lead to worsening of the disease and worsening of vision. For example, people at risk of developing keratoconus who undergo refractive surgery (surgery to correct their vision) could end up with worse vision.

What is artificial intelligence and how can it help detect keratoconus?

Detecting keratoconus based on images is challenging, especially for untrained clinicians. AI gives machines the ability to adapt, reason, and find solutions. Algorithms can be developed and trained to analyse images of the cornea and recognize keratoconus. These tests could help ophthalmologists, optometrists, and other eye care professionals to make a diagnosis and refer people with keratoconus to cornea specialists in time to preserve their vision. There are many different types of algorithms, but they all distinguish between healthy eyes and keratoconus based on images of the cornea.

What did we want to find out?

The aim of the review was to find out whether AI can correctly diagnose keratoconus in people seeking refractive surgery and people whose vision can no longer be corrected fully with glasses.

What did we do?

We searched for studies that investigated the accuracy of AI for diagnosing keratoconus, preferably in people seeking refractive surgery or people whose vision can no longer be corrected fully with glasses. We compared and summarized the results of the studies to calculate two measures of accuracy: sensitivity (the ability of AI to correctly identify keratoconus) and specificity (the ability of AI to correctly rule out keratoconus). The closer sensitivity and specificity were to 100%, the better the algorithm.

What did we find?

We found 63 studies that used three different units (eyes, participants, and images) to analyse the accuracy of AI for detecting keratoconus: 44 studies analysed 23,771 eyes, four studies analysed 3843 participants, and 15 studies analysed 38,832 images.

The accuracy of AI for detecting manifest keratoconus (keratoconus that can be detected through a clinical examination) was high. If 1000 people were tested, 30 people with keratoconus would be correctly referred to a cornea specialist, and none would be missed. Of the remaining 970 people (without keratoconus), only 17 would be wrongly referred. These people would receive additional non‐invasive tests to verify whether they had keratoconus.

The accuracy of AI for detecting early keratoconus was lower. If 1000 people were tested, nine people with keratoconus would be correctly referred to a cornea specialist and one would be missed. If this person received refractive surgery, it would aggravate the disease and worsen their vision. Of the remaining 990 people (without keratoconus), 941 would be reassured that they did not have the disease and would receive refractive surgery or glasses; 49 people would be wrongly referred.

The evidence suggests that AI may be good at detecting manifest keratoconus but may not be ideal for screening early keratoconus.

What are the limitations of the evidence?

We have little confidence in the evidence on the accuracy of AI for detecting manifest keratoconus, and we have little to no confidence in the evidence related to early keratoconus. There were problems with how the studies were conducted, which may result in AI appearing more accurate than it really is.

How up‐to‐date is this evidence?

The evidence is up‐to‐date to 29 November 2022.

Summary of findings

Summary of findings 1. Summary of findings: artificial intelligence for the detection of keratoconus in refractive surgery candidates and people with refractive errors.

Review Question	What is the diagnostic accuracy of AI algorithms in the detection of keratoconus in people presenting with refractive errors, people seeking corneal refractive surgery, or people suspected of having keratoconus?
Population	People presenting with refractive errors, especially those whose vision can no longer be corrected fully with glasses, people seeking corneal refractive surgery, or people suspected of having keratoconus
Index test	AI algorithms e.g. neural network, logistic regression, support vector machine, etc. analysing topography and tomography images
Target condition	Keratoconus
Reference standard	Topography and tomography images interpreted by at least two cornea specialists
Action	(Early) referral of people suspected of having keratoconus to a cornea specialist by ophthalmologists, optometrists, and other eye care professionals.
Quantity of evidence	63 studies
Outcome	Effect (95% CI)	Number of participants (studies)	Test result	Number of results per 1000 participants tested		Certainty of evidence
				Real clinical setting*	Included studies**
Manifest keratoconus	Summary sensitivity 98.6% (97.6% to 99.1%)	21,330 (54)	True positive	30 (29 to 30)	493 (488 to 496)	⊕⊕⊝⊝ Low^a
			False negative	0 (0 to 1)	7 (5 to 12)
	Summary specificity 98.3% (97.4% to 98.9%)	29,189 (54)	True negative	954 (945 to 959)	492 (487 to 495)	⊕⊕⊝⊝ Low^a
			False positive	16 (11 to 27)	9 (6 to 13)
Subclinical keratoconus	Summary sensitivity 90.0% (84.5% to 93.8%)	2758 (28)	True positive	9 (8 to 9)	225 (211 to 235)	⊕⊝⊝⊝ Very low^b
			False negative	1 (1 to 2)	25 (16 to 39)
	Summary specificity 95.5% (91.9% to 97.5%)	6750 (28)	True negative	945 (911 to 970)	716 (687 to 731)	⊕⊕⊝⊝ Low^a
			False positive	45 (20 to 79)	34 (19 to 63)
Estimated prevalence in the real clinical setting was 3% for manifest keratoconus and 1% for subclinical keratoconus. Prevalence calculated from the included studies was 50% for manifest keratoconus and 25% for subclinical keratoconus. AI:* artificial intelligence; CI: confidence interval.
GRADE Working Group grades of evidence High certainty: we are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect. Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect.

Study ID	Study design	Study population	Sample size	Country	Instrument	Index test	Reference standard
Abdelmotaal 2020	Retrospective, single‐centre, case‐control	Refractive surgery candidates, subclinical KC, and manifest KC	3218 images (3218 eyes of 1669 participants)	Egypt	Pentacam	Convolutional neural network	2 cornea specialists
Accardo 2002	Retrospective, single‐centre, case‐control	Healthy controls, KC, and other ocular diseases	396 images (396 eyes of 198 participants)	Italy	Eyesys	Neural network	Unclear
Almeida 2022	Retrospective, multicentre, case‐control	Healthy controls (who underwent PRK or LASIK), subclinical KC, manifest KC	2893 eyes (2893 participants)	Brazil	Pentacam	Multiple logistic regression	1 cornea specialist
Al‐Timemy 2021	Retrospective, multicentre, case‐control	Healthy controls, subclinical KC, and manifest KC	1050 images (150 eyes of 85 participants)	Brazil	Pentacam	Convolutional neural network	3 cornea specialists
Arbelaez 2012	Retrospective, multicentre, case series	Healthy controls, subclinical KC, and manifest KC	3502 eyes	Oman, Italy	Sirius	Support vector machine	Unclear
Bessho 2006	Retrospective, multicentre, case‐control	Healthy controls and KC	165 eyes (120 participants)	Japan	Orbscan IIz	Logistic regression	Unclear
Cao 2020	Retrospective, case‐control	Healthy controls and subclinical KC	88 participants	Australia	Pentacam	Random forest	Unclear
Cao 2021a	Retrospective, case‐control	Healthy controls and subclinical KC	267 eyes (226 participants): 186 training, 81 test	Australia	Pentacam	Random forest	2 cornea specialists
Carvalho 2005	Retrospective, single‐centre, case‐control	Instrument database	80 eyes: 40 training, 40 test set	Brazil	Eyesys	Neural network	2 cornea specialists
Castro‐Luna 2020	Retrospective, single‐centre, case‐control	Control group and KC	60 eyes (60 participants)	Spain	CSO topography system	Bayesian network	Unclear
Cavas‐Martinez 2017	Retrospective, single‐centre, case‐control	Control group and KC	464 eyes (464 participants)	Spain	Sirius	Logistic regression	Unclear
Chan 2015	Retrospective, single‐centre, case‐control	Database of KC of the Singapore National Eye Center	128 images (128 participants)	Singapore	Orbsan II	Discriminant analysis	1 cornea specialist
Chandapura 2019	Retrospective, multicentre, case‐control	Healthy controls, subclinical KC, and manifest KC	439 eyes	India, Brazil	OCT RTVue + Pentacam	Random forest	1 cornea specialist
Chastang 2000	Retrospective, single‐centre, case‐control	Control group (e.g. healthy, regular astigmatism, radial keratotomy) and KC	208 eyes: 104 training, 104 validation set	France	Eyesys	Binary decision tree	2 cornea specialists
Chen 2021	Retrospective, multicentre, case‐control	Healthy controls and KC	1926 images	UK, Iran, New Zealand	Pentacam	Convolutional neural network	Unclear
Cohen 2022	Retrospective, single‐centre, case‐control	Healthy controls, KC, and subclinical KC	8526 corneal tomography examinations (2525 participants)	Israel	Galilei Dual Scheimpflug Analyzer	Random forest	1 cornea specialist
Consejo 2020	Prospective, single‐centre, case‐control	Control group and KC	50 eyes	Belgium	Corvis‐ST	Support vector machine	1 ophthalmologist
De Almeida Jr 2021	Prospective, single‐centre, case‐control	LASIK or PRK candidates and KC	Training 777 eyes, validation 237 eyes	Brazil	Pentacam	Support vector machine	1 cornea specialist
Elsawy 2021	Prospective, single‐centre, case‐control	Control group (healthy, dry eye, Fuchs' endothelial dystrophy) and KC	158,220 images (879 eyes, 478 participants): training 134,460, validation 23,760	USA	Envisu R2210 (AS‐OCT)	Neural network	6 cornea specialists
Feizi 2016	Prospective, single‐centre, case‐control	Refractive surgery candidates, subclinical KC, and manifest KC	210 eyes (207 participants)	Iran	Galilei Dual Scheimpflug Analyzer	Logistic regression	Unclear
Gairola 2022	Retrospective, single‐centre, case‐control	Healthy controls and KC	2224 images	India	Topography (Keratron and KC‐smart device)	Convolutional neural network	1 ophthalmologist
Gao 2022	Retrospective, single‐centre, case‐control	Healthy controls, KC, and subclinical KC	1040 images (208 participants)	China	Pentacam	Neural network	Unclear
Ghaderi 2021	Retrospective, single‐centre, case‐control	Healthy controls and KC (single‐centre database)	450 eyes (separated into training, validation, and test sets)	Iran	Pentacam	Ensemble learning system	Unclear
Issarti 2019	Retrospective, single‐centre, case‐control	Healthy controls and KC (single‐centre database)	851 eyes	Belgium	Pentacam	Feedforward neural network	1 ophthalmologist, 1 optometrist
Issarti 2020	Retrospective, multicentre, case‐control	Healthy controls and KC (multicentre database)	812 eyes	Belgium	Pentacam	Feedforward neural network	1 ophthalmologist, 1 optometrist
Kalin 1996	Prospective, consecutive, cross‐sectional study	Refractive surgery candidates and KC	106 eyes (53 participants)	USA	TMS‐1	Binary decision tree	1 ophthalmologist
Kamiya 2019	Retrospective, single‐centre, case‐control	Refractive surgery candidates, contact lens fitting candidates, and KC	543 eyes	Japan	CASIA SS‐1000	Convolutional neural network	Cornea specialists
Kamiya 2021	Retrospective, single‐centre, case‐control	Refractive surgery candidates, contact lens fitting candidates, and KC	349 eyes	Japan	TMS‐4 topographer	Convolutional neural network	Cornea specialists
Kojima 2020	Retrospective, multicentre, case‐control	Healthy controls and KC	329 eyes	Japan	Auto‐keratometer	Logistic regression	2 cornea specialists
Kojima 2021	Retrospective, single‐centre, case‐control	Healthy controls, KC, and subclinical KC	647 eyes (335 participants)	Japan	Auto‐keratometer (ARK‐1)	Regression algorithm	2 cornea specialists
Kovacs 2016	Retrospective, single‐centre, case‐control	Refractive surgery candidates, normal eye of unilateral KC, and KC	135 eyes: training 70%, test set 30%	Hungary	Pentacam	Neural network	Unclear
Kuo 2020	Retrospective, single‐centre, case‐control	Refractive surgery candidates and KC	354 images (206 participants)	Taiwan	TMS‐4	Convolutional neural network	4 cornea specialists
Lavric 2021	Retrospective, case‐control	Controls and KC	5881 eyes (2800 participants)	Brazil	Pentacam	Support vector machine	Unclear
Lopes 2018	Retrospective, multicentre, case‐control	LASIK cases, post‐LASIK ectasia, and KC	3648 eyes	USA, Brazil, UK, Italy	Pentacam	Random forest	1 cornea specialist
Lucena 2021	Retrospective, case‐control	Control group and KC	1172 images: training 960, test set 212	Brazil	Topographers	Convolutional neural network	1 cornea specialist
Maeda 1994	Single‐centre, case‐control	Control group and KC	200 eyes: training 100, test 100	USA	TMS‐1	Combined discriminant analysis and classification tree	3 cornea specialists
Maeda 1995a	Single‐centre, case‐control	Control group and KC	176 eyes (125 participants)	USA	TMS‐1	Combined discriminant analysis and classification tree	Unclear
Maeda 1995b	Single‐centre, case‐control	Control group and KC	183 eyes: training 108, test set 75	USA	TMS‐1	Neural network	Unclear
Mohammadpour 2022	Prospective, diagnostic test accuracy study	Healthy controls, subclinical KC, and manifest KC	217 eyes (212 participants)	Iran	Sirius	Neural network	2 cornea specialists
Mahmoud 2013	Retrospective, multicentre, case‐control	Healthy controls and KC	407 eyes	Colombia, Switzerland, USA	Galilei Dual Scheimpflug‐Placido tomographer	Logistic regression	Unclear
Mahmoud 2021	Case‐control	Healthy controls and KC	250 eyes	Unclear	CASIA SS‐1000	Convolutional neural network	1 ophthalmologist
Pavlatos 2020	Prospective, multicentre, case‐control	Healthy controls, subclinical KCT, and manifest KC	215 eyes	USA, China	OCT RTVue or Avanti	CTN index	Unclear
Rabinowitz 1999	Retrospective, single‐centre, case‐control	Healthy controls and KC	281 participants	USA	TMS‐1	KISA% index	Unclear
Ruiz 2016	Retrospective, single‐centre, case‐control	Healthy controls, refractive surgery candidates, irregular astigmatism, subclinical KC, and manifest KC	860 eyes	Belgium	Pentacam	Support vector machine	1 cornea specialist, 1 optometrist
Ruiz 2017	Retrospective, multicentre, case‐control	Healthy controls, post‐refractive surgery candidates, subclinical KC, and manifest KC	131 eyes (102 participants)	Belgium, France	Topographers	Support vector machine	Unclear
Saad 2014	Prospective, single‐centre, case‐control	Refractive surgery candidates, subclinical KC, and manifest KC	166 eyes	France	Orbscan IIz	Discriminant analysis	1 cornea specialist
Saad 2016	Prospective, single‐centre, case‐control	Refractive surgery candidates, subclinical KC, and manifest KC	119 eyes (176 participants)	France	Placido disk topographer	Discriminant analysis	Unclear
Saika 2013	Single‐centre case‐control	Healthy controls, LASIK candidates, subclinical KC, and manifest KC	212 eyes	Japan	Placido disk topographer	Linear discriminant analysis	Unclear
Shetty 2015	Retrospective, single‐centre, case‐control	Healthy controls and KC	128 eyes	India	Pentacam	Logistic regression	Unclear
Shi 2020	Prospective, single‐centre, case‐control	Healthy controls and KC	121 eyes (121 participants)	China	Scheimpflug and UHR‐OCT	Neural network	2 cornea specialists
Sideroudi 2017	Prospective, cross‐sectional, non‐randomized observational study	Refractive surgery candidates, subclinical KC, and manifest KC	185 eyes (185 participants)	Greece	Pentacam	Logistic regression	Unclear
Smadja 2013	Retrospective, single‐centre, case‐control	Refractive surgery or routine ophthalmic examination, referrals, subclinical KC, and manifest KC	372 eyes (197 participants)	France	Galilei rotating Scheimpflug tomography	Tree classification	Unclear
Smolek 1997	Retrospective, single‐centre, case‐control	Normal, with‐the‐rule astigmatism, KC, subclinical KC, contact lens‐induced corneal warpage, pellucid marginal degeneration, PRK, radial keratotomy, and keratoplasty	300 examinations (150 training, 150 test)	USA	TMS‐1	Neural network	Unclear
Souza 2010	Retrospective, single‐centre, case‐control	Healthy controls, astigmatism, photorefractive keratectomy, and KC	318 participants	Brazil	Orbscan IIz	Support vector machine	Unclear
Subramaniam 2022	Case‐control study	Healthy controls, subclinical KC, and manifest KC	1500 images	India	Topography images synthesized with SyntEye	Convolutional neural network	Unclear
Twa 2005	Retrospective, single‐centre, case‐control	Refractive surgery candidates and KC	224 eyes	USA	Topography	Decision tree	Unclear
Xie 2020	Retrospective, observational	Refractive surgery candidates, KC	6465 images (1385 participants)	China	Pentacam	Convolutional neural network	3 ophthalmologists
Xu 2017	Prospective, single‐centre, cross‐sectional	Healthy controls, subclinical KC, and manifest KC	363 eyes (363 participants)	China	Pentacam	Discriminant analysis	2 ophthalmologists
Xu 2022a	Retrospective, single‐centre, case‐control	Healthy controls and subclinical KC	92 eyes (80 participants)	China	Sirius	Logistic regression	Unclear
Yang 2021	Cross‐sectional, observational	Healthy controls, refractive surgery candidates, subclinical KC, and manifest KC	176 eyes (124 participants)	USA	OCT	Decision tree (2‐step)	Unclear
Yousefi 2018	Retrospective, multicentre, case‐control	Healthy controls and KC	3156 participants	Japan, USA	CASIA OCT	Density‐based clustering	Unclear
Zeboulon 2020a	Retrospective, case‐control	Healthy controls, refractive surgery candidates, subclinical KC, and manifest KC	3000 examinations	France	Orbscan	Convolutional neural network	1 ophthalmology resident, 1 corneal tomography expert
Zeboulon 2020b	Retrospective, case‐control	Healthy controls, history of myopic refractive surgery, Fuchs' corneal dystrophy, and KC	6979 participants	France	Orbscan	Convolutional neural network	1 ophthalmology resident, 1 corneal tomography expert

Subgroups		No. of studies (participants)	Sensitivity (95% CI)	P‐value for relative sensitivity	Specificity (95% CI)	P‐value for relative specificity
Study design	Clinical series	46 (38,788)	0.987 (0.977, 0.993)	Reference	0.984 (0.975, 0.993)	Reference
Study design	Registries	8 (11,731)	0.975 (0.919, 0.993)	0.458	0.975 (0.936, 0.990)	0.464
AI algorithm	Logistic regression	8 (2,889)	0.983 (0.957, 0.993)	Reference	0.992 (0.974, 0.997)	Reference
	Bayesian network	3 (788)	0.994 (0.972‐0.999)	0.260	0.982 (0.834, 0.998)	0.666
	Convolutional neural network	13 (13,452)	0.979 (0.945‐0.991)	0.734	0.978 (0.960, 0.988)	0.110
	Discriminant analysis	3 (462)	0.977 (0.945, 0.990)	0.628	1.000 (0.814, 1.000)	0.093
	Decision tree	5 (8,96)	0.976 (0.895, 0.995)	0.731	0.978 (0.935, 0.993)	0.299
	Neural network	10 (16,296)	0.973 (0.920, 0.991)	0.561	0.968 (0.931, 0.986)	0.093
	Other	6 (4,338)	0.990 (0.892, 0.999)	0.629	0.968 (0.931, 0.987)	0.068
	Random forest	2 (3,487)	1.000 (0, 1.000)	0.038	0.997 (0.994, 0.999)	0.270
	SVM	4 (7,911)	0.994 (0.982, 0.998)	0.203	0.993 (0.928, 0.999)	0.916
Imaging technique	OCT	6 (19,585)	0.971 (0.941, 0.985)	Reference	0.984 (0.885, 0.998)	Reference
	Tomography	26 (27,267)	0.993 (0.985, 0.996)	0.042	0.986 (0.976, 0.992)	0.910
	Topography	21 (3,579)	0.965 (0.931, 0.983)	0.744	0.978 (0.958, 0.989)	0.756
Data type	Images	13 (27,532)	0.980 (0.950, 0.992)	Reference	0.975 (0.947, 0.988)	Reference
Data type	Parameters	39 (22,792)	0.987 (0.976, 0.947)	0.461	0.984 (0.975, 0.990)	0.342

DOMAIN	Low risk/concern	Unclear	High risk/concern
PATIENT SELECTION	Describe methods of patient selection; describe included patients (prior testing, presentation, intended use of index test and setting):
Was a consecutive or random sample of patients enroled?	Consecutive sampling or random sampling seeking refractive error correction or refractive surgery in eye services.	Unclear whether consecutive or random sampling used.	Selection of non‐consecutive patients.
Was a case‐control design avoided?	No selective recruitment of people with or without keratoconus.	Unclear selection mechanism.	Selection of either cases or control in a predetermined, non‐random fashion; or enrichment of the cases from a selected population.
Did the study avoid inappropriate exclusions?	Exclusions are detailed and felt to be appropriate (e.g. people already diagnosed with keratoconus or with other corneal diseases).	Exclusions are not detailed (pending contact with study authors).	Inappropriate exclusions are reported (e.g. of people with borderline index test results).
Risk of bias: could the selection of patients have introduced bias?	'No' for any of the above
Concerns regarding applicability: are there concerns that the included patients do not match the review question?	Inclusion of patients seeking refractive error correction or refractive surgery in primary or secondary care eye services.	Unclear inclusion criteria.	Inclusion of patients attending cornea services for known disease, population‐based studies, registry‐based studies.
INDEX TEST	Describe the index test and how it was conducted and interpreted:
Were the index test results interpreted without knowledge of the results of the reference standard?	Test performed "blind" or "independently and without knowledge of" reference standard results are sufficient and full details of the blinding procedure are not required; or clear temporal pattern to the order of testing that precludes the need for formal blinding.	Unclear whether results are interpreted independently.	Reference standard results available to those who conducted or interpreted the index test.
If a threshold was used, was it prespecified?	The study authors declare that the selected cut‐off used to dichotomize data was specified a priori, or a protocol is available with this information.	No information on preselection of index test cut‐off values.	A study is classified at higher risk of bias if the authors define the optimal cut‐off post hoc based on their own study data.
Risk of bias: could the conduct or interpretation of the index test have introduced bias?	'No' for any of the above.
Concerns regarding applicability: are there concerns that the index test, its conduct, or interpretation differ from the review question?	Tests used and testing procedure clearly reported and tests executed by personnel with sufficient training.	Unclear execution of the tests or unclear study personnel profile, background, and training.	Tests used are not validated, or study personnel is insufficiently trained.
REFERENCE STANDARD	Describe the reference standard and how it was conducted and interpreted:
Is the reference standard likely to correctly classify the target condition?	Topography and/or tomography interpreted independently by 2 or more cornea specialists.	Topography and/or tomography interpreted by cornea specialists, but not enough details to adjudicate 'yes' or 'no'.	Topography and/or tomography interpreted by only one cornea specialist.
Were the reference standard results interpreted without knowledge of the results of the index test?	Reference standard performed "blind" or "independently and without knowledge of" index test results are sufficient and full details of the blinding procedure are not required; or clear temporal pattern to the order of testing that precludes the need for formal blinding.	Unclear whether results are interpreted independently.	Index test results available to those who conducted the reference standard.
Risk of bias: could the reference standard, its conduct, or its interpretation have introduced bias?	'No' for any of the above.
Concerns regarding applicability: are there concerns that the target condition as defined by the reference standard does not match the review question?	Same or similar definition of the target condition as described in the protocol.	Unclear definition of the target disease diagnosed by the reference standard.	Different definition of the target condition as defined in the protocol.
FLOW AND TIMING	Describe any patients who did not receive the index test(s) and/or reference standard or who were excluded from the 2 × 2 table (refer to flow diagram): describe the time interval and any interventions between index test(s) and reference standard.
Was there an appropriate interval between index test(s) and reference standard?	No more than three months between index and reference test execution.	—	More than three months between index and reference test execution.
Did all patients receive a reference standard?	All participants receiving the index test are verified with the reference standard.	—	Not all participants receiving the index test are verified with the reference standard.
Did all patients receive the same reference standard?	Not applicable for this review.
Were all patients included in the analysis?	The number of participants included in the study matches the number in analyses, or participants with undefined or borderline test results are excluded.	—	The number of participants included in the study does not match the number in analyses, or participants with undefined or borderline test results are excluded.
Risk of bias: could the patient flow have introduced bias?	'No' for any of the above,
ADDITIONAL QUESTIONS	These questions concern the direct comparisons between AI tests,
Were different AI tests developed and interpreted without knowledge of each other?	Different AI tests were developed and interpreted "blind" or "independently and without knowledge of" the results of each other.	—	Different AI tests were developed or their results interpreted with knowledge of the results of each other.
Are the proportions and reasons for missing data similar for all index tests?	Missing data and their causes were similar for each AI test.	—	The amount of missing data or their causes differed between AI tests.

Test	No. of studies	No. of participants
1 Artificial intelligence (all studies)	63	56364
2 Artificial intelligence (manifest keratoconus)	54	50519
3 Artificial intelligence (subclinical keratoconus)	28	9508
4 Artificial intelligence (mixed)	11	11644

*Study characteristics*
Patient Sampling	Single‐centre, retrospective, case‐control study. Scheimpflug tomographic (Pentacam (Oculus GmbH, Wetzlar, Germany)) images obtained from non‐consecutive refractive surgery candidates, people with unilateral or bilateral keratoconus, or subclinical keratoconus in Egypt (3218 images from 3218 eyes of 1669 people).
Patient characteristics and setting	1108 healthy eyes selected from non‐consecutive refractive surgery candidates 1038 keratoconus eyes 1072 subclinical keratoconus eyes
Index tests	Convolutional neural network (CNN) for 4‐map selectable display images. The CNN was trained with corneal colour‐coded maps of whole Scheimpflug images.
Target condition and reference standard(s)	The keratoconus class included those with a clinical diagnosis of keratoconus or an irregular cornea (as determined by distorted keratometry mires or distortion of retinoscopic red reflex, or both) and the following topographic findings. Focal steepening located in a zone of protrusion surrounded by concentrically decreasing power zones Focal areas with dioptric values > 47.0 D I‐S asymmetry measured as > 1.4 D, or angling of the hemi‐meridians in an asymmetric or broken bowtie pattern with skewing of the steepest radial axis. The subclinical keratoconus class included subtle corneal topographic changes in the aforementioned keratoconus abnormalities in the absence of slit‐lamp or visual acuity changes typical of keratoconus. The cases were labelled before the analysis with the algorithm by 2 experienced corneal specialists and any disagreements were reviewed by a 3rdspecialist.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	No funding source mentioned.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Unclear
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			High
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	The study design is unclear, it seems to be case‐control study. 3794 Pentacam (Oculus GmbH, Wetzlar, Germany) corneal images from University of Sao Paulo were included and an independent validation subset with 1050 images was collected from 150 eyes of 85 subjects from a separate centre in Brazil.
Patient characteristics and setting	The criteria for keratoconus diagnosis are unclear. People with manifest keratoconus, suspected keratoconus and normal eyes were included.
Index tests	A hybrid deep learning model which integrates multiple convolutional neural network (CNN) models for detecting keratoconus based on corneal topographic maps.
Target condition and reference standard(s)	The criteria for keratoconus diagnosis are unclear. Eyes were labelled as keratoconus suspects if corneal topography included atypical, localized steepening or an asymmetrical bowtie pattern; the keratometric curvature was > 47.00 D, the oblique cylinder was > 1.50 D, the central corneal thickness was < 500 μm, BAD‐D was between 1.6 and 3.0. 3 corneal specialists performed the eye classification, before the analysis with the algorithm.
Flow and timing	All cases were included in reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	Supported by National Institute of Health (NIH), National Eye Institute (NEI), and Bright Focus Foundation.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	Unclear
Was a case‐control design avoided?	Unclear
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		Unclear risk
Are there concerns that the included patients and setting do not match the review question?			Unclear
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Multicentre, case‐control study. All participants were examined at the Visum Eye Center and Rio Claro Eye Institute between January 2012 and January 2019. Exclusion criteria were a history of ocular trauma, corneal scarring, and neurotrophic keratopathy. Normal Eyes: 2296 people who underwent LASIK or photorefractive keratectomy and were stable after at least 18 months of follow‐up. Very Asymmetric Eyes With Normal Topography and Tomography Group: 187 eyes of 187 people with very asymmetric eyes with normal topography (VAE‐NT) in 1 eye and frank ectasia (VAE‐E) in the fellow eye Ectasia Group: 410 people (1 eye each) with bilateral clinical keratoconus
Patient characteristics and setting	410 eyes with keratoconus 2296 healthy corneas 187 very asymmetric eyes with a normal topography of the cornea.
Index tests	Multiple logistic regression analysis (MLRA) is based on the logistic function that bounds its output within the range of 0 to 1. To build the algorithm extracted from MLRA, 22 variables were used.
Target condition and reference standard(s)	All eyes were examined by rotating Scheimpflug corneal and anterior segment tomography (Pentacam HR, Oculus Optikgerate GmbH). Image quality was checked to ensure that only cases with acceptable‐quality images were included. All cases were reviewed by an experienced fellowship‐trained corneal specialist (G.C.A.J.) for correct classification into keratoconus and VAE‐NT groups. Objective criteria for considering normal topography included objective front surface curvature metrics derived from Pentacam. Normal topography criteria were rigorously considered based on the objective criteria of a maximum keratometry curvature (Kmax; steepest front keratometry) of < 47.2 D, a paracentral I‐S asymmetry value at 6 mm (3 mm radii) of < 1.45, and a keratoconus percentage index score of < 60. An objective criterion for normal tomography criteria was adopted for the control group and the VAE‐NT group, and the maximum values were 3.8 mm for anterior chamber depth, 4 mm for front apical elevation, 5 mm for front corneal elevation at the thinnest point, and 12 mm for front corneal elevation in the central 4.0 mm. The corresponding posterior elevation values were 7 mm, 13 mm, and 25 mm.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	This work was supported by the Sao Paulo State Research Support Foundation (FAPESP, grant nos: 2015/17226‐7 and 2019/04475‐0) and the National Council for Scientific and Technological Development (CNPq, grant no: 306808/2018‐8.). The funding organizations had no role in design or conduct of this research, and they have no related commercial interests.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	Unclear
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Unclear
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Unclear
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	No
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		High risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.
Are the proportions and reasons for missing data similar for all index tests?

*Study characteristics*
Patient Sampling	Retrospective case series. Clinical data and corneal examinations were retrieved from clinical records from 2 centres (Oman and Italy). 3502 eyes were enroled.
Patient characteristics and setting	According to the clinical diagnosis, participants were classified into the following 4 groups. Keratoconus Subclinical keratoconus (early keratoconus, forme fruste and suspected) Other conditions (history of refractive surgery, penetrating keratoplasty, or ocular trauma) Normal eyes, enroled among subjects undergoing a routine ophthalmological examination for minor refractive defects. Each group was divided into a training set (including 200 eyes) to be used to develop the keratoconus detection program and a validation set (including the remaining eyes). Participants were excluded if tomography scans had poor quality.
Index tests	Classification algorithm based on support vector machine (SVM), a supervised learning technique that can be used for pattern classification. It analysed symmetry index of front and back corneal curvature, best fit radius of the front corneal surface, Baiocchi Calossi Versaci front index (BCVf) and BCV back index (BCVb), root‐mean‐square of front and back corneal surface higher order aberrations, and thinnest corneal point provided by a Scheimpflug camera combined with Placido corneal topography (Sirius, CSO, Italy).
Target condition and reference standard(s)	Unclear who performed the classification of the eyes, which was done before the analysis with the algorithm.
Flow and timing	It is unclear if all cases received the same reference standard. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	No funding source mentioned.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	No
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Unclear
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Unclear risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective, case‐control study including the following groups. People with subclinical keratoconus recruited from public and private clinics at the Royal Victorian Eye and Ear Hospital and private consulting rooms and optometry clinics in Melbourne, Australia, as part of the Australian Study of Keratoconus Controls recruited from the GEnes in Myopia study
Patient characteristics and setting	Subclinical keratoconus was defined as eyes with abnormal corneal topography, including I‐S localized steepening or asymmetric bowtie pattern and no detectable clinical signs. The control group consisted of refractive error subjects with no ocular disease that may affect refraction in the eyes.
Index tests	Random forest method using 11 tomographic parameters (Pentacam) for the diagnosis of subclinical keratoconus.
Target condition and reference standard(s)	Subclinical keratoconus was defined as those eyes with abnormal corneal topography, including I‐S localized steepening or asymmetric bowtie pattern and no detectable clinical signs. It is unclear how the diagnosis was made; however, cases were classified before the inclusion.
Flow and timing	All cases were included in reference standard and index test. All data were included in a 2 × 2 table.
Comparative	It is unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other, and whether missing data and their causes were similar for each AI test.
Notes	This study was supported by the Australian National Health and Medical Research Council (NHMRC) project Ideas grant APP1187763 and Senior Research Fellowship (1138585 to PNB), the Louisa Jean De Bretteville Bequest Trust Account, University of Melbourne, the Angior Family Foundation, Keratoconus Australia, Perpetual Impact Philanthropy grant (SS), and a Lions Eye Foundation Fellowship (SS). The Centre for Eye Research Australia (CERA) receives Operational Infrastructure Support from the Victorian Government.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Unclear
		Unclear risk

*Study characteristics*
Patient Sampling	Retrospective, single‐centre case‐control study. 80 corneal maps were selected from the database of the EyeSys System 2000 (EyeSys Vision, Houston, TX) topographer in Brazil.
Patient characteristics and setting	80 corneal maps of different people were selected according to the following 5 categories (16 corneas for each category). Regular cornea With‐the‐rule astigmatisms Against‐the‐rule astigmatisms Keratoconus Post‐LASIK Criteria for diagnosis of keratoconus are unclear. Corneal maps had few or no nose or eyelid shadows; only the right eye of each person was allowed. Right and left eyes of the same person were not used. The investigators excluded corneas with incipient keratoconus, keratoconus with high degrees of symmetrical astigmatism, and other cases for which a single prevailing diagnosis could not be issued. In the case of regular profiles, given that even the most symmetrical corneas have some degree of with‐the‐rule astigmatism, only corneas with simulated keratometry 0.25 D were considered "regular" or "normal."
Index tests	A neural network which used the first 15 Zernike coefficients
Target condition and reference standard(s)	The selection and classification were performed by 2 eye care specialists, with unclear criteria. However, cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	Supported in part by FAPESP (São Paulo Research Foundation) process #03132–8.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective single‐centre case‐control study in 60 eyes from 60 people from the Department of keratoconus of INVISION Ophthalmology clinic in Almería, Spain
Patient characteristics and setting	Participants were divided into the following 2 groups depending on their preliminary diagnosis based on the classical topographic criteria. Control group without topographic alteration (30 eyes) Keratoconus group (30 eyes). The keratoconus group included people with asymmetric bow tie in the topographic image and ≥ 1 sign of keratoconus in the examination with the slit lamp, such as stromal thinning, conical protrusion of the cornea at the apex, Fleischer's ring, Vogt's striae, or anterior stromal scar. Grade 4 keratoconus with excessively distorted corneal topography was excluded. All cases were examined using the CSO topography system (CSO, Firenze, Italy).
Index tests	Bayesian network classifier for keratoconus identification that uses previously developed topographic indices, calculated directly from the digital analysis of the Placido ring images.
Target condition and reference standard(s)	It is unclear who performed the selection and classification. However, cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	This research was partially supported by the Andalucian regional government (grant PIN‐0530‐2017). Research of A.M.‐F. and D.R.‐L. was also supported in part by the Spanish Government – European Regional Development Fund (grant MTM2017‐89941‐P), the Andalucian regional government (research group FQM‐229), and the University of Almería (Campus de Excelencia Internacional del Mar CEIMAR). A.M.‐F. acknowledges an additional support from the Carlos I Institute of Theoretical and Computational Physics, while D.R.‐L. thanks the support from CDTIME (Center for Development and Transfer of Mathematical Research to Industry, University of Almería).
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	No
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Unclear
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Single‐centre case‐control study in Spain, including 464 eyes of 464 participants
Patient characteristics and setting	Participants were divided into the following 2 groups. Control group (143 healthy eyes) Keratoconus group (321 keratoconus eyes)
Index tests	A model of detection of early keratoconus (only grade 1) obtained by logistic regression considering the new parameters defined according to a new geometric approach
Target condition and reference standard(s)	The standard criteria for keratoconus diagnosis were the presence of an asymmetric bowtie pattern in corneal topography, KISA% index ≥ 100%, a central keratometry with different cut‐off values to keratoconus suspect (> 47.2 D), an I‐S asymmetry with a cut‐off value of 1.4 D difference between average inferior and superior corneal powers at 3 mm from the centre of the cornea, as well as other topographic indices and ≥ 1 keratoconus sign on slit‐lamp examination, such as stromal thinning, conical protrusion on the cornea at the apex, Fleischer's ring, Vogt's striae, or anterior stromal scar. Corneal analysis was performed by the Sirius system (CSO, Italy). It seems that a single experienced examiner was involved in selection and classification. However, cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	The study was carried out in the framework of the Thematic Network for Co‐Operative Research in Health (RETICS) reference number RD12/0034/0007 and RD16/0008/0012, financed by the Carlos III Health Institute ± General Subdirection of Networks and Cooperative Investigation Centers (R&D&I National Plan 2008±2011) and the European Regional Development Fund (FEDER). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	No
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective case‐control study; 128 topographic images of 128 people were selected at the Singapore National Eye Center
Patient characteristics and setting	The forme fruste keratoconus group (24 images) involved clinically and topographically normal eyes with the contralateral eye showing frank keratoconus. These cases were obtained from the database of people with keratoconus from the Singapore National Eye Center. The diagnosis of keratoconus in the contralateral eye was reconfirmed by clinical examination and evaluation of topographies from the Orbscan IIz corneal topography system (Bausch + Lomb TechnoLas, Munich, Germany) and Tomey keratoconus screening system (TMS, software version 2.4.2J, Tomey TMS‐2N; Tomey Corp, Nagoya, Japan) by a corneal sub‐specialist. The control group (104 images) involved normal preoperative topographies of people who had myopic LASIK (with or without astigmatism) performed at least 4 years before with no resultant ectasia.
Index tests	The SCORE Analyzer is based on a linear regression analysis that constructs a set of linear functions of variables known as discriminant functions. It combines 12 Placido and tomographic indices in a weighted fashion to classify corneas as suspicious for keratoconus or normal.
Target condition and reference standard(s)	Clinically evident keratoconus was defined by evidence of ≥ 1 slit‐lamp biomicroscopic findings including conical protrusion of the cornea at the apex, Fleischer's ring, Vogt's striae, and corneal stromal thinning. The classification was performed by 1 corneal specialist. Cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	No funding source mentioned.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	No
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		High risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective case‐control study, involving 439 eyes from 2 centres (India and Brazil). Comparison between 4 AI models.
Patient characteristics and setting	Keratoconus Form fruste keratoconus Very asymmetric eyes with normal topography (VAE‐NT)
Index tests	Random forest models based on Pentacam (Oculus GmbH, Wetzlar, Germany) or OCT parameters (OCT topography of the Bowman's layer)
Target condition and reference standard(s)	Keratoconus group: corneas with significant inferior steepening, asymmetric astigmatism, and corneal thinning on both OCT (RTVue, Optovue Inc., Irvine) and Scheimpflug (Pentacam, OCULUS Optikgerate). Form fruste keratoconus: very mild localized steepening and suspicious anterior corneal surface topography on both devices. Very asymmetric eyes with normal topography (VAE‐NT): fellow eyes of people with highly asymmetric keratoconus. Examination of topographies of the anterior surface was performed by only 1 experienced refractive surgeon, who was masked to the information (disease present in 1 or both eyes) about the participants and the eyes. Classification was performed before the index tests.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	It is unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other. Missing data and their causes were similar for each AI test.
Notes	Indo‐German Science and Technology Center, Grant/Award Number: SIBAC
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	No
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		High risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Yes
		Unclear risk

*Study characteristics*
Patient Sampling	Retrospective case‐control study, single‐centre (France) involving 208 corneal topographies (EyeSys System 2000) of 208 corneas from 8 groups of participants.
Patient characteristics and setting	Participants were classified by the following diagnoses. Normal Regular astigmatism Cataract surgery Radial keratotomy Excimer laser photorefractive keratectomy Non‐freeze myopic keratomileusis Penetrating keratoplasty Keratoconus
Index tests	Binary decision tree. In the first step, the distribution of keratoconic and non‐keratoconic patterns was studied based on the value of each index in the training set. For each index, corneas with an index value higher than the threshold (or cut‐off value) were classified as keratoconic corneas (positive test), whereas corneas with an index value less than the threshold were classified as non‐keratoconic (negative test). In the second step, binary decision trees were built by combining 2 indices to improve the classification method. The 6 indices with the highest sensitivity and specificity were used in these models. The first index was used to divide the training set into 2 populations (i.e. population with a positive test and population with a negative test) based on the previously calculated optimum threshold. In each of these populations, the distribution of keratoconic and non‐keratoconic patterns was studied based on the value of the second index. In each population, sensitivity and specificity curves as a function of the second index threshold were generated to evaluate the optimum cut‐off value. This resulted in 2 thresholds according to the response to the first test. In fact, the second index's most efficient threshold (i.e. the threshold with maximum sensitivity and specificity) in the population with a positive test was different from that in the population with a negative test. A cornea was classified as keratoconic when the second test was positive.
Target condition and reference standard(s)	Maps were classified by 2 cornea specialists based on clinical records and topographic appearances, before the index test.
Flow and timing	All cases were included in reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other, and whether missing data and their causes were similar for each AI test.
Notes	Supported in part by the Fondation Claude Bernard, Paris, France.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Unclear
		Unclear risk

*Study characteristics*
Patient Sampling	Multicentre retrospective case‐control study comparing 4 convolutional neural network methods. It included 1926 images from the Pentacam (Oculus GmbH, Wetzlar, Germany) of keratoconic and healthy volunteers' eyes provided by 3 centres (UK, Iran, New Zealand).
Patient characteristics and setting	Keratoconic scans were classified according to the Amsler‐ Krumeich classification. Only scans of acceptable quality were included. Control group: subjects with a BAD‐D < 1.6 SDs from normative values.
Index tests	Convolutional neural network method that uses 4 colour‐coded corneal maps obtained by a Scheimpflug camera (Pentacam)
Target condition and reference standard(s)	The definition of keratoconus is unclear. Unclear who performed the classification. However, cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other. Missing data and their causes were similar for each AI test.
Notes	The study authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not‐for‐profit sectors.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Yes
		Unclear risk

*Study characteristics*
Patient Sampling	Single centre, retrospective, case‐control study that evaluated 8526 corneal tomography examinations of 2525 participants obtained between November 2010 and July 2017 with the Galilei dual Scheimpflug/Placido disc analyzer system (software version 5.2.1; Ziemer Ophthalmic Systems, Port, Switzerland). Low‐quality samples were excluded.
Patient characteristics and setting	Of the 7104 included samples: 4088 were labelled as normal; 1299 were labelled as suspect irregular cornea; and 2614 were labelled as keratoconus. Label distribution was similar in train and test sets.
Index tests	Random forest; the model integrated keratoconus prediction indexes of the device in addition to the 94 instrument‐derived output parameters. The model was first trained and tested, then validated with a separate validation set.
Target condition and reference standard(s)	All images were graded by a single cornea specialist (D.V.). A normal cornea would have a regular spherical or spherocylindrical curvature, thinning toward the centre without epicentral posterior or anterior elevation, with relatively normal numerical values. A suspected irregular cornea describes an at‐risk cornea. Such a cornea may have subtle inconclusive signs like I‐S values outside the reference range or aberrant C‐shaped or mild posterior surface elevations. Alternatively, a suspected irregular cornea may have an unusual corneal thinning.
Flow and timing	All cases were included in reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	All study authors declared that they received no grant support or research funding for the study. All study authors certified that they had no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent‐licensing arrangements), or non‐financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in the manuscript.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	Unclear
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		Unclear risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Unclear
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	No
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		High risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.
Are the proportions and reasons for missing data similar for all index tests?

*Study characteristics*
Patient Sampling	Prospective case‐control study, involving 50 eyes selected in a single centre in Belgium
Patient characteristics and setting	Scheimpflug single‐image snapshots obtained with Corvis‐ST (Oculus, Germany) were analysed and grouped as follows. Normal (25 eyes): intraocular pressure of 15–17 mmHg and corneal astigmatism < 0.75 D Keratoconus (25 eyes): various stages of keratoconus, including clear cornea, non‐severe corneal thinning, Fleischer's ring at the apex, and anterior or posterior corneal steepening
Index tests	The combination of central corneal thickness with microscopic parameters extracted from statistical modelling of light intensity distribution
Target condition and reference standard(s)	The definition of keratoconus is unclear. The classification was performed by only 1 experienced ophthalmologist, before the index test.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable.
Notes	This project received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 779960 and support from the Statutory Funds of Wroclaw University of Science and Technology. MW received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 666295 and from the financial resources for science in the years 2016 to 2019 awarded by the Polish Ministry of Science and Higher Education for the implementation of an international co‐financed project. JJR received a grant from the Flemish Fund for Scientific Research (FWO‐TBM T000416N).
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	No
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		High risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Prospective single‐centre case‐control study. 210 eyes of 210 people were included in 1 centre in Iran.
Patient characteristics and setting	Refractive surgery candidates (normal subjects) People affected by keratoconus People affected by subclinical keratoconus
Index tests	Logistic regression analysis of sets of parameters obtained with Galilei dual Scheimpflug system (Ziemer Ophthalmic System AG, Port, Switzerland)
Target condition and reference standard(s)	The diagnosis of subclinical keratoconus and keratoconus was based on clinical slit‐lamp findings (stromal thinning, conical protrusion, Fleischer's ring, and Vogt's striae) and characteristic patterns based on Placido disc corneal topography (Tomey, EM‑3000, version 4.20, Nagoya, Japan). Participants who had 1 abnormal biomicroscopic finding and 1 major or 2 minor criteria were diagnosed with keratoconus. Participants with a normal appearing cornea and 1 major or 2 minor topographic criteria were diagnosed with subclinical keratoconus. It seems that a single experienced examiner was involved in classification. However, cases were classified before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	No funds, grants, or other support were received.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	No
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			High
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective, single‐centre (Iran), case‐control study. A data set of 450 people with keratoconus and healthy controls was analysed.
Patient characteristics and setting	Data set of people with keratoconus and healthy controls. Scheimpflug tomographic images (Pentacam HR (Oculus GmbH, Wetzlar, Germany)) were analysed.
Index tests	Ensemble learning system, based on combining multiple initial classifiers as experts for primary classification and a combination rule for combining results of classifiers.
Target condition and reference standard(s)	Unclear definition of keratoconus. Unclear who performed the classification; it seems 1 cornea specialist. Unclear if reference standard results were interpreted without knowledge of the results of the index test.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	No funds, grants, or other support were received.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Unclear
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Unclear
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

PERMALINK

Artificial intelligence for detecting keratoconus

Magali MS Vandevenne

Eleonora Favuzza

Mitko Veta

Ersilia Lucenteforte

Tos TJM Berendschot

Rita Mencucci

Rudy MMA Nuijts

Gianni Virgili

Mor M Dickman

Abstract

Background

Objectives

Search methods

Selection criteria

Data collection and analysis

Main results

Authors' conclusions

Plain language summary

Summary of findings

Summary of findings 1. Summary of findings: artificial intelligence for the detection of keratoconus in refractive surgery candidates and people with refractive errors.

Background

Target condition being diagnosed

Index test(s)

1.

Clinical pathway

Rationale

Objectives

Secondary objectives

Methods

Criteria for considering studies for this review

Types of studies

Participants

Index tests

Target conditions

Reference standards

Search methods for identification of studies

Electronic searches

Searching other resources

Data collection and analysis

Selection of studies

Data extraction and management

Assessment of methodological quality

Statistical analysis and data synthesis

Investigations of heterogeneity

Sensitivity analyses

Certainty of the evidence assessment

Assessment of reporting bias

Results

Results of the search

2.

Included studies

Excluded studies

Methodological quality of included studies

3.

4.

Findings

Characteristics of included studies

1. Study characteristics.

Detection of manifest keratoconus

5.

Detection of subclinical keratoconus

6.

Detection of mixed keratoconus

7.

Subgroup analyses

2. Subgroup analyses.

Sensitivity analyses

Discussion

Summary of main results

Subgroup analyses

Strengths and weaknesses of the review

Comparison with previous research

Applicability of findings to the review question

Authors' conclusions

Implications for practice.

Implications for research.

History

Acknowledgements

*Study characteristics*
Patient Sampling	Single‐centre (Japan), retrospective case‐control study, which included a total of 304 keratoconic eyes and 239 healthy eyes (refractive surgery candidates and contact lens fitting candidates)
Patient characteristics and setting	The data of people with keratoconus who underwent corneal tomography obtained by a swept‐source anterior segment OCT (CASIA SS‐1000, Tomey, Aichi, Japan) between March 2013 and April 2018 at Miyata Eye Hospital were retrospectively reviewed. 304 keratoconic eyes with good quality scans of corneal tomography were enroled and divided according to the Amsler‐Krumeich classification, as follows. Grade 1 (108 eyes) Grade 2 (75 eyes) Grade 3 (42 eyes) Grade 4 (79 eyes) The control group comprised 239 eyes in subjects with normal corneal and ocular findings applying for a contact lens fitting or a refractive surgery consultation.
Index tests	Deep learning (convolutional neural network) of the arithmetical mean output data of 6 colour‐coded maps of an anterior segment OCT.
Target condition and reference standard(s)	Diagnosis of keratoconus was performed based on evident findings characteristic of keratoconus (e.g. corneal tomography with asymmetric bowtie pattern with or without skewed axes), and ≥ 1 keratoconus sign (e.g. stromal thinning, conical protrusion of the cornea at the apex, Fleischer's ring, Vogt's striae, or anterior stromal scar) on slit‐lamp examination. It is unclear how many corneal specialists classified the cases. However, classification was performed before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	The study authors declared no specific grant for this research from any funding agency in the public, commercial or not‐for‐profit sectors.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	Unclear
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective, single‐centre (Japan), case‐control study involving 349 keratoconus eyes and 170 normal eyes (refractive surgery candidates, contact lens fitting candidates).
Patient characteristics and setting	A total of 349 eyes with good‐quality images of corneal topography measured with a Placido disc corneal topographer (TMS‐4 TM, Tomey, Aichi, Japan) were included. The disease was graded according to the Amsler‐Krumeich classification, as follows. Grade 1 (54 eyes) Grade 2 (52 eyes) Grade 3 (23 eyes) Grade 4 (50 eyes) Control group: 170 eyes in people with normal ocular findings applying for a contact lens fitting or for a refractive surgery consultation, who had a refractive error of < 6 D as well as astigmatism of < 3 D.
Index tests	Deep learning (convolutional neural network) of a single colour‐coded topography map.
Target condition and reference standard(s)	Multiple corneal specialists diagnosed keratoconus with distinctive features (e.g. corneal colour‐coded map with asymmetric bowtie pattern with or without skewed axes), and ≥ 1 keratoconus sign (e.g. stromal thinning, conical bulging, Fleischer's ring, Vogt's striae, or apical scar). It is unclear how many corneal specialists classified the cases; however, classification was performed before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	This work was in part supported by Grants‐in‐Aid for Scientific Research (Grant Number 21K09706).
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Yes
Was the model designed in an appropriate manner?	Unclear
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Multicentre (Japan), retrospective, case‐control study which included 329 eyes (healthy controls and people with keratoconus).
Patient characteristics and setting	People with keratoconus were aged < 50 years, visited 1 of the facilities between January 2015 and December 2018, were diagnosed with keratoconus, Amsler‐Krumeich classification stage 1 (average K value < 48 D and corneal astigmatism < 5 D), had no ocular diseases other than keratoconus and refractive error, and had no history of ocular surgery. Healthy subjects were people aged < 50 years who had undergone an ophthalmic screening.
Index tests	Multivariate logistic regression analysis to create an equation that predicts early keratoconus (keratometer keratoconus index) using auto‐keratometer parameters.
Target condition and reference standard(s)	Keratoconus diagnosis was based on corneal topography or tomography results (corneal steepening and asymmetric astigmatism, protrusion of the posterior cornea, and thinning of the cornea at the area of protrusion) and slit‐lamp findings. 2 corneal specialists classified the cases before inclusion.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	This research did not receive any specific grant from funding agencies in the public, commercial, or not‐for‐profit sectors.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Yes
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	No
Was the model designed in an appropriate manner?	No
Could the conduct or interpretation of the index test have introduced bias?		High risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Retrospective, single‐centre (Hungary), case‐control study, which involved 135 eyes of people with bilateral keratoconus (keratoconus group), normal fellow eyes of people with unilateral keratoconus (fellow‐eye group), and eyes of refractive surgery candidates (control group).
Patient characteristics and setting	Keratoconus group: 60 eyes of 30 people with bilateral keratoconus Fellow‐eye group: 15 normal fellow eyes of people with unilateral keratoconus Control group: 60 eyes of 30 refractive surgery candidates
Index tests	Multilayer perception classifier (neural network) trained on bilateral data of index of height decentration.
Target condition and reference standard(s)	Keratoconus was diagnosed according to classic corneal biomicroscopic and topographic findings using the criteria of Rabinowitz: the existence of central protrusion of the cornea with Fleischer's ring, Vogt's striae, or both, by slit‐lamp examination in addition to the following topographic findings: a central keratometry (K) value > 47.2 D or an I‐S value > 1.4 D, or KISA% > 100%. Both eyes in the keratoconus group and the affected eye in the unilateral keratoconus group had abnormal keratoconus indices measured by a Scheimpflug camera (Pentacam HR). It is unclear who classified the cases. Classification was performed before the index test.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other. However, missing data and their causes were similar for each AI test.
Notes	Supported by OTKA NN106649 from the Hungarian Scientific Research Fund. The funder had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Yes
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	No
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Yes
		Unclear risk

*Study characteristics*
Patient Sampling	Retrospective, single‐centre (Taiwan), case‐control study. The investigators retrospectively collected corneal topographies (TMS‐4; Tomey Corporation, Nagoya, Japan) of the study group with clinically manifested keratoconus, the subclinical keratoconus group, and the control group with regular astigmatism (354 images of 206 participants).
Patient characteristics and setting	170 keratoconus pictures 28 subclinical keratoconus pictures (criteria based on topographic pattern and no slit‐lamp keratoconus findings) 156 normal topographic pictures (from candidates for refractive surgery without any previous manifestations and with regular astigmatism)
Index tests	3 convolutional neural network models
Target condition and reference standard(s)	The diagnosis of keratoconus was based on clinical signs (the existence of central protrusion of the cornea, Fleischer's ring, Vogt's striae, and focal corneal thinning on slit‐lamp examination) and topographic criteria (central K value > 47 D, I‐S value > 1.4 D, KISA% >100%, and asymmetric bowtie presentation). 4 corneas specialists classified the cases before the index test.
Flow and timing	All cases were included in the reference standard and index test. All data were included in a 2 × 2 table.
Comparative	Unclear whether different AI tests were developed and interpreted blind or independently and without knowledge of the results of each other. However, missing data and their causes were similar for each AI test.
Notes	Supported by Grants 107L891002 and 108L891002 from National Taiwan University.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Yes
		Unclear risk

*Study characteristics*
Patient Sampling	Retrospective case‐control study. Pentacam data (Oculus GmbH, Wetzlar, Germany) obtained from people screened for keratoconus disease in Brazil. Elevation, topography, and pachymetry parameters were obtained from 5881 eyes of 2800 participants.
Patient characteristics and setting	Participant characteristics and setting are not clearly described in the study. Data seems to originate from a larger data set which included people with keratoconus and healthy controls.
Index tests	Support vector machine that uses elevation, topography or pachymetry parameters obtained from the raw data of the Pentacam to detect keratoconus. The workflow for the development of the algorithm was as follows: splitting the initial data set in elevation, topography and pachymetry data sets; data cleaning and elimination; feature selection; machine learning validation; and performance evaluation. It is unclear whether different data were used for testing and validating the model.
Target condition and reference standard(s)	The target condition was keratoconus; however, the article provided no definition. Tomography images of the Pentacam were used in this study. It is unclear who interpreted the images and made the diagnosis; however, the diagnosis was made before the algorithms analysed the images.
Flow and timing	The article did not describe the reference standard, nor did it describe whether all participants received the same reference standard. All data were included in a 2 × 2 table.
Comparative	In total, 6 algorithms were developed, tested, and compared: decision tree, discriminant naïve Bayes, support vector machine, k‐nearest neighbour, and ensemble.
Notes	This work was supported in part by a grant from the Romanian Ministry of Research and Innovation, CCCDI‐UEFISCDI, within PNCDI III, under Project PN‐III‐P2‐2.1‐PTE‐2019‐0642, and in part by the Romania National Council for Higher Education Funding, CNFIS, under Project CNFIS‐FDI‐2021‐0357.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Unclear
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Unclear
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Unclear
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative
Were different AI tests were developed and interpreted without knowledge of each other.	Unclear
Are the proportions and reasons for missing data similar for all index tests?	Yes
		Unclear risk

*Study characteristics*
Patient Sampling	Retrospective, case‐control study. A diagnostic pattern bank generated by a specialist physician. The database consisted of 1172 examples of corneal topography, divided into 275 spherical patterns, 302 regular symmetrical astigmatism patterns, 295 regular asymmetrical astigmatism patterns, and 300 irregular astigmatism patterns (keratoconus).
Patient characteristics and setting	This study is a registry‐based study; the registry contained healthy controls and people with keratoconus.
Index tests	Convolutional neural network. The algorithm is a semi‐automatic, manual interference model. It uses a hierarchical system that tries to represent the structure in relation to the recognition of an image, where pixels form edges, edges form patterns, patterns form objects, which in turn describe the scenes. The algorithm analyses the topographic images and decides whether it is regular or irregular astigmatism (keratoconus). The algorithm was developed with a training phase and a validation phase.
Target condition and reference standard(s)	A specialist physician developed the diagnostic pattern bank of images made by topographers. He divided the included topographies into the following 4 groups. Spherical patterns Regular symmetrical astigmatism patterns Regular asymmetrical astigmatism patterns Irregular astigmatism patterns (keratoconus) The cases were divided before the algorithm analysed the images.
Flow and timing	All topographies included in the diagnostic pattern bank were judged by the specialist physician and all images were included in the analysis.
Comparative	Not applicable
Notes	This study was supported by Government of the State of Ceará (Foundation for the Support to Scientific and Technological Development of Ceará), and Ophthalmology School of Ceará.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	Unclear
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Unclear
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Unclear
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			High
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	No
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		High risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Unclear
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Single‐centre, case‐control study. Videokeratographs were drawn from the Louisiana State University Eye Center patient population and were divided randomly by category into 2 sets. Each set comprised 8 categories: normal, keratoconus, keratoplasty, epikeratophakia, excimer laser photorefractive keratectomy, radial keratotomy, contact lens‐induced warpage, and other.
Patient characteristics and setting	The study included the following categories of participants. Healthy controls People with keratoconus People who underwent refractive surgery or who had undergone a keratoplasty The study did not include people seeking refractive surgery.
Index tests	Combined discriminant analysis and classification tree analysing images from the TMS‐1. The keratoconus detection programme was developed using a training set of 100 corneas and evaluated with a validation set of an additional 100 corneas. Maps were first classified as either keratoconus, borderline, or non‐keratoconus. The borderline maps were then divided into keratoconus or non‐keratoconus by certain indices. Next, all keratoconus patterns were classified into either peripheral or central keratoconus using a threshold combination of these indices. Final output of the system was the display of the certainty of keratoconus.
Target condition and reference standard(s)	The included cases were diagnosed by 3 corneal topography researchers based on clinical records and topography. All images were made by a TMS‐1. The diagnosis was made before the algorithm analysed the images.
Flow and timing	All participants received the index and reference test. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	Supported in part by National Institutes of Health grants EY03311 and EYO2377 and by Computed Anatomy, Inc. and Menicon, Co., Ltd.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	No
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Unclear
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Yes
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Low risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Single‐centre, case‐control study. Corneal topographic maps of the TMS‐1 were drawn from the Louisiana State University Eye Center patient population.
Patient characteristics and setting	Maps from 176 eyes of 125 people were selected and grouped as follows. 44 with topographic features typical of keratoconus 132 with topographic features typical of a variety of non‐keratoconus conditions (normal, with‐the‐rule‐astigmatism, contact lens‐induced warpage, excimer laser photorefractive keratectomy, penetrating keratoplasty, and pellucid marginal degeneration). Maps of eyes with keratoconus were selected from the charts of people previously diagnosed as having keratoconus in our clinic.
Index tests	Combined discriminant analysis and classification tree, based on topographic images. The algorithm determined whether a keratoconus‐like pattern was seen in a particular map in the binary classification tree and, if so, reported a value between 5% and 95% in proportion to the linear discriminant function to quantify the severity of the keratoconus pattern.
Target condition and reference standard(s)	Topography images were made with the TMS‐1. It is unclear who diagnosed the cases; however, the diagnosis was made before the algorithm analysed the images.
Flow and timing	It is unclear whether all eyes were diagnosed with the same reference standard. All data were included in a 2 × 2 table.
Comparative	Not applicable
Notes	This study was supported in part by US Public Health Service grants EY03311 and 02377 from the National Eye Institute, National Institutes of Health, Bethesda, Md; an unrestricted departmental grant from Research to Prevent Blindness Ine, New York, NY; and funds from Computed Anatomy Ine, New York, NY, and Menicon Co, Ltd, Nagoya, Japan. The Conecare data analysis software used in this study was provided courtesy of Yaron S. Rabinowitz, MD.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Unclear
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Unclear risk
DOMAIN 5: Comparative

*Study characteristics*
Patient Sampling	Participants were selected from 4 different hospitals: Department of Ophthalmology of the Ohio State University, Columbus, Ohio; Clinica de Oftalmologia de Cali, Pontificia Universidad Javeriana, Cali, Colombia; Cullen Eye Institute, Department of Ophthalmology, Baylor College of Medicine, Houston, Texas; and Department of Ophthalmology, University Hospital of Bern, Inpelspital, Bern, Switzerland.
Patient characteristics and setting	People with keratoconus were clinically identified by characteristic refractive and slit‐lamp signs (e.g. unstable refraction; oblique astigmatism; irregular retinoscopic and keratometry mires; and biomicroscopic signs such as Vogt's striae or Fleischer's ring) and no history of corneal surgery. Normal subjects had no documented history of corneal disease or corneal surgery.
Index tests	Logistic regression. The algorithm included both anterior and posterior curvature maps; results were divided into 3 categories: normal (0–0.25), suspect (0.25–0.8), and keratoconus (0.8–1.0).
Target condition and reference standard(s)	Tomography images from the Galilei Dual Scheimpflug‐Placido tomographer were used. It was unclear how the reference standard made the diagnosis; however, all cases were diagnosed before the index test analysed them.
Flow and timing	It was unclear whether all participants received the same reference standard, but all cases were diagnosed before inclusion.
Comparative	Not applicable
Notes	No funding source mentioned.
*Methodological quality*
Item	Authors' judgement	Risk of bias	Applicability concerns
DOMAIN 1: Patient selection
Was a consecutive or random sample of patients enrolled?	No
Was a case‐control design avoided?	No
Did the study avoid inappropriate exclusions?	Unclear
Could the selection of patients have introduced bias?		High risk
Are there concerns that the included patients and setting do not match the review question?			High
DOMAIN 2: Index test (All tests)
Were the index test results interpreted without knowledge of the results of the reference standard?	Unclear
If a threshold was used, was it pre‐specified?	Unclear
Was the model designed in an appropriate manner?	Yes
Could the conduct or interpretation of the index test have introduced bias?		Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?			Low concern
DOMAIN 3: Reference standard
Is the reference standard likely to correctly classify the target condition?	Unclear
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?		Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?			Low concern
DOMAIN 4: Flow and timing
Did all patients receive the same reference standard?	Unclear
Were all patients included in the analysis?	Yes
Could the patient flow have introduced bias?		Unclear risk
DOMAIN 5: Comparative