Lopes 2018.
| Study characteristics | |||
| Patient Sampling | Retrospective, multicentre, case‐control study. A total of 3693 people were enroled from 5 different clinics. Participants were divided into 2 data sets: 1 training set and 1 validation set. The training set included the preoperative data of the following 3 groups.
The algorithm was independently tested in a different set of stable LASIK cases and people with very asymmetric ectasia; these people had clinically diagnosed ectasia in 1 eye and normal topography in the fellow eye. |
||
| Patient characteristics and setting | The participants were grouped as follows.
|
||
| Index tests | Random forest: multiple decision trees were built and merged to improve accuracy of the prediction. 2 steps of validation were used to assess the generalization and clinical validity of the models and their ability to correctly classify new data. The first was a holdout validation: the training set was randomly split into 2 data sets: the first comprised 70% of the total data set and was used to actually train the models; the other 30% was used to test the model accuracy. The second validation step was an independent test with cases that were not part of the training set. The algorithm analysed the raw tomographic data to identify the different patterns and detect keratoconus. | ||
| Target condition and reference standard(s) | All eyes were examined by rotating Scheimpflug corneal and anterior segment tomography (Pentacam HR; Oculus GmbH, Wetzlar, Germany). Image quality was checked, so that only cases with acceptable‐quality images were included in the study. 1 experienced fellowship‐trained corneal specialist reviewed all the cases so that they were correctly classified in the keratoconus and very asymmetric ectasia groups. All cases were diagnosed before the algorithm analysed the images. | ||
| Flow and timing | All eyes received the reference standard and were included in the 2 × 2 table of the index test. | ||
| Comparative | 5 models were developed and compared: regularized discriminant analysis (RDA), support vector machine (SVM), naïve Bayes (NB), neural networks (NN), and random forest (RF). It is unclear if all tests were developed and interpreted without knowledge of each other and if all data were used for each test. | ||
| Notes | No funding or grant support. | ||
| Methodological quality | |||
| Item | Authors' judgement | Risk of bias | Applicability concerns |
| DOMAIN 1: Patient selection | |||
| Was a consecutive or random sample of patients enrolled? | No | ||
| Was a case‐control design avoided? | No | ||
| Did the study avoid inappropriate exclusions? | Unclear | ||
| Could the selection of patients have introduced bias? | High risk | ||
| Are there concerns that the included patients and setting do not match the review question? | High | ||
| DOMAIN 2: Index test (All tests) | |||
| Were the index test results interpreted without knowledge of the results of the reference standard? | Yes | ||
| If a threshold was used, was it pre‐specified? | Unclear | ||
| Was the model designed in an appropriate manner? | Yes | ||
| Could the conduct or interpretation of the index test have introduced bias? | Low risk | ||
| Are there concerns that the index test, its conduct, or interpretation differ from the review question? | Low concern | ||
| DOMAIN 3: Reference standard | |||
| Is the reference standard likely to correctly classify the target condition? | No | ||
| Were the reference standard results interpreted without knowledge of the results of the index tests? | Yes | ||
| Could the reference standard, its conduct, or its interpretation have introduced bias? | High risk | ||
| Are there concerns that the target condition as defined by the reference standard does not match the question? | Low concern | ||
| DOMAIN 4: Flow and timing | |||
| Did all patients receive the same reference standard? | Unclear | ||
| Were all patients included in the analysis? | Yes | ||
| Could the patient flow have introduced bias? | Unclear risk | ||
| DOMAIN 5: Comparative | |||
| Were different AI tests were developed and interpreted without knowledge of each other. | Unclear | ||
| Are the proportions and reasons for missing data similar for all index tests? | Unclear | ||
| Unclear risk | |||