Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Feb 10.
Published in final edited form as: JAMA Ophthalmol. 2020 May 1;138(5):526–527. doi: 10.1001/jamaophthalmol.2020.0515

Artificial Intelligence for Refractive Surgery Screening: Finding the Balance Between Myopia and Hype-ropia

Travis K Redd 1, J Peter Campbell 1, Michael F Chiang 1
PMCID: PMC7875088  NIHMSID: NIHMS1666041  PMID: 32215599

Research involving applications of artificial intelligence (AI) in health care has expanded rapidly in recent years. In particular, deep learning methods using convolutional neural networks has demonstrated the ability to achieve expert-level performance in image-based diagnosis. Although the highest-profile AI work in ophthalmology has applied this technology to retinal disease, there are several potential applications to corneal pathologic conditions.1,2 Most work to date has focused on keratoconus owing to its high prevalence and primarily image-based diagnosis. Convolutional neural networks have been trained to detect keratoconus from corneal tomographic imaging with high accuracy and also show potential for monitoring progression of disease, but the ability to indicate the risk of postrefractive ectasia or future development of keratoconus remains elusive.3

In this issue of JAMA Ophthalmology, Xie et al4 describe the development of Pentacam InceptionResNetV2 Screening System (PIRSS), a tomographic-based screening tool built on a deep learning architecture to identify candidates for refractive surgery at risk of postoperative ectasia. The potential application of this technology is important in light of the high volume of patients seeking refractive surgery, which is likely to increase substantially with the growing prevalence of myopia, especially in East Asia. The authors collected corneal tomographic imaging data from 1385 patients undergoing refractive surgery evaluation at a single center in China over a 2.5-year period. The PIRSS system was trained using 6465 images from these scans and tested on an external data set of 100 scans. The authors report a sensitivity of 80% for identifying ectasia suspects and 95% for detecting early keratoconus, and an overall diagnostic accuracy of 95% with an area under the receiver operator characteristic curve of 0.99. The system performed comparably to 5 expert clinicians and currently available screening software platforms, such as the Belin-Ambrosio enhanced ectasia display.

This study takes an important step toward semiautomated screening of candidates for refractive surgery. A universal challenge in AI studies involves establishing the ground truth or standard label applied to the images used to train and evaluate the AI system. In this study, the authors describe 3 experts independently assigning each image a label of normal, suspected irregular cornea, early-stage keratoconus, keratoconus, or postrefractive surgery. In cases of discrepant labeling among the graders, the majority vote determined the ground truth, but without any adjudication process, such as committee review. This lack of adjudication is a potential challenge because veracity of the ground truth label in studies evaluating AI applications is the primary source of guidance to a largely unsupervised machine learning algorithm and therefore is vital to accurately train and validate the system. Furthermore, in this study, the aim was to determine whether PIRSS could identify individuals at risk of postrefractive surgery ectasia. As such, an interesting future study design will involve defining a ground truth based on the result of longitudinal follow-up of each patient to determine who eventually developed ectasia after refractive surgery, then training an AI system on the initial tomographic images with appropriate labels corresponding to their eventual course. This process would require extensive follow-up data and would be logistically challenging, but such longitudinal data will be required when training a system to predict an outcome.

In addition, the test set was small, hand-picked, and not a population-based sample. With such limitations, this cohort is unlikely to be representative of a real-world refractive screening population where presumably the prevalence of disease (postoperative ectasia) is very low. In this context, accuracy is not an ideal measure of performance and the number of false-positives is likely to exceed the number of true-positives. Furthermore, a screening test with 80% sensitivity in a population undergoing an elective surgery will likely result in an unacceptably high number of false-negative results. For those reasons, we believe that external validation in a population-based sample, labeled according to longitudinal follow-up, will be necessary before real-world implementation of this technology.

As researchers in AI, we believe that the concept of a “hype cycle” is a useful theoretical framework to examine the adoption of technological innovations and guide our work.5 Within this framework, a new technology experiences a period of inflated expectations shortly after its development, followed by a trough of disillusionment once the system fails to deliver on these overly optimistic promises. Some technologies die in the trough, but others are able to withstand this turbulence and eventually reach a plateau of productivity wherein they experience mainstream adoption, have demonstrated value, and are proven within the marketplace to be a good investment. One could argue that AI has experienced several hype cycles since first being theorized by Alan Turing in the 1930s, but within health care in particular we seem to be approaching a period of inflated expectations as interest and funding in this field experience tremendous expansion. To date, the number of articles describing potential applications of AI far outweighs the number of systems that have achieved an effect in a clinical practice setting, suggesting that we have not yet reached the plateau. We will need to understand how this technology fits into our workflow, evaluate the causes of any erroneous results, understand important biases in the algorithms, and develop business models that not only support getting through regulatory approval processes, but also provide incentive for future innovation and verify the added value of the technology. Beyond a high receiver operator characteristic curve, achieving these ends necessitates a conscientious approach involving clinicians and other stakeholders with expertise in imaging modalities, data science, machine learning methods, diagnostic test validation, and public health implementation.

We certainly accept that the hype that the technology developed by Xie et al will influence many aspects of eye care by providing automated, objective, and efficient image-based diagnosis and facilitating quantification of disease severity to enable longitudinal monitoring of treatment response in the future. Moreover, we believe that there are substantial potential telemedicine applications in underresourced regions with a scarcity of trained human diagnosticians. Eventually, AI might also help refractive surgeons recognize which patients are most likely to have stable, long-term outcomes of refractive surgery. However, until as a field we work together to solve some of these implementation barriers for AI in ophthalmology, the plateau remains a bit out of focus.

Acknowledgments

Conflict of Interest Disclosures: Dr Chiang is a paid consultant for Novartis (Basel, Switzerland), and an equity owner of Inteleretina. The research in applying artificial intelligence to ophthalmology informing the opinions expressed in this commentary was supported by grants R01EY19474, P30EY10572, and K12EY27720 from National Institutes of Health (Bethesda, Maryland), by grant SCH-1622679 from the National Science Foundation (Arlington, Virginia), and by grant funding from Genentech (South San Francisco, California).

REFERENCES

  • 1.Sayres R, Taly A, Rahimy E, et al. Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology. 2019;126(4):552–564. doi: 10.1016/j.ophtha.2018.11.016 [DOI] [PubMed] [Google Scholar]
  • 2.Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–2410. doi: 10.1001/jama.2016.17216 [DOI] [PubMed] [Google Scholar]
  • 3.Lavric A, Valentin P. KeratoDetect: keratoconus detection algorithm using convolutional neural networks. Comput Intell Neurosci. Published online January 23, 20192019;2019:8162567. doi: 10.1155/2019/8162567 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Xie Y, Zhao L, Yang X, et al. Screening candidates for refractive surgery with corneal tomographic-based deep learning. JAMA Ophthalmol. Published online March 26, 2020. doi: 10.1001/jamaophthalmol.2020.0507 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fenn J, Raskino M. Mastering the Hype Cycle: How to Choose the Right Innovation at the Right Time. Harvard Business Press; 2008. [Google Scholar]

RESOURCES