Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jun 1.
Published in final edited form as: Ophthalmic Epidemiol. 2018 Dec 6;26(3):147–149. doi: 10.1080/09286586.2018.1554160

Techniques for improving ophthalmic studies performed on administrative databases

Durga S Borkar 1, Lucia Sobrin 2, Rebecca A Hubbard 3, John H Kempen 2,4, Brian L VanderBeek 5,6,7
PMCID: PMC6529239  NIHMSID: NIHMS1515373  PMID: 30521404

Over the last several years, there has been a surge in the amount of data at our fingertips with the increased popularity of large electronic health datasets. In a pessimistic response to this, many have touted the adage “garbage in, garbage out,” when referring to the pitfalls of using “big data,” citing inaccuracies in the coding and billing process leading to erroneous conclusions. However, as our experience with using these administrative datasets created from EHR data and insurance claims has grown, successful efforts have been made to critically evaluate and validate the data available.

One way of assessing insurance claims data is to perform studies that validate the use of billing codes for the identification of ophthalmic diagnoses. In general, International Classification of Diseases (ICD) codes for most common diagnoses have been shown to be highly accurate.16 In one study evaluating the accuracy of ICD-9 codes for cataract, primary open angle glaucoma, neovascular age-related macular degeneration, and proliferative diabetic retinopathy, the medical record supported the diagnosis code in 97% of cases.1 Another recent study of Current Procedural Terminology (CPT) and drug codes (J codes) used in the treatment of diabetic retinopathy also found high positive and negative predictive values.3 Thus, with the proper safeguards big data has become a powerful tool to study these ocular conditions and identify both practice patterns and clinical outcomes on a much larger scale than previously was possible.

As the scope of big data has expanded, we also have been able to broaden the spectrum of diseases studied. Multiple recent studies have aimed to validate the use of ICD coding for identifying cases of uveitis.79 While some studies suggest that coding for uveitis has variable reliability based on the specific EHR used and the type of provider8,9, others have shown that select codes may be useful for research purposes. For example, a study of 893 uveitis cases evaluated by individual chart review found quite variable positive predictive values of specific ICD9 codes, ranging from 0 to 100 percent. Despite this, 11 of these codes had a positive predictive value exceeding 80%, the typical cut-off used for suitability in scientific research.7

Although the results of validation studies for rare diseases, such as uveitis, have been mixed, more sophisticated epidemiological methodology may overcome many of these shortcomings. For example, uveitis is often a chronic or recurrent process; thus, using a second confirmatory diagnosis within a prespecified time interval after the first diagnosis can help improve accurate identification within claims data. This strategy already has been demonstrated to be effective in numerous other chronic diseases including hypertension and diabetes mellitus.10,11

Similarly, in designing most studies, it is important to be able to differentiate a pre-existing condition from a new diagnosis. Using a look back period or disease-free interval prior to the incident date a code first appeared in the medical record can help improve the accuracy of estimates of incident cases. For example, one study showed that using a look back period of two years allowed for considerably more accurate estimates of disease incidence as opposed to a one year look back period for cataract, primary open angle glaucoma, age-related macular degeneration, and nonproliferative diabetic retinopathy.12

In addition to diagnosis codes, using supplemental billing codes for both therapeutic and diagnostic procedures can enhance the likelihood of a correct diagnosis in large electronic health datasets. For example, a patient newly diagnosed with proliferative diabetic retinopathy who also received panretinal photocoagulation on the same day reasonably can be concluded to truly have proliferative diabetic retinopathy. The procedural codes for diabetic retinopathy have been shown to have both high positive and negative predictive values.3 Outside of ophthalmology, the diagnostic accuracy of both rheumatoid arthritis and diabetes have been shown to be greatly improved when ICD coding is combined with medication use.13 Although yet to be proven in ophthalmology, it is reasonable to believe applying similar concepts to patients with eye disease should apply. Again, taking uveitis as an example, supplementing a diagnosis code with medication fills of oral corticosteroids or immunosuppressants, or with procedure codes for corticosteroid injections, can be expected to improve the accuracy of uveitis case definitions in administrative datasets.

While it is unlikely that any approach to identifying diagnoses in administrative data will achieve perfect accuracy, a variety of statistical methods are available to account for any residual error in outcome ascertainment. For example, if the sensitivity and specificity of an algorithm are known, they can be incorporated into subsequent statistical analyses to account for the probability that some cases will actually be misclassified controls and vice versa. Methods to account for imperfect sensitivity and specificity have been developed for both binary14 and survival outcomes15 and have been demonstrated to produce valid estimates of the association between exposures and outcomes. An alternative approach is to use quantitative bias analysis methods to construct an interval within which the true association is expected to lie under a variety of assumptions about the magnitude of the misclassification.16

The steps outlined above can increase validity of analyses using diagnoses or outcomes derived from administrative data, but they are only part of the epidemiologic armamentarium used to reduce bias within administrative database studies. Major progress has been made using analytic methods to better address unmeasured confounding. One example of this is the use of propensity scores.17 While propensity scores can be used in numerous ways, their primary goal is to create comparison groups with balanced covariates, and this methodology already has been frequently used in ophthalmic studies.1820

One of the best examples of the benefits of propensity scores can be seen in the recent publication by Rim et al. who examined the effects of low dose aspirin on neovascular age-related macular degeneration (nAMD).19 They found that while using typical methods to control for covariates, aspirin was found to be associated with an increased risk for nAMD. However, once a propensity score was used to create comparison groups with balanced covariates, no difference was seen.19 Intuitively, this makes sense since patients taking low dose aspirin typically have cardiovascular disease and related comorbidities, which also are associated with AMD. Once these factors were properly accounted for via the propensity score, no difference was seen.

When performing clinical research of any type, we always must consider the advantages and disadvantages of any database or analytic technique used. As such, no single methodology should ever be considered above reproach. The usual considerations regarding the positive and negative aspects of any research methodology hold true for observational studies, including those conducted in administrative datasets. For instance, it is not surprising when a study finds that a disease without a specific ICD9 or ICD10 code cannot be accurately ascertained in a database based on ICD9/10 coding.8 However, the strategies reviewed here provide examples of strategies to enhance the validity of observational research using administrative databases. In appropriate applications, such methods should overcome the major difficulties of using administrative databases for epidemiological research. Administrative databases offer a powerful tool for answering previously unanswerable questions, but their limitations must be acknowledged, just as with other study designs.

Extensive work has been done to improve the quality of research performed on administrative databases across all of medicine offering better methods to validate exposures, diagnoses and outcomes studied. Over the past 15–20 years, several new analytic techniques have been developed specifically to better utilize “big data;” however, these methods are just starting to be implemented in ophthalmic studies. As the epidemiologic community continues to develop these tools in other fields of medicine, ophthalmology has an opportunity to take advantage of these advanced analytic techniques to address some of the important clinical questions in vision research.

Acknowledgments

Financial Support: National Institutes of Health K23 Award (1K23EY025729 – 01) and University of Pennsylvania Core Grant for Vision Research (2P30EYEY001583). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Additional funding was provided by Research to Prevent Blindness and the Paul and Evanina Mackall Foundation. Funding from each of the above sources was received in the form of block research grants to the Scheie Eye Institute. Additional funding was provided by the Massachusetts Eye and Ear Global Surgery Program, and Sight for Souls (Philadelphia, PA). None of the organizations had any role in the design or conduction of the study

Footnotes

Conflicts of Interest: No conflicting relationship exists for any author.

References

  • 1.Muir KW, Gupta C, Gill P, Stein JD. Accuracy of international classification of diseases, ninth revision, clinical modification billing codes for common ophthalmic conditions. JAMA Ophthalmol 2013;131(1):119–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bearelly S, Mruthyunjaya P, Tzeng JP, et al. Identification of patients with diabetic macular edema from claims data: a validation study. Arch Ophthalmol 2008;126(7):986–989. [DOI] [PubMed] [Google Scholar]
  • 3.Lau M, Prenner JL, Brucker AJ, VanderBeek BL. Accuracy of Billing Codes Used in the Therapeutic Care of Diabetic Retinopathy. JAMA Ophthalmol 2017;135(7):791–794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Quigley HA, Friedman DS, Hahn SR. Evaluation of practice patterns for the care of open-angle glaucoma compared with claims data: the Glaucoma Adherence and Persistency Study. Ophthalmology 2007;114(9):1599–1606. [DOI] [PubMed] [Google Scholar]
  • 5.Javitt JC, McBean AM, Sastry SS, DiPaolo F. Accuracy of coding in Medicare part B claims. Cataract as a case study. Arch Ophthalmol 1993;111(5):605–607. [DOI] [PubMed] [Google Scholar]
  • 6.Coleman AL, Morgenstern H. Use of insurance claims databases to evaluate the outcomes of ophthalmic surgery. Surv Ophthalmol 1997;42(3):271–278. [DOI] [PubMed] [Google Scholar]
  • 7.Pimentel MA, Browne EN, Janardhana PM, et al. Assessment of the Accuracy of Using ICD-9 Codes to Identify Uveitis, Herpes Zoster Ophthalmicus, Scleritis, and Episcleritis. JAMA Ophthalmol 2016;134(9):1001–1006. [DOI] [PubMed] [Google Scholar]
  • 8.Palestine AG, Merrill PT, Saleem SM, Jabs DA, Thorne JE. Assessing the Precision of ICD-10 Codes for Uveitis in 2 Electronic Health Record Systems. JAMA Ophthalmol 2018. [DOI] [PMC free article] [PubMed]
  • 9.Uchiyama E, Faez S, Nasir H, et al. Accuracy of the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) as a research tool for identification of patients with uveitis and scleritis. Ophthalmic Epidemiol 2015;22(2):139–141. [DOI] [PubMed] [Google Scholar]
  • 10.Tu K, Campbell NR, Chen ZL, Cauch-Dudek KJ, McAlister FA. Accuracy of administrative databases in identifying patients with hypertension. Open Med 2007;1(1):e18–26. [PMC free article] [PubMed] [Google Scholar]
  • 11.Khokhar B, Jette N, Metcalfe A, et al. Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations. BMJ Open 2016;6(8):e009952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Stein JD, Blachley TS, Musch DC. Identification of persons with incident ocular diseases using health care claims databases. Am J Ophthalmol 2013;156(6):1169–1175 e1163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lipscombe LL, Hwee J, Webster L, Shah BR, Booth GL, Tu K. Identifying diabetes cases from administrative data: a population-based validation study. BMC Health Serv Res 2018;18(1):316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Magder LS, Hughes JP. Logistic regression when the outcome is measured with uncertainty. Am J Epidemiol 1997;146(2):195–203. [DOI] [PubMed] [Google Scholar]
  • 15.Meier AS, Richardson BA, Hughes JP. Discrete proportional hazards models for mismeasured outcomes. Biometrics 2003;59(4):947–954. [DOI] [PubMed] [Google Scholar]
  • 16.Fox MP, Lash TL, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol 2005;34(6):1370–1376. [DOI] [PubMed] [Google Scholar]
  • 17.Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res 2011;46(3):399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kim DH, Addis VM, Pan W, VanderBeek BL. Comparative Effectiveness of Generic Latanoprost Versus Branded Prostaglandin Analogs for Primary Open Angle Glaucoma. Ophthalmic Epidemiol 2018:1–9. [DOI] [PMC free article] [PubMed]
  • 19.Rim TH, Yoo TK, Kwak J, et al. Long-term Regular Use of Low-dose Aspirin and Neovascular Age-related Macular Degeneration: National Sample Cohort 2010–2015. Ophthalmology 2018. [DOI] [PubMed]
  • 20.Kolomeyer AM, Maguire MG, Pan W, VanderBeek BL. Systemic Beta-Blockers and Risk of Progression to Neovascular Age-Related Macular Degeneration. Retina 2018. [DOI] [PMC free article] [PubMed]

RESOURCES