Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jul 29.
Published in final edited form as: J Urol. 2013 Apr 20;190(1):17–18. doi: 10.1016/j.juro.2013.04.048

Utility and Pitfalls in the Use of Administrative Databases for Outcomes Assessment

Emilie K Johnson 1, Caleb P Nelson 1
PMCID: PMC4114235  NIHMSID: NIHMS602014  PMID: 23608038

Introduction and context

In this issue of The Journal Lee et al (page 000) examine rates of reoperation and predictors of complications in patients with hypospadias.1 This article represents an interesting example of the use of administrative data to address a clinical question. We discuss the uses and limitations of administrative databases.

Administrative data (also known as claims data or secondary data) are data collected for nonresearch purposes (often for billing) that can be analyzed retrospectively for research. Examples include Medicare claims, the Nationwide Inpatient Sample, proprietary insurance claims databases and the Pediatric Health Information System Database. When used for research, such data sets typically contain limited, de-identified information regarding hospitals, clinicians and patients (including demographics and clinical information via ICD and/or CPT diagnosis and procedure codes) as these pertain to specific clinical encounters.

Value of Administrative Data

Administrative databases can facilitate research into clinical or health services questions that would be impractical or impossible to study with conventional techniques. Many clinical research studies are limited by small sample size, restricted generalizability and missing data. Studies which adequately address these problems, with large, prospective, systematic observational or experimental designs, are expensive and time-consuming, and available resources can only fund a small fraction of such investigations. In many cases administrative databases can provide large, demographically diverse, multicenter cohorts at a fraction of the time and cost. In addition, for investigations of health care quality, disparities and economics, administrative data may offer the only feasible way to study the broadly representative samples necessary for these topics. Administrative data may also be useful for studying rare conditions, allowing investigators to aggregate clinical information from many sites. Another use for such data is for hypothesis generation, and observed trends or associations may spawn further clinical research.

Pitfalls of Administrative Data

Researchers and clinicians have often expressed skepticism regarding the validity of administrative databases for clinical research. The most common criticism pertains to the accuracy of billing codes used to classify diagnoses and procedures.2 In response, a virtual cottage industry has arisen to validate billing codes by correlating them with medical records. Such studies have mixed findings. Some show reasonable reliability,3, 4 while others have found poor correlation.5, 6 Coding accuracy varies by data source, condition or procedure as well as by disease definitions.7 In general, procedural codes, particularly combined with diagnosis codes, are more accurate and precise than diagnosis codes alone.8, 9 Coding accuracy may also suffer due to clerical error, the limited precision of codes to describe conditions or procedures in detail, and omission of comorbidity codes by billing or coding staff due to perceived irrelevance to a particular encounter.

Furthermore, claims databases can be unwieldy to work with as data are often stored raw and not in analysis ready form. Significant resources and expertise may be required for data cleaning and management. Most administrative data sources also lack followup and outcome data, available information is of limited granularity and de-identification precludes the collection of additional variables, thus increasing the risk of unmeasured confounding.

Addressing Limitations of Administrative Data

As clinical and outcomes research has evolved, an increasing breadth of strategies have become available to mitigate claims data limitations. Whenever possible, incorporating diagnostic and procedural codes into class definitions is advisable. When the level of clinical detail is insufficient, linkage of administrative data with clinical data sources such as survey data or clinical data registries can be useful. Advanced analytic techniques including propensity score and instrumental variable analyses are also increasingly being used to balance confounding factors among patient groups that are inherently different at baseline. Finally, sophisticated data warehouse tools such as Information for Integrating Biology and the Bedside (https://www.i2b2.org) will allow researchers to harness the advantages of administrative and clinical data elements.

Discussion

Lee et al illustrate the promise and limitations of administrative database research. In their study the main end point was the incidence of secondary surgery, defined as any of several CPT codes occurring during the 2-year study period after primary hypospadias repair. However, since many of the same CPT codes could apply to a primary repair or a redo repair, this distinction was based largely on chronology. It is possible that additional (unrecorded) repairs occurred outside of the study period or at another institution. This article also demonstrates perhaps the single biggest weakness of using administrative data for outcomes assessment, which is that such databases rarely include discrete patient outcomes. By choosing secondary surgery as an outcome, the authors are inferring that this end point is an indication of a poor result from the primary repair. While this may be a logical assumption, secondary surgery is only an intermediate end point, serving as a marker for the true clinical outcome (eg voiding function, cosmesis, sexual function etc). It is even possible, albeit unlikely, that the clinical outcome was actually better in those who underwent a secondary surgery. Unfortunately, direct measures of clinical outcomes (objectively measured and/or patient reported) are rarely available using administrative data sources.10 However, in the future, administrative databases may be substantially improved by the incorporation of direct outcome measures such as patient reported symptom scores or health related quality of life survey scores.

Conclusions

In the meantime, the pitfalls of administrative data should not dissuade us from leveraging these remarkable resources when appropriate. All methodologies in clinical research have limitations, from the single surgeon retrospective case series to the multicenter randomized controlled trial. Administrative data sets represent an extremely powerful tool in the armamentarium of clinical researchers, albeit one that must be used with caution and a thorough understanding of its limitations. Used properly, such data can have a key role as we work toward the common goal of accurately assessing and improving patient outcomes.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Lee OT, Durbin-Johnson B, Kurzrock EA. Predictors of secondary surgery after hypospadias repair: a population based analysis of 5,000 patients. J Urol. 2013;190 doi: 10.1016/j.juro.2013.01.091. xxx. [DOI] [PubMed] [Google Scholar]
  • 2.van Walraven C, Bennett C, Forster AJ. Administrative database research infrequently used validated diagnostic or procedural codes. J Clin Epidemiol. 2011;64:1054. doi: 10.1016/j.jclinepi.2011.01.001. [DOI] [PubMed] [Google Scholar]
  • 3.Semins MJ, Trock BJ, Matlaga BR. Validity of administrative coding in identifying patients with upper urinary tract calculi. J Urol. 2010;184:190. doi: 10.1016/j.juro.2010.03.011. [DOI] [PubMed] [Google Scholar]
  • 4.Tamariz L, Harkins T, Nair V. A systematic review of validated methods for identifying ventricular arrhythmias using administrative and claims data. Pharmacoepidemiol Drug Saf. 2012;21:148. doi: 10.1002/pds.2340. [DOI] [PubMed] [Google Scholar]
  • 5.Woodworth GF, Baird CJ, Garces-Ambrossi G, et al. Inaccuracy of the administrative database: comparative analysis of two databases for the diagnosis and treatment of intracranial aneurysms. Neurosurgery. 2009;65:251. doi: 10.1227/01.NEU.0000347003.35690.7A. [DOI] [PubMed] [Google Scholar]
  • 6.Khwaja HA, Syed H, Cranston DW. Coding errors: a comparative analysis of hospital and prospectively collected departmental data. BJU Int. 2002;89:178. doi: 10.1046/j.1464-4096.2001.01428.x. [DOI] [PubMed] [Google Scholar]
  • 7.Southern DA, Roberts B, Edwards A, et al. Validity of administrative data claim-based methods for identifying individuals with diabetes at a population level. Can J Public Health. 2010;101:61. doi: 10.1007/BF03405564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang MC, Laud PW, Macias M, et al. Strengths and limitations of International Classification of Disease Ninth Revision Clinical Modification codes in defining cervical spine surgery. Spine (Phila Pa 1976) 2011;36:E38. doi: 10.1097/BRS.0b013e3181d273f6. [DOI] [PubMed] [Google Scholar]
  • 9.Tanpowpong P, Broder-Fingert S, Obuch JC, et al. Multicenter study on the value of ICD-9-CM codes for case identification of celiac disease. Ann Epidemiol. 2013;23:136. doi: 10.1016/j.annepidem.2012.12.009. [DOI] [PubMed] [Google Scholar]
  • 10.Tollefson MK, Gettman MT, Karnes RJ, et al. Administrative data sets are inaccurate for assessing functional outcomes after radical prostatectomy. J Urol. 2011;185:1686. doi: 10.1016/j.juro.2010.12.039. [DOI] [PubMed] [Google Scholar]

RESOURCES