Introduction
There is increasing interest in the use of administrative data in health services and clinical research. Administrative data are routinely collected during clinic, hospital, laboratory, or pharmacy visits for administrative purposes.1 Administrative databases provide easy and cheap access to large numbers of patients over expansive geographic regions. Although these databases were initially designed to reimburse health care services and to track differences in services and the use for state and national agencies, they are increasingly being used for epidemiological, effectiveness, and safety outcomes research. However, there are several limitations that must be considered, and critical appraisal of studies that utilize administrative databases is important.
Publically Available Databases
Several administrative databases are available. In the United States, the largest publicly available all-payer inpatient care database is the Nationwide Inpatient Sample (NIS). The NIS was developed for the Healthcare Cost and Utilization Project (HCUP) through funding from the Agency for Healthcare Research and Quality (AHRQ). The NIS contains hospital stay data starting in 1998 that includes diagnoses, admissions and discharge, demographics, and outcomes data from a sample of approximately 20% of patients admitted to all community hospitals in the United States (5–8 million patients per year).2
The HCUP also provides several other health care databases besides the NIS, including the Kids' Inpatient Database (KID), the Nationwide Emergency Department Sample (NEDS), as well as a variety of state databases. HCUP has software tools available that allow users to access information from the databases. HCUP was created to provide a robust source of health care data that could be used to further research, improve health care, and inform decision making.
Another large database in the United States is the Centers for Medicare and Medicaid Services (CMS) administrative data files, which contains information on approximately 98% of adults 65 years of age and older enrolled in Medicare (more than 45 million people). Data from these administrative databases are useful in health care research in that they provide clinical validity, information on population coverage, and linkage to other data sets.3
Increasingly, commercially available services such as PearlDiver4 and IMS Health Incorporated5 are being used by universities, medical device manufacturers, and government agencies. These companies utilize large health claims databases comprised of records from private insurers, government databases, pharmacy prescriptions, and manufacturers to provide clinical effectiveness and health care management services.
Strengths of Administrative Data
Administrative data sets provide a readily available source of “real-world” health care data on a large population of unselected patients.6 Because of the sheer numbers of patients included in databases such as the NIS, the data are considered to be representative of the populations of interest.7 Administrative databases can serve as useful and inexpensive resources for reliably reported data associated with accepted coding systems, including procedure volumes, length of stay, as well as reliably reported outcomes such as death.1 7 Furthermore, administrative data can be used to evaluate health care utilization as well as outcomes that differ by patient demographics or geographical locale.6
Limitations of Administrative Data
One limitation inherent in administrative data is the reason for their creation. Because they are typically intended for financial and administrative management rather than for research purposes, they may vary in the degree of detail and accuracy.7 8 9 For example, they may prove to be less reliable information sources for events that may not result in a medical visit or use of a diagnostic code, such as nausea. Furthermore, the coding of administrative data may be nuanced in terms of how ICD-9 codes (International Classifications of Diseases, Ninth Revision) are applied or how physician records are interpreted by the medical reviewer entering the codes.6 7 One recent report showed data suggestive of underreporting of perioperative stroke occurring with carotid endarterectomy and stenting in the NIS data set,8 whereas another reported the complexity in evaluating national rates of mortality from pneumonia due to changing coding practices.10
Critical Appraisal of Administrative Data Studies
Guidelines to govern high-quality administrative database studies are presently under development by the Reporting of Studies Conducted using Observational Routinely Collected Data collaborative.11 12 However, criteria that constitute high-quality administrative database studies have recently been proposed.12 13 Here, we have summarized such proposed criteria for critical appraisal of administrative studies (Table 1). These criteria can act as a checklist of things to consider if you are planning a study using administrative data. As described in previous “Science in Spine” articles, using a focused, answerable research question and the PICOTS/PPOTS framework are important to planning your study.
Table 1. Spectrum Research checklist for evaluating the quality of administrative database studies.
Methodological principle |
---|
•Study design |
Administrative database comparative study |
Administrative database case–control study |
Administrative database case series |
•Why database was created clearly stated |
•Description of database's inclusion/exclusion criteria |
•Description of methods for reducing bias in database |
•Codes and search algorithms reported |
•Rationale for coding algorithm reported |
•Code accuracy reported |
•Code validity reported |
•Clinical significance assessed |
•Is the period of data consistent with the outcome data? |
•Statement regarding whether data stems from single or multiple hospital admissions |
•Statement regarding whether data stems from single or multiple procedures |
•Accounting for clustering |
•Number of criteria met (maximum: 12) |
Robust Descriptions of the Data Set
Clear descriptions should be provided regarding how and why the database was created.12 13 To that end, the database's inclusion and exclusion criteria should be clearly stated. The reader then can use these descriptions to assess the potential for biased or missing information as it relates to the study at hand.13
Code Accuracy
Because administrative data are coded, administrative database studies should clearly state the diagnostic and/or procedural codes used in the search algorithm as well as the reason for selecting the codes. In addition, the accuracy of the codes to identify a particular diagnosis or outcome should be reported to provide an estimate of the percentage of misclassified data. This information provides insight as to how well the code(s) represent the actual diagnosis, procedure, or outcome and allows the reader to gauge the level of resulting bias. Code accuracy can be measured using several different types of code validation studies,13 the most reliable of which are “gold standard” validation studies. These studies compare the code to a gold standard known to provide accurate information, such as laboratory test results required for diagnosis. Ideally, code validity statistics will be reported in terms of the probability that a patient identified with a code actually has the condition of interest, although other methods such as positive predictive value, sensitivity and specificity, and positive likelihood ratio may also be used.13
Clinical Significance
Because in large database studies very small differences between groups can result in statistically significant differences, results should not be interpreted based solely on p value because these differences may not be clinically relevant. Instead, results should be interpreted based on clinical relevance and on the absolute and relative differences between treatment groups.13
Time-Dependent Bias
Time-dependent patient variables are those which can change during the period of observation. If the values of such variables are unknown at baseline but are assessed as if they were known, time-dependent bias of the results may occur.13 Other factors that should be considered include whether the data set specifies the following: the same time period consistent with the length of follow-up for the outcome data; whether it includes data from the initial hospital admission alone or in addition to data from repeat admissions; and whether it includes data from the first procedure only or in addition to data from repeat procedures.
Clustering
Because data obtained from administrative data sets are subject to clustering, a study should properly account for clustering that may be present in the data set. One example of clustering is a specific diagnosis (e.g., acute myocardial infarction) treated by emergency room physicians only within academic hospitals. Multivariate regression models can be used to control for clustering and avoid the potential for misleading conclusions.13
Summary
Administrative data provide researchers with relatively inexpensive access to large numbers of patients nationwide and are increasingly being used for epidemiological, effectiveness, and safety outcomes studies. Publically available databases from sources such as the NIS and CMS provide information on large proportions of medical visits in the United States, and provide a good source of “real-world” health care data for reliably reported data. However, because administrative data are primarily gathered for billing purposes rather than research purposes, there are several limitations that must be considered, including the potential for inaccuracy and bias. As for all study types, critical appraisal of administrative database studies are critical to avoid arriving at inaccurate conclusions.
References
- 1.Agency for Healthcare Research Quality (AHRQ) Methods Guide for Effectiveness and Comparative Effectiveness Reviews. AHRQ Publication No. 10(14)-EHC063-EF Rockville, MD; 2014. Available at: www.effectivehealthcare.ahrq.gov. Accessed September 4, 2014
- 2.Agency for Healthcare Research Quality (AHRQ) Agency for Healthcare Research Quality (AHRQ) HCUP Databases, Healthcare Cost and Utilization Project (HCUP): Overview of the Nationwide Inpatient Sample (NIS), 2013. Available at: http://www.hcup-us.ahrq.gov/nisoverview.jsp
- 3.Virnig B Maderia A D Strengths and Limitations of CMS Administrative Data in Research 2012. Available at: http://www.resdac.org/resconnect/articles/156#broad-limitations-of-cms-administrative-data. Accessed May 7, 2013
- 4.PearlDiver, Inc. Available at: http://www.pearldiverinc.com/. Accessed September 4, 2014
- 5.IMS Health. Available at: http://www.imshealth.com. Accessed September 4, 2014
- 6.Sarrazin M S, Rosenthal G E. Finding pure and simple truths with administrative data. JAMA. 2012;307(13):1433–1435. doi: 10.1001/jama.2012.404. [DOI] [PubMed] [Google Scholar]
- 7.Romano P S Hussey P Ritley D Selecting quality and resource use measures: a decision guide for community quality collaboratives. Question 2: What are the strengths and weaknesses of using administrative data, medical record data, and hybrid data? Rockville, MD: Agency for Healthcare Research and Quality; May 2010. AHRQ Publication No. 09(10)-0073, 2010. Available at: http://www.ahrq.gov/legacy/qual/perfmeasguide/perfmeaspt1a.htm. Accessed May 7, 2013
- 8.Hertzer N R. Reasons why data from the Nationwide Inpatient Sample can be misleading for carotid endarterectomy and carotid stenting. Semin Vasc Surg. 2012;25(1):13–17. doi: 10.1053/j.semvascsurg.2012.02.003. [DOI] [PubMed] [Google Scholar]
- 9.McPhee J T Schanzer A Messina L M Eslami M H Carotid artery stenting has increased rates of postprocedure stroke, death, and resource utilization than does carotid endarterectomy in the United States, 2005 J Vasc Surg 20084861442–1450., e1 [DOI] [PubMed] [Google Scholar]
- 10.Lindenauer P K, Lagu T, Shieh M S, Pekow P S, Rothberg M B. Association of diagnostic coding with trends in hospitalizations and mortality of patients with pneumonia, 2003-2009. JAMA. 2012;307(13):1405–1413. doi: 10.1001/jama.2012.384. [DOI] [PubMed] [Google Scholar]
- 11.RECORD 2014. Available at: http://record-statement.org/. Accessed September 4, 2014
- 12.Langan S M, Benchimol E I, Guttmann A. et al. Setting the RECORD straight: developing a guideline for the reporting of studies conducted using observational routinely collected data. Clin Epidemiol. 2013;5:29–31. doi: 10.2147/CLEP.S36885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.van Walraven C, Austin P. Administrative database research has unique characteristics that can risk biased results. J Clin Epidemiol. 2012;65(2):126–131. doi: 10.1016/j.jclinepi.2011.08.002. [DOI] [PubMed] [Google Scholar]