Skip to main content
Canadian Urological Association Journal logoLink to Canadian Urological Association Journal
letter
. 2019 Jul 23;14(2):70–72. doi: 10.5489/cuaj.6158

Routinely collected data for population-based outcomes research

Blayne Welk 1,
PMCID: PMC7012287  PMID: 31348747

Introduction

Routinely collected data (or administrative data) is a source of data for many studies that assess a variety of questions, such as epidemiological trends over time to clinically relevant associations between risk factors and disease. This data comes from databases that record information for a purpose other than medical research, such as for hospital or physician reimbursement.

There are several strengths of routinely collected data studies:

  1. Low study costs

  2. Rapid study completion

  3. Good for estimating incidence/prevalence in a population

  4. Often have large sample sizes and significant statistical power

  5. Better generalizability to the real world

  6. Prolonged retrospective study periods are possible

  7. Longitudinal followup across providers and regions may be possible

  8. Improved feasibility for studying rare populations, exposures, and outcomes

  9. Can study outcomes or exposures that would be unethical in a prospective study

  10. Well-suited for measuring geographical variation

There are also potential limitations that must be considered when conducting or reading a routinely collected data study:

  1. The validity and reliability of the data elements may be poor

  2. Often not all clinically relevant variables are present

  3. Results may not be hypothesis-driven and could represent a spurious association or demonstrate a statistically significant result that is not clinically relevant

  4. Data collection methods or coding practices may change over time, and this may not be evident to the researcher

Epidemiological considerations

Routinely collected data is usually used to either describe something (e.g., incidence of a disease, changes in treatment over time, or resource utilization) or to perform an observational study. Observational studies have potential biases associated with them, of which a few are particularly relevant to those that use routinely collected data:

  1. Selection bias occurs when a study population is not a random sample from the target population that you wish to generalize your results to. For example, most randomized, controlled trials have strict inclusion/exclusion criteria, however, physicians use the interventions studied in those trials on patients who would not have been eligible for randomized trial with the assumption that the results will be similar

  2. Information bias occurs when the variable is not measured accurately. This leads to either misclassification or measurement errors. While prospective studies can explicitly define a method of measurement that maximizes accuracy (for example, taking three blood pressure readings three minutes apart after the patient has rested in the seated position for two minutes), this is usually not the cause with routinely collected data variables. This is because the administrative data elements are not created or recorded for the purposes of research, and often indicator variables are used to represent a clinical condition (for example, in a clinical study, pathology data would be used to determine if a patient had prostate cancer, whereas in an administrative data study, a physician code for the performance of a radical prostatectomy might be used as a marker for prostate cancer). If misclassification or measurement error is random, it biases the results towards a non-significant outcome, as confidence intervals widen due to more “noise” in the data. If it is not random, this can significantly affect the results and lead to completely mistaken conclusions.1

How well do the key variables (such as the codes used to identify the population, primary exposure, and primary outcome) represent what the research is actually interested in?

Consider how common the condition is, how likely it is that the coding element would be recorded, how likely the coding element could be confused for another condition or procedure, what measures the database has to ensure correct codes are entered, and what the motivations are of the people submitting the coding elements. Ideally, these key variables — like the primary outcome for instance — should have known measurement characteristics (such as a positive predictive value) so that you can judge how well that code represents what it is meant to represent. This has traditionally been poorly done, 24 and when it is done, it elevates administrative data studies to a higher level.

Confounding occurs when the relationship between an exposure and outcome is distorted by another variable, which acts as a confounder. Known confounders can be controlled for, however, unknown or unmeasured confounders can only be properly controlled for with randomization, which is not possible with retrospective administrative data studies. Propensity scores and instrumental variables can help address confounders, but does not eliminate the risk of residual confounding.5

Transparent reporting of a routinely collected data study

Most physicians are aware of reporting standards for randomized clinical trials (CONsolidated Standards Of Reporting Trials, CONSORT). This guideline has improved the quality of clinical trial reporting. An analogous reporting guideline is available for routinely collected data studies (RECORD: REporting of studies Conducted using Observational Routinely-collected health Data).6 Similar to this reporting guideline, others have proposed criteria to evaluate the quality of administrative database studies7 (Tables 1, 2).

Table 1.

Methodological principle

 Study design clearly described
  Administrative database comparative study
  Administrative database case-control study
  Administrative database case series
 Why database was created clearly stated
 Description of database’s inclusion/exclusion criteria
 Description of methods for reducing bias in database
 Codes and search algorithms reported
 Rationale for coding algorithm reported
 Code accuracy reported
 Code validity reported
 Clinical significance assessed
 Is the period of data consistent with the outcome data?
 Statement regarding whether data stems from single or multiple hospital admissions
 Statement regarding whether data stems from single or multiple procedures
 Accounting for clustering

Adapted from Hashimoto et al. Administrative database studies: Goldmine or goose chase?

Evid Based Spine Care J 2014;5:74–6.

Table 2.

Examples and brief overview of routinely collected data sources

Description Major data elements
 Surveillance, Epidemiology, and End Results Program (SEER)12 U.S. cancer registry that includes approximately 35% of the U.S. population. Data are representative of the U.S. population and are drawn from 12 state registries, 4 metropolitan multicounty areas, and 3 indigenous registries Patient demographics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and followup for survival
 Medicare/Medicaid13 National records of reimbursement related to subsidized care provided to U.S. citizens >65 years of age (Medicare), or low-income adults, those with a physical disability, and children (Medicaid) Part A covers non-physician inpatient care, Part B covers physician services, and Part D includes optional drug coverage
Demographic and geographic information, diagnosis (ICD code) and procedures (CPT or HCPC codes) and national drug codes are included in each respective part
 National Inpatient Sample (NIS) National representative sample of discharges (20%) of children and adults from all community hospitals (includes those with both Medicare/Medicaid, private insurance, and no insurance) Discharge abstracts include ICD codes for admission and discharge diagnoses, demographics, hospital characteristics, payment source, length of stay, severity and comorbidity measures
 American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) Voluntary hospital-level program that compares risk-adjusted outcomes after surgical procedures. Over 650 hospitals (primarily from the U.S.) are participating in order to compare their post-surgical complications to national averages Demographics, operative procedure (CPT code), selected risk factors (such as diabetes, smoking, medical comorbidities), preoperative laboratory values, length of stay, and specific complications that occur within 30 days of the initial OR (such as unplanned reoperation, stroke, bleeding, urinary tract infection, and wound infection)

Conclusions

Electronic data is a driving force in our society. It has an annual compound growth of 60% and in 2020, it is estimated there will be 35 zettabytes of electronic data.8 In healthcare, information technology plays a key role in all aspects of practice, from medical records to medication prescribing to communication. This wealth of readily available electronic information will likely continue to drive medical research using routinely collected data. An a priori hypothesis and analytical plan, valid data elements, appropriate statistical techniques, a careful assessment of bias, and high-quality reporting will hopefully continue to improve the quality and impact of these studies in urology. Despite the limitations of observational studies, they often produce results similar to randomized, controlled trials.9 Other well-written reviews specific to urologists have been published10,11 and are worth reviewing for those interested in administrative data research.

Footnotes

Competing interests: The author reports no competing personal or financial interests related to this work.

This paper has been peer-reviewed

References

  • 1.Höfler M. The effect of misclassification on the estimation of association: A review. Int J Methods Psychiatr Res. 2005;14:92–101. doi: 10.1002/mpr.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Benchimol EI, Manuel DG, To T, et al. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol. 2010:1–9. doi: 10.1016/j.jclinepi.2010.10.006. [DOI] [PubMed] [Google Scholar]
  • 3.van Walraven C, Bennett C, Forster AJ. Administrative database research infrequently used validated diagnostic or procedural codes. J Clin Epidemiol. 2011;64:1054–9. doi: 10.1016/j.jclinepi.2011.01.001. [DOI] [PubMed] [Google Scholar]
  • 4.Welk B, Kwong J. A review of routinely collected data studies in urology: Methodological considerations, reporting quality, and future directions. Can Urol Assoc J. 2017;11:136–6. doi: 10.5489/cuaj.4101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Normand SLT, Sykora K, Li P, et al. Readers guide to critical appraisal of cohort studies: 3. Analytical strategies to reduce confounding. BMJ. 2005;330:1021. doi: 10.1136/bmj.330.7498.1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely collected health Data (RECORD) statement. PLoS Med. 2015;12:e1001885. doi: 10.1371/journal.pmed.1001885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hashimoto RE, Brodt ED, Skelly AC, et al. Administrative database studies: Goldmine or goose chase? Evid Based Spine Care J. 2014;5:74–6. doi: 10.1055/s-0034-1390027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Data Universe Explosion & the Growth of Big Data | CSC. 2016. [Accessed May 24, 2016]. pp. 1–3. Available at: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode.
  • 9.Anglemyer A, Horvath HT, Bero L. In: Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Bero L, editor. Chichester, UK: John Wiley & Sons, Ltd; 1996. pp. 1–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schlomer BJ, Copp HL. Secondary data analysis of large data sets in urology: Successes and errors to avoid. J Urol. 2014;191:587–96. doi: 10.1016/j.juro.2013.09.091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cole AP, Friedlander DF, Trinh Q-D. Secondary data sources for health services research in urologic oncology. Urol Oncol. 2018;36:165–73. doi: 10.1016/j.urolonc.2017.08.008. [DOI] [PubMed] [Google Scholar]
  • 12.Engels EA, Pfeiffer RM, Ricker W, et al. Use of Surveillance, Epidemiology, and End Results-Medicare data to conduct case-control studies of cancer among the US elderly. Am J Epidemiol. 2011;174:860–70. doi: 10.1093/aje/kwr146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mues KE, Liede A, Liu J, et al. Use of the Medicare database in epidemiologic and health services research: A valuable source of real-world evidence on the older and disabled populations in the US. Clin Epidemiol. 2017;9:267–77. doi: 10.2147/CLEP.S105613. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Canadian Urological Association Journal are provided here courtesy of Canadian Urological Association

RESOURCES