. Author manuscript; available in PMC: 2008 Jul 28.

Published in final edited form as: Med Care. 2007 Oct;45(10 SUPL):S58–S65. doi: 10.1097/MLR.0b013e31805371bf

TABLE 1.

Strengths and Comparative Advantages in Working With Medicaid Claims Data

Dataset Characteristic	Comparative Advantages and Types of Studies Supported
Very large numbers of covered lives, with relatively comprehensive benefit and information on full continuum of care in most settings.	Strong statistical power. Supports detailed analyses of subgroups, rare conditions and comorbidities, including individuals with complex combinations of diagnoses. Supports study of serious but low-prevalence events, such as severe adverse medication outcomes that clinical trials are not powered to detect. Dataset development not constrained by per-subject costs of primary data collection; large, comprehensive analytic datasets on clinically diverse populations can be constructed cost-effectively with potential to support analyses on a range of research questions
Strong representation of vulnerable populations including racial/ethnic minorities. Race and ethnicity are recorded, in contrast to many commercial databases.	Essential source of knowledge on health care for people with disabilities, minority group members, hard-to-interview subgroups such as mentally ill and substance abusers. Vital resource for research on disparities. As payer for almost 1 in 5 Americans and a higher proportion of vulnerable sub-populations, Medicaid is intrinsically important; quality of care and outcomes for Medicaid beneficiaries are of critical importance for the health of the population.
Unobtrusive data collection on entire covered population; diagnostic and treatment information from providers rather than consumers.	Avoids biases related to self-report and differential study participation. Supports studies that include beneficiaries with limited ability to self-report such as those with cognitively impairment. Supports characterization of usual care for the full covered population and across the full range of providers and care settings. Supports analysis of off-label medication use and outcomes, and of medication outcomes for types of patients excluded from clinical trials.
Detailed longitudinal histories with dates of healthcare encounters, treatments and diagnoses; multiple years of data can be merged for long-term follow-up; datasets can be updated cost-effectively as newer years of data become available.	Datasets support detailed longitudinal analysis of medication initiation and persistence over time. Long-term follow-up is possible for beneficiaries who are consistently enrolled. Event history analyses of temporal relationships among health care events are supported, such as incidence and timing of hospitalizations and emergency room visits following treatment initiation. Information on dates of healthcare events can be used to construct episodes of care of consistent duration. Vital source of information on secular trends in usual care.
Includes information on care of patients for all participating providers; provides geographic detail.	Individual-level data can be aggregated to create provider-level and area-level estimates of treatment patterns; this information can be used to support multilevel analyses of treatment and outcome patterns. Supports study designs that incorporate linkages to other sources of clinical, contextual and outcome data, such as vital records and claims for other payers.
Provides expenditure information from payer’s perspective.	Supports economic analysis of Medicaid costs of care.