Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Jul 28.
Published in final edited form as: Med Care. 2007 Oct;45(10 SUPL):S58–S65. doi: 10.1097/MLR.0b013e31805371bf

Studying Prescription Drug Use and Outcomes With Medicaid Claims Data Strengths, Limitations, and Strategies

Stephen Crystal *,, Ayse Akincigil *,, Scott Bilder *, James T Walkup *,
PMCID: PMC2486436  NIHMSID: NIHMS49963  PMID: 17909385

Abstract

Medicaid claims and eligibility data, particularly when linked to other sources of patient-level and contextual information, represent a powerful and under-used resource for health services research on the use and outcomes of prescription drugs. However, their effective use poses many methodological and inferential challenges. This article reviews strengths, limitations, challenges, and recommended strategies in using Medicaid data for research on the initiation, continuation, and outcomes of prescription drug therapies. Drawing from published research using Medicaid data by the investigators and other groups, we review several key validity and methodological issues. We discuss strategies for claims-based identification of diagnostic subgroups and procedures, measuring and modeling initiation and persistence of regimens, analysis of treatment disparities, and examination of comorbidity patterns. Based on this review, we discuss “best practices” for appropriate data use and validity checking, approaches to statistical modeling of longitudinal patterns in the presence of typical challenges, and strategies for strengthening the power and potential of Medicaid datasets. Finally, we discuss policy implications, including the potential for the research use of Medicare Part D data and the need for further initiatives to systematically develop and optimally use research datasets that link Medicaid and other sources of clinical and outcome information.

Keywords: Medicaid, prescription drugs, methodology, claims, outcomes, adherence


Medicaid data have great potential for examining patterns of medication use and outcomes, but pose many methodological difficulties. We discuss their strengths, challenges, and strategies to address these challenges, with illustrations from work by our group and others.

STRENGTHS AND COMPARATIVE ADVANTAGES

Medicaid files provide a unique and powerful perspective on key components of health care in the United States. Their strengths and comparative advantages are summarized in Table 1. A key strength is the very large number of individuals represented and corresponding statistical power for fine-grained analyses of important subgroups, rare conditions, complex patterns of comorbidity, and adverse events. The data are indispensable for studying health care for vulnerable subgroups such as minority, low-income and/or disabled individuals. They offer a detailed, longitudinal record of utilization, diagnoses, procedures, and prescriptions across the full range of health care settings. Whereas participants in randomized clinical trials (RCTs) of drugs sometimes fail to represent the full medical and social diversity of the populations who ultimately use them, Medicaid data include a broader population, not limited by clinical-trial exclusions and capturing off-label use. And although commercial-insurance databases typically include mainly individuals well enough to work and their dependents, Medicaid beneficiaries reflect a broad range of health status, including many with disabilities and severe chronic conditions.13

TABLE 1.

Strengths and Comparative Advantages in Working With Medicaid Claims Data

Dataset Characteristic Comparative Advantages and Types of Studies Supported
Very large numbers of covered lives, with relatively comprehensive benefit and information on full continuum of care in most settings. Strong statistical power.
Supports detailed analyses of subgroups, rare conditions and comorbidities, including individuals with complex combinations of diagnoses.
Supports study of serious but low-prevalence events, such as severe adverse medication outcomes that clinical trials are not powered to detect.
Dataset development not constrained by per-subject costs of primary data collection; large, comprehensive analytic datasets on clinically diverse populations can be constructed cost-effectively with potential to support analyses on a range of research questions
Strong representation of vulnerable populations including racial/ethnic minorities. Race and ethnicity are recorded, in contrast to many commercial databases. Essential source of knowledge on health care for people with disabilities, minority group members, hard-to-interview subgroups such as mentally ill and substance abusers.
Vital resource for research on disparities.
As payer for almost 1 in 5 Americans and a higher proportion of vulnerable sub-populations, Medicaid is intrinsically important; quality of care and outcomes for Medicaid beneficiaries are of critical importance for the health of the population.
Unobtrusive data collection on entire covered population; diagnostic and treatment information from providers rather than consumers. Avoids biases related to self-report and differential study participation.
Supports studies that include beneficiaries with limited ability to self-report such as those with cognitively impairment.
Supports characterization of usual care for the full covered population and across the full range of providers and care settings.
Supports analysis of off-label medication use and outcomes, and of medication outcomes for types of patients excluded from clinical trials.
Detailed longitudinal histories with dates of healthcare encounters, treatments and diagnoses; multiple years of data can be merged for long-term follow-up; datasets can be updated cost-effectively as newer years of data become available. Datasets support detailed longitudinal analysis of medication initiation and persistence over time.
Long-term follow-up is possible for beneficiaries who are consistently enrolled.
Event history analyses of temporal relationships among health care events are supported, such as incidence and timing of hospitalizations and emergency room visits following treatment initiation. Information on dates of healthcare events can be used to construct episodes of care of consistent duration.
Vital source of information on secular trends in usual care.
Includes information on care of patients for all participating providers; provides geographic detail. Individual-level data can be aggregated to create provider-level and area-level estimates of treatment patterns; this information can be used to support multilevel analyses of treatment and outcome patterns.
Supports study designs that incorporate linkages to other sources of clinical, contextual and outcome data, such as vital records and claims for other payers.
Provides expenditure information from payer’s perspective. Supports economic analysis of Medicaid costs of care.

Medicaid is the predominant payer for low-income Americans and is crucial for people with disabilities. In 2006, excluding State Child Health Insurance Program (SCHIP) participants, an estimated 56.3 million individuals were enrolled,4 or almost 1 in 5 Americans, including 31.1 million children, 16.2 million nonelderly adults, 6.1 million elderly, and 9.7 million blind and disabled persons. Racial and ethnic minorities are strongly represented; racial/ethnic breakdowns for 2003 (the most recent available) include 11.7 million African Americans (almost 1 in 3 nationally) and 9.8 million Latinos (almost 1 in 4 nationally).5,6 Unlike typical commercial insurance databases, race and ethnicity are recorded for most enrollees. Thus, Medicaid files are particularly important for examining outcomes in diverse populations and identifying treatment disparities.

Because beneficiaries have access to a generally similar package of benefits, Medicaid data provide important insight into nonfinancial sources of disparities. Such studies are particularly valuable when they go beyond simple cross-sectional analyses to longitudinal modeling. Within-person analyses of duration and consistency of treatment spells can shed important light on processes and potentially modifiable factors leading to disparate health outcomes. For example, longitudinal analyses of antiretroviral medication use among Medicaid beneficiaries with HIV/AIDS found that racial/ethnic disparities in treatment included both later initiation of regimens and less consistent use after treatment initiation by African American and Hispanic beneficiaries, suggesting the need to examine nonfinancial barriers to adherence in long-term antiretroviral therapy, and implement better strategies for supporting patients in remaining on these regimens consistently.7

Data on treatments and diagnoses come from providers, avoiding self-report and nonresponse biases that are issues in interview-based studies.8 Difficult-to-interview subgroups (eg, those with neurologic, psychiatric, or cognitive incapacities) are fully included. Medicaid is a central payer for persons with mental illness or substance abuse and an essential source for studying their care. Very large datasets can be obtained and updated for a fraction of the cost of comparably sized studies involving primary data collection, providing continuous, longitudinal information on healthcare encounters, their accompanying recorded diagnoses and procedures, and filled prescriptions. Detailed information on the dates of events can be used to construct episodes of care, support close examination of the timing of events in relation to treatment initiation, and facilitate use of event history methods to model hazards of outcomes such as hospitalizations.9 The data support studies of off-label use and outcomes for types of patients excluded from clinical trials.

Power for studying low-prevalence conditions and outcomes is a key comparative advantage. Examples include studies of multiple-diagnosed subgroups (eg, HIV/AIDS patients with mental retardation)10 and uncommon, severe medication outcomes that premarketing trials have insufficient power to capture.11 Use of data from multiple states increases power for such analyses. For example, Olfson and colleagues12 used 50-state Medicaid data in a case–control study of risks of suicide attempts and suicide deaths in severely depressed children and adults; in children and adolescents, antidepressant drug treatment was associated with both outcomes. The statistical power of the 50-state dataset was key to detecting associations between treatment and these infrequent events.

LIMITATIONS, CHALLENGES, AND STRATEGIES

Selecting optimal strategies for research using Medicaid claims involves assessing data quality, understanding limitations, identifying potential sources of error, and adopting effective methods for restricting their impact. Several key challenges are summarized in Table 2. Ideal solutions remain elusive for some problems, and no set of rules can be applied mechanically, but an increasing body of research has demonstrated strategies for addressing these challenges. The International Society for Pharmacoeconomics and Outcomes Research’s Task Force on Retrospective Databases provides a user-friendly checklist for decision makers considering this sort of research.13 Key first steps are thoughtful selection of research questions based on extensive familiarity with both national and state Medicaid program characteristics, and understanding the institutional processes that determine patient eligibility and produce and process claims.

TABLE 2.

Challenges and Best Practices in Working With Medicaid Claims Data

Challenges Best Practices
Claims are generated for administrative and reimbursement rather than clinical or research purposes. Linkages to external sources of clinical information may be possible in some studies.
Understanding non-clinical influences in coding processes, such as reimbursement considerations is important in interpreting data in claims.
Raw data must be organized into meaningful diagnosis and treatment variables. Variable construction should be informed by clinical understanding of relevant conditions, treatments and outcomes.
Sensitivity of results to alternative variable specifications should be carefully examined.
Completeness of encounter histories must be carefully assessed and inclusion criteria selected to minimize missing information. Matching with Medicare claims histories, although it can be labor-intensive and costly, is the best way to assure completeness of claims histories for dual eligibles.
Beneficiaries who are dually eligible for Medicaid and Medicare will have claims histories in both systems, and their Medicaid claims histories alone may not provide complete diagnostic histories. Although encounter or “pseudo-claims” data exist for beneficiaries in capitated managed care programs in some states, careful examination of data quality is necessary before such data can be relied upon. Restriction of analyses to non-capitated beneficiaries is typically used to address missing information for those in capitated plans.
Detailed service utilization and diagnostic histories are typically missing or incomplete for beneficiaries enrolled in capitated managed care programs. Plausibility of observed utilization patterns for subgroups of concern should be assessed, and sensitivity of results to alternative inclusion criteria examined.
Careful construction and validation of diagnostic indicators and clinical classifications from claims is key for subsequent analyses. Careful examination of alternative criteria regarding diagnostic coding nets and number and source of observed diagnoses allows researchers to examine trade-offs between sensitivity and specificity on a condition-by-condition basis. Choices of narrower versus broader measures should be consistent with research questions.
Selection of algorithms may involve tradeoffs between sensitivity and specificity. External validation of diagnostic classifications may be available from studies that link Medicaid coding to chart or other clinical information.
Strategies often used to increase specificity include restriction to cases with 2 or more outpatient diagnoses and/or an inpatient diagnosis, or to diagnoses from high-credibility provider types.
Prescription drug claims histories reflect complex and often erratic patterns of use over time. Examination of which beneficiaries use which drugs is simply a starting point; further insight can be gained by analysis of duration of treatment spells and consistency of use over time.
Choice of measures should be consistent with clinical information on desirable medication use patterns.
Observations may be clustered in complex ways, violating assumptions of independence and complicating inferences. Consider use of statistical tools such as generalized estimating equations that are robust in the face of clustered data.
Variation at multiple levels of clustering (eg, repeated observations within beneficiaries, beneficiaries within providers and providers within communities) can be modeled explicitly with multilevel methods such as hierarchical linear modeling.
Censoring and changes in diagnosis, treatment, and eligibility status complicate analyses. Event history analyses techniques such as Cox regression handle time-to-event data with censoring, and can incorporate time-varying covariates.
Use/adherence is not necessarily implied by filling of prescriptions and receipt of services. Remind readers that the services identified in the data (eg, possession of prescription drugs) are generally a necessary, but not a sufficient, condition for use/adherence; they represent an upper-bound estimate.
Compare various adherence measures to estimate frequency/types of cases with receipt but not use.
Complex diagnostic and treatment histories create potential for confounding. Treated individuals may differ in important ways from untreated individuals. Careful construction of analytic covariates that fully capture information in the data on diagnostic and treatment histories is important.
Controlling separately for an extensive vector of relevant comorbid conditions is often preferable to reliance on comorbidity indexes.
Techniques such as propensity scoring and instrumental variables can help reduce confounding by indication.
Robustness of results across alternative analytic specifications should be examined.
Enrollees move in and out of systems as their eligibility changes. Construction of eligibility timelines is used to select intervals to be included in analyses.
Complex longitudinal diagnostic and treatment histories complicate development of summary variables. Diagnostic and treatment timelines allow researchers to incorporate interactions among diagnosis, drug prescription, and service histories.
State-by-state patterns of missing data and other anomalies complicate analysis. Understanding of state-by-state patterns of missing data and other anomalies is used to fine-tune analyses and inclusion criteria.
Results may vary as a function of inclusion and exclusion criteria; generalizability of results needs to be interpreted in relation to the specific population included in analyses. Careful selection of inclusion criteria focuses analyses on subpopulations whose data can properly support inferences. Reports of results should clearly indicate inclusion and exclusion criteria.
Robustness of findings under alternative criteria should be examined.
Additional information on contextual factors (eg, community characteristics and resources) may be needed to interpret geographic variations. Linkages to geographic data (eg, US Census data, Area Resource File) can be used to complement information on geographic variations within the files themselves, supporting contextual analyses and examination of small-area variations.

Table 2 presents, for each major challenge, relevant strategies and best practices. A central limitation with many implications is that Medicaid data are collected for administrative rather than research purposes. Thus, understanding nonclinical influences in coding processes, such as coverage and reimbursement considerations, is essential. Claims represent raw data that must be organized into meaningful diagnostic and treatment variables, whose validity and sensitivity to alternative methodological choices must be carefully considered before analysis can proceed. Care and thoroughness in this process, informed by a clear understanding of relevant clinical, administrative, and coding issues, is critical to effective inference.

Moving from these broad issues to more specific analytic challenges, several that occur in many if not most Medicaid studies are highlighted in Table 2. Especially important, as discussed below, are assessing completeness of information on health care encounters; identification of relevant clinical subgroups based on diagnostic information in claims; choice and validity of measures of medication use based on filled-prescription data; and selection of statistical modeling techniques appropriate to the structure of the data.

Assessing Data Completeness

As with other claims datasets, information on health care encounters, prescription drug utilization and diagnoses in Medicaid files is based on provider report. Although recall and social desirability biases characteristic of self-report data are avoided,14 conditions that have not come to medical attention are not identified and provider-recorded diagnoses may reflect nonmedical considerations such as reimbursement policies and rates. It is also important to assess the possibility of receipt of services that do not appear in Medicaid claims files and select inclusion criteria that minimize the potential for incomplete data. For example, state psychiatric hospitals do not bill Medicaid for most recipients utilizing their care. Beneficiaries with dual Medicare-Medicaid eligibility and participants in capitated managed care programs may receive services not reflected in Medicaid claims files. Although claims files in some states include “pseudo-claims” for services to capitated beneficiaries, such data cannot be assumed to be complete without validation studies supporting their usability. For dual-eligibles, although many services eligible for Medicare funding will generate a Medicaid copayment, some may not.15 Although matching with Medicare files to retrieve corresponding Medicare claims is a complex procedure, such matching is the best way to assure completeness of care histories in studies that include dual-eligibles.

Identifying Diagnostic Subgroups and Procedures

Diagnostic classification of beneficiaries is a critical foundation for subsequent analyses. Although diagnostic codes in claims serve as a useful proxy for underlying medical conditions, it is helpful to think of them as representing a provider behavior that is also influenced by other factors. Thus, trends in the reported rates of conditions may reflect changes in provider-level patterns of recognition and labeling of conditions, as opposed to true changes in underlying prevalence. For example, analyzing Medicare claims data, Crystal and colleagues16 found that the proportion of elderly beneficiaries diagnosed with depression doubled between 1992 and 1998, likely representing a shift in diagnosing behavior rather than in underlying depression.

Validity of detection algorithms in claims data for diagnostic groups17 and procedures18 can be examined with a 2-by-2 framework familiar from the medical screening literature. Comparisons between algorithm-based decision rules and some specified “gold standard” produce true positives, false positives, true negatives, and false negatives. Sensitivity represents the algorithm’s effectiveness in classifying diagnosis-positive clients (based on the external standard) as diagnosis positive (based on the claims). Specificity is the detection algorithm’s capacity to classify diagnosis-negative clients as diagnosis negative.

As an illustration of application of this framework to validation of claims-based diagnostic measures, sensitivity of HIV/AIDS case identification was examined using New Jersey Medicaid data, with the state’s HIV/AIDS Registry as the “gold standard.”19 Building on work by Keyes and others,2022 claims-based algorithms for identifying AIDS cases were applied to Medicaid data and compared with those constructed by matching to the Registry. An algorithm using nonpharmacy claims produced 88% sensitivity; adding pharmacy claims produced 95% sensitivity.

Similarly, validation studies have been conducted for claims-based measures of psychiatric conditions. Agreement with medical records has been found to be high for more serious disorders, such as schizophrenia,23 but lower for conditions such as minor depression.24 Lurie and colleagues25 tested an algorithm using 2 years of Medicaid claims, classifying a patient as having schizophrenia who had 1 inpatient or 2 outpatient encounters with this diagnosis. Results were compared with an external standard using psychiatrist ratings of symptom profiles, producing an estimated specificity of 87% for the claims-based algorithm.

Measures of Prescription Drug Use

Prescription fill data are widely used to examine duration and consistency of medication use spells. Such claims-based measures can be used in conjunction with, or in place of, other adherence measures such as self-reports, pill counts, electronic monitoring, and laboratory results. Measures of refill persistence have been shown to correlate with these other more costly and intrusive measures. Unlike self-reports, refill persistence measures are not affected by inaccurate or biased recall, and have been shown to predict clinical outcomes.7 Grymonpre et al26 reported strong concordance between claims-derived measures of medication availability and pill counts. Nachega et al27 found a significant relationship between a claims-based measure of antiretroviral adherence and survival among South African adults with HIV. In interpreting results, however, it is important to understand persistence measures as reflecting drug availability to the patient, not actual drug-taking behavior. Filled prescriptions, representing a necessary, though not sufficient, condition for drug ingestion, indicate a lower bound for adherence. Also, pharmacy claims will not reflect drug samples obtained from physicians’ offices, drugs received while hospitalized, over-the-counter drugs, and nonformulary drugs purchased out-of-pocket.28

Many measures of refill persistence have been proposed, most based on fill dates and days supplied. Measures can be distinguished on several dimensions including continuous versus dichotomous measures, single or multiple observation intervals, and definition of the “denominator” time period for calculation of medication availability (eg, time from initiation to last filled prescription vs. time from initiation to end of observation period).29 Results using alternative measures seem to vary mainly by choice of denominator rather than other characteristics. In one recent study, 5 measures using total study evaluation period in the denominator produced almost identical results.30

Among available measures, there seems to be no single accepted standard; choices are best made in light of the subpopulations and clinical issues at hand. Clinical and treatment standards considerations will help rule out certain measures and recommend others. In some instances, adequate care would involve continuous treatment of an indefinite period after therapy initiation. In studies of HIV treatment, for example, one would usually expect to observe consistent, indefinite receipt of antiretroviral therapy once started, although specific drugs in the regimen might change. In such a situation, therefore, it would be appropriate to treat both temporary gaps and discontinuation as suboptimal, and examine medication possession from treatment initiation to the last date of observation, as applied in antiretroviral persistence studies among New Jersey Medicaid beneficiaries.31

Clinical circumstances also inform choice of continuous versus binary persistence measures. In HIV disease, strict adherence is necessary to achieve viral suppression and prevent viral resistance, so a binary measure of “strict adherence” would be appropriate. In other circumstances (eg, antipsychotic therapy among beneficiaries with schizophrenia), there may be no clear “threshold” of adherence for effectiveness, and the challenge for quality may be a matter of moving the entire distribution upward. In such cases, continuous measures may be best.32

Clinical guidelines may provide benchmarks for adequate treatment duration. For example, Health and Employer Data and Information Set (HEDIS)33 standards specify minimum duration of antidepressant treatment of acute and continuation phases of care. Although usually used by managed care organizations, such measures can also provide an episode-of-care benchmark for fee-for-service Medicaid data.

Construction of a refill persistence measure must also take into account the complexity of many treatment regimens, including medication switching, polypharmacy, and introduction and diffusion of new drugs. For example, claims histories for persons with bipolar disorder may include changes from lithium to anticonvulsants (ie, gabapentin) and may be further complicated by diagnosis-related concerns such as early mis-diagnosis of major depressive disorder and concurrent diagnosis of a seizure disorder. Clinically informed analytic choices are important to distinguish periods of appropriate treatment from nontreatment, and are a necessary condition for meaningful interpretation of findings. Thus, researchers should be wary of simply selecting a single measure and mechanically applying the required numbers. Use of sensitivity analyses to examine robustness of findings across alternative measures will strengthen confidence in results.

Analysis and Measurement Strategies

A key challenge in analyzing outcomes is to address possible sources of confounding due to unobserved clinical differences. It is important to understand patient-level and other sources of variation in treatment and to make optimal use of clinical information that does exist in diagnostic histories in the claims files. As in smaller-scale studies, indices such as the Charlson Comorbidity Index are often used to control for co-occurring conditions.34,35 However, such indices may under-adjust, as they force weights on the various coexisting conditions that may not be suited to the research question at hand. With the extensive statistical power of most claims-based studies, conservation of degrees of freedom may not be an important consideration and it may be appropriate to adjust separately for each relevant co-occurring condition. Propensity score matching (PSM) is gaining popularity to partially address the problem of selection bias into treatment groups, but researchers should understand that biases may remain due to unobserved differences between groups being compared. Another adjustment strategy well-suited to Medicaid claims data involves reducing heterogeneity by focusing on restricted subgroups. For example, in their study of the association between suicidality and antidepressant use, Olfson and colleagues12 limited their analyses to individuals who had previously experienced a hospitalization for depression.

Improvements in statistical and econometric techniques also have promise for more effective and appropriate use of Medicaid claims data. There has been much interest in instrumental variables approaches to address bias due to unobserved clinical factors motivating prescription of treatments (“confounding by indication”).36 Although it is often difficult to identify suitable instruments for this type of analysis (ie, high correlation with treatment but not with the outcome), there is often enormous variability in prescription drug use by Medicaid populations across time, geographical location, and provider that stems from causes that are exogenous with respect to individuals’ clinical characteristics, and may offer the potential for such analyses. For example, provider-level measures of treatment patterns that can be calculated from Medicaid data are probably less subject to confounding by indication than are patient-level measures of treatment. Brookhart and colleagues37 used physicians’ treatment preferences, operationalized as the treatment received by the previous patient with the same condition seen by the physician, as an instrumental variable for exposure to particular prescription drug treatments. Similarly, treatment variations across space (geographical area) and time can be exploited in analyses as relatively exogenous sources of treatment variation.

Changes in formularies, cost-sharing, and other policies also offer opportunities to examine the consequences of natural experiments in Medicaid. For example, Soumerai and colleagues38 examined effects of a 3 prescription monthly cap on noninstitutionalized patients with schizophrenia. Although medication use was reduced, clinic visits and ER use increased. Medicaid data on dual-eligibles also provide a baseline for future studies of the effects of Medicare Part D implementation.

Finally, problems of data censoring are a common challenge in longitudinal analyses of Medicaid files. For example, in analyzing duration of treatment spells, discontinuation often does not take place by the end of the observation period. Survival analysis methods are an appropriate tool in such situations. Kaplan–Meier estimation can be used to examine distribution of treatment duration and bivariate differences, whereas proportional hazards (Cox) regression can be used to examine effects of multiple covariates on time-to-discontinuation. Akincigil et al39 used these models to examine time-to-discontinuation of beta-blocker and angiotensin converting enzyme (ACE) inhibitor therapy after acute myocardial infarction, finding substantial socioeconomic disparities.

CONCLUSIONS AND POLICY IMPLICATIONS

An essential foundation for policy development and quality improvement is improved understanding of the ways in which prescription drugs are used in routine clinical care, which often differs widely from populations and diagnoses in premarketing trials. Despite practical challenges such as time lags in data availability, Medicaid data are particularly well-suited to monitoring rates, predictors, and other aspects of use, including disparities, dosages, durations and associated diagnoses. For example, across a range of chronic health conditions, claims-based studies have frequently documented patterns of inconsistent use or early discontinuation of therapies that represent major missed public health opportunities.7,16,31,32,39,40 Similarly, such studies have documented racial/ethnic disparities in care, even when financial coverage is comparable, providing insight into actionable intervening variables that could reduce disparities,41,42 and have received substantial attention in policy documents such as the National Healthcare Disparities Report.43

With respect to treatment outcomes and drug safety, many methodological challenges present themselves. Nevertheless, Medicaid data offer an extremely important resource for such studies. Although RCTs remain the gold standard, such data are often unavailable for a host of questions that are critically important to better-informed therapeutic decision making and improved safety and treatment outcomes. Even if there is a sharp increase in national investment in needed RCTs, such studies will never be able to answer every important question about outcomes for every beneficiary subgroup, condition, and drug or drug combination, given the many feasibility, economic, ethical, duration, and other constraints on RCT implementation. Absence of needed RCT data is particularly marked for low-income, minority, elderly and disabled populations, often under-represented in RCTs. Medicaid data provide the power to examine use and outcomes in these and other important subpopulations.

To make more effective use of the considerable potential for research on pharmaceutical use and outcomes among Medicaid beneficiaries, including dual-eligibles, several public policy initiatives merit consideration. Many studies to date have been conducted on an ad hoc basis, with datasets assembled to address a single research topic. However, effective use of these powerful but large and complex datasets is labor-intensive, requiring long-term cumulative experience, considerable investment in database development and management, linkage across multiple data sources, documentation, sensitivity analyses, and procedures for data security and rigorous protection of confidentiality, among other steps.

Optimally, costs of such investments would be amortized over multiple studies rather than on a single-study basis. These analyses require multiple types of expertise including knowledge of the conditions, populations and drugs at issue, detailed understanding of the programs and data systems, and advanced statistical and modeling methods; thus, infrastructure is needed to support and sustain multidisciplinary teams. Similarly, there is need for an array of validation studies, such as comparisons of the performance of claims-based algorithms for identification of populations with particular conditions to external criteria based on expert clinical assessment. Such studies have typically been conducted as byproducts of substantive investigations, but they deserve encouragement and funding in their own right. Thus, a more systematic program of support is needed for development and linking of administrative databases, validation studies, and utilization of these data resources to address use and outcomes of prescription drug therapy, particularly for low-income, disabled, and elderly individuals reliant on publicly-funded coverage. Such a program could usefully incorporate systematic initiatives for linkages of claims data to other sources of data on beneficiaries’ health status and outcomes, such as disease registries, surveys, electronic medical record data, and birth and death records. The National Cancer Institute has linked its Surveillance, Epidemiology and End Results (SEER) cancer registry data with Medicare claims, creating a merged database that has been a productive research tool. This represents a promising model that could be further strengthened by linkage to data on prescription drugs beyond those administered in physicians’ offices, such as Part D data. More systematic development of research datasets that link Medicaid data to other data files providing clinical and outcome data for beneficiaries will expand the potential of research with these datasets and the ability to address problems of confounding; this is a key strategy for reducing missing-variable bias and improving measurement of outcomes. For example, the potential of data linkages between Medicaid claims and vital-statistics files (eg, birth certificates) has been illustrated in important research by Cooper et al11 on medication effects on birth-defect rates, which prompted a policy response in the form of an FDA safety alert.44

A related concern for the future is the need for systematic data merging, warehousing, and access initiatives to address fragmentation of data for beneficiaries of public coverage that has been a by-product of initiatives aimed at reform. These changes include expansion of capitated Medicaid managed care programs, managed care carve-outs for specific services, such as behavioral health, and the advent of Medicare Part D. Under the Medicare Modernization Act (MMA; Pub. L. 108–173), on January 1, 2006, responsibility for prescription drug benefits for individuals dually eligible for Medicare and Medicaid was shifted from state Medicaid programs to privately-administered Medicare Prescription Drug Plans (PDPs). These individuals include elderly beneficiaries receiving Medicaid, as well as many persons with disabilities. Thus, future studies of prescription drug use and outcomes for elderly and disabled individuals reliant on publicly funded health coverage will need to access prescription drug claims data from the PDPs and merge them with other sources of data, such as Medicare Part A and B claims, and Medicaid claims for non-Medicare-covered services. Such studies will need to reintegrate now-fragmented data from multiple databases on the same individuals, while carefully protecting beneficiary confidentiality. Under MMA, PDPs provide claims-level data to the Centers for Medicare and Medicaid Services (CMS) that are used for tracking out-of-pocket costs and other purposes. These data have the potential to serve as the foundation for a powerful national research database on prescription drug use and outcomes by the elderly and disabled. However, by mid-2007, procedures for research access to Part D data were still a work in progress, and it may be a long and complex road to achieve their great research potential.

Although much more needs to be done, the existing body of work using Medicaid data to examine prescription drug use, although modest in volume, has already done much to demonstrate their power to address critical questions of prescription drug use and safety. For example, the Presidential New Freedom Commission on Mental Health made extensive use of Medicaid-based studies.45 Given its growing financial costs ($315 billion in state and federal expenditures in 2005) and public health importance as the payer for almost 1 in 5 Americans, Medicaid is a chronic subject of intense policy debate,46 and it is important that issues of utilization, quality and outcomes of care in Medicaid be adequately addressed with available data as a basis for rational discussion of policy choices. Medicaid data resources represent an important opportunity for needed studies in coming years. Further development of merged research datasets that link claims with other sources of clinical and outcome information; systematic efforts to support and disseminate validation studies; thoughtful exploitation of natural experiments; careful selection of research questions to draw on the data’s strengths while avoiding threats to validity; and further development of statistical and analytic methods to address potential sources of bias are among the tools that will help to realize this potential.

Acknowledgments

Supported by the Agency for Healthcare Research and Quality through a cooperative agreement for the Center for Research and Education on Mental Health Therapeutics at Rutgers (HS016097), as part of AHRQ’s Centers for Education and Research on Therapeutics (CERTs) program; with additional support from NIMH grants MH058984, MH60831, and MH076206; and AHRQ grant HS011825.

Footnotes

An earlier version was presented at the AHRQ-sponsored conference on Comparative Effectiveness and Safety: Emerging Methods Symposium, June 19–20, 2006, Rockville, MD.

References

  • 1.Davis MH, O’Brien E. Profile of persons with disabilities in Medicare and Medicaid. Health Care Financ Rev. 1996;17:179–211. [PMC free article] [PubMed] [Google Scholar]
  • 2.Long SK, King J, Coughlin TA. The health care experiences of rural Medicaid beneficiaries. J Health Care Poor Underserved. 2006;17:575–591. doi: 10.1353/hpu.2006.0111. [DOI] [PubMed] [Google Scholar]
  • 3.Rupp K, Davies PS, Newcomb C, et al. A profile of children with disabilities receiving SSI: highlights from the National Survey of SSI Children and Families. Soc Secur Bull. 2005;66:21–48. [PubMed] [Google Scholar]
  • 4.Centers for Medicaid and Medicare Services. [Accessed February 16, 2007];1996 Data Compendium. 2006 December; Available at: http://www.cms.hhs.gov/DataCompendium/18_2006_Data_Compendium.
  • 5.Centers for Medicare and Medicaid Services. [Accessed September 11, 2006];MSIS Tables FY 2003. 112006 Available at: http://www.cms.hhs.gov/MedicaidDataSourcesGenInfo/02_MSISData.asp. [PubMed]
  • 6.U.S. Census Bureau. [Accessed September 4, 2006];Annual Estimates of the Population by Sex, Race and Hispanic or Latino Origin for the United States: April 1, 2000 to July 1, 2005 (NC-EST2005-03) Available at: http://www.census.gov/popest/national/asrh/NC-EST2005-srh.html.
  • 7.Crystal S, Sambamoorthi U, Moynihan PJ, et al. Initiation and continuation of newer antiretroviral treatments among Medicaid recipients with AIDS. J Gen Intern Med. 2001;16:850–859. doi: 10.1111/j.1525-1497.2001.01025.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kendler KS, Gallagher TJ, Abelson JM, et al. Lifetime prevalence, demographic risk factors, and diagnostic validity of nonaffective psychosis as assessed in a US community sample. The National Comorbidity Survey. Arch Gen Psychiatry. 1996;53:1022–1031. doi: 10.1001/archpsyc.1996.01830110060007. [DOI] [PubMed] [Google Scholar]
  • 9.Crystal S, Lo Sasso AT, Sambamoorthi U. Incidence and duration of hospitalizations among persons with AIDS: an event history approach. Health Serv Res. 1999;33:1611–1638. [PMC free article] [PubMed] [Google Scholar]
  • 10.Walkup J, Sambamoorthi U, Crystal S. Characteristics of persons with mental retardation and HIV/AIDS infection in a statewide Medicaid population. Am J Ment Retard. 1999;104:356–363. doi: 10.1352/0895-8017(1999)104<0356:COPWMR>2.0.CO;2. [DOI] [PubMed] [Google Scholar]
  • 11.Cooper WO, Hernandez-Diaz S, Arbogast PG, et al. Major congenital malformations after first-trimester exposure to ACE inhibitors. N Engl J Med. 2006;354:2443–2451. doi: 10.1056/NEJMoa055202. [DOI] [PubMed] [Google Scholar]
  • 12.Olfson M, Marcus SC, Shaffer D. Antidepressant drug therapy and suicide in severely depressed children and adults: a case-control study. Arch Gen Psychiatry. 2006;63:865–872. doi: 10.1001/archpsyc.63.8.865. [DOI] [PubMed] [Google Scholar]
  • 13.Motheral B, Brooks J, Clark MA, et al. A checklist for retroactive database studies–report of the ISPOR task force on retrospective databases. Value Health. 2003;6:90–97. doi: 10.1046/j.1524-4733.2003.00242.x. [DOI] [PubMed] [Google Scholar]
  • 14.Holt K, Franks P, Meldrum S, et al. Mammography self-report and mammography claims: racial, ethnic, and socioeconomic discrepancies among elderly women. Med Care. 2006;44:513–518. doi: 10.1097/01.mlr.0000215884.81143.da. [DOI] [PubMed] [Google Scholar]
  • 15.Hennessy S, Bilker WB, Weber A, et al. Descriptive analyses of the integrity of a US Medicaid claims database. Pharmacoepidemiol Drug Saf. 2003;12:103–111. doi: 10.1002/pds.765. [DOI] [PubMed] [Google Scholar]
  • 16.Crystal S, Sambamoorthi U, Walkup JT, et al. Diagnosis and treatment of depression in the elderly Medicare population: predictors, disparities, and trends. J Am Geriatr Soc. 2003;51:1718–1728. doi: 10.1046/j.1532-5415.2003.51555.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rawson NS, Malcolm E. Validity of the recording of ischemic heart disease and chronic obstructive pulmonary disease in the Saskatchewan health care datafiles. Stat Med. 1995;14:2627–2643. doi: 10.1002/sim.4780142404. [DOI] [PubMed] [Google Scholar]
  • 18.Edouard L, Rawson NS. Reliability of the recording of hysterectomy in the Saskatchewan health care system. Br J Obstet Gynaecol. 1996;103:891–897. doi: 10.1111/j.1471-0528.1996.tb09908.x. [DOI] [PubMed] [Google Scholar]
  • 19.Walkup JT, Wei W, Sambamoorthi U, et al. Sensitivity of an AIDS case-finding algorithm: who are we missing? Med Care. 2004;42:756–763. doi: 10.1097/01.mlr.0000132749.20897.46. [DOI] [PubMed] [Google Scholar]
  • 20.Keyes M, Andrews R, Mason ML. A methodology for building an AIDS research file using Medicaid claims and administrative data bases. J Acquir Immune Defic Syndr. 1991;4:1015–1024. [PubMed] [Google Scholar]
  • 21.Fanning TR, Cosler LE. The quality of Medicaid data for HIV/AIDS research: examination of a statewide data base. AIDS Public Policy J. 1995;10:39–47. [Google Scholar]
  • 22.Thornton C, Turner BJ. MPR Reference Number 8230-400. Princeton, NJ: Mathematica Policy Research; 1997. Methods for Identifying AIDS Cases in Medicare and Medicaid Claims Data. [Google Scholar]
  • 23.Walkup JT, Boyer CA, Kellermann SL. Reliability of Medicaid claims files for use in psychiatric diagnoses and service delivery. Adm Policy Ment Health. 2000;27:129–139. doi: 10.1023/a:1021308007343. [DOI] [PubMed] [Google Scholar]
  • 24.Rawson NS, Malcolm E, D’Arcy C. Reliability of the recording of schizophrenia and depressive disorder in the Saskatchewan health care datafiles. Soc Psychiatry Psychiatr Epidemiol. 1997;32:191–199. doi: 10.1007/BF00788238. [DOI] [PubMed] [Google Scholar]
  • 25.Lurie N, Popkin M, Dysken M, et al. Accuracy of diagnoses of schizophrenia in Medicaid claims. Hosp Community Psychiatry. 1992;43:69–71. doi: 10.1176/ps.43.1.69. [DOI] [PubMed] [Google Scholar]
  • 26.Grymonpre R, Cheang M, Fraser M, et al. Validity of a prescription claims database to estimate medication adherence in older persons. Med Care. 2006;44:471–477. doi: 10.1097/01.mlr.0000207817.32496.cb. [DOI] [PubMed] [Google Scholar]
  • 27.Nachega JB, Hislop M, Dowdy DW, et al. Adherence to highly active antiretroviral therapy assessed by pharmacy claims predicts survival in HIV-infected South African adults. J Acquir Immune Defic Syndr. 2006;43:78–84. doi: 10.1097/01.qai.0000225015.43266.46. [DOI] [PubMed] [Google Scholar]
  • 28.Gable CB, Friedman RB, Holzer S, et al. Pharmacoepidemiological studies in automated claims databases: methodological issues. J Res Pharm Econ. 1992;4:53–67. [Google Scholar]
  • 29.Steiner JF, Prochazka AV. The assessment of refill compliance using pharmacy records: methods, validity, and applications. J Clin Epidemiol. 1997;50:105–116. doi: 10.1016/s0895-4356(96)00268-5. [DOI] [PubMed] [Google Scholar]
  • 30.Hess LM, Raebel MA, Conner DA, et al. Measurement of adherence in pharmacy administrative databases: a proposal for standard definitions and preferred measures. Ann Pharmacother. 2006;40:1280–1288. doi: 10.1345/aph.1H018. [DOI] [PubMed] [Google Scholar]
  • 31.Walkup JT, Sambamoorthi U, Crystal S. Use of newer antiretroviral treatments among HIV-infected Medicaid beneficiaries with serious mental illness. J Clin Psychiatry. 2004;65:1180–1189. doi: 10.4088/jcp.v65n0905. [DOI] [PubMed] [Google Scholar]
  • 32.Bagchi A, Sambamoorthi U, McSpiritt E, et al. Use of antipsychotic medications among HIV-infected individuals with schizophrenia. Schizophr Res. 2004;71:435–444. doi: 10.1016/j.schres.2004.02.021. [DOI] [PubMed] [Google Scholar]
  • 33.National Committee for Quality Assurance. HEDIS 2004. Vol 2: Technical Specifications. Washington, DC: NCQA Press; 2003. [Google Scholar]
  • 34.Charlson ME, Pompei P, Ales KL, et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40:373–383. doi: 10.1016/0021-9681(87)90171-8. [DOI] [PubMed] [Google Scholar]
  • 35.Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45:613–619. doi: 10.1016/0895-4356(92)90133-8. [DOI] [PubMed] [Google Scholar]
  • 36.Martens EP, Pestman WR, de Boer A, et al. Instrumental variables: application and limitations. Epidemiology. 2006;17:260–267. doi: 10.1097/01.ede.0000215160.88317.cb. [DOI] [PubMed] [Google Scholar]
  • 37.Brookhart MA, Wang PS, Solomon DH, et al. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology. 2006;17:268–275. doi: 10.1097/01.ede.0000193606.58671.c5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Soumerai SB, McLaughlin TJ, Ross-Degnan D, et al. Effects of a limit on Medicaid drug-reimbursement benefits on the use of psychotropic agents and acute mental health services by patients with schizophrenia. N Engl J Med. 1994;331:650–655. doi: 10.1056/NEJM199409083311006. [DOI] [PubMed] [Google Scholar]
  • 39.Akincigil A, Bowblis J, Levin C, et al. Refill persistence with beta-blocker, ACE inhibitor, and statin therapy after an acute myocardial infarction. Presented at the Sixth Scientific Forum on Quality of Care and Outcomes Research in Cardiovascular Disease and Stroke, 2005; Washington, DC. [Google Scholar]
  • 40.Himelhoch S, Moore RD, Treisman G, et al. Does the presence of a current psychiatric disorder in AIDS patients affect the initiation of antiretroviral treatment and duration of therapy? J Acquir Immune Defic Syndr. 2004;37:1457–1463. doi: 10.1097/01.qai.0000136739.01219.6d. [DOI] [PubMed] [Google Scholar]
  • 41.Melfi C, Croghan TW, Hanna MP, et al. Racial variation in antidepressant treatment in a Medicaid population. J Clin Psychiatry. 2000;61:16–21. doi: 10.4088/jcp.v61n0105. [DOI] [PubMed] [Google Scholar]
  • 42.Litaker D, Koroukian S, Frolkis JP, et al. Disparities among the disadvantaged: variation in lipid management in the Ohio Medicaid program. Prev Med. 2006;42:313–315. doi: 10.1016/j.ypmed.2005.11.016. [DOI] [PubMed] [Google Scholar]
  • 43.Agency for Healthcare Research and Quality. 2005 National Healthcare Disparities Report (AHRQ Publication No. 06-0017) Rockville, MD: Agency for Healthcare Research and Quality; 2005. [Google Scholar]
  • 44.Food and Drug Administration. [Accessed February 15, 2007];Safety Alerts for Drugs, Biologics, Medical Devices, and Dietary Supplements. 2006 Available at: http://www.fda.gov/medwatch/safety/2006/safety06.htm#chronological.
  • 45.Day S. Issues in Medicaid policy and system transformation: recommendations from the President’s Commission. Psychiatric Serv. 2006;57:1713–1718. doi: 10.1176/ps.2006.57.12.1713. [DOI] [PubMed] [Google Scholar]
  • 46.Iglehart JK. Medicaid revisited—skirmishes over a vast public enterprise. N Engl J Med. 2007;357:734–740. doi: 10.1056/NEJMhpr066650. [DOI] [PubMed] [Google Scholar]

RESOURCES