Skip to main content
Journal of Managed Care & Specialty Pharmacy logoLink to Journal of Managed Care & Specialty Pharmacy
. 2019 May;25(5):10.18553/jmcp.2019.25.5.538. doi: 10.18553/jmcp.2019.25.5.538

A Primer for Managed Care Residents: How to Conduct Research Using Live Medical and Pharmacy Claims Data

Anna Hung 1,*, Rodney Gedey 2, Marti Groeneweg 3, Michelle Jay 4
PMCID: PMC10398282  PMID: 31039066

Abstract

Managed care organizations are growing more sophisticated in their ability to analyze data. There are increasing numbers of data analysts at managed care organizations, as well as more types of real-time, or “live,” data available. These data range from pharmacy claims and enrollment files to medical claims, medical records, and linkages to external data. Moreover, the data are often curated in a way that allows for easier data analysis.

Using these data, managed care residents are often required to perform a project to evaluate a utilization management policy or clinical program. Yet, there is a lack of guidance specific to managed care organizations on how to conduct such a research study using “live” claims data. This Viewpoint article provides a primer for managed care residents and other managed care professionals who are seeking to use data to help inform decisions on how to manage their beneficiaries’ health and costs.


There is a growing trend of health plans and pharmacy benefit managers using their real-time, or “live,” claims data to manage their populations. For example, many managed care residents are required to complete at least 1 major practice-related project, often using claims data.1 However, there is not much guidance on how to conduct a claims analysis using health plan pharmacy and medical claims data. This article provides a framework, data request template, and examples for managed care residents and professionals to use when conducting claims analyses.

It is important to note that health plan pharmacy and medical claims data, used by managed care residents and professionals for their analyses, are continually updated as new claims are created and old claims are either paid, rejected, or reversed (Table 1). In general, rejected and reversed claims should be excluded from claims analyses so as not to falsely elevate utilization. Furthermore, the sequence of claims may be of interest, depending on the type of study. For example, one may want to identify the first claim that was denied due to a prior authorization (Table 2, Case Study 2).

TABLE 1.

Common Types of Claims

Claim Type Definition Include in Analysis?
Paid Claim that was adjudicated and paid Yes
Rejected Claim that was denied and was not paid Noa
Reversed Claim that was initially processed and paid and later voided or reversed due to a drug utilization review edit (e.g., therapeutic duplication, high dose alert) Noa

aThese claims are often excluded from utilization analyses; however, they may be included depending on the objective of the study.

TABLE 2.

Data Request Template with Examples

graphic file with name jmcp-025-05-538_g001.jpg

graphic file with name jmcp-025-05-538_g002.jpg

The methods used to conduct claims analyses vary widely between managed care organizations. In large organizations, an analytics team is generally available to pull claims data. Moreover, increasingly more managed care organizations have a separate research arm that processes and provides data for external research. Clear and comprehensive data requests are important because any confusion often leads to delays in receipt of data. Table 2 provides a template that can help communicate what claims data are being requested from data analysts.

Designing a Claims Analysis

The Research Question

When planning a claims analysis, the first step is to craft a research question that may be answered by the results of the study. It is important to spend adequate time carefully scripting and revising the research question for the study. The PICOT framework is a tool that can be used to help define research questions.2 The PICOT framework is also useful in crafting other elements of the claims analysis. The following sections address each of the PICOT categories to follow when designing a claims analysis: population, intervention, comparator, outcome, and time.

Population

The next step when designing a research study is to identify the population of interest by defining inclusion and/or exclusion criteria, which may consist of medical claims data, diagnosis codes, pharmacy claims data, or a combination of all such information. For plans that have access to medical claims data, populations are often defined as having a disease state based on 1-2 medical claims with a diagnosis code, or a set of diagnosis codes, that correspond to the disease state of interest during a given time period. For inpatient admission claims, one may also distinguish between admission and discharge claim diagnosis codes, if the data are available. Often, this time period is the period, such as 1 year, before the study period. However, this varies based on the goal of the study. Diagnosis codes are often specific codes from the International Classification of Diseases, Ninth/Tenth Revisions, Clinical Modification (ICD-9-CM or ICD-10-CM), and can be found on the Centers for Disease Control and Prevention website.3 Of note, the Tenth Revision replaced the Ninth Revision on October 1, 2015.3 One should keep in mind that depending on the disease state, using diagnosis codes alone could lead to the underidentification of the diseased population.4 For example, mental illnesses and addiction disorders are often underdiagnosed.

In addition to diagnosis codes, populations can be identified by whether they had a particular procedure or medication. These codes will be discussed in the Intervention section. Since errors can occur in diagnosis codes, a single medical claim with the diagnosis codes of interest may not be sufficient. Therefore, depending on the desired level of accuracy, the researcher may decide to require more than 1 claim on unique dates with the diagnosis codes of interest. Other combinations, such as at least 1 inpatient claim with the diagnosis codes of interest, at least 2 outpatient claims with the diagnosis codes of interest, and/or at least 2 prescription claims with a medication associated with the clinical condition of interest, may be required as part of the inclusion criteria. While such detailed inclusion criteria may decrease the sample size of the cohort, it will increase the likelihood that each individual captured in the cohort had the condition, procedure, and/or medication of interest.5-7

To determine what diagnosis codes to use, as well as the number and types of claims to require, one should look at published literature in the area of interest to see what has been done. One should also look to see if there are validation studies testing the inclusion and/or exclusion criteria. If diagnosis codes are not found in the literature, clinical providers and medical coders are good resources to provide guidance. Ideally, testing whether the inclusion and/or exclusion criteria identify a similar cohort to a cohort defined based on medical chart reviews would provide confidence that the inclusion and/or exclusion criteria are valid. Beyond diagnosis codes, managed care organizations sometimes have access to other software that may assign a condition to a member based on proprietary algorithms.

For those who only have access to pharmacy claims data, the population can be identified by searching for members who have filled medications associated with the condition being evaluated. The accuracy of the target patient population using only pharmacy claims data varies with the disease state and the drugs to treat that disease state. For example, a drug with a fairly narrow therapeutic area of use, such as liothyronine, would provide a more accurate patient population than prednisone, which has many different indications. Moreover, off-label use of medications can lead to false positives. For example, bile acid sequestrants are frequently used for gastrointestinal conditions when only indicated for hyperlipidemia and diabetes.

Intervention

Drugs are commonly the intervention of interest. The most common coding systems used to identify drugs are National Drug Code (NDC), Generic Code Number (GCN), Generic Product Identifier (GPI), Hierarchical Ingredient Code List (HICL), American Hospital Formulary Service (AHFS), and RxNorm codes. GPI, HICL, and AHFS codes have a hierarchical structure, where the first few digits refer to larger categories (such as disease states) and later digits refer to smaller subcategories (such as exact drug or exact formulation and strength). GPI codes are subcategorized into formulation and strength, while AHFS codes are not.8,9 Other variations of these codes also exist. Having this hierarchical structure can be helpful in selecting all drug products of a certain subclass, class, or clinical area. The choice of which type of drug code to use is usually dictated by what is available in the data warehouse.

If drugs are not paid for by the pharmacy benefit, but are paid for by the medical benefit instead, the previously mentioned codes may not be available. In these cases, Level II Healthcare Common Procedure Coding System (HCPCS) J codes may help.10 J codes are not always specific to 1 drug product, leading to problems in identifying when a member has used a specific drug paid for by the medical benefit.

Other ways to identify when patients have used drugs are through utilization management tools such as prior authorization approvals. If a procedure is of interest, Level I HCPCS Current Procedural Terminology codes; ICD-9-CM Procedure codes; and ICD-10 Procedure Coding System codes may be used.11-13 If a device is of interest, other Level II HCPCS codes may be used.14 If a program is of interest, program members are often tracked through a site-specific manual or automated system.

Often researchers are interested in defining a group of members as users of the intervention (i.e., the intervention group). The intervention group can be defined as having at least 1, 2, or more of the previously mentioned codes over a period of time. The number of codes and length of time will depend on the intervention of interest. Previous literature regarding similar interventions and clinical conditions can provide further guidance and confidence in defining the intervention group.

Comparator

If the comparator group is determined by use of another drug, procedure, device, or program, the same types of codes and definitions can often be used as previously discussed. However, if the comparator is a control group that did not receive any “intervention,” this can be more difficult to determine. One needs to consider potential biases that may arise because of how the control group is defined. For example, immortal selection bias occurs when, by design, there is a period of time during which the comparison group cannot have the outcome of interest.15

Outcomes

The outcomes of interest depend on the study. Common outcomes include costs, health care resource utilization, and clinical outcomes. When evaluating costs, it is important to define the perspective of the study. If the payer perspective is assumed, one should find the net cost after rebates and discounts. For publication purposes, one may need to request publicly available benchmarks such as wholesale acquisition cost or average wholesale price for drugs, as opposed to actual acquisition costs. One should also consider net costs after payments from other health insurance and whether to separate out the population with other health insurance from those without other health insurance. Resource use is often subdivided into outpatient, inpatient, and emergency room visits. Note that professional services may be billed separately from the facility fees. Measurement of clinical outcomes involves many of the diagnosis, procedure, device, and drug codes previously discussed.

One additional aspect to consider is how one wants to report the outcome. For example, is the average (mean or median) and/or measure of variation (standard deviation or interquartile range) of interest? Is a rate or ratio with a confidence interval of interest? Statistical analyses are a key part of this discussion but fall outside of the scope of this article. It should be noted that cost data are often highly skewed and non-normal.16 Thus, statistical tests need to be carefully selected, with special attention to their assumptions. When dealing with complex statistical issues (e.g., non-normal data or missing data), advice from a statistical team is highly beneficial.

Time

Traditionally, the study time horizon is the length of time to observe the outcome of interest. However, it also depends on the study type and what is feasible. Often managed care organizations want to examine their claims data and make decisions immediately. However, when working with realtime claims data, one needs to keep in mind that there is a claims lag, especially with medical claims. The claims lag is the amount of time it takes for almost all of the claims to be filled, paid for, reversed (if relevant), and finalized. The claims lag will differ between organizations. Thus, if the claims lag is 3 months and today’s date is December 1, 2018, the most recent data one could expect would be from August 31, 2018, which should be reflected in the study time horizon. If more recent data were pulled, this data would likely be incomplete and underestimate the outcome of interest.

It is important to note that the time horizon is often different from the time period used to select the patient population and the length of time for the intervention. Furthermore, if relevant, other time periods may be used to check for continuous enrollment with a plan or if a member is a new user of a medication or intervention. Thus, the use of different time periods needs to be clearly communicated (Table 2).

Other Considerations

Data Format

Depending on how much analysis one plans on doing, one may ask for summary results or patient-level or claims-level data files from data analysts. Files could also be at the provider or plan level. It is most important to be clear in how one wants to receive the data. Providing an Excel (Microsoft, Redmond, WA) template of requested data column titles to the data analysts will help to clarify and streamline the data request process. This will allow the data analysts to visualize what type of data is requested and organize it into an Excel file. For example, requested data column titles could include line of business, unique member ID, sex, age, GPI of medications analyzed, drug name, quantity, day supply, number of paid claims, and drug acquisition costs.

Privacy, Confidentiality, and Data Security

Managed care organizations generally have detailed policies about how to handle protected health information and personally identifiable information. We will not discuss this topic at length, but it is important to be aware of and comply with the organization’s policies. In addition, the Privacy Rule of the Health Insurance Portability and Accountability Act should be consulted to determine whether the scope of the project qualifies it as a research project or a quality improvement project. Managed care organizations often have a committee and/or institutional review board that can help with this determination.

Quality Assurance

Once the data file is created, check against the inclusion and exclusion criteria and look for outliers that may suggest that something is wrong. For example, are there many 5-year-old patients in a data file that is supposed to only contain Medicare enrollees?

Adjustment for Potential Confounding

Consider whether there are potential confounders that need to be addressed due to the study design. This important issue falls outside the scope of this article, especially when trying to evaluate the effect of an intervention. Please consult an expert in quasi-experimental methods or the study design of interest to ensure that biases are minimized.

Data Interpretation

Keep in mind that study results need to be interpreted within the strengths and limitations of the study design. One major limitation of observational studies are the unobserved confounders that are not adjusted for.

Specificity to Managed Care Organization

Each managed care organization has unique features, such as the types of enrollee plans, types of formularies, types of data available, and interaction with other health insurance. Make sure to incorporate these into the study design. For example, integrated health delivery systems may have access to more information. Thus, one may want to specify whether they want all claims or only internal claims.

Publishing

Consider presenting the work at a national meeting and publishing the work in a peer-reviewed journal.

Reporting Study Designs

For guidance on how to report specific study designs, see the following:

Examples

In this section, we provide 3 examples and demonstrate how to request data using a template (Table 2).

Example 1

A managed care resident would like to compare the pharmacy and medical costs of members diagnosed with schizophrenia, schizoaffective disorder, or bipolar I disorder taking injectable versus oral antipsychotic therapy in the first year after initiating therapy. The resident has access to pharmacy and medical claims but no electronic health record data. After evaluating current literature, the resident submits a data request to the data analytics department (Table 2).

Example 2

A managed care resident would like to identify the 1-month effect of a prior authorization policy for pain-related compounds. The resident has access to pharmacy claims only. See Table 2 for the data request submitted to the data analytics department.

Example 3

A managed care resident would like to evaluate the effect of a clinical pharmacist-managed medication therapy management program on diabetes blood sugar control—a Centers for Medicare & Medicaid Services Part C star measure. The resident works in an integrated health delivery system with access to pharmacy claims, medical claims, and electronic health record data. The resident constructs a research question and submits a request to the data analytics department (Table 2).

Conclusions

Managed care organizations are increasingly using their own data to assess their populations and evaluate their interventions. This article serves as a primer for managed care residents and professionals who are conducting analyses using live claims data.

ACKNOWLEDGMENTS

The authors acknowledge Gregory Bresin, Huong Nguyen, Sarah J. Park, Ty Vo, and Tricia Lee Wilkins for their contributions to this article.

REFERENCES


Articles from Journal of Managed Care & Specialty Pharmacy are provided here courtesy of Academy of Managed Care Pharmacy

RESOURCES