Adaptation of Bayesian Data Mining Algorithms to Longitudinal Claims Data: Coxib Safety as an Example

Jeffrey R Curtis; Hong Cheng; Elizabeth Delzell; David Fram; Meredith Kilgore; Kenneth Saag; Huifeng Yun; William DuMouchel

doi:10.1097/MLR.0b013e318179253b

. Author manuscript; available in PMC: 2009 Sep 1.

Published in final edited form as: Med Care. 2008 Sep;46(9):969–975. doi: 10.1097/MLR.0b013e318179253b

Adaptation of Bayesian Data Mining Algorithms to Longitudinal Claims Data: Coxib Safety as an Example

Jeffrey R Curtis ¹, Hong Cheng ^1,³, Elizabeth Delzell ^1,³, David Fram ², Meredith Kilgore ^1,³, Kenneth Saag ^1,³, Huifeng Yun ³, William DuMouchel ²

PMCID: PMC2694945 NIHMSID: NIHMS116307 PMID: 18725852

Abstract

Introduction

Bayesian data mining methods have been used to evaluate drug safety signals from adverse event reporting systems and allow for evaluation of multiple endpoints that are not pre-specified. Their adaptation for use with longitudinal data such as administrative claims has not been previously evaluated or validated.

Methods

In this pilot study, we evaluated the feasibility of adapting data mining methods using the empirical Bayes Multi-Item Gamma Poisson Shrinker (MGPS) algorithm to longitudinal administrative claims data. The Medicare Current Beneficiary Survey (MCBS) was used to identify a cohort of Medicare enrollees who were exposed to cyclooxygenase selective (coxib) or non-selective non-steroidal anti-inflammatory drugs (NS-NSAIDs) from 1999-2003. Empirical Bayes MGPS algorithm was used to simultaneously evaluate 259 outcomes associated with current use of coxibs vs. NS-NSAIDs while adjusting for key covariates and multiple comparisons. For comparison, a parallel analysis used traditional epidemiologic methods to evaluate the relationship between coxib vs. NS-NSAID use and acute myocardial infarction (AMI), with the goal of establishing the concurrent validity of the data mining approach.

Results

Among 9431 Medicare beneficiaries using NSAIDs and considering all 259 possible outcomes, empirical Bayes MGPS identified an association between current celecoxib use and AMI (Empirical Bayes Geometric Mean ratio 1.91) but not other outcomes. Rofecoxib use was associated with acute cerebrovascular events (EBGM ratio 1.85) and several other diagnoses that likely represented indications for the drug. Results from the analyses using traditional epidemiologic methods were similar and indicated that the data mining results were valid.

Discussion

Bayesian data mining methods appear useful to evaluate drug safety using administrative data. Further work will be needed to extend these findings to different types of drug exposures and to other claims databases.

Introduction

The assessment of pharmaceutical safety after product licensure is of great interest to clinicians, patients, pharmaceutical companies, regulatory agencies, and policymakers. Recent and high-profile examples of drug withdrawals after recognition of safety problems have highlighted existing deficiencies in the current mechanisms by which medication safety is evaluated. The phase 3 studies required for drug approval are rarely powered to detect uncommon adverse events and lack generalizability with respect to the majority of people who eventually receive these medications. Unfortunately, relatively few tools are available to provide rapid detection of previously unrecognized or underappreciated safety signals.

In the U.S., the Adverse Event Reporting System (AERS) is an important mechanism by which hitherto unknown safety concerns are recognized. However, analyses of the voluntary reports submitted through this mechanism have a number of limitations. These include under-reporting, distortion due to reporting trends, biases such as the Weber effect (1), and lack of information on the total number of exposed persons, all of which preclude calculation of valid incidence rates. Despite these limitations, the AERS system is a useful resource that has added substantially to the evaluation of drug safety. There are various mechanisms by which AERS data can be analyzed, including qualitative review and more quantitative methods such as proportional reporting ratios (PRRs) and empirical Bayes methods. These quantitative disproportionality methods compare n, the number of observed reports of a drug-adverse event combination, with e, the number expected under an assumption of independence between drugs and events in the database.

Analyses that use an Empirical Bayes technique, the Multi-item Gamma Poisson Shrinkage (MGPS) algorithm, have advantages over alternative methods such as PRR. First, PRR has the disadvantage of being hard to interpret without simultaneously considering the significance of an associated chi-squared statistic; there is no chi-squared test used in conjunction with MGPS. Additionally, MGPS ‘shrinks’ the values of the Bayesian observed-to-expected ratios toward the null hypothesis value of 1 by an amount that depends on their statistical variability. MGPS produces an empirical Bayes geometric mean (EBGM) estimate with a surrounding confidence interval ([EB05, EB95]), which is designed to be resistant to the post-hoc selection fallacy caused by looking at many highly variable statistics. This results in the convenient property of being able to sort many different drug—event combinations in a single dimension for rankings and comparisons. A single ratio incorporates information both about the value of n/e and its variability. Finally, extensions of the method can also adjust for multiple covariates.

Another important statistical issue is the problem of multiple comparisons. When considering the many computed n/e values in a large database, it is natural to focus on the largest ratios. This is an example of post hoc selection, which is likely to select ratios biased toward large values, based on counts that happen to be large because of sampling variation. Bayesian shrinkage methods are designed to correct for this bias by shrinking estimates toward a prior distribution. This prior distribution is estimated from the ensemble of all (n,e) pairs. As an example of this issue, consider a disproportionality analysis of one drug—event combination having (n=3, e=0.03, n/e=100) with that of another combination having (n=50, e=5, n/e=10). Both ratios are likely to be statistically larger than their “true values”; the computation of how much to shrink their estimates depends on fitting a Bayesian model to the entire set of (n, e) pairs in the database. Depending on the results of the fit, it might be that the first estimate shrinks from 100 down to 5, whereas the more reliable second estimate only shrinks from 10 to 9 (2). Shrinkage will be the same for all pairs with the same n and e. Finally, MGPS can evaluate all outcomes simultaneously without requiring any to be specified in advance. Semi-automated software programs have been developed that provide rapid and visual implementation of this approach and provide an adjusted summary relative risk estimate.

To date, use of Bayesian data mining methods has largely been restricted to evaluation of adverse event reports and clinical trial results. This type of data can be thought of as ‘packet’ data that does not place much importance on the element of time. An extension of these methods should theoretically be able to incorporate time-dependent exposures and varying durations of times at risk across patients, but this possibility has largely been unexplored, and the implementation of this approach has not been well-characterized.

We therefore conducted a pilot study with the primary goal of evaluating the feasibility and validity of adapting Bayesian data mining methods to analysis of longitudinal administrative claims data. As a framework within which to do this, we studied outcomes associated with cyclooxygenase-2 selective (coxib) non-steroidal anti-inflammatory drugs (NSAIDs) compared to use of non-selective non-steroidal anti-inflammatory drugs (NS-NSAID). For comparison with the results of the data mining analyses, we then conducted a parallel study using traditional epidemiologic methods to evaluate the consistency of results between the two approaches and to assess the likely validity of adapting Bayesian data mining algorithms for use with longitudinal data.

Methods

Overview

We obtained linked survey information, medical claims, and medication use data from the Medicare Current Beneficiary Survey (MCBS) for the years 1999-2003. We identified current NSAID exposure using the MCBS medication data and all medical events using the linked Medicare claims. After mapping the administrative claims data to a classification system that allow simultaneous consideration of all outcome events, we used Bayesian data mining methods using the empirical Bayes MGPS algorithm developed by DuMouchel (3-7) to evaluate the relationship between current exposure to celecoxib or rofecoxib compared to NS-NSAIDs and the occurrence of all possible outcome events. The MGPS algorithm can adjust for important factors, such as age, gender, and comorbidity score, that might confound these relationships and also can protect against inflated estimates of statistical significance resulting from multiple comparisons. As a separate but parallel analysis, we used traditional epidemiologic methods to evaluate the validity of the MGPS approach.

Medicare Current Beneficiary Survey (MCBS) Data Source and Eligibility

After institutional review board approval from the University of Alabama at Birmingham, we obtained the de-identified Cost and Use Files from the MCBS from 1999-2003. The MCBS is a rotating panel survey of institutionalized and community-dwelling Medicare beneficiaries that collects detailed information on demographics, insurance coverage, comorbidities, medical events, costs, and medication use. Most individuals remain in the panel three years. Data are collected via in-person interviews that occur in the participant’s home every four months. Medicare claims for each beneficiary are linked and provide a unique amalgam between survey data and insurance claims. For community dwelling beneficiaries, and at every interview, the MCBS obtains information on the names of the medications from medication bottles that the beneficiary is asked to produce and on the number of refills of each medication since the last MCBS interview. For institutionalized beneficiaries, medication information was obtained by once monthly review of medical records.

Eligibility and Exposure Classification

We identified all NSAID usage for each person during the study period. Persons who never used NSAIDs were excluded from analysis given a previous observation that non-NSAID users have a higher risk for mortality than NSAID users, likely due to channeling of sicker patients away from NSAIDs (8). NSAIDs were grouped into three unique categories as celecoxib, rofecoxib, and NS-NSAIDs. Because valdecoxib usage was minimal during the study period, we did not compute separate risk estimates for valdecoxib. NSAID exposure was defined as current use of a NSAID in the current or prior month before each outcome event. The public use MCBS data files were supplemented with more specific medication data obtained directly from CMS to allow for greater precision in defining current NSAID exposure.

Event Classification

We identified events of interest using the linked Medicare claims data, which contain information on diagnoses coded using the International Disease Classification, 9^th revision (ICD-9) system. Because no particular outcome diagnosis was pre-specified as being of particular interest for the data mining analysis, we needed a mechanism by which to group all possible ICD-9 codes into a manageable number of unique categories. Use of 5 digit ICD-9 codes to represent different events is statistically inefficient given that many clinically similar events have different 5 digit ICD-9 codes. Combining 5-digit codes into higher-level categories, such as to 4 digit or 3 digit groups, does not fully address the problem because these higher-level categories sometimes group dissimilar types of events.

For that reason, we used the Clinical Classifications Software (CCS) developed by the Agency for Healthcare Research and Quality (AHRQ) to classify outcome events into 259 unique groups. The ICD-9 codes from each claim were mapped to the corresponding CCS groups, were identified as being the primary or a secondary diagnosis, and were classified as coming from an inpatient or outpatient setting. For the purposes of this analysis, we considered only those events that were primary diagnoses recorded in claims from an inpatient setting.

Bayesian Data Mining Analysis using MGPS

We used a simple technique to convert the longitudinal record of each patient to a set of pseudo adverse event reports. Each month of observation for each patient was viewed as a separate report, the report consisting of a list of medical events (unique CCS group codes meeting the inpatient and primary diagnosis requirements) and a list of drugs (recorded as exposed in either the current or previous month). Unlike the spontaneous reporting scenario, months for which no medical event occurred (or for which no drug exposure was recorded for the corresponding two month window) for a patient still generated reports. Thus, the patients in this study generated a total of 243,916 monthly reports having a total of 7,037 inpatient primary diagnosis CCS group code events. The association between celecoxib and rofecoxib use compared to current NS-NSAID use was evaluated using the empirical Bayes geometric mean (EBGM) ratios produced by the MGPS method. The EBGM ratios are the estimated number compared to the expected number of adverse events. The expected value e is computed as the number of reports having the event in question multiplied by the overall proportion of reports having the exposure in question. If covariates are used, this computation is repeated separately for each covariate stratum, and the results summed across strata. EBGMs are smoothed (shrunk toward 1 and adjusted for covariates) values of the ratios n/e mentioned above. In the context of spontaneous report databases, these are called relative reporting ratios and are used for exploratory signaling of potential causal relationships. In order to evaluate the statistical significance of these EBGM ratios, we calculated the 5th percentile of the posterior distribution of the EBGM ratios for celecoxib and rofecoxib (EB05) and the 95th percentile of the posterior distribution of the EBGM ratios for NS-NSAIDs (EB95). If the lower and upper bounds of these respective confidence intervals are non-overlapping, the ratio of the 5th percentile of celecoxib/rofexoxib to the 95th percentile of NS-NSAID will exceed 1.00. We described this as the EB05/95 ratio and consider it as an informal criterion for judging that two reporting ratios were different if their respective confidence intervals did not overlap. We adjusted for covariates of interest (i.e. age, gender, calendar year, and Charlson comorbidity score) by using them to compute the ‘e’ in the n/e ratio. The database of pseudo-reports was then analyzed as if it were a spontaneous database of reports such as AERS, using the WebVDME software (Lincoln Technologies, Waltham, MA). In order to preserve the semi-automated nature of the procedure and its software implementation, observation time was not censored after the occurrence of any particular event. This allowed us to consider multiple outcomes simultaneously rather than having to censor observation time based upon the occurrence of pre-specified events.

Analysis using Traditional Statistical Methods

We next conducted a parallel analysis to establish the concurrent validity of the MGPS approach compared to traditional epidemiologic and statistical methods. Cox proportional hazards models were used to estimate the hazards ratios of AMI, comparing coxib and NS-NSAIDS and adjusting for potential confounding variables at baseline. Both coxib and NS-NSAIDS were time-dependent variables. We evaluated confounding effects of age, gender, race, education, income, body mass index, tobacco use, and comorbidities. Comorbidities were summarized using the Charlson comorbidity index (9). Confounding effects were adjusted by including the potential confounders in the final model if a >20% change in the estimated regression coefficients of coxib was observed. Although all potential confounders were evaluated based on their ability to modify the exposure-outcome relationship, only the Charlson comorbidity index had this property, so only age and Charlson comorbidity index was included in the adjusted models. We performed this analysis using the CCS outcome definition, and we repeated it using a validated claims-based definition for AMI shown to have excellent positive predictive value in identifying confirmed events compared to a gold standard of medical record review (10). Because we pre-specified the outcome, we censored observation time at the first occurrence of AMI in the analysis using traditional methods.

Results

Characteristics of the Medicare beneficiaries that used NSAIDs at any time from 1999-2003 are presented in Table 1. The mean age of the cohort was 72 years, and a majority was Caucasian. Approximately one-quarter of the cohort described themselves as current smokers. The prevalence of cardiovascular risk factors including coronary artery disease, hypertension, diabetes, and hyperlipidemia was ten to twenty-four percent. One-third of the cohort used celecoxib during the study period, and slightly more than half used a NS-NSAID.

Table 1.

Characteristics of 9,431 Medicare Beneficiaries that Ever Used Non-Steroidal Anti-Inflammatory Drugs (NSAIDs) 1999-2003

	Mean or n	Standard Deviation or (%)

Demographics
Age	71.51	13.36
Race
White	7,764	82.3%
Black	1,093	11.6%
Other Race	537	5.7%
Education
Lower than high school	2,905	30.8%
High school	2,192	23.2%
Higher than high school	2,760	29.3%
Gender
Male	3,586	38.0%
Female	5,845	62.0%
Income ($/year)	25,192	41,095

Clinical Characteristics
Body Mass Index
BMI < 25	3,342	35.8%
25 =<BMI <=30	3,374	36.2%
BMI > 30	2,618	28.1%
Tobacco Use
Smoke Now	1,312	24.4%
No smoking now	4,065	75.6%
Comorbidities
Old MI
Yes	117	1.2%
No	9,314	98.8%
Coronary artery disease
Yes	938	14.3%
No	8,493	85.7%
Hypertension
Yes	2,268	24.0%
No	7,163	76.0%
Diabetes
Yes	949	10.1%
No	8,482	89.9%
Hyperlipidemia
Yes	1,288	13.7%
No	8,143	86.3%
Renal Failure
Yes	152	1.6%
No	9,279	98.4%
COPD
Yes	411	4.4%
No	9,020	95.6%
Had PTCA
Yes	31	0.3%
No	9,400	99.7%
Had CABG
Yes	33	0.4%
No	9,398	99.6%
Charlson comorbidity index	0.25	0.70

NSAIDs used during study period^**
Celecoxib	3,018	32.0%
Rofecoxib	2,255	23.9%
Valdecoxib	193	2.1%
Other Ns aid	5,278	56.0%

Open in a new tab

^**

total sums to more than 100% since persons may have used more than one NSAID

The results from the MGPS approach for celecoxib are shown in Table 2. As none of the 259 outcomes of interest was prespecified, for the sake of brevity and transparency we have displayed all those outcomes that approach or exceed conventional levels of statistical significance using an EB05/95 ratio threshold of > 0.85. As shown, the only outcome ‘significantly’ associated with current use of celecoxib was acute myocardial infarction (AMI) (CCS group 100). Other diagnoses for which ratios were > 0.85 included osteoarthritis and rehabilitation-related, followed by acute cerebrovascular disease and coronary atherosclerosis and other heart diseases.

Table 2.

Adjusted^* Empiric Bayes Geometric Mean (EBGM) Ratios comparing Relative Risks for Outcome Events for Current Celecoxib Users Versus Current Non-Selective Non Steroidal Anti-Inflammatory Drug (NS-NSAID) Users, Sorted by Significance based on Overlap of EB05/95 Confidence Intervals

	Event (n) in Celecoxib Exposed	Events (n) in NS-NSAID Exposed	EBGM Ratio of Celecoxib to NS-NSAID^**	Ratio of Celecoxib EB05 to NS-NSAID EB95 CIs^†
Acute myocardial Infarction (100)	31	20	1.91	1.08
Osteoarthritis (203)	57	53	1.47	0.97
Rehabilitation care; fitting of prostheses; and adjustment of devices (254)	41	35	1.49	0.92
Acute cerebrovascular disease (109)	28	22	1.62	0.91
Coronary atherosclerosis and other heart disease (101)	44	48	1.39	0.89

Open in a new tab

Numbers in parentheses () in the first column refer to the Agency for Health Research and Quality Clinical Classifications Software (CCS) event group

adjusted for age, gender, calendar year, and Charlson comorbidity index

^**

estimates relative risk

^†

ratios greater than 1.00 indicate that the confidence intervals of the two EGBM ratios did not overlap. All events with ratios > 0.85 are displayed.

For rofecoxib, rehabilitation, device-related complications, and osteoarthritis were the most significant events associated with current rofecoxib use (Table 3). Also significant was the result for events related to acute cerebrovascular disease. Additional events that approached statistical significance included non-specific chest pain and cardiac dysrhythmias. Based on clinical interest, we also took note of the MGPS-estimated association between rofecoxib and AMI. Based on 8 AMI events in the rofecoxib users, the EBGM risk ratio was 1.08, indicating a slightly increased risk among the rofecoxib users. The EB05/95 ratio was 0.50 and non-significant, indicating substantial overlap in the corresponding confidence intervals.

Table 3.

Adjusted^* Empiric Bayes Geometric Mean (EBGM) Ratios comparing Relative Risks for Outcome Events for Current Rofecoxib Users Versus Current Non-Selective Non Steroidal Anti-Inflammatory Drug (NS-NSAID) Users, Sorted by Significance based on Overlap of EB05/95 Confidence Intervals

	Event (n) in Rofecoxib Exposed	Events (n) in NS-NSAID Exposed	EBGM Ratio of Rofecoxib to NS-NSAID^**	Ratio of Rofecoxib EB05 to NS-NSAID EB95 CIs^†
Rehabilitation care; fitting of prostheses; and adjustment of devices (254)	40	35	2.10	1.30
Complication of device; implant or graft (237)	30	30	2.00	1.17
Osteoarthritis (203)	45	53	1.72	1.12
Acute cerebrovascular disease (109)	21	22	1.85	1.01
Coronary atherosclerosis and other heart disease (101)	34	48	1.60	0.99
Spondylosis; intervertebral disc disorders; other back problems (205)	28	38	1.52	0.91
Nonspecific chest pain (102)	23	32	1.57	0.90
Cardiac dysrhythmias (106)	23	28	1.57	0.90
Diverticulosis and diverticulitis (146)	14	13	1.76	0.87

Open in a new tab

Numbers in parentheses () in the first column refer to the Agency for Health Research and Quality Clinical Classifications Software (CCS) event group

adjusted for age, gender, calendar year, and Charlson comorbidity index

^**

estimates relative risk

^†

ratios greater than 1.00 indicate that the confidence intervals of the two EGBM ratios did not overlap. All events with ratios > 0.85 are displayed.

In our parallel analyses using traditional epidemiologic methods, we specifically focused on AMI as the outcome of interest in order to compare with the MGPS result of a significant association with celecoxib use and no significant association with rofecoxib. Table 4 describes the relationship between current celecoxib and rofecoxib referent to current use of a traditional NS-NSAID use and the risk of AMI. In both the age and age + comorbidity adjusted analysis, there was a significant association between current use of celecoxib and AMI. Results were minimally changed when we re-defined AMIs using the validated claims algorithm (10). In contrast, there were no significant associations between rofecoxib use and AMI. Hazard ratios from these analyses were similar to the corresponding EBGM ratios previously described.

Table 4.

Crude and Adjusted Association between current Celecoxib and Rofecoxib use (compared to current Non-Selective Anti-Inflammatory (NS-NSAID) use) and Acute Myocardial Infarction (AMI) using Traditional Epidemiologic Methods

	Person months of exposure	Relative Risk	95% CI
Celecoxib, CCS Definition for AMI	19,196
Age-Adjusted		2.19	1.21-3.96
Age and Comorbidity-Adjusted^*		2.06	1.05-4.07

Celecoxib, Validated Claims-Based Outcome Definition^**	18,033
Age-Adjusted		2.33	1.24-4.38
Age and Comorbidity-Adjusted^*		2.42	1.16-5.02

Rofecoxib, CCS Outcome Definition	11,972
Age-Adjusted		0.84	0.35-2.00
Age and Comorbidity-Adjusted^*		0.82	0.31-2.16

Rofecoxib, Validated Claims-Based Outcome Definition^**	11,217
Age-Adjusted		0.95	0.39-2.33
Age and Comorbidity-Adjusted^*		0.98	0.36-2.65

Open in a new tab

CI = Confidence Interval

adjusted for the Charlson comorbidity index (9). Potential confounders that were screened also include all those listed in Table 1.

^**

results from analysis using validated AMI definition from (10)

Discussion

In this pilot study, we evaluated the feasibility of applying Bayesian data mining methods to longitudinal survey and administrative claims data. These methods have been most commonly applied to spontaneous adverse event reports such as those from the AERS database. As the main endpoint of this project, we successfully adapted these methods for use with administrative claims data and demonstrated concurrent validity with results from a separate analysis conducted using traditional epidemiologic and statistical methods. In contrast to the usual methods of analysis for observational data, however, the data mining approach did not require us to prespecify the outcomes of interest and identified several important associations between coxib use and cardiovascular and cerebrovascular events out of a total of 259 potential outcomes.

As anticipated from prior research, the data mining analysis identified several outcomes associated with celecoxib and rofecoxib. For celecoxib, the only association that exceeded an EB05/95 ratio of 1.0 was events associated with an AMI diagnosis. Other outcomes for which associations were of borderline significance included acute cerebrovascular disease and coronary atherosclerosis diagnoses, as well as osteoarthritis and rehabilitation care. Similar patterns were observed for rofecoxib, although the strong association with AMI was not observed. As is known from analysis of adverse event reports, significant associations resulting from data mining methods will not only reflect potential safety concerns but also disease indications for which the drug is prescribed. For that reason, content knowledge must be applied in order to differentiate these two possibilities. As we observed in our study, all significant and near-significant results either were for indications for the drug (e.g. osteoarthritis, rehabilitation diagnoses) or for ischemic vascular events.

Recognizing that the actual findings from this feasibility study are of somewhat lesser interest than its methodologic focus, they nevertheless deserve mention. Our observation that celecoxib was associated with an approximately two-fold increased risk of AMI has been observed in some (11-13) but not all (14, 15) studies. Although we observed a significant association between rofecoxib and stroke events, as has been found previously (16, 17), we did not observe a significantly increased risk of AMI. This study was likely underpowered to establish a significant relationship.

The principal strength of our study lies in its uniqueness, as we are not aware of prior reports that have demonstrated success in adaptation of Bayesian data mining methods to longitudinal claims data. The promise that these methods could be applied in an automated way to perform routine signal detection to identify unrecognized adverse drug events soon after product launch using administrative data would be a substantial advance and would fill an important gap in postmarketing drug safety surveillance. We do not view these methods as ever replacing welldesigned postmarketing RCTs or observational studies. Rather, we believe them to be complementary to traditional methods by providing a tool adapted to longitudinal data sources such as claims data by which to identify safety signals that need to be pursued.

Despite our initial results from this pilot project, we are cautious with respect to the broad applicability of these methods without further research regarding their validity, precision, and power. In evaluating such efforts it is always desirable to have a gold standard set of results to which data mining analyses can be compared. We intentionally chose to evaluate coxib safety given the recent furor and numerous studies published on this topic. However, given the lack of consensus on even this subject, well designed simulation studies, or a pooled analysis of randomized controlled trials data where randomization controls for both measured and unmeasured confounders, may be the optimal next steps to ensure the robustness of our adaptation of data mining methods.

We used the AHRQ’s CCS classification system to group similar ICD9 codes together into 259 unique groups. The data mining analysis evaluated all of these outcomes simultaneously and did not require foreknowledge of which of the 259 groups were of particular interest. However, not all important events have specific enough ICD-9 codes to be useful for a claims-based analysis, much less fit into a well-defined CCS category. Moreover, the events of greatest interest in our analysis, and those most likely to represent true safety problems, were AMI and acute cerebrovascular disease. These conditions are relatively homogeneous with respect to the ICD-9 codes included in them and appeared as the primary diagnosis from a hospitalization. Different outcomes that are included in a more heterogeneous event group may be masked if they are rare compared to somewhat dissimilar events also included in that group. Nevertheless, it is possible to use a different event classification system that includes more or different groups; up to 750-1000 event groups is likely to be an upper limit, depending on the size of the data source.

We acknowledge a number of limitations of this study. First, neither pharmacy data from an administrative claims database, nor medication information collected during in-home interviews from the MCBS, accurately reflect actual medication taking behavior or precisely identify the start and end dates of drug use, and we lacked information on drug dose. Additionally, the sample size of the MCBS is relatively small, and this may have limited our ability to detect some important associations. Although we evaluated a number of potential confounders, and the MCBS collects data on covariates not routinely found in claims databases (e.g. race, BMI, smoking status, education), we recognize the possibility for residual confounding. However, of greater interest than our ability to answer content-related questions were our concordant findings between the Bayesian compared to traditional epidemiologic methods. We would expect the same sources of confounding to be operant using both methods, and our finding of concordant results between the two parallel methods is reassuring. Finally, we do not expect that this or any method will be adequate to detect significant increases in very rare “sentinel” events (e.g. Stevens-Johnson syndrome) when they occur, but these should nevertheless be pursued based on clinical relevance.

In summary, results from this pilot project demonstrated the feasibility of using Bayesian data mining methods to analyze administrative claims data. We also showed concurrent validity between the data mining results and traditional methods in the analysis of one particular outcome, AMI. These techniques appear to hold substantial promise to fill a large niche in the evaluation of drug safety for which the available tools for pharmacovigilance are few in number. However, despite these encouraging results, these approaches will require further validation before they can be recommended for widespread use.

Acknowledgements

This work was funded by the Agency for Healthcare Research and Quality (U18 HS10389-06S1). Some of the investigators (JRC, KGS) also receive support from the National Institutes of Health (AR053351, AR052361) and the Arthritis Foundation (JRC). There was no pharmaceutical support for this project.

Disclosures:

JRC: grant support: Merck, Proctor & Gamble, Lilly, Amgen, Novartis; consulting/honorarium: Merck, Proctor & Gamble, Roche, Lilly

KGS: grant support: Amgen; consulting/honorarium: Amgen HC, ED, MK, HY: grant support: Amgen

References

1.Hartnell NR, Wilson JP. Replication of the Weber effect using postmarketing adverse event reports voluntarily submitted to the United States Food and Drug Administration. Pharmacotherapy. 2004;24(6):743–9. doi: 10.1592/phco.24.8.743.36068. [DOI] [PubMed] [Google Scholar]
2.Almenoff JS, Pattishall EN, Gibbs TG, DuMouchel W, Evans SJ, Yuen N. Novel statistical tools for monitoring the safety of marketed drugs. Clin Pharmacol Ther. 2007;82(2):157–66. doi: 10.1038/sj.clpt.6100258. [DOI] [PubMed] [Google Scholar]
3.DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system (with discussion) The American Statistician. 1999;53:177–202. [Google Scholar]
4.Solomon R, Dumouchel W. Contrast media and nephropathy: findings from systematic analysis and Food and Drug Administration reports of adverse effects. Invest Radiol. 2006;41(8):651–60. doi: 10.1097/01.rli.0000229742.54589.7b. [DOI] [PubMed] [Google Scholar]
5.Almenoff JS, DuMouchel W, Kindman LA, Yang X, Fram D. Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol Drug Saf. 2003;12(6):517–21. doi: 10.1002/pds.885. [DOI] [PubMed] [Google Scholar]
6.Fram DM, Almenoff JS, DuMouchel W. Empirical Bayesian data mining for discovering patterns in post-marketing drug safety. Proc Knowledge Discov Data Proc. 2003:359–68. [Google Scholar]
7.DuMouchel W, Pregibon D. Empirical Bayes screening for multi-item associations. ACM Press; San Francisco: 2001. [Google Scholar]
8.Glynn RJ, Knight EL, Levin R, Avorn J. Paradoxical relations of drug treatment with mortality in older persons. Epidemiology. 2001;12(6):682–9. doi: 10.1097/00001648-200111000-00017. [DOI] [PubMed] [Google Scholar]
9.Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45(6):613–9. doi: 10.1016/0895-4356(92)90133-8. [DOI] [PubMed] [Google Scholar]
10.Kiyota Y, Schneeweiss S, Glynn RJ, Cannuscio CC, Avorn J, Solomon DH. Accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records. Am Heart J. 2004;148(1):99–104. doi: 10.1016/j.ahj.2004.02.013. [DOI] [PubMed] [Google Scholar]
11.Solomon SD, Pfeffer MA, McMurray JJ, et al. Effect of celecoxib on cardiovascular events and blood pressure in two trials for the prevention of colorectal adenomas. Circulation. 2006;114(10):1028–35. doi: 10.1161/CIRCULATIONAHA.106.636746. [DOI] [PubMed] [Google Scholar]
12.Lee YH, Ji JD, Song GG. Adjusted indirect comparison of celecoxib versus rofecoxib on cardiovascular risk. Rheumatol Int. 2006 doi: 10.1007/s00296-006-0244-y. [DOI] [PubMed] [Google Scholar]
13.Gislason GH, Jacobsen S, Rasmussen JN, et al. Risk of death or reinfarction associated with the use of selective cyclooxygenase-2 inhibitors and nonselective nonsteroidal antiinflammatory drugs after acute myocardial infarction. Circulation. 2006;113(25):2906–13. doi: 10.1161/CIRCULATIONAHA.106.616219. [DOI] [PubMed] [Google Scholar]
14.White WB, West CR, Borer JS, et al. Risk of cardiovascular events in patients receiving celecoxib: a meta-analysis of randomized clinical trials. Am J Cardiol. 2007;99(1):91–8. doi: 10.1016/j.amjcard.2006.07.069. [DOI] [PubMed] [Google Scholar]
15.Velentgas P, West W, Cannuscio CC, Watson DJ, Walker AM. Cardiovascular risk of selective cyclooxygenase-2 inhibitors and other non-aspirin non-steroidal anti-inflammatory medications. Pharmacoepidemiol Drug Saf. 2006;15(9):641–52. doi: 10.1002/pds.1192. [DOI] [PubMed] [Google Scholar]
16.Afilalo J, Coussa-Charley MJ, Eisenberg MJ. Long-term risk of ischemic stroke associated with rofecoxib. Cardiovasc Drugs Ther. 2007;21(2):117–20. doi: 10.1007/s10557-007-6011-9. [DOI] [PubMed] [Google Scholar]
17.Andersohn F, Schade R, Suissa S, Garbe E. Cyclooxygenase-2 selective nonsteroidal anti-inflammatory drugs and the risk of ischemic stroke: a nested case-control study. Stroke. 2006;37(7):1725–30. doi: 10.1161/01.STR.0000226642.55207.94. [DOI] [PubMed] [Google Scholar]

[R1] 1.Hartnell NR, Wilson JP. Replication of the Weber effect using postmarketing adverse event reports voluntarily submitted to the United States Food and Drug Administration. Pharmacotherapy. 2004;24(6):743–9. doi: 10.1592/phco.24.8.743.36068. [DOI] [PubMed] [Google Scholar]

[R2] 2.Almenoff JS, Pattishall EN, Gibbs TG, DuMouchel W, Evans SJ, Yuen N. Novel statistical tools for monitoring the safety of marketed drugs. Clin Pharmacol Ther. 2007;82(2):157–66. doi: 10.1038/sj.clpt.6100258. [DOI] [PubMed] [Google Scholar]

[R3] 3.DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system (with discussion) The American Statistician. 1999;53:177–202. [Google Scholar]

[R4] 4.Solomon R, Dumouchel W. Contrast media and nephropathy: findings from systematic analysis and Food and Drug Administration reports of adverse effects. Invest Radiol. 2006;41(8):651–60. doi: 10.1097/01.rli.0000229742.54589.7b. [DOI] [PubMed] [Google Scholar]

[R5] 5.Almenoff JS, DuMouchel W, Kindman LA, Yang X, Fram D. Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol Drug Saf. 2003;12(6):517–21. doi: 10.1002/pds.885. [DOI] [PubMed] [Google Scholar]

[R6] 6.Fram DM, Almenoff JS, DuMouchel W. Empirical Bayesian data mining for discovering patterns in post-marketing drug safety. Proc Knowledge Discov Data Proc. 2003:359–68. [Google Scholar]

[R7] 7.DuMouchel W, Pregibon D. Empirical Bayes screening for multi-item associations. ACM Press; San Francisco: 2001. [Google Scholar]

[R8] 8.Glynn RJ, Knight EL, Levin R, Avorn J. Paradoxical relations of drug treatment with mortality in older persons. Epidemiology. 2001;12(6):682–9. doi: 10.1097/00001648-200111000-00017. [DOI] [PubMed] [Google Scholar]

[R9] 9.Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45(6):613–9. doi: 10.1016/0895-4356(92)90133-8. [DOI] [PubMed] [Google Scholar]

[R10] 10.Kiyota Y, Schneeweiss S, Glynn RJ, Cannuscio CC, Avorn J, Solomon DH. Accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records. Am Heart J. 2004;148(1):99–104. doi: 10.1016/j.ahj.2004.02.013. [DOI] [PubMed] [Google Scholar]

[R11] 11.Solomon SD, Pfeffer MA, McMurray JJ, et al. Effect of celecoxib on cardiovascular events and blood pressure in two trials for the prevention of colorectal adenomas. Circulation. 2006;114(10):1028–35. doi: 10.1161/CIRCULATIONAHA.106.636746. [DOI] [PubMed] [Google Scholar]

[R12] 12.Lee YH, Ji JD, Song GG. Adjusted indirect comparison of celecoxib versus rofecoxib on cardiovascular risk. Rheumatol Int. 2006 doi: 10.1007/s00296-006-0244-y. [DOI] [PubMed] [Google Scholar]

[R13] 13.Gislason GH, Jacobsen S, Rasmussen JN, et al. Risk of death or reinfarction associated with the use of selective cyclooxygenase-2 inhibitors and nonselective nonsteroidal antiinflammatory drugs after acute myocardial infarction. Circulation. 2006;113(25):2906–13. doi: 10.1161/CIRCULATIONAHA.106.616219. [DOI] [PubMed] [Google Scholar]

[R14] 14.White WB, West CR, Borer JS, et al. Risk of cardiovascular events in patients receiving celecoxib: a meta-analysis of randomized clinical trials. Am J Cardiol. 2007;99(1):91–8. doi: 10.1016/j.amjcard.2006.07.069. [DOI] [PubMed] [Google Scholar]

[R15] 15.Velentgas P, West W, Cannuscio CC, Watson DJ, Walker AM. Cardiovascular risk of selective cyclooxygenase-2 inhibitors and other non-aspirin non-steroidal anti-inflammatory medications. Pharmacoepidemiol Drug Saf. 2006;15(9):641–52. doi: 10.1002/pds.1192. [DOI] [PubMed] [Google Scholar]

[R16] 16.Afilalo J, Coussa-Charley MJ, Eisenberg MJ. Long-term risk of ischemic stroke associated with rofecoxib. Cardiovasc Drugs Ther. 2007;21(2):117–20. doi: 10.1007/s10557-007-6011-9. [DOI] [PubMed] [Google Scholar]

[R17] 17.Andersohn F, Schade R, Suissa S, Garbe E. Cyclooxygenase-2 selective nonsteroidal anti-inflammatory drugs and the risk of ischemic stroke: a nested case-control study. Stroke. 2006;37(7):1725–30. doi: 10.1161/01.STR.0000226642.55207.94. [DOI] [PubMed] [Google Scholar]

PERMALINK

Adaptation of Bayesian Data Mining Algorithms to Longitudinal Claims Data: Coxib Safety as an Example

Jeffrey R Curtis, MD MPH

Hong Cheng, PhD

Elizabeth Delzell, PhD

David Fram

Meredith Kilgore, PhD

Kenneth Saag, MD MSc

Huifeng Yun, MD MS

William DuMouchel, PhD

Abstract

Introduction

Methods

Results

Discussion

Introduction

Methods

Overview

Medicare Current Beneficiary Survey (MCBS) Data Source and Eligibility

Eligibility and Exposure Classification

Event Classification

Bayesian Data Mining Analysis using MGPS

Analysis using Traditional Statistical Methods

Results

Table 1.

Table 2.

Table 3.

Table 4.

Discussion

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Adaptation of Bayesian Data Mining Algorithms to Longitudinal Claims Data: Coxib Safety as an Example

Jeffrey R Curtis, MD MPH

Hong Cheng, PhD

Elizabeth Delzell, PhD

David Fram

Meredith Kilgore, PhD

Kenneth Saag, MD MSc

Huifeng Yun, MD MS

William DuMouchel, PhD

Abstract

Introduction

Methods

Results

Discussion

Introduction

Methods

Overview

Medicare Current Beneficiary Survey (MCBS) Data Source and Eligibility

Eligibility and Exposure Classification

Event Classification

Bayesian Data Mining Analysis using MGPS

Analysis using Traditional Statistical Methods

Results

Table 1.

Table 2.

Table 3.

Table 4.

Discussion

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases