Abstract
Background
Increasingly, the diagnostic codes from administrative claims data are being used as clinical outcomes.
Methods and Results
Data from the Cardiovascular Health Study (CHS) were used to compare event rates and risk-factor associations between adjudicated hospitalized cardiovascular events and claims-based methods of defining events. The outcomes of myocardial infarction (MI), stroke, and heart failure (HF) were defined in three ways: 1) the CHS adjudicated event (CHS[adj]); 2) selected ICD9 diagnostic codes only in the primary position for Medicare claims data from the Center for Medicare and Medicaid Services (CMS[1st]); and 3) the same selected diagnostic codes in any position (CMS[any]). Conventional claims-based methods of defining events had high positive predictive values (PPVs) but low sensitivities. For instance, the PPV of an ICD9 code of 410.×1 for a new acute MI in the first position was 90.6%, but this code identified only 53.8% of incident MIs. The observed event rates were low. For MI, the incidence was 14.9 events per 1000 person years for CHS[adj] MI, 8.6 for CMS[1st] and 12.2 for CMS[any]. In general, CVD risk factor associations were similar across the three methods of defining events. Indeed, traditional CVD risk factors were also associated with all first hospitalizations not due to an MI.
Conclusions
The use of diagnostic codes from claims data as clinical events, especially when restricted to primary diagnoses, leads to an underestimation of event rates. Additionally, claims-based events data represent a composite endpoint that includes the outcome of interest and selected (misclassified) non-event hospitalizations.
Keywords: incidence, myocardial infarction, stroke, heart failure
Increasingly, the diagnostic codes from administrative claims data are themselves being used as clinical outcomes in epidemiological studies. Since 1983 at the Center for Medicare and Medicaid Services (CMS), the principal or primary diagnosis of a hospitalization has been used to determine the diagnosis related group (DRG) for reimbursement1. Over time, changes in the reimbursement practices have tended to influence the coding patterns. For instance, the 2007 implementation of medical-severity DRG codes was associated with a decrease in the hospitalization rate for essential hypertension and an increase in the rate for malignant hypertension: this pattern of coding increased reimbursement to hospitals, and the fact that the mortality rate for malignant hypertension dropped significantly after 2007 suggests that the observed changes in these hospitalization rates were likely the result of trends in coding practices rather than changes in disease prevalence, severity, or treatment2. In other words, administrative claims data serve a dual role, the electronic side effects of both clinical care and reimbursement policies.
Typically, the validity of the diagnostic codes from claims data has been evaluated in terms of their positive predictive value (PPV) in small samples3,4. A high PPV means that a large proportion of cases identified by selected diagnostic codes meets study criteria for an event. In-patient diagnostic codes for many common cardiovascular diseases (CVD), including myocardial infarction (MI), heart failure (HF), and stroke, have high PPVs5–7, and these codes are often used without further review as primary outcomes. Of course, hospitalization data fail to detect out-patient events and out-of-hospital deaths. Nevertheless, diagnostic codes from administrative hospitalization claims are now so commonly used as clinical endpoints that new conventional definitions are beginning to emerge in the published literature. For HF and stroke, selected diagnostic codes as the primary reason for hospitalization, the principal diagnosis, often count as events, but the same diagnostic codes in other secondary positions do not count as events8–10. For MI, while some studies use an International Classification of Diseases, Ninth Edition (ICD9), 410 code only in the primary position to define events3,9, others accept an ICD9 410 in any position as the definition of an event8,10,11.
In both observational studies and clinical trials, cardiovascular-disease event rates vary for a number of biological and methodological reasons. The observed event rate in a study depends not only on the population under study but also on the methods of case identification, the intensity of the surveillance efforts, the aggressiveness of data collection, the criteria for validating events, and the quality control of the overall effort. Such labor-intensive efforts often involve physician review of medical records12–15. One consequence of active-surveillance methods is the assembly of genuine events that reduce the bias from misclassification. The claims-based methods that use a combination of diagnostic codes and positions with high PPVs are also intended to reduce misclassification. The other consequence of active-surveillance events methods is the complete or near-complete identification of clinical outcomes in a study. When the emerging conventions used to define clinical CVD outcomes in claims-based analyses intentionally ignore low-PPV diagnostic codes that are known to harbor some genuine events, they systematically underestimate event rates and the absolute levels of risk. In this analysis of events data from the Cardiovascular Health Study (CHS), we evaluate the degree of both the misclassification and the underestimation of event rates for CVD outcomes identified solely from claims data compared with those identified through CHS active-surveillance procedures.
Methods
Design
CHS is a cohort study designed to evaluate risk factors for coronary heart disease (CHD) and stroke in older adults16. At the four Field Centers, each community sample was obtained from random samples of the Medicare lists. Eligible to participate were persons living in the household of each sampled individual who were: 1) 65 years or older; 2) non-institutionalized; 3) expected to remain in the area for 3 years; and 4) able to give informed consent. In 1989–1990, the Field Centers recruited 5201 participants. In 1992–1993, an additional 687 African Americans were recruited using similar methods. The baseline examinations consisted of a home interview and a clinic examination that assessed traditional risk factors such as blood pressure, weight, height, smoking status, and medication use17; a fasting blood specimen for glucose and lipids; and measures of subclinical disease, including carotid ultrasound, echocardiography, electrocardiography, and pulmonary function. Participants were eligible for CHS regardless of whether they had had prevalent cardiovascular disease at baseline. Baseline reports of prevalent disease were validated18. Semi-annual participant contacts to obtain information about potential events alternated between a telephone interview and an annual clinic examination until 1998–99. Since 2000, participants or their proxies have been contacted every 6 months for this information. The study was approved by institutional review committees at the participating sites, and all participants provided written informed consent.
Primary and secondary outcomes
CHS events criteria for MI, stroke and HF have been published12,19–21. For MI, they include the traditional elements of chest pain, cardiac enzymes, and ECGs. Criteria for HF rely on physician diagnosis, treatment, and diagnostic test results12. Stroke is defined as a clinical event of rapid onset consisting of a focal neurological deficit lasting more than 24 hours unless death supervenes19,21. Due to a broad interest in a variety of health conditions that affect older adults, the goal of the events data collection was not only to ascertain incident cardiovascular events, but also to create a database of all hospitalizations that included discharge diagnoses, procedures and at least a discharge summary. This research resource was designed so that other events such as venous thromboembolism22, hip fracture23, or pneumonia24 could be captured and studied more easily.
Methods of events identification, data collection, and adjudication
Between the 6-monthly contacts to obtain information about cardiovascular events and all hospitalizations, participants and their proxies were asked to call the Field Center to report them. Field-Center investigations for all hospitalizations occasionally identified other unreported hospitalizations. Periodically, CHS also used Medicare data to identify hospitalizations that may have been missed by self-report. Deaths were identified from proxies on follow-up calls, searches in the local obituaries, and periodic searches of the National Death Index. A broad set of diagnostic codes was used to identify potential events12,19. Additionally, Field-Center review of the discharge diagnoses and the text of the discharge summaries of all hospitalizations was used to identify other potential clinical CVD outcomes that had not been self-reported. The initial report form indicated how the event had first been identified. The events data collected for adjudication were matched to the events criteria, and included (where appropriate or available) hospital records and out-patient medical records; copies of ECGs, head computed tomography and magnetic resonance imaging scans; results of diagnostic tests; questionnaires for physicians caring for participants; death certificates, autopsy reports and coroners' reports; and an interview with proxies or witnesses for all out-of-hospital deaths. The data from potential events identified by all methods were reviewed and classified by physician members of the Stroke and Cardiac Events Committees. Adjudicators were blinded to baseline risk-factor data. Each potential CVD event of any type—MI, angina, HF and peripheral arterial disease—was reviewed for all cardiac event types. During each meeting, a sample of events was included for blinded re-review as a quality control effort. The agreement between reviews for non-fatal events has been excellent with kappas of 0.86 (95% CI, 0.78–0.93; n=241) for MI, 0.87 (95% CI, 0.75–0.99; n=62) for stroke, and 0.85 (95% CI, 0.78–0.92; n= 241) for HF.
Analysis
CHS participants were sampled from Medicare lists and CMS Part A data was obtained for all Medicare fee-for-service beneficiaries through December 2012. In addition, CHS conducted surveillance activities and identified incident events and hospitalizations for Medicare beneficiaries enrolled in managed care plans and for those treated at Veterans Administration Hospitals. Discharge summaries from the hospitalized events were obtained so that the CHS hospitalization database would include discharge diagnoses for all participants. Since this CHS active surveillance provided information on the CMS non fee-for-service hospitalizations, we were able to include all participants and all available follow-up time in the analysis. The main ICD9 codes used to define clinical outcomes in CMS were 410.×1 for MI; 430, 431, 433, 434 and 436 for stroke; and 428, 402. ×1, 404. ×1 and 404. ×3 for HF. Each outcome was defined in three ways: 1) the CHS adjudicated result (CHS[adj]); 2) the ICD9 diagnostic codes only in the first position (CMS[1st]); and 3) the ICD9 diagnostic codes in any position (CMS[any]). For the analysis of incident events, only participants at risk of a first such event were included in the analysis of that event type (n=5326 for MI, 5639 for stroke, and 5616 for HF). Risk-factor data from the baseline examination were summarized as proportions for categorical data and means with standard deviations for continuous data (Supplement Table 1). Associations between risk factors and events were evaluated in Cox proportional hazards models, and because CMS data included only hospitalized events, the comparison with CHS adjudicated events included only hospitalized events. Participants were treated as censored at loss to follow up or death. We used bootstrap techniques25 to estimate whether the point estimates for risk-factor associations with the ICD-9 code-based definitions were statistically significantly different from the point estimate for the same risk-factor associations with the adjudicated events. Insofar as misclassified events lack any association with the risk factors of interest, risk estimates for an outcome that includes misclassified events will typically be biased toward the null. In this study, all CMS events were hospitalizations; as a result, a misclassified CMS MI represents an illness requiring a hospitalization for another reason. To determine the extent to which CVD risk factors were associated with hospitalization in general and not specifically with an MI hospitalization, we also evaluated the association between traditional CVD risk factors and all first hospitalizations not due to an MI. Statistical analysis and data management was done in SAS version 9.1.4 with forest plots done in STATA.
Results
Based on information from initial-report forms, Table 1 summarizes how the 28,230 hospitalizations for the 5888 CHS participants were first identified in CHS. Of the 4344 incident cardiovascular events, for instance, 59.0% were first identified by self-report, 29.2% by Field-Center investigation, 3.5% by periodic searches of Medicare data, and 8.4% by a method that was unrecorded on the initial-report form. The percentages of events identified by self-report were generally similar across types of clinical outcomes, ranging from 57.0% for stroke to 66.3% for angina.
Table 1.
Initial source where CHS Field Centers learned about events for all hospitalizations from CHS baseline to Dec 2012
| Self-Report | FC Investigation | CMS | Unspecified | Total | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| N | % | N | % | N | % | N | % | N | % | ||
| Incident Events | 2564 | 59.0% | 1267 | 29.2% | 150 | 3.5% | 363 | 8.4% | 4344 | 100.0% | |
| Myocardial Infarction | 619 | 61.5% | 278 | 27.6% | 21 | 2.1% | 88 | 8.7% | 1006 | 100.0% | |
| Angina | 855 | 66.3% | 267 | 20.7% | 42 | 3.3% | 126 | 9.8% | 1290 | 100.0% | |
| Stroke | 547 | 57.0% | 317 | 33.0% | 16 | 1.7% | 80 | 8.3% | 960 | 100.0% | |
| Heart Failure | 1031 | 58.6% | 545 | 31.0% | 66 | 3.8% | 117 | 6.7% | 1759 | 100.0% | |
| Peripheral Arterial Disease | 172 | 52.4% | 96 | 29.3% | 22 | 6.7% | 38 | 11.6% | 328 | 100.0% | |
| Transient Ischemic Attack | 147 | 58.6% | 68 | 27.1% | 12 | 4.8% | 24 | 9.6% | 251 | 100.0% | |
| Recurrent Myocardial Infarction | 271 | 47.6% | 219 | 38.5% | 21 | 3.7% | 58 | 10.2% | 569 | 100.0% | |
| Recurrent Stroke | 162 | 51.1% | 122 | 38.5% | 6 | 1.9% | 27 | 8.5% | 317 | 100.0% | |
| Fatal | 520 | 27.6% | 1229 | 65.2% | 5 | 0.3% | 131 | 6.9% | 1885 | 100.0% | |
| Other hospitalizations | 11758 | 54.2% | 6114 | 28.2% | 2197 | 10.1% | 1627 | 7.5% | 21696 | 100.0% | |
| Total | 15063 | 53.4% | 8639 | 30.6% | 2372 | 8.4% | 2156 | 7.6% | 28230 | 100.0% | |
Note: Self-report of fatal events refers to proxy reports. Fatal hospitalized events are included both in their event category and in the row labelled fatal.
Abbreviations: CVD = cardiovascular disease; FC = Field Center; CMS = Centers for Medicare and Medicaid Services data.
Studies that rely on diagnostic codes from claims data as outcomes use selected codes and their diagnostic position to define events. With the CHS adjudicated results serving as the standard, Table 2 shows the PPVs and the proportions of events identified by commonly used CMS conventions for clinical outcomes. For MI, the PPV of an ICD9 code of 410 in the first position was 90.6%, but this code identified only 53.8% of incident MIs ascertained by active-surveillance in CHS. The 410 code as any secondary diagnosis identified an additional 16.6% of MI events with a PPV of 69.8%. For all the other ischemic heart disease codes, the PPVs were low, but because they comprised so many hospitalizations, they accounted for 29.6% of all incident MI events. For the main stroke codes in the first position, the emerging convention for stroke in the literature, the PPV was 80.4% but only 44.4% in the second position. Strokes ascertained by the main codes as primary diagnoses in claims data identified only 63.8% of the incident stroke events,. For HF, the main diagnostic codes in the primary position had a high PPV of 93.2%, but identified only 27.2% of HF events. (Supplemental Table 2 includes the results for HF stratified by ejection fraction.)
Table 2.
Positive predictive value of commonly used codes for cardiovascular events.
| Incident Myocardial Infarction (n=1,018) | |||||
|---|---|---|---|---|---|
| ICD9 | Position | Hospitalizations | CHS adjudicated MIs |
PPV | Proportion of CHS adjudicated events identified |
| 410 | First | 605 | 548 | 90.6% | 53.8% |
| 410 | Secondary | 242 | 169 | 69.8% | 16.6% |
| 411, 412, 413, 414 | First | 408 | 67 | 16.4% | 6.6% |
| 411, 412, 413, 414 | Secondary | 1288 | 122 | 9.5% | 12.0% |
| Other | Any | 20428 | 99 | 0.8% | 9.7% |
| Not hospitalized | n/a | n/a | 13 | n/a | 1.3% |
| Incident Stroke (n=1,087) | |||||
|---|---|---|---|---|---|
| ICD9 | Position | Hospitalizations | CHS adjudicated strokes |
PPV | Proportion of CHS adjudicated events identified |
| 430, 431, 433, 434, and 436 | First | 863 | 694 | 80.4% | 63.8% |
| 430, 431, 433, 434, and 436 | Secondary | 385 | 171 | 44.4% | 15.7% |
| 432, 435, 437, 438 | First | 222 | 44 | 19.8% | 4.0% |
| 432, 435, 437, 438 | Secondary | 330 | 41 | 12.4% | 3.8% |
| Other | Any | 22459 | 10 | 0.6% | 0.9% |
| Not hospitalized | n/a | n/a | 127 | n/a | 11.6% |
| Incident Heart Failure (n=1,863) | |||||
|---|---|---|---|---|---|
| ICD9 | Position | Hospitalizations | CHS adjudicated heart failures |
PPV | Proportion of CHS adjudicated events identified |
| 428, 402×1, 404. ×1 and 404. ×3 | First | 544 | 507 | 93.2% | 27.2% |
| 428, 402×1, 404. ×1 and 404. ×3 | Secondary | 1375 | 997 | 72.5% | 53.5% |
| Other | Any | 17944 | 255 | 1.8% | 13.7% |
| Not hospitalized | n/a | n/a | 104 | n/a | 5.6% |
Note: ICD9 = International Classification of Diseases, Ninth Edition; PPV = positive predictive value; 410 = acute MI; 411–414 = other forms of acute and chronic ischemic heart disease; Main stoke codes = 430, 431, 433, 434, and 436; Minor codes related to other forms of cerebrovascular disease = 432, 435, 437 and 438. Main codes for HF 428, 402×1, 404. ×1 and 404. ×3.
For the CVD events that lacked qualifying disease codes, we ascertained the primary reason for hospitalization. For MI, the most common were other forms of ischemic heart disease [IHD] (n=57 of 284), HF (n=44), arrhythmia, primarily atrial fibrillation (n=21), pneumonia (n=20), and cerebrovascular disease (n=12); for stroke, they were other forms of cerebrovascular disease (n=36 of 212), IHD (n=21), arrhythmia (n= 13), symptoms such as altered consciousness or coma (n=10), HF (n=8), and femur fracture (n=8); and for HF, they were IHD (n= 58 of 293); arrhythmia (n=22), pneumonia (n=27); respiratory and chest symptoms (n=12); and rheumatic or valvular heart disease (n=11).
Estimates of disease incidence differed markedly across the three methods for identifying events. For CHS[adj], CMS[1st], and CMS[any], the incidence rates per 1000 person-years by age category are displayed for MI in Figure 1a, stroke in Figure 1b, and HF in Figure 1c. In Figure 1a, for instance, the MI incidence increased with age though there were only 29 events among those >= 85 years of age. For MI, the overall incidence was 14.9 events per 1000 person year for CHS[adj], 8.6 for CMS[1st] and 12.2 for CMS[any]. The new conventions use diagnostic codes in the first position for stroke and heart failure. For stroke, the CMS[1st] incidence of 11.9 was lower than the CHS adjudicated incidence rate of 13.4. The CHS HF incidence was three times higher than the CMS [1st] (25.6 versus 7.3 events per 1000 person years).
Figure 1.



A. Incidence of MI per 1000 person-years by age categories for adjudicated CHS events, CMS events with ICD9 of 410 in first position or in any position, n = number of MI events, probable and definite, adjudicated within each age group. B. Incidence of stroke per 1000 person-years by age categories for adjudicated CHS events, CMS events with ICD9 of main stroke codes in first position or in any position, n = number of stroke events, probable and definite, adjudicated in each age group. C. Incidence of heart failure per 1000 person-years by age categories for adjudicated CHS events, CMS events with ICD9 of main HF codes in first position or in any position; n= number of HF events, probably and definite, adjudicated in each age group
For each of the three incident events, the association analyses were designed to compare risk-factor associations across the three methods of defining each event type. For all outcomes, Figure 2 shows that the risk-factor associations were generally similar across the three methods of events identification, perhaps with some minor differences. (Numerical values for hazard ratios and confidence intervals are given in supplemental table 3). For MI, CMS[any] association was significantly lower for African American race but significantly higher for total cholesterol, elevated fasting glucose, and drug-treated diabetes. For stroke, CMS[any] association was significantly higher for men and for smoking. For HF, CMS[any] HF associations were significantly stronger for age and weaker for drug treated diabetes. In general, the misclassified events in the CMS analyses, appeared to have little effect on the magnitude of associations for most of the CVD risk factors.
Figure 2.



A. Associations of incident MI with cardiovascular risk factors among 5,326 participants in the Cardiovascular Health Study (n for CHS[adj]=1006, n for CMS[1st] = 605, n for CMS[any] = 847). Risk-factor estimates that were different from those for the adjudicated event estimate (reference) at the p<0.05 level with bootstrap methods are indicated with a [*] symbol. B. Associations of incident stroke with cardiovascular risk factors among 5,639 participants in the Cardiovascular Health Study (n for CHS[adj]=960, n for CMS[1st] = 863, n for CMS[any] = 1248). Risk-factor estimates that were different from those for the adjudicated event estimate (reference) at the p<0.05 level with bootstrap methods are indicated with a [*] symbol. C. Associations of incident heart failure with cardiovascular risk factors among 5613 participants in the Cardiovascular Health Study (n for CHS[adj]=1759, n for CMS[1st] = 544, n for CMS[any] = 1919). Risk-factor estimates that were different from those for the adjudicated event estimate (reference) at the p<0.05 level with bootstrap methods are indicated with a [*] symbol.
Many traditional CVD risk factors are non-specific and themselves associated with conditions that lead to hospitalization. Figure 3 compares the hazard ratios for a first CHS[adj] MI with the hazard ratios for the all first hospitalizations not due to an MI. (Numerical values for hazard ratios and confidence intervals are given in supplemental table 4). With the exception of lipids, CVD risk factors were associated with a first non-MI hospitalization. Many of the hazard ratios were similar in magnitude to those for incident MI, and the point estimates for smoking and drug-treated hypertension were larger for non-MI hospitalization than for incident MI.
Figure 3.

Risk factor associations for incident adjudicated MI (n=1006 events) and the same risk-factor associations for all first hospitalizations (n=4981 events) not due to an MI among 5,326 CHS participants free of baseline MI events. Risk-factor estimates that differed between adjudicated incident MI and the first non-MI hospitalization at the p<0.05 level with bootstrap methods are indicated with a [*] symbol.
In sensitivity analyses, the results changed only in trivial ways when we restricted the analysis to fee-for-service beneficiaries, when we restricted the analysis only to definite events, and when we looked at time trends, before and after January 1, 1999 (Supplemental Table 5).
Discussion
For MI, stroke and HF, we compared adjudicated events with two claims-based approaches to defining health endpoints in terms of event rates and risk-factor associations. The multiple methods used for the active surveillance in CHS improved the completeness of events identification (Table 1). Conventional claims-based methods of defining events, especially those for stroke and HF, which use only the principal diagnosis, had high PPVs; but these methods intentionally ignore genuine events that appear in low-PPV diagnostic codes and positions (Table 2). As a result, the event rates based on the emerging conventions for claims-based methods for CMS[any] MI, CMS[1st] stroke, and CMS[1st] HF were lower than those estimated from CHS adjudicated results (Figure 1). In general, CVD risk-factor associations were similar across the three methods of defining events for MI, stroke and HF (Figure 2). Indeed, most CVD risk factors were also associated with the risk of a first non-MI hospitalization (Figure 3).
Event rates are directly related to the intensity of surveillance. Like the Framingham Heart Study (FHS) and the Atherosclerosis Risk in Communities (ARIC) Study13,14,26, CHS is an active-surveillance study. In an effort to develop a research resource for a variety of health outcomes, CHS used multiple methods, including CMS data, to identify potential events. In this setting, self-report alone identified only 61.7% of incident events. Similar findings have been reported by other studies15,27. Although some events were surely missed in CHS, the consequence of broad surveillance efforts is likely to be near-complete identification of key cardiovascular events.
Criteria for a diagnosis of MI, stroke and HF may vary across studies and clinical settings. Nonetheless, claims-based definitions of CVD events have often worked well (11), and in a previous publication, the mortality rates of participants with incident HF events identified by both CHS and ICD9 codes was slightly higher than the mortality rates of those identified by either method alone28. The validity of claims-based diagnostic codes also depends importantly on the condition. For instance, administrative data alone, without review of the medical records, do not work well for specialized studies of drug-induced liver injury29 or statin-related rhabdomyolysis30. On October 1, 2015, the tenth revision of the ICD will be implemented in the US, and though the diagnostic performance of the new codes is likely to be similar to that of the old codes, continued vigilance is warranted.
In association analyses, non-differential misclassification generally drives associations toward the null, and the effects of large levels of misclassification can be pronounced30. For MI, stroke, and HF, however, the high positive predictive values of the claim-based methods, particularly those that use the principal diagnosis, tend to minimize the potential for bias from misclassification. With information on sensitivity and specificity, methods of quantitative bias analysis can characterize plausible effects of outcome misclassification bias on estimated associations31–33. These methods all assume, however, that the misclassified event has no association with the risk factors of interest. The use of selected diagnostic codes from hospitalizations as events means that even a misclassified event involves a serious health condition that required hospitalization. In effect, claims-based events data represent a composite endpoint that includes both the outcome of interest and selected (misclassified) non-event hospitalizations. As illustrated in Figure 3, the hazard ratios for all first hospitalizations not due to an MI suggest that hospitalizations randomly misclassified as MIs would have little effect on the strength of associations of MI with a number of CVD risk factors. Under these circumstances, showing that the levels of risk factor associations for an outcome are consistent with the published literature34 may in fact provide little evidence of validity. For other less promiscuous risk factors or exposures, the effect of misclassification may be more pronounced.
The use of claims-based methods with high PPVs to define events does come at the price of low sensitivity. For MI and stroke, the conventional claims-based methods identified only about two-thirds of the events in CHS (Table 2). In some settings such as the Mini-Sentinel with 178 million covered lives35, the available sample size is so large or the marginal cost of increasing the sample size is so low that in the absence of known or suspected bias, the effect of missed events on study power may be invisible. On the other hand, in trials or cohort studies evaluating associations with variables measured by costly examination components such as carotid ultrasound36, echocardiography37, or brain magnetic resonance imaging38, study power depends importantly on the observed event rates: to achieve comparable power, the efforts to recruit and examine additional patients may be much more costly than the efforts to improve the ascertainment of the primary events of interest. By design, events data-collection methods should be efficient and fit for purpose.
Observed event rates depend importantly on the intensity of identification and investigation. As a result, active-surveillance studies generate higher estimates of event rates than studies that rely on self-report. When data from active-surveillance studies are used to create risk-prediction algorithms, they also generate higher levels of predicted absolute risk. Data from CHS, FHS and ARIC, all active-surveillance studies, were used to develop the risk calculator39 for the Adult Treatment Panel 4 guidelines for lipids40. While the new guidelines were controversial for a number of reasons, one persistent complaint was the apparent overestimation of the absolute risk produced by the calculator compared with studies that had used self-report to identify events41,42. Missed events in studies that rely on self-report for the ascertainment of outcomes remains a cogent alternative explanation for large differences in the absolute event rates. As demonstrated by the REGARDS study43, additional data from CMS substantially reduced the perceived overestimation in risk.
For studies where the use of high-PPV codes is not suspected to introduce bias, the underestimation of event rates may not be a concern. But for studies where power is at a premium, where bias must be minimized, or where accurate estimates of incidence are of interest, the use of high-quality surveillance methods has much to recommend itself. While events data-collection activities should be appropriate to the purpose of the study, the achieved or observed event rates in both observational studies and clinical trials can also serve as an important quality-control measure of study conduct44,45.
Supplementary Material
Clinical Perspectives.
In both observational studies and clinical trials, cardiovascular-disease event rates vary for a number of biological and methodological reasons. Increasingly, the diagnostic codes from administrative claims data are being used to measure clinical outcomes. The quality of claims data varies according to the condition, and many cardiovascular events are coded with a moderate degree of accuracy. The approach of defining events on the basis of claims data influences the results. Methods that minimize misclassification such as the use of specific diagnostic codes in the primary diagnosis position also tend to underestimate event rates. Methods that use broad diagnostic codes in any position tend to capture not only genuine outcomes of interest but also non-events that fail to meet standard criteria. Levels of risk factor associations with the outcome may not be reliable guides to the amount of misclassification. Observed event rates are directly related to the intensity of surveillance. When data from active-surveillance studies are used, for instance, to create risk-prediction algorithms, they also generate higher levels of predicted absolute risk than studies that rely on passive surveillance or self report. While events data-collection activities should be appropriate to the purpose of the study, the achieved or observed event rates in both observational studies and clinical trials can also serve as an important quality-control measure of study conduct.
Acknowledgments
Funding Sources: This work was supported in part by National Heart, Lung and Blood Institute contracts HHSN268201200036C, HHSN268200800007C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants U01HL080295, with additional contribution from the National Institute of Neurological Disorders and Stroke. Additional support was provided through R01AG023629 from the National Institute on Aging. A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org.
Footnotes
Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute, the National Institutes of Health.
Disclosures: Dr. Psaty serves on the Data Safety and Monitoring Board for a clinical trial of a device funded by Zoll LifeCor and on the steering committee of the Yale Open Data Access Project funded by Johnson & Johnson.
References
- 1.Blumenthal D, Davis K, Guterman S. Medicare at 50--origins and evolution. N Engl J Med. 2015;372:479–486. doi: 10.1056/NEJMhpr1411701. [DOI] [PubMed] [Google Scholar]
- 2.Polgreen LA, Suneja M, Tang F, Carter BL, Polgreen PM. Increasing trend in admissions for malignant hypertension and hypertensive encephalopathy in the United States. Hypertension. 2015;65:1002–1007. doi: 10.1161/HYPERTENSIONAHA.115.05241. [DOI] [PubMed] [Google Scholar]
- 3.Yeh R, Sidney S, Chandra M, Sorel M, Selby J, Go A. Population trends in the incidence and outcomes of acute myocardial infarction. N Engl J Med. 2010;362:2155–2165. doi: 10.1056/NEJMoa0908610. [DOI] [PubMed] [Google Scholar]
- 4.Graham DJ, Reichman ME, Wernecke M, Zhang R, Southworth MR, Levenson M, Sheu TC, Mott K, Goulding MR, Houstoun M, MaCurdy T, Worrall C, Kelsman JA. Cardiovascular, bleeding, and mortality risks in elderly Medicare patients treated with dabigatran or warfarin for nonvaluvular atrial fibrillation. Circulation. 2015;131:157–164. doi: 10.1161/CIRCULATIONAHA.114.012061. [DOI] [PubMed] [Google Scholar]
- 5.Cutrona SL, Toh S, Iyer A, Foy S, Cavagnaro E, Forrow S, Racoosin JA, Goldberg R, Gurwitz JH. Design for validation of acute myocardial infarction cases in Mini-Sentinel. Pharmacoepidemiol Drug Safety. 2012;21(S1):274–281. doi: 10.1002/pds.2314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Saczynski JS, Andrade SE, Harrold LR, Tija J, Cutrona SL, Dodd KS, Goldberg RJ, Gurwitz JH. A systematic review of validated methods for identifying heart failure using administrative data. Pharmacoepidemiol Drug Saf. 2012;21(S1):129–140. doi: 10.1002/pds.2313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Andrade SE, Harrold LR, Tjia J, Cutrona SL, Saczynski JS, Dodd KS, Goldberg RJ, Gurwitz JH. A systematic review of validated methods for identifying cerebrovascular accident or transient ischemic attack using administrative data. Pharmacoepidemiol Drug Safety. 2012;21(S1):100–128. doi: 10.1002/pds.2312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Graham D, Ouellet-Hellstrom R, Macurdy T, Ali F, Sholley C, Worrall C, Kelman JA. Risk of acute myocardial infarction, stroke, heart failure, and death in elderly Medicare patients treated with rosiglitazone or pioglitazone. JAMA. 2010;304:411–418. doi: 10.1001/jama.2010.920. [DOI] [PubMed] [Google Scholar]
- 9.Go AS, Singer D, Cheetham TC, Toh D, Reichman M, Graham D, Southworth MR, Zhang R, Houstoun M, Wu YT, Mott K, Gagne J. Mini-Sentinel Medical Product Assessment: a protocol for assessment of dabigatran. 2013 Dec 30; Posted for public comment, http://mini-sentinel.org/assessments/medical_events/details.aspx?ID=219. [Google Scholar]
- 10.Graham DJ, Zhou EH, McKean S, Levenson M, Calia K, Gelperin K, Ding Z, MaCurdy TE, Worrall C, Kelman JA. Cardiovascular and mortality risk in elderly Medicare beneficiaries treated with olmesartan versus other angiotensin receptor blockers. Pharmacoepidemol Drug Saf. 2014;23:331–339. doi: 10.1002/pds.3548. [DOI] [PubMed] [Google Scholar]
- 11.Hlatky MA, Ray RM, Burwen DR, Margolis KL, Johnson KC, Kucharska-Newton A, Manson JE, Robinson JG, Safford MM, Allison M, Assimes TL, Barvy AA, Berger J, Cooper-DeHoff RM, Heckbert SR, Li W, Liu S, Martin LW, Perez MV, Tindle HA, Winkelmayer WC, Stefanick ML. Use of Medicare data to identify coronary heart disease outcomes in the Women's Health Initiative. Circ Cardiovasc Qual Outcomes. 2014;7:157–162. doi: 10.1161/CIRCOUTCOMES.113.000373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ives DG, Fitzpatrick AL, Bild DE, Psaty BM, Kuller LH, Crowley PM, Cruise RG, Theroux S. Surveillance and ascertainment of cardiovascular events: The Cardiovascular Health Study. Ann Epidemiol. 1995;5:278–285. doi: 10.1016/1047-2797(94)00093-9. [DOI] [PubMed] [Google Scholar]
- 13.Rosamond WD, Chambless LE, Sorlie PD, Bell EM, Weitman S, Smith JC, Folsom AR. Trends in the sensitivity, positive predictive value, false-positive rate, and comparability ratio of hospital discharge diagnosis codes for acute myocardial infarction in four US communities, 1987–2000. Am J Epidemiol. 2004;160:1137–1146. doi: 10.1093/aje/kwh341. [DOI] [PubMed] [Google Scholar]
- 14.Jones SA, Gottesman RF, Shahar E, Wruck L, Rosamond WD. Validity of hospital discharge diagnosis codes for stroke: the Atherosclerosis Risk in Communities study. Stroke. 2014;45:3219–3225. doi: 10.1161/STROKEAHA.114.006316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Heckbert SR, Kooperberg C, Safford MM, Psaty BM, Hsia J, McTiernana A, Gaziano JM, Frishman WH, Curb JD. Comparison of self-report, hospital discharge codes, and adjudication of cardiovascular events in the Women's Health Initiative. Am J Epidemiol. 2004;160:1152–1158. doi: 10.1093/aje/kwh314. [DOI] [PubMed] [Google Scholar]
- 16.Fried LP, Borhani NO, Enright P, Furberg C, Gardin J, Kronmal R, Kuller LH, Manolio T, Mittelmark M, Newman A, O'Leary DH, Psaty B, Rautaharju P, Tracy RP, Weiler PG. The Cardiovascular Health Study: design and rationale. Ann Epidemiol. 1991;1:263–276. doi: 10.1016/1047-2797(91)90005-w. [DOI] [PubMed] [Google Scholar]
- 17.Psaty BM, Lee M, Savage PJ, Rutan GH, German PS, Lyles M. Assessing the use of medications in the elderly: method and initial results in the Cardiovascular Health Study. J Clin Epidemiol. 1992;45:683–692. doi: 10.1016/0895-4356(92)90143-b. [DOI] [PubMed] [Google Scholar]
- 18.Psaty BM, Kuller LH, Bild D, Burke GL, Kittner SJ, Mittelmark M, Price TR, Rautaharju PM, Robbins J. Methods of assessing prevalent cardiovascular disease in the Cardiovascular Health Study. Ann Epidemiol. 1995;5:270–277. doi: 10.1016/1047-2797(94)00092-8. [DOI] [PubMed] [Google Scholar]
- 19.Price TR, Psaty B, O'Leary D, Burke G, Gardin J for the Cardiovascular Health Study Research Group. Assessment of cerebrovascular disease in the Cardiovascular Health Study. Ann Epidemiol. 1993;3:504–507. doi: 10.1016/1047-2797(93)90105-d. [DOI] [PubMed] [Google Scholar]
- 20.Gottdiener JS, Arnold AM, Aurigemma GP, Polak JF, Tracy RP, Kitzman D, Gardin JM, Rutledge JE, Boineau RC. Predictors of congestive heart failure in the elderly: The Cardiovascular Health Study. J Am Coll Cardiol. 2000;35:1628–1637. doi: 10.1016/s0735-1097(00)00582-9. [DOI] [PubMed] [Google Scholar]
- 21.Longstreth WT, Jr, Bernick C, Fitzpatrick A, Cushman M, Knepper L, Lima J, Furberg CD. Frequency and predictors of stroke death in 5,888 participants in the Cardiovascular Health Study. Neurology. 2001;56:368–375. doi: 10.1212/wnl.56.3.368. [DOI] [PubMed] [Google Scholar]
- 22.Tsai AW, Cushman M, Rosamond WD, Heckbert SR, Tracy RP, Aleksic N, Folsom AR. Coagulation factors, inflammation markers, and venous thromboembolism: the longitudinal investigation of thromboembolism etiology (LITE) Am J Med. 2002;113:636–642. doi: 10.1016/s0002-9343(02)01345-1. [DOI] [PubMed] [Google Scholar]
- 23.Robbins JA, Biggs ML, Cauley J. Adjusted mortality after hip fracture: From the Cardiovascular Health Study. J Am Geriatr Soc. 2006;54:1885–1891. doi: 10.1111/j.1532-5415.2006.00985.x. [DOI] [PubMed] [Google Scholar]
- 24.Corrales-Medine VF, Alvarez KN, Weissfeld LA, Newman AB, Lehr L, Angus DC, Folsom A, Chirinos JA, Elkind MS, Lyles MF, Kronmal R, Yende S. Association between hospitalization for pneumonia and subsequent risk of cardiovascular disease. JAMA. 2015;313:264–274. doi: 10.1001/jama.2014.18229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Carpenter J, Bithell J. Boostrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med. 2000;19:1141–1164. doi: 10.1002/(sici)1097-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
- 26.Parikh N, Gona P, Larson M, Fox CS, Benjamin EJ, Murabito JM, O’Donnell CJ, Vasan RS, Levy D. Long-term trends in myocardial infarction incidence and case fatality in the National Heart, Lung, and Blood Institute's Framingham Heart Study. Circulation. 2009;119:1203–1210. doi: 10.1161/CIRCULATIONAHA.108.825364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yasaitis LC, Berkman LF, Chandra A. Comparison of self-reported and Medicare claims-identified acute myocardial infarction. Circulation. 2015;131:1477–1485. doi: 10.1161/CIRCULATIONAHA.114.013829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schellenbaum GD, Heckbert SR, Smith NL, Rea TD, Lumley T, Kitzman DW, Roger VL, Taylor HA, Psaty BM. Congestive heart failure incidence and prognosis: case identification using central adjudication versus hospital discharge diagnoses. Ann Epidemiol. 2006;16:15–122. doi: 10.1016/j.annepidem.2005.02.012. [DOI] [PubMed] [Google Scholar]
- 29.Re VL, III, Kaynes K, Goldberg D, Forde KA, Carbonari DM, Fortier K, Hennessy S, Reddy KR, Pawloski PA, Daniel GW, Cheetham TC, Iyer A, Coughlin KO, Toh D, Boudreau D, Cooper WO, Selvam N, Selvan MS, VanWormer JJ, Avigan M, Houstoun M, Zornberg GL, Racoosin JA, Shoaibi A. Validity of codes to identify cases of severe acute liver injury in the Mini-Sentinel Distributed Database. 2012 Dec 6; doi: 10.1002/pds.3470. http://www.mini-sentinel.org/methods/outcome_validation/details.aspx?ID=103; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Floyd JS, Heckbert SR, Weiss NS, Carrell DS, Psaty BM. Use of administrative data to estimate the incidence of statin-related rhabdomyolysis. JAMA. 2012;307:1580–1582. doi: 10.1001/jama.2012.489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Greenland S. Variance estimation for epidemiologic effect estimates under misclassification. Stat Med. 1988;7:745–757. doi: 10.1002/sim.4780070704. [DOI] [PubMed] [Google Scholar]
- 32.Lash TL, Schmidt M, Jensen A, Engebjerg MC. Methods to apply probabilistic bias analysis to summary estimates of association. Pharmacoepidemiol Drug Saf. 2010;19:638–644. doi: 10.1002/pds.1938. [DOI] [PubMed] [Google Scholar]
- 33.Fox MP, Lash TL, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol. 2005;34:1370–1376. doi: 10.1093/ije/dyi184. [DOI] [PubMed] [Google Scholar]
- 34.Setiawan VW, Virnig BA, Porcel J, Henderson BE, Marchand LL, Wilkens LR, Monroe KR. Linking data from the multiethnic cohort study to Medicare data: linkage results and application to chronic disease research. Am J Epidemiol. 2015;181(11):917–919. doi: 10.1093/aje/kwv055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Platt R, Carnahan RM, Brown JS, Chrischilles E, Curtis LH, Hennessy S, Nelson JC, Racoosin JA, Robb M, Schneeweiss S, Toh S, Weiner MG. The US Food and Drug Administration's Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):1–8. doi: 10.1002/pds.2343. [DOI] [PubMed] [Google Scholar]
- 36.O'Leary DH, Polak JF, Kronmal RA, Manolio TA, Burke GL, Wolfson SK, Jr for the Cardiovascular Health Study Collaborative Research Group. Carotid-artery intima and media thickness as a risk factor for myocardial infarction and stroke in older adults. N Engl J Med. 1999;340:14–22. doi: 10.1056/NEJM199901073400103. [DOI] [PubMed] [Google Scholar]
- 37.deFilippi CR, Christenson RH, Kop WJ, Gottdiener JS, Zhan M, Seliger SL. Left ventricular ejection fraction assessment in older adults: an adjunct to natriuretic peptide testing to identify risk of new-onset heart failure and cardiovacular death? J Am Coll Cardiol. 2011;58:1497–1506. doi: 10.1016/j.jacc.2011.06.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bernick C, Kuller L, Dulberg C, Longstreth WT, Jr, Manolio T, Beauchamp N, Price T. Silent MRI infarcts and the risk of future stroke: The Cardiovascular Health Study. Neurology. 2001;57:1222–1229. doi: 10.1212/wnl.57.7.1222. [DOI] [PubMed] [Google Scholar]
- 39.Goff DC, Jr, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, Robinson JG, Schwartz JS, Shero ST, Smith SC, Jr, Sorlie P, Stone JM, Wilson PWF. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129:S49–S73. doi: 10.1161/01.cir.0000437741.48606.98. [DOI] [PubMed] [Google Scholar]
- 40.Stone NJ, Robinson JG, Lichtenstein AH, Bairey CN, Blum CB, Eckel RH, Goldberg AC, Gordon D, Levy D, Lloyd-Jones DM, McBride P, Schwartz JS, Shero ST, Smith SC, Jr, Watson K, Wilson PWF. 2013 ACC/AHA Guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: A report of the ACC/AHA task force on practice guidelines. Circulation. 2014;129:S1–S45. doi: 10.1161/01.cir.0000437738.63853.7a. [DOI] [PubMed] [Google Scholar]
- 41.Ridker PM, Cook NR. Statins: new American Guidelines for prevention of cardiovascular disease. Lancet. 2013;382:1762–1765. doi: 10.1016/S0140-6736(13)62388-0. [DOI] [PubMed] [Google Scholar]
- 42.Cook NR, Ridker PM. Further insight into the cardiovascular risk calculator: the roles of statins, revascularizations, and underascertainment in the Women's Health Study. JAMA Intern Med. 2014;174:1964–1971. doi: 10.1001/jamainternmed.2014.5336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Muntner P, Colantonio LD, Cushman M, Goff DC, Jr, Howard G, Howard VJ, Kissela B, Levitan EB, Lloyd-Jones DM, Safford MM. Validation of the atherosclerotic cardiovascular disease pooled cohort risk equations. JAMA. 2014;311:1406–1415. doi: 10.1001/jama.2014.2630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Psaty BM, Prentice RL. Variation in event rates in trials of patients with type 2 diabetes. JAMA. 2009;302:1698–1700. doi: 10.1001/jama.2009.1497. [DOI] [PubMed] [Google Scholar]
- 45.Pfeffer MA, Claggett B, Assmann SF, Boineau R, Anand IS, Clausell N, Desai AS, Diaz R, Fleg JL, Gordeev I, Heitner JF, Lewis EF, O’Meara E, Rouleau JL, Probstfield JL, Shaburishvili T, Shah SJ, Solomon SD, Sweitzer NK, McKinlay SM, Pitt B. Regional variation in patients and outcomes in the treatment of preserved cardiac function heart failure with an aldosterone antagonist (TOPCAT) trial. Circulation. 2015;131:34–42. doi: 10.1161/CIRCULATIONAHA.114.013255. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
