Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2016 Sep 30;184(7):532–544. doi: 10.1093/aje/kww077

Linkage of a Population-Based Cohort With Primary Data Collection to Medicare Claims

The Reasons for Geographic and Racial Differences in Stroke Study

Fenglong Xie, Lisandro D Colantonio, Jeffrey R Curtis, Monika M Safford, Emily B Levitan, George Howard, Paul Muntner *
PMCID: PMC5044809  PMID: 27651383

Abstract

We described the linkage of primary data with administrative claims using the Reasons for Geographic and Racial Differences in Stroke (REGARDS) study and Medicare. REGARDS study data were linked with Medicare claims by use of Social Security numbers. We compared REGARDS participants by Medicare linkage status, having fee-for-service (FFS) coverage or not, and with a 5% sample of Medicare beneficiaries who had FFS coverage in 2005, overall, by age (45–64 and ≥65 years), and by race. Among REGARDS participants who were ≥65 years of age, 80% had data linked to Medicare on their study-visit date (64% with FFS coverage). No differences except race and sex were present between REGARDS participants without Medicare linkage and those with data linked to Medicare with and without FFS coverage. After the age-sex-race adjustment, comorbid conditions and health-care utilization were similar for those with FFS coverage in the REGARDS study and the 5% sample of Medicare beneficiaries. Among REGARDS participants aged 45–64 years, 11% had FFS coverage on their study-visit date. In this age group, differences were present between participants with and without FFS coverage and the Medicare 5% sample with FFS coverage. In conclusion, REGARDS participants aged ≥65 years with FFS coverage are representative of the study cohort and the US population aged ≥65 years with FFS coverage.

Keywords: follow-up studies, insurance claim review, Medicare


Claims data are increasingly being used to supplement primary data collection in large cohort studies ( 1–10 ). Through claims data, investigators can study health-care cost and utilization and efficiently capture information on exposures and outcomes that were not assessed through primary data collection. A common source used for supplementing primary data collection in the United States is Medicare claims. Medicare is a US federal benefit program that provides health insurance to about 93% of US adults 65 years of age or older ( 11 ). It also provides insurance coverage for persons younger than 65 with disabilities or who have end-stage renal disease (about 7 million or 2.6% of the entire US population <65 years of age) ( 12 ).

The Reasons for Geographic and Racial Differences in Stroke (REGARDS) study enrolled a nationwide sample of US adults aged 45 years or older ( 13 ). REGARDS study data, collected through questionnaires and a study examination, have been linked to Medicare claims by use of participants' Social Security numbers (SSNs). Although SSNs are unique identifiers, successful linkages between a cohort and Medicare data may not occur because of recording errors, because SSNs were not provided, or because participants provided incorrect SSNs. In addition, not all US adults aged 65 years or older have Medicare coverage ( 11 ) and, of those who do have coverage, some beneficiaries are enrolled in managed care programs (Medicare Advantage Part C) for which complete claims are not available for research purposes.

The purpose of this article is to describe the process undertaken to link REGARDS study participants' data to Medicare claims and report the characteristics of REGARDS study participants with and without Medicare fee-for-service (FFS) coverage (i.e., being enrolled in Medicare Part A and Part B but not Part C), with the goal of providing a template for other cohort studies that may seek to link participants' data to Medicare claims in the future. Additionally, we compare the characteristics of REGARDS-Medicare– linked participants with the 5% random sample of Medicare beneficiaries who had FFS coverage to evaluate the representativeness of REGARDS study participants who have Medicare-linked data.

METHODS

REGARDS is a US population-based observational cohort study that enrolled 30,239 non-Hispanic white and black adults ≥45 years of age from all 48 contiguous US states and Washington, DC, between January 2003 and October 2007 ( 13 ). By design, blacks and residents of the southeastern United States were oversampled. At baseline, 21% and 35% of REGARDS study participants resided in the stroke buckle (coastal North and South Carolina and Georgia) and the stroke belt (the rest of North and South Carolina, Georgia, Alabama, Mississippi, Tennessee, and Arkansas), respectively. The institutional review boards of the University of Alabama at Birmingham and other participating institutions approved the REGARDS study, and all participants provided written, informed consent. The consent form included a statement about linking participants' data to Medicare claims, and this linkage was approved by the University of Alabama at Birmingham Institutional Review Board and the Centers for Medicare and Medicaid Services (CMS).

REGARDS-Medicare linking process

To link participants' data to Medicare claims, we sent SSNs collected during the REGARDS study baseline in-home visit to CMS. CMS contractors searched the Medicare registry database for these SSNs and generated a crosswalk file containing SSNs linked to Medicare beneficiary identification numbers. To confirm that REGARDS-Medicare data were correctly linked, we merged REGARDS study data with the Medicare beneficiary summary file using the crosswalk file. We considered the linkage to be successful if the birth date and sex matched between that recorded in the REGARDS study database and the Medicare beneficiary summary file. REGARDS study participants' data were not considered linked to Medicare claims if their SSN was not present in the crosswalk file or the participant had no record in the beneficiary summary file, if their sex did not match between the REGARDS study data collection and the Medicare data, or if at least 2 of the 3 elements of the birth date (day, month, year) did not match between the REGARDS study data collection and Medicare. Moreover, REGARDS study participants with matching SSN, sex, day, and month but not year of birth were only considered successfully matched to Medicare if their birth year recorded in the REGARDS study and Medicare was within 1 year. Participants' data were also considered to not be linked to Medicare claims if they had multiple sexes or birth dates in the Medicare beneficiary summary file across calendar years from 2000 to 2012.

Data collection

REGARDS study

As described previously ( 13 ), baseline data were collected through a computer-assisted telephone interview and an in-home study visit. Information on variables from the REGARDS study is provided in Supplementary Data available at http://aje.oxfordjournals.org/ .

Medicare

Medicare claims from 2000 to 2012 were obtained for REGARDS study participants with data successfully linked. To evaluate the representativeness of REGARDS study participants with Medicare coverage, we obtained Medicare data for the 5% random sample of beneficiaries ≥45 years of age with FFS coverage in 2005 (the midpoint of REGARDS study enrollment). As pharmacy insurance in Medicare began on January 1, 2006, medication use was assessed for REGARDS study participants enrolled in 2006–2007 and the 5% random sample of Medicare beneficiaries in 2006–2007 with Medicare FFS coverage. Medicare data were obtained from the beneficiary summary files, inpatient base claims files, inpatient revenue center files, outpatient base claims files, outpatient revenue center files, and B carrier line files. Prescription drug event files were used to determine medication use.

For REGARDS study participants, the index date was defined as the date of their in-home study visit in 2003–2007. For analyses comparing REGARDS study participants with the 5% sample of year 2005 Medicare beneficiaries, participants were required to have 6 months of continuous Medicare FFS coverage prior to their index date. To match this requirement, we randomly selected for each Medicare beneficiary an index date in 2005 after they had 6 months of consecutive Medicare FFS coverage. As REGARDS participants were recruited throughout the calendar year, a random index date was chosen rather than using July 1, 2005. This approach avoids using an identical look-back period (January 1–June 30) for all Medicare beneficiaries that may lead to bias if there are seasonal differences in health-care utilization and diagnoses. All available claims in the 3 years before the index date were used for defining comorbid conditions and health-care utilization. Three years was chosen as this represents the maximum possible look-back for the first REGARDS study participant recruited in January 2003. Previously published algorithms were used to define comorbid conditions and health-care utilization ( Supplementary Data ). For identifying medication use in Medicare claims, we compared REGARDS study participants enrolled in 2006–2007 with the 5% sample of 2006–2007 Medicare beneficiaries with 6 months of FFS plus Part D coverage before their in-home visit for REGARDS study participants, and for 6 months prior to a random date in 2006 or 2007 for Medicare beneficiaries. Prescription fills were defined as the presence of a national drug code in Part D claims before the index date.

Statistical methods

All analyses were conducted by age group (45–64 and 65 years of age or older) to reflect the primary reasons individuals were eligible for Medicare coverage: older age (i.e., age 65 years or older) and disability/end-stage renal disease (i.e., for those younger than 65 years), and by race. We used REGARDS study data to analyze the characteristics of participants with data linked to Medicare claims with and without FFS coverage for the calendar month when their REGARDS in-home visit occurred and for those without data linked to Medicare on the calendar month when their in-home visit occurred. We determined the prevalence of comorbid conditions and health-care utilization in Medicare for REGARDS study participants with at least 6 months of continuous Medicare FFS coverage prior to their index date. For the analysis of participants aged 65 years or older, we excluded individuals who turned 65 years of age during the 6 months prior to the index date, as they do not represent those with Medicare coverage due to older age. Differences between these 2 groups were assessed by using the standard mean difference as described by Yang and Dalton ( 14 ). A standard mean difference >0.1 was considered potentially important (i.e., a large difference in the distributions between groups). Using Part D claims, we calculated the prevalence of medication use for REGARDS study participants enrolled in 2006–2007 and the 5% random sample of Medicare beneficiaries in 2006–2007. In a final analysis, we calculated the standardized prevalence of comorbid conditions, health-care utilization, and medication use for REGARDS study participants with FFS and FFS plus Part D coverage and for Medicare beneficiaries in the 5% random sample. Prevalence estimates were standardized to the age (in 5-year groups from <50 to ≥80 years), sex, race (black, white), and region of residence (Northeast, Midwest, South, and West) distribution of the 5% Medicare sample. All analyses were conducted by use of SAS, version 9.3, software (SAS Institute, Inc., Cary, North Carolina).

RESULTS

After exclusion of 56 potential participants with anomalies on their informed consent, SSNs were available for 26,183 (87%) of the 30,183 REGARDS study participants (Figure  1 ). SSNs for these participants were sent to CMS, and a crosswalk file containing 22,221 beneficiaries was returned. Overall, 21,162 REGARDS study participants had records in the Medicare beneficiary summary file between 2000 and 2012. A total of 759 participants with data initially linked to Medicare through SSNs were ultimately considered not linked because they had multiple ages or 2 sexes recorded in the Medicare beneficiary summary file across 2000–2012 ( n = 83) or age and/or sex disagreement between Medicare and the REGARDS study ( n = 676). Of 20,403 participants with data linked to Medicare between 2000 and 2012, 6,769 became Medicare eligible because of age or disability and obtained Medicare coverage after their in-home study visit, and 13,634 had their data linked at the time (i.e., calendar month) of their in-home study visit (11,947 (87%) ≥65 years of age and 1,687 (11%) <65 years of age). Of these participants, 10,839 had FFS coverage on the date of their in-home visit, and 10,340 also had at least 6 consecutive months of FFS coverage preceding their in-home visit date. Overall, 628 REGARDS study participants had at least 6 months of consecutive FFS plus Part D coverage prior to their in-home study visit in 2006–2007. The flowcharts showing the linkage stratified by age (≥65 and <65 years of age) are provided in Supplementary Data . The mean look-back period used to determine comorbid conditions and health-care utilization was 2.70 (standard deviation, 0.64) years among REGARDS study participants and 2.77 (standard deviation, 0.61) years for Medicare beneficiaries in the 5% sample with FFS coverage.

Figure 1.

Figure 1.

Figure 1.

Flowchart of participants through the process of linkage to Medicare claims, Reasons for Geographic and Racial Differences in Stroke Study, 2003–2007. A) Overall population; B) blacks; C) whites. Participants who were excluded for not having records in the Medicare beneficiary summary file (BSF) from 2000 to 2012 are included in the 2013 BSF. Medicare Parts A, B, and D provide coverage for inpatient care, outpatient care, and prescription medication, respectively. Part C is a capitated program, and the services provided are not observable through claims. Fee-for-service (FFS) coverage was defined as being enrolled in Medicare Parts A and B but not in Medicare Part C. BIN, beneficiary identification number; REGARDS, Reasons for Geographic and Racial Differences in Stroke.

Participants 65 years or older

Overall, 80% of REGARDS study participants aged 65 years or older on their in-home study visit date had data linked to Medicare claims (64% and 16% with and without FFS coverage, respectively). Among REGARDS study participants aged 65 years or older, those with data linked to Medicare with FFS coverage, with data linked without FFS coverage, and without data linked to Medicare were similar with respect to all characteristics except race and sex (Table  1 ). Results were similar when analyses were stratified by race ( Supplementary Data ).

Table 1.

Characteristics of Study Participants Enrolled in 2003–2007 With Data Linked to Medicare With and Without Medicare Fee-for-Service Coverage for the Month of Their In-Home Study Visit and Without Data Linked to Medicare Claims, Reasons for Geographic and Racial Differences in Stroke Study, 2003–2007 a

Characteristics Age ≥65 Years b
Age 45–64 Years b
With Data Linked to Medicare Claims on Date of In-Home Visit ( n = 11,947)
Without Data Linked on Their In-Home Visit c ( n = 3,014)
With Data Linked to Medicare Claims on Date of In-Home Visit ( n = 1,687)
Without Data Linked on Their In-Home Visit c ( n = 13,535)
With FFS Coverage ( n = 9,528)
Without FFS Coverage ( n = 2,419)
With FFS Coverage ( n = 1,311)
Without FFS Coverage ( n = 376)
Mean (SD) % Mean (SD) % Mean (SD) % Mean (SD) % Mean (SD) % Mean (SD) %
Age, years 72.8 (5.9) 72.3 (5.9) 72.9 (6.1) 58.5 (4.4) 58.1 (4.7) 57.2 (5.0)
Black 32.4 50.7 48.5 59.2 66.0 42.3
Female 50.8 51.6 58.9 58.7 50.0 57.7
Less than high school education 15.4 17.3 19.2 21.3 14.7 7.4
Health behavior
 Cigarette smoking 9.6 11.4 10.3 26.9 25.1 18.2
 Physical activity
  None 36.6 36.4 38.2 45.1 42.1 30.5
  1–3 times/week 32.7 34.1 31.7 31.6 34.1 40.0
  ≥4 times/week 30.7 29.5 30.1 23.3 23.7 29.5
Comorbid conditions
 Diabetes 23.2 23.7 26.0 40.0 35.9 17.9
 Hypertension 65.2 65.4 67.7 72.6 70.7 50.3
 Myocardial infarction 17.0 15.8 15.3 18.8 16.7 8.0
 Self-rated health
  Excellent 16.1 15.6 14.8 4.6 2.9 17.6
  Very good 30.6 30.9 29.8 11.9 12.8 32.7
  Good 35.6 36.6 36.7 33.2 36.3 34.3
  Fair 14.5 14.4 15.6 35.6 36.3 12.7
  Poor 3.2 2.5 3.1 14.7 11.7 2.7
 Scores on SF-12 mental 55.4 (7.4) 55.2 (7.4) 54.8 (8.0) 49.4 (11.7) 50.0 (11.2) 53.3 (8.7)
 Scores on SF-12 physical 45.9 (10.3) 46.5 (10.0) 45.8 (10.3) 35.4 (11.8) 35.7 (11.5) 48.2 (9.9)
Medication
 Antihypertensives 58.8 59.8 62.2 68.4 66.5 45.1
 Statins 37.7 36.0 36.0 37.5 36.2 24.4
 Multiple medication use d 22.0 16.1 18.2 31.7 26.9 11.4
Laboratory and exam measurements
 Body mass index e 28.3 (5.5) 28.7 (5.8) 28.7 (5.9) 32.0 (7.5) 32.3 (8.2) 29.9 (6.4)
 Systolic blood pressure, mm Hg 130.1 (16.8) 130.0 (17.0) 129.4 (17.0) 128.8 (17.2) 129.6 (17.2) 124.9 (16.0)
 Diastolic blood pressure, mm Hg 75.2 (9.5) 76.0 (9.9) 75.6 (9.6) 77.9 (10.4) 78.8 (10.4) 77.6 (9.6)
 Total cholesterol, mg/dL 187.5 (39.6) 190.0 (39.4) 189.7 (41.0) 190.9 (44.9) 190.3 (41.6) 196.4 (39.5)
 HDL cholesterol, mg/dL 51.5 (16.3) 52.0 (16.0) 53.2 (16.5) 49.1 (15.2) 48.6 (15.6) 52.0 (16.1)
 LDL cholesterol, mg/dL 109.4 (33.9) 113.1 (34.5) 110.7 (34.5) 112.4 (38.9) 113.8 (36.0) 118.1 (34.6)
 Estimated glomerular filtration rate, mL/minute/1.73 m 2 76.3 (18.9) 78.0 (18.9) 77.5 (19.2) 88.0 (25.6) 90.1 (24.9) 93.5 (17.2)
 Albuminuria f 19.7 21.9 21.2 24.3 26.6 13.4

Abbreviations: FFS, fee-for-service; HDL, high-density lipoprotein; LDL, low-density lipoprotein; REGARDS, Reasons for Geographic and Racial Differences in Stroke; SD, standard deviation; SF-12, 12-item short form health survey.

a The data in Table  1 were collected as part of the REGARDS study. FFS was defined by being enrolled in Medicare Parts A and B but not in Medicare Part C.

b Age was based on the date of each participant's in-home study visit.

c Not linked for the month of the REGARDS in-home study visit.

d Multiple medication use was defined as 10 or more prescription or over-the counter drugs.

e Weight (kg)/height (m) 2 .

f Albuminuria was defined by a urinary albumin-to-creatinine ratio ≥30 mg/g.

Compared with the 5% random sample of Medicare beneficiaries in 2005, REGARDS study participants aged 65 years or older with 6 consecutive months of FFS coverage prior to their in-home study visit were younger, less likely to be female, and more likely to be black (Table  2 ). Additionally, REGARDS study participants were less likely to have dementia and more likely to have had a prostate-specific antigen test or mammography than Medicare beneficiaries in the 5% random sample. The distributions of other comorbid conditions and health-care utilization were similar. After age, sex, race, and region standardization, the prevalence of comorbid conditions was similar for REGARDS study participants and the 5% random sample of Medicare beneficiaries ( Supplementary Data ). However, a higher percentage of REGARDS study participants had a prostate-specific antigen test and mammography, while a lower percentage of REGARDS study participants had dementia or received state subsidies, an indicator of low income.

Table 2.

Comorbid Conditions and Health-Care Utilization Among Study Participants Enrolled in 2003–2007 With Data Linked to Medicare and 2005 Medicare 5% Random Sample Aged ≥65 Years With 6 or More Months of Continuous Medicare Fee-for-Service Coverage Prior to Their Index Date, Reasons for Geographic and Racial Differences in Stroke Study, 2003–2007 a

Overall
Black
White
REGARDS-Medicare Linked, % ( n = 9,087) b Medicare 5% Random Sample, % ( n = 1,509,427) SMD c REGARDS-Medicare Linked, % ( n = 2,907) b Medicare 5% Random Sample, % ( n = 123,134) SMD c REGARDS-Medicare Linked, % ( n = 6,180) b Medicare 5% Random Sample, % ( n = 1,386,293) SMD c
Age, years 0.25 0.23 0.22
 65–69 33.4 22.9 36.9 27.1 31.8 22.5
 70–74 29.4 23.1 30.3 24.9 28.9 23.0
 75–79 22.1 21.0 19.6 19.7 23.2 21.1
 ≥80 15.1 33.0 13.1 28.3 16.1 33.4
Black 32.0 8.2 0.62
Female 50.5 63.4 0.26 59.8 66.0 0.13 46.1 63.1 0.35
Comorbid conditions
 Myocardial infarction 6.1 6.2 <0.01 5.5 5.8 0.01 6.4 6.3 <0.01
 Coronary heart disease 27.3 26.7 0.02 26.0 24.1 0.04 28.0 26.9 0.02
 Peripheral artery disease 4.5 4.5 <0.01 6.1 6.5 0.02 3.7 4.3 0.03
 Stroke 5.6 5.1 0.02 7.4 7.4 <0.01 4.8 4.9 0.01
 Dementia 1.1 6.0 0.27 1.5 8.0 0.31 0.9 5.8 0.28
 Diabetes 21.7 18.2 0.09 32.3 29.8 0.05 16.7 17.1 0.01
 Chronic kidney disease 5.5 6.0 0.02 7.5 10.4 0.10 4.5 5.6 0.05
 Chronic pulmonary disease 21.0 21.5 0.01 20.9 19.6 0.03 21.1 21.7 0.02
Health-care utilization
 PSA screening (men only) 54.3 48.1 0.12 55.0 44.3 0.21 54.1 48.4 0.11
 Mammography (women only) 67.9 45.7 0.46 62.9 40.4 0.46 70.9 46.2 0.52
 All-cause hospitalization 33.7 35.4 0.04 34.0 36.0 0.04 33.6 35.3 0.04
Other Medicare variables
 Receiving Medicare for reasons other than age 9.5 7.7 0.06 16.2 15.4 0.02 6.5 7.0 0.02
 Receiving state subsidies 8.3 11.1 0.09 18.8 31.5 0.29 3.3 9.3 0.25

Abbreviations: PSA, prostate-specific antigen; REGARDS, Reasons for Geographic and Racial Differences in Stroke; SMD, standardized mean difference.

a The data in Table  2 were derived from Medicare claims data. The results in this table are restricted to participants aged ≥65 years 6 months prior to their index date. As described in Methods, the index date is the in-home study visit for REGARDS study participants and a randomly selected date in 2005, after 6 months of continuous fee-for-service coverage for individuals in the 5% Medicare random sample. Fee-for-service coverage was defined by being enrolled in Medicare Parts A and B but not in Medicare Part C.

b The total number of participants in the analyses presented in Table  2 differs from that in Table  1 as we excluded participants without 6 months of consecutive Medicare FFS coverage before their in-home study visit.

c SMD >0.1 was considered as potentially important (i.e., a large difference in the distributions between groups).

Compared with Medicare beneficiaries with FFS plus Part D coverage in 2006–2007, REGARDS study participants aged 65 years or older with 6 consecutive months of FFS plus Part D coverage prior to their in-home study visit were younger and more likely to be black (Table  3 ). In addition, REGARDS study participants were less likely to be taking anticoagulants. After age, sex, race, and region standardization, use of antidiabetes and antihypertensive medications was less common for REGARDS study participants with data linked compared with beneficiaries in the 2005 Medicare 5% sample ( Supplementary Data ).

Table 3.

Drug Utilization Among Study Participants Enrolled in 2006–2007 With Data Linked to Medicare and the 2006–2007 Medicare 5% Random Sample Aged ≥65 Years With 6 or More Months of Continuous Medicare Fee-for-Service Plus Part D Coverage, Reasons for Geographic and Racial Differences in Stroke Study, 2003–2007 a

Overall
Black
White
REGARDS-Medicare Linked, % ( n = 494) b Medicare 5% Random Sample, % ( n = 735,922) SMD c REGARDS-Medicare Linked, % ( n = 152) b Medicare 5% Random Sample, % ( n = 70,798) SMD c REGARDS-Medicare Linked, % ( n = 342) b Medicare 5% Random Sample, % ( n = 665,124) SMD c
Age, years 0.29 0.31 0.28
 65–69 34.0 22.5 38.8 24.9 31.9 22.2
 70–74 30.8 21.7 28.3 23.2 31.9 21.5
 75–79 22.1 19.7 20.4 19.1 22.8 19.8
 ≥80 13.2 36.2 12.5 32.7 13.5 36.6
Black 30.8 9.6 0.55
Female 63.6 69.6 0.13 75.0 72.1 0.08 58.5 69.3 0.23
Medication
 Antidiabetes medication 22.1 18.4 0.09 32.9 29.9 0.07 17.3 17.2 <0.01
 Antihypertensives 58.1 55.2 0.06 69.7 66.2 0.08 52.9 54.0 0.02
 Statin 45.5 41.1 0.09 46.1 39.5 0.13 45.3 41.3 0.08
 Other lipid-lowering medication 10.9 10.5 0.02 15.1 10.8 0.13 9.1 10.4 0.05
 Narcotics 43.3 38.8 0.09 45.4 40.2 0.10 42.4 38.7 0.08
 NSAIDs 19.8 17.9 0.05 20.4 21.5 0.03 19.6 17.5 0.05
 Proton-pump inhibitor 24.7 23.1 0.04 28.9 24.0 0.11 22.8 22.9 0.00
 Bisphosphonate 14.6 13.3 0.04 9.2 6.6 0.10 17.0 14.0 0.08
 Steroid 38.5 35.3 0.07 36.8 29.9 0.15 39.2 35.9 0.07
 Thyroid medication 15.6 18.7 0.08 9.2 8.9 0.01 18.4 19.7 0.03
 Antibiotics 56.7 53.9 0.06 48.0 45.2 0.06 60.5 54.9 0.11
 Antidepressant 22.9 25.5 0.06 24.3 17.9 0.16 22.2 26.3 0.09
 Anticoagulant 5.7 10.6 0.18 Redacted d Redacted 0.22 7.0 10.9 0.14

Abbreviations: NSAID, nonsteroidal antiinflammatory drug; REGARDS, Reasons for Geographic and Racial Differences in Stroke; SMD, standardized mean difference.

a The data in Table  3 were derived from Medicare claims data. The results in this table are restricted to participants aged ≥65 years 6 months prior to their index date. As described in Methods, the index date is the in-home study visit for REGARDS participants and a randomly selected date in 2006 or 2007, after 6 months of continuous fee-for-service coverage for individuals in the 5% Medicare random sample. Fee-for-service coverage was defined by being enrolled in Medicare Parts A and B but not in Medicare Part C.

b Restricted to those whose REGARDS study visit date was in 2006 and 2007 as Medicare Part D coverage began on January 1, 2006.

c SMD >0.1 was considered as potentially important (i.e., a large difference in the distributions between groups).

d Redacted cells with a sample size of less than 11 people are not displayed.

Participants 45–64 years of age

Overall, data from 11% of REGARDS study participants less than 65 years of age on their in-home visit date were linked to Medicare claims (9% and 2% with and without FFS coverage, respectively). REGARDS study participants with FFS coverage were more likely to have less than a high school education and to participate in no physical activity compared with those who had data linked without FFS coverage and those without data linked. REGARDS study participants with data linked with or without FFS coverage were more likely to smoke cigarettes; to have diabetes, hypertension, or a history of myocardial infarction; to take antihypertensive medications; and to have multiple medication use (Table  1 ) compared with those without data linked. These participants also had lower physical functioning scores on the 12-item short form health survey compared with those without data linked. Results were similar for whites and blacks ( Supplementary Data ).

Compared with the 5% sample of Medicare beneficiaries aged 45–64 years in 2005 with 6 consecutive months of FFS coverage, REGARDS study participants aged 45–64 with 6 consecutive months of FFS coverage prior to their in-home study visit were older, more likely to be female, to be black, to have several comorbid conditions, and to have had a prostate-specific-antigen test or mammography (Table  4 ). With the exception of receipt of state subsidies, the prevalences of comorbid conditions and health-care utilization were similar for REGARDS study participants 45–64 years of age and the Medicare 5% sample after age, sex, race, and region standardization ( Supplementary Data ).

Table 4.

Comorbid Conditions and Health-Care Utilization Among Study Participants Enrolled in 2003–2007 With Data Linked to Medicare and 2005 Medicare 5% Random Sample Aged 45–64 Years With 6 or More Months of Continuous Medicare Fee-for-Service Coverage Prior to Their Index Date, Reasons for Geographic and Racial Differences in Stroke Study, 2003–2007 a

Overall
Black
White
REGARDS-Medicare Linked, % ( n = 1,183) Medicare 5% Random Sample, % ( n = 192,415) SMD b REGARDS-Medicare Linked, % ( n = 713) Medicare 5% Random Sample, % ( n = 38,816) SMD b REGARDS-Medicare Linked, % ( n = 470) Medicare 5% Random Sample, % ( n = 153,599) SMD b
Age, years 0.66 0.75 0.59
 45–49 4.8 20.0 4.5 21.6 5.3 19.6
 50–54 12.0 22.9 12.5 24.7 11.3 22.5
 55–59 37.4 27.2 39.8 27.1 33.6 27.3
 60–64 45.8 29.8 43.2 26.6 49.8 30.6
Black, % 60.3 20.2 0.90
Female, % 58.3 47.7 0.21 62.4 51.0 0.23 52.1 46.9 0.10
Comorbid conditions
 Myocardial infarction 7.4 5.8 0.06 5.6 5.7 <0.01 10.0 5.8 0.16
 Coronary heart disease 24.6 19.4 0.12 22.9 19.5 0.08 27.2 19.4 0.19
 Peripheral artery disease 6.0 3.9 0.10 6.6 5.0 0.07 5.1 3.7 0.07
 Stroke 5.4 3.8 0.08 5.2 5.6 0.01 5.7 3.3 0.12
 Dementia Redacted c Redacted 0.05 Redacted Redacted 0.05 Redacted Redacted 0.06
 Diabetes 35.2 22.8 0.28 40.3 29.2 0.23 27.7 21.1 0.15
 Chronic kidney disease 10.7 7.5 0.11 13.0 12.3 0.02 7.0 6.2 0.03
 Chronic pulmonary disease 28.9 27.2 0.04 26.5 24.9 0.04 32.6 27.7 0.10
Health-care utilization
 PSA screening (men only) 29.8 24.1 0.13 34.3 26.0 0.18 24.4 23.6 0.02
 Mammography (women only) 58.6 45.6 0.26 61.3 46.3 0.30 53.5 45.3 0.16
 All-cause hospitalization 39.6 36.3 0.07 39.3 38.9 0.01 40.2 35.7 0.09
Other
 Receiving Medicare for reasons other than age Redacted Redacted 0.03 Redacted Redacted 0.04 100.0 100.0 <0.01
 Receiving state subsidies 30.2 40.4 0.22 36.3 50.3 0.28 20.9 37.9 0.38

Abbreviations: PSA, prostate-specific antigen; REGARDS, Reasons for Geographic and Racial Differences in Stroke; SMD, standardized mean difference.

a The data in Table  4 were derived from Medicare claims data. The results in this table are presented for participants aged 45–64 years on their index date. As described in Methods, the index date is the in-home study visit for REGARDS study participants and a randomly selected date in 2006 or 2007, after 6 months of continuous fee-for-service plus Part D coverage for individuals in the 5% Medicare random sample. Fee-for-service coverage was defined by being enrolled in Medicare Parts A and B but not in Medicare Part C.

b SMD >0.1 was considered as potentially important (i.e., a large difference in the distributions between groups).

c Redacted cells with a sample size of less than 11 people are not displayed.

There were differences in medication use between REGARDS study participants and the 5% sample of Medicare beneficiaries aged 45–64 years with 6 consecutive months of FFS plus Part D coverage prior to their in-home study visit in 2006–2007 (Table  5 ). After standardization for age, sex, race, and region, differences remained present ( Supplementary Data ).

Table 5.

Drug Utilization Among Study Participants Enrolled in 2006–2007 With Data Linked to Medicare and the 2006–2007 Medicare 5% Random Sample Aged 45–64 Years With 6 or More Months of Continuous Medicare Fee-for-Service Plus Part D Coverage Prior to Their Index Date, Reasons for Geographic and Racial Differences in Stroke Study, 2003–2007 a

Overall
Black
White
REGARDS-Medicare Linked, % ( n = 127) b Medicare 5% Random Sample, % ( n = 136,938) SMD c REGARDS-Medicare Linked, % ( n = 82) b Medicare 5% Random Sample, % ( n = 31,230) SMD c REGARDS-Medicare Linked, % ( n = 45) b Medicare 5% Random Sample, % ( n = 105,708) SMD c
Age, years 0.52 0.48 0.62
 45–49 11.0 22.5 Redacted d 23.2 Redacted 22.4
 50–54 22.8 24.9 Redacted 26.2 Redacted 24.5
 55–59 39.4 25.8 Redacted 26.3 Redacted 25.6
 60–64 26.8 26.8 Redacted 24.3 Redacted 27.5
Black 64.6 22.8 0.93
Female 70.1 51.8 0.38 72.0 54.1 0.38 66.7 51.1 0.32
Medication
 Antidiabetes medication 32.3 22.6 0.22 39.0 27.5 0.25 Redacted Redacted 0.03
 Antihypertensives 52.8 43.0 0.20 58.5 53.4 0.10 42.2 39.9 0.05
 Statin 40.9 34.4 0.13 41.5 31.9 0.20 40.0 35.2 0.10
 Other lipid-lowering medication 10.2 9.0 0.04 Redacted Redacted 0.07 Redacted Redacted 0.06
 Narcotics 60.6 53.8 0.14 57.3 51.6 0.12 66.7 54.4 0.25
 NSAIDs 34.6 25.5 0.20 30.5 27.2 0.07 42.2 24.9 0.37
 Proton-pump inhibitor 26.8 28.6 0.04 19.5 25.5 0.14 40.0 29.6 0.22
 Bisphosphonate Redacted Redacted 0.06 Redacted Redacted 0.12 Redacted 6.3 0.36
 Steroid 44.1 38.8 0.11 40.2 34.8 0.11 51.1 39.9 0.23
 Thyroid medication 10.2 12.5 0.07 Redacted Redacted 0.06 Redacted Redacted 0.03
 Antibiotics 54.3 56.7 0.05 51.2 51.6 0.01 60.0 58.1 0.04
 Antidepressant 40.2 43.6 0.07 30.5 29.7 0.02 57.8 47.7 0.20
 Anticoagulant Redacted Redacted 0.18 Redacted Redacted 0.17 Redacted Redacted 0.19

Abbreviations: NSAID, nonsteroidal antiinflammatory drug; REGARDS, Reasons for Geographic and Racial Differences in Stroke; SMD, standardized mean difference.

a The data in Table  5 were derived from Medicare claims data. The results in this table are presented for participants aged 45–64 years on their index date. As described in Methods, the index date is the in-home study visit for REGARDS participants and a randomly selected date in 2006 or 2007, after 6 months of continuous fee-for-service plus Part D coverage for individuals in the 5% Medicare random sample. Fee-for-service coverage was defined by being enrolled in Medicare Parts A and B but not in Medicare Part C.

b Restricted to participants with their home visit date in 2006 and 2007 as Medicare Part D coverage began on January 1, 2006.

c SMD >0.1 was considered as potentially important (i.e., a large difference in the distributions between groups).

d Redacted cells with a sample size of less than 11 people are not displayed.

DISCUSSION

In the current study, we successfully linked data from 80% of REGARDS study participants and 92% of those who provided their SSN and were aged 65 years or older on their in-home visit date. Among participants with data linked, 80% (64% of all participants) had full FFS coverage on the date of their in-home study visit. There were no substantial differences between REGARDS study participants with data linked to Medicare claims with FFS coverage, with data linked to Medicare claims without FFS coverage, and without data linked to Medicare claims. Additionally, with the exceptions of age, sex, and race, REGARDS study participants aged 65 years or older with FFS coverage were similar to the 5% random sample of Medicare beneficiaries in this age range with similar insurance coverage with respect to comorbid conditions and medication use. We successfully linked data from 11% of REGARDS study participants younger than 65 years of age at baseline to Medicare claims. Many differences were present when comparing older REGARDS study participants with REGARDS study participants younger than 65 years with data linked to Medicare with FFS coverage, with data linked without this coverage, and those without data linked.

In addition to the REGARDS study, several other large population-based studies including the Framingham Heart Study, the Jackson Heart Study, and the Multi-Ethnic Study of Atherosclerosis have been linked to Medicare claims ( 8 , 15 , 16 ). Data from the Iowa Women's Health Study cohort were linked to Medicare by using the SSN with a 99.2% success among participants aged 65 years or older ( 8 ). The Women's Health Initiative was linked to Medicare by using the SSN, the date of birth and, in some cases, the date of death or residential zip code, with 90% of participants aged ≥65 years having data successfully linked ( 17 ). Data from the Atherosclerosis Risk in Communities study cohort were linked to Medicare by using the SSN, sex, and date of birth with a 92.3% success ( 9 ). We successfully linked data from 92% of REGARDS study participants who provided their SSN and were ≥65 years of age to Medicare claims. The percentage of participants with data linked to Medicare claims is consistent with our a priori expectations, as Medicare provides coverage to 93% of US residents aged 65 years or older ( 11 ). We successfully linked data for 11% of REGARDS study participants younger than 65 years to Medicare. The prevalence of Medicare coverage among REGARDS study participants (11%) was higher than the 2.6% with Medicare coverage in the US population <65 years of age because our cohort included participants 45–<65 years of age.

Medicare claims data can supplement primary data collection in several domains. They can be used to estimate health-care utilization and expenditures. Claims can also be used to identify outcomes not detected through active surveillance ( 17 ). Although Medicare claims data have many strengths, they also have limitations. As claims data are generated for the purpose of reimbursement and not research, they include only diagnoses and procedures that are billed to Medicare. Diagnosis codes are not always valid for identifying diseases and procedures. The positive predictive value for identifying some diseases is high (e.g., stroke: 92.6%), while it is low for other diseases (e.g., dementia: 58%) ( 5 , 18 , 19 ). Moreover, the availability of Medicare claims data usually has a 2-year lag period, resulting in delays in performing analyses.

SSNs are unique identifiers, and using them to perform linkages should give a definitive matching status. However, linkage using SSNs may result in incomplete matches. This may occur because of some participants’ refusal to provide their SSNs or providing an incorrect SSN. In the REGARDS study, SSNs were not available for 4,000 (13%) of the participants. Additionally, some participants may not report their exact birth date. Therefore, our requirement that REGARDS study data and CMS match on month and day of birth probably excluded some successful matches. In addition, mismatches could occur because of recording errors in both the primary data source (e.g., REGARDS study) and claims data (e.g., Medicare). The 80% of REGARDS study participants aged 65 years or older being linked to Medicare is similar to what has been reported previously ( 3 ). Finally, it is possible to link inpatient, outpatient cohorts or registries to Medicare data without having SSNs, although the process is more cumbersome ( 3 , 7 ).

Not all REGARDS study participants' data were linked to Medicare claims, raising the concern that a nonrepresentative sample of REGARDS study participants had Medicare linkage and FFS coverage. A priori, we did not know if differences would exist between REGARDS study participants whose data were versus were not linked to Medicare claims. However, with the exception of race and sex, intentionally oversampled in the REGARDS study, characteristics were similar for participants ≥65 years of age with and without data linked to Medicare and with and without FFS coverage. Additionally, few differences were present between REGARDS study participants and the Medicare 5% random sample with FFS coverage. This suggests that little selection bias is present in the REGARDS-Medicare–linked data for participants ≥65 years of age, and the results of future analyses using these data should be highly generalizable to the older US population with Medicare FFS coverage. However, for those 45–64 years of age, most characteristics were different between REGARDS participants with and without data linked to Medicare claims and between REGARDS study participants and the general Medicare population in this age range with FFS coverage. Therefore, there is probably limited generalizability of results from analyses using Medicare claims among younger REGARDS study participants.

The strengths of using the REGARDS study for the current analysis include its large sample size, inclusion of participants from across the continental United States, and extensive data collection at baseline. The current study has some limitations. The number of variables compared in our study is limited, and variables not compared in the present study (e.g., history of cancer) may be valuable to other studies. Not all the algorithms used to define comorbid conditions from claims data in our study have been validated. In addition, we used 2005 Medicare data when comparing REGARDS study participants and the 5% random sample of Medicare beneficiaries as it represents the midpoint of REGARDS enrollment. The addition of Part D may have resulted in differences in the population of Medicare beneficiaries with FFS coverage since 2006. The sample size for assessing medication use was small, and medication use assessed through primary collected data and Medicare pharmacy claims may not be comparable.

In conclusion, the results presented provide information regarding the generalizability of studies utilizing the REGARDS-Medicare–linked data set. Analyses of Medicare claims for REGARDS study participants aged 65 years or older appear to be highly generalizable to US adults aged 65 years or older with Medicare FFS coverage with little threat of bias due to the inclusion of a select population.

Supplementary Material

Supplementary Data

ACKNOWLEDGMENTS

Author affiliations: Division of Clinical Immunology and Rheumatology, Department of Medicine, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama (Fenglong Xie, Jeffrey R. Curtis); Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama (Fenglong Xie, Lisandro D. Colantonio, Emily B. Levitan, Paul Muntner); Division of Preventive Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama (Monika M. Safford); and Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama (George Howard).

This research project is supported by a cooperative agreement (U01 NS041588) from the National Institute of Neurological Disorders and Stroke, National Institutes of Health, Department of Health and Human Services. Additional funding was provided by grants from the Agency for Healthcare Research and Quality (R01-HS-8517) and the National Institutes of Health (R01HL080477 and K24HL111154-K24).

We thank the other investigators for their valuable contributions. A full list of participating REGARDS investigators and institutions can be found at http://www.regardsstudy.org .

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Neurological Disorders and Stroke or the National Institutes of Health.

Conflict of interest: none declared.

REFERENCES

  • 1. Curtis LH , Greiner MA , Hammill BG et al. . Representativeness of a national heart failure quality-of-care registry: comparison of OPTIMIZE-HF and non-OPTIMIZE-HF Medicare patients . Circ Cardiovasc Qual Outcomes . 2009. ; 24 : 377 – 384 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Dokholyan RS , Muhlbaier LH , Falletta JM et al. . Regulatory and ethical considerations for linking clinical and administrative databases . Am Heart J . 2009. ; 1576 : 971 – 982 . [DOI] [PubMed] [Google Scholar]
  • 3. Hammill BG , Hernandez AF , Peterson ED et al. . Linking inpatient clinical registry data to Medicare claims data using indirect identifiers . Am Heart J . 2009. ; 1576 : 995 – 1000 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Jacobs JP , Edwards FH , Shahian DM et al. . Successful linking of the Society of Thoracic Surgeons database to Social Security data to examine survival after cardiac operations . Ann Thorac Surg . 2011. ; 921 : 32 – 37 ; discussion 38–39 . [DOI] [PubMed] [Google Scholar]
  • 5. Kumamaru H , Judd SE , Curtis JR et al. . Validity of claims-based stroke algorithms in contemporary Medicare data: Reasons for Geographic and Racial Differences in Stroke (REGARDS) study linked with Medicare claims . Circ Cardiovasc Qual Outcomes . 2014. ; 74 : 611 – 619 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Setoguchi S , Zhu Y , Jalbert JJ et al. . Validity of deterministic record linkage using multiple indirect personal identifiers: linking a large registry to claims data . Circ Cardiovasc Qual Outcomes . 2014. ; 73 : 475 – 480 . [DOI] [PubMed] [Google Scholar]
  • 7. Curtis JR , Chen L , Bharat A et al. . Linkage of a de-identified United States rheumatoid arthritis registry with administrative data to facilitate comparative effectiveness research . Arthritis Care Res (Hoboken) . 2014. ; 6612 : 1790 – 1798 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Virnig B , Durham SB , Folsom AR et al. . Linking the Iowa Women's Health Study cohort to Medicare data: linkage results and application to hip fracture . Am J Epidemiol . 2010. ; 1723 : 327 – 333 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Bengtson LGS , Lutsey PL , Loehr LR et al. . Impact of atrial fibrillation on healthcare utilization in the community: the Atherosclerosis Risk in Communities study . J Am Heart Assoc . 2014. ; 36 : e001006 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Bengtson LG , Kucharska-Newton A , Wruck LM et al. . Comparable ascertainment of newly-diagnosed atrial fibrillation using active cohort follow-up versus surveillance of centers for Medicare and Medicaid services in the Atherosclerosis Risk in Communities study . PLoS One . 2014. ; 94 : e94321 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Administration on Aging, Administration for Community Living. Profile of older Americans: 2013 . Washington, DC: : US Department of Health and Human Services; ; 2014. . http://www.aoa.gov/Aging_Statistics/Profile/2013/index.aspx . Accessed March 23, 2015 . [Google Scholar]
  • 12. California Health Advocates. Medicare: policy, advocacy and education . http://www.cahealthadvocates.org/disabilities/overview.html . Updated February 27, 2009. Accessed March 23, 2015 .
  • 13. Howard VJ , Cushman M , Pulley L et al. . The Reasons for Geographic and Racial Differences in Stroke study: objectives and design . Neuroepidemiology . 2005. ; 253 : 135 – 143 . [DOI] [PubMed] [Google Scholar]
  • 14. Yang DS , Dalton JE . A unified approach to measuring the effect size between two groups using SAS® . SAS Global Forum 2012, 2013: paper. 335–2012 . http://support.sas.com/resources/papers/proceedings12/335-2012.pdf . Accessed March 23, 2015 .
  • 15. Benjamin I , Brown N , Burke G et al. . American Heart Association cardiovascular genome-phenome study: foundational basis and program . Circulation . 2015. ; 1311 : 100 – 112 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. O'Neal WT , Efird JT , Dawood FZ et al. . Coronary artery calcium and risk of atrial fibrillation (from the multi-ethnic study of atherosclerosis) . Am J Cardiol . 2014. ; 11411 : 1707 – 1712 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Hlatky MA , Ray RM , Burwen DR et al. . Use of Medicare data to identify coronary heart disease outcomes in the Women's Health Initiative . Circ Cardiovasc Qual Outcomes . 2014. ; 7 : 157 – 162 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Muntner P , Gutiérrez OM , Zhao H et al. . Validation study of Medicare claims to identify older US adults with CKD using the Reasons for Geographic and Racial Differences in Stroke (REGARDS) study . Am J Kidney Dis . 2014. ; 652 : 249 – 258 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Taylor DH Jr , Ostbye T , Langa KM et al. . The accuracy of Medicare claims as an epidemiological tool: the case of dementia revisited . J Alzheimers Dis . 2009. ; 174 : 807 – 815 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES