Skip to main content
. Author manuscript; available in PMC: 2019 Apr 1.
Published in final edited form as: Annu Rev Public Health. 2017 Dec 22;39:437–452. doi: 10.1146/annurev-publhealth-040617-013544

Table 3.

Health Care Claims Databases

Fair Health MarketScan Health Care
Cost Institute
Database
Optum Labs
Data
Warehouse
Organization that owns the data Fair Health Inc. (private non-profit) Truven Health Analytics (private for-profit) Health Care Cost Institute (private non-profit) Optum (subsidiary of UnitedHealth Group – public, for-profit)
Data contributors 60–70 private insurance carriers participating in the program (32, 80) 150 employers (1), 21 commercial health plans (1), Medicare and Medicaid (24). Aetna, Humana, Kaiser Permanente and UnitedHealthcare (47) Affiliated and non-affiliated commercial health plans, provider EMR/EHR systems (82)
Sample size 150 million covered lives, data gathered from about. 15 billion claims. Represents estimated 23.4% of national payments by privately insured patients (32). Represents 75% of privately insured population (32). 230 million unique patients (since 1995), most recent year includes 50 million covered lives (1). Claims related to 50 million unique people including individual, group, and Medicare Advantage members (44) (25.3% of nonelderly population with ESI) 150 million unique lives: 19% of US population in commercial health plans, 19% of those in Medicare Advantage plans, 24% of those in Medicare PDP plans, and 7% of U.S. population with any health care utilization (82)
Geographic coverage Covers every locality in U.S. (32) Wide geographic range, but disproportionately covers South (7). 10–12 unidentified states for Medicaid sample. (1) Unknown Relatively geographically representative, concentrated in the South and Midwest (82).
Variables of interest
  Geographic level Geozip (first three numbers of ZIP code), can be aggregated to State/MSA level Geozip (first three numbers of ZIP code), can be aggregated to State/MSA level (7) ZIP Code of with pops greater than 1350 (48). Core Based Statistical Area (only Metro areas with 50,000+ populations included) Not clear—at least available by Census Region
  Race/ethnicity Unknown Only for Medicaid (61) No Yes
  Age Yes (DOB) Yes (DOB) (1) Yes (DOB) (48) Yes
  Other demographics Some patient and provider information is optional. Gender, aid category for Medicaid populations (blind/disabled, Medicare eligible), employment status, relationship of patient to beneficiary urban/rural status (64) Gender, relationship to policyholder Gender, sociodemographic characteristics (111) Race, income, education, assets, health risk assessment, mortality available via linked secondary data sources (48).
  Inpatient
  Outpatient
  Pharmacy
  Lab ✓ (41)
  Behavioral ✓ (26) ✓ (49)
  Dental
Type of claim: fee charged vs. paid claim All claims contain the fee billed by provider. About 50% of claims report “allowed charge” (80) Claims represent the allowed amount paid by the plan (7). Claims represent the allowed amount paid by the plan (48). Claims represent actual paid amounts (12)
Run-out period Database is updated twice yearly. Claims have a 3-month run-out (31). Analysts can choose between “Early View data” with no minimum run-out, “Standard Updates” with 3 month minimum run-out, and “Annual File” with at least 6-month run-out (1). Annual claims submitted at end of CY (44). Claims have a 5–6 month run-out period depending on payer (43). Unknown