Skip to main content
. 2011 Feb 8;26(8):920–929. doi: 10.1007/s11606-010-1621-5

Table 4.

Examples of High Value Datasets

Cost, availability, and complexity Dataset Description Sample publications
Free. Readily available. Population-based survey with cross-sectional design. Does not require special statistical techniques to address complex sampling Surveillance, Epidemiology and End Results Program (SEER) http://www.seer.cancer.gov/ Population-based multi-regional cancer registry database. SEER data are updated annually. Can be linked to Medicare claims and files (see Medicare below) Trends in breast-conserving surgery among Asian Americans and Pacific Islanders, 1992–200012
Treatment and outcomes of gastric cancer among US-born and foreign-born Asians and Pacific Islanders13
Free. Readily available. Requires statistical considerations to account for complex sampling design and use of survey weights National Ambulatory Medical Care Survey (NAMCS) & National Hospital Ambulatory Care Survey (NHAMCS) http://www.cdc.gov/nchs/ahcd.htm Nationally-representative serial cross-sectional surveys of outpatient and emergency department visits. Can combine survey years to increase sample sizes (e.g., for uncommon conditions) or evaluate temporal trends. Provides national estimates Preventive health examinations and preventive gynecological examinations in the US14
The NAMCS and NHAMCS are conducted annually. Do not link to other datasets Primary care physician office visits for depression by older Americans15
National Health Interview Survey (NHIS) http://www.cdc.gov/nchs/nhis.htm Nationally-representative serial cross-sectional survey of individuals and families including information on health status, injuries, health insurance, access and utilization information. The NHIS is conducted annually. Can combine survey years to look at rare conditions Psychological distress in long-term survivors of adult-onset cancer: results from a national survey16
Can be linked to National Center for Health Statistics Mortality Data; Medicare enrollment and claims data; Social Security Benefit History Data; Medical Expenditure Panel Survey (MEPS) data; and National Immunization Provider Records Check Survey (NIPRCS) data from 1997–1999 Diabetes and Cardiovascular Disease among Asian Indians in the US17
Behavioral Risk Factor Surveillance System (BRFSS) http://www.cdc.gov/brfss/ Serial cross-sectional nationally-representative survey of health risk behaviors, preventative health practices, and health care access. Provides national and state estimates. Since 2002, the Selected Metropolitan/Micropolitan Area Risk Trends (SMART) project has also used BRFSS data to identify trends in selected metropolitan and micropolitan statistical areas (MMSAs) with 500 or more respondents. BRFSS data are collected monthly. Does not link to other datasets

Perceived discrimination in health care and use of preventive health services18

Use of recommended ambulatory care services: is the Veterans Affairs quality gap narrowing?19

Free or minimal cost. Readily available. Can do more complex studies by combining data from multiple waves and/or records. Accounting for complex sampling design and use of survey weights can be more complex when using multiple waves—seek support from a statistician. Or can restrict sample to single waves for ease of use Nationwide Inpatient Sample (NIS) http://www.hcup-us.ahrq.gov/databases.jsp The largest US database of inpatient hospital stays that incorporates data from all payers, containing data from approximately 20% of US community hospitals. Sampling frame includes approximately 90% of discharges from US hospitals Factors associated with patients who leave acute-care hospitals against medical advice20
NIS data is collected annually. For most states, the NIS includes hospital identifiers that permit linkages to the American Hospital Association (AHA) Annual Survey Database and county identifiers that permit linkages to the Area Resource File (ARF) Impact of hospital volume on racial disparities in cardiovascular procedure mortality21
National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm Nationally- representative series of studies combining data from interviews, physical examination, and laboratory tests Demographic differences and trends of vitamin D insufficiency in the US population,1988-200422
NHANES data are collected annually. Can be linked to National Death Index (NDI) mortality data; Medicare enrollment and claims data; Social Security Benefit History Data; and Medical Expenditure Panel Survey (MEPS) data; and Dual Energy X-Ray Absorptiometry (DXA) Multiple Imputation Data Files from 1999–2004 Association of hypertension, diabetes, dyslipidemia, and metabolic syndrome with obesity: findings from the National Health and Nutrition Examination Survey, 1999 to 200423
The Health and Retirement Study (HRS) http://hrsonline.isr.umich.edu/index.php A nationally-representative longitudinal survey of adults older than 50 designed to assess health status, employment decisions, and economic security during retirement Chronic conditions and mortality among the oldest old24
HRS data is collected every 2 years. Can be linked to Social Security Administration data; Internal Revenue Service data; Medicare claims data (see Medicare below); and Minimum Data Set (MDS) data Advance directed and surrogate decision making before death25
Medical Expenditure Panel Survey (MEPS) http://www.meps.ahrq.gov/mepsweb/ Serial nationally-representative panel survey of individuals, families, health care providers, and employers covering a variety of topics. MEPS data are collected annually Loss of health insurance among non-elderly adults in Medicaid26
Can be linked by request to the Agency for Healthcare Research and Quality to numerous datasets including the NHIS, Medicare data, and Social Security data Influence of patient-provider communication on colorectal cancer screening27
Data costs are in the thousands to tens of thousands of dollars. Requires an extensive application and time to acquire data is on the order of months at a minimum. Databases frequently have observations on the order of 100,000 to >1,000,000. Require additional statistical considerations to account for complex sampling design, use of survey weights, or longitudinal analysis. Multiple records per individual. Complex database structure requires a higher degree of analytic and programming skill to create a study dataset efficiently. Medicare claims data (alone), SEER-Medicare, and HRS-Medicare http://www.resdac.org/Medicare/data_available.asp Claims data on Medicare beneficiaries including demographics and resource utilization in a wide variety of inpatient and outpatient settings. Medicare claims data are collected continually and made available annually. Can be linked to other Medicare datasets that use the same unique identifier numbers for patients, providers, and institutions, for example, the Medicare Current Beneficiary Survey, the Long-Term Care Minimum Data Set, the American Hospital Association Annual Survey, and others. SEER and the HRS offer linkages to Medicare data as well (as described above) Long-term outcomes and costs of ventricular assist devices among Medicare beneficiaries28
Association between the Medicare Modernization Act of 2003 and patient wait times and travel distance for chemotherapy29
Medicare Current Beneficiary Survey (MCBS) http://www.cms.gov/MCBS/ Panel survey of a nationally-representative sample of Medicare beneficiaries including health status, health care use, health insurance, socioeconomic and demographic characteristics, and health expenditures. MCBS data are collected annually. Can be linked to other Medicare Data Cost-related medication nonadherence and spending on basic needs following implementation of Medicare Part D30
Medicare beneficiaries and free prescription drug samples: a national survey31