Table 4.
Examples of High Value Datasets
Cost, availability, and complexity | Dataset | Description | Sample publications |
---|---|---|---|
Free. Readily available. Population-based survey with cross-sectional design. Does not require special statistical techniques to address complex sampling | Surveillance, Epidemiology and End Results Program (SEER) http://www.seer.cancer.gov/ | Population-based multi-regional cancer registry database. SEER data are updated annually. Can be linked to Medicare claims and files (see Medicare below) | Trends in breast-conserving surgery among Asian Americans and Pacific Islanders, 1992–200012 |
Treatment and outcomes of gastric cancer among US-born and foreign-born Asians and Pacific Islanders13 | |||
Free. Readily available. Requires statistical considerations to account for complex sampling design and use of survey weights | National Ambulatory Medical Care Survey (NAMCS) & National Hospital Ambulatory Care Survey (NHAMCS) http://www.cdc.gov/nchs/ahcd.htm | Nationally-representative serial cross-sectional surveys of outpatient and emergency department visits. Can combine survey years to increase sample sizes (e.g., for uncommon conditions) or evaluate temporal trends. Provides national estimates | Preventive health examinations and preventive gynecological examinations in the US14 |
The NAMCS and NHAMCS are conducted annually. Do not link to other datasets | Primary care physician office visits for depression by older Americans15 | ||
National Health Interview Survey (NHIS) http://www.cdc.gov/nchs/nhis.htm | Nationally-representative serial cross-sectional survey of individuals and families including information on health status, injuries, health insurance, access and utilization information. The NHIS is conducted annually. Can combine survey years to look at rare conditions | Psychological distress in long-term survivors of adult-onset cancer: results from a national survey16 | |
Can be linked to National Center for Health Statistics Mortality Data; Medicare enrollment and claims data; Social Security Benefit History Data; Medical Expenditure Panel Survey (MEPS) data; and National Immunization Provider Records Check Survey (NIPRCS) data from 1997–1999 | Diabetes and Cardiovascular Disease among Asian Indians in the US17 | ||
Behavioral Risk Factor Surveillance System (BRFSS) http://www.cdc.gov/brfss/ | Serial cross-sectional nationally-representative survey of health risk behaviors, preventative health practices, and health care access. Provides national and state estimates. Since 2002, the Selected Metropolitan/Micropolitan Area Risk Trends (SMART) project has also used BRFSS data to identify trends in selected metropolitan and micropolitan statistical areas (MMSAs) with 500 or more respondents. BRFSS data are collected monthly. Does not link to other datasets |
Perceived discrimination in health care and use of preventive health services18 Use of recommended ambulatory care services: is the Veterans Affairs quality gap narrowing?19 |
|
Free or minimal cost. Readily available. Can do more complex studies by combining data from multiple waves and/or records. Accounting for complex sampling design and use of survey weights can be more complex when using multiple waves—seek support from a statistician. Or can restrict sample to single waves for ease of use | Nationwide Inpatient Sample (NIS) http://www.hcup-us.ahrq.gov/databases.jsp | The largest US database of inpatient hospital stays that incorporates data from all payers, containing data from approximately 20% of US community hospitals. Sampling frame includes approximately 90% of discharges from US hospitals | Factors associated with patients who leave acute-care hospitals against medical advice20 |
NIS data is collected annually. For most states, the NIS includes hospital identifiers that permit linkages to the American Hospital Association (AHA) Annual Survey Database and county identifiers that permit linkages to the Area Resource File (ARF) | Impact of hospital volume on racial disparities in cardiovascular procedure mortality21 | ||
National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm | Nationally- representative series of studies combining data from interviews, physical examination, and laboratory tests | Demographic differences and trends of vitamin D insufficiency in the US population,1988-200422 | |
NHANES data are collected annually. Can be linked to National Death Index (NDI) mortality data; Medicare enrollment and claims data; Social Security Benefit History Data; and Medical Expenditure Panel Survey (MEPS) data; and Dual Energy X-Ray Absorptiometry (DXA) Multiple Imputation Data Files from 1999–2004 | Association of hypertension, diabetes, dyslipidemia, and metabolic syndrome with obesity: findings from the National Health and Nutrition Examination Survey, 1999 to 200423 | ||
The Health and Retirement Study (HRS) http://hrsonline.isr.umich.edu/index.php | A nationally-representative longitudinal survey of adults older than 50 designed to assess health status, employment decisions, and economic security during retirement | Chronic conditions and mortality among the oldest old24 | |
HRS data is collected every 2 years. Can be linked to Social Security Administration data; Internal Revenue Service data; Medicare claims data (see Medicare below); and Minimum Data Set (MDS) data | Advance directed and surrogate decision making before death25 | ||
Medical Expenditure Panel Survey (MEPS) http://www.meps.ahrq.gov/mepsweb/ | Serial nationally-representative panel survey of individuals, families, health care providers, and employers covering a variety of topics. MEPS data are collected annually | Loss of health insurance among non-elderly adults in Medicaid26 | |
Can be linked by request to the Agency for Healthcare Research and Quality to numerous datasets including the NHIS, Medicare data, and Social Security data | Influence of patient-provider communication on colorectal cancer screening27 | ||
Data costs are in the thousands to tens of thousands of dollars. Requires an extensive application and time to acquire data is on the order of months at a minimum. Databases frequently have observations on the order of 100,000 to >1,000,000. Require additional statistical considerations to account for complex sampling design, use of survey weights, or longitudinal analysis. Multiple records per individual. Complex database structure requires a higher degree of analytic and programming skill to create a study dataset efficiently. | Medicare claims data (alone), SEER-Medicare, and HRS-Medicare http://www.resdac.org/Medicare/data_available.asp | Claims data on Medicare beneficiaries including demographics and resource utilization in a wide variety of inpatient and outpatient settings. Medicare claims data are collected continually and made available annually. Can be linked to other Medicare datasets that use the same unique identifier numbers for patients, providers, and institutions, for example, the Medicare Current Beneficiary Survey, the Long-Term Care Minimum Data Set, the American Hospital Association Annual Survey, and others. SEER and the HRS offer linkages to Medicare data as well (as described above) | Long-term outcomes and costs of ventricular assist devices among Medicare beneficiaries28 |
Association between the Medicare Modernization Act of 2003 and patient wait times and travel distance for chemotherapy29 | |||
Medicare Current Beneficiary Survey (MCBS) http://www.cms.gov/MCBS/ | Panel survey of a nationally-representative sample of Medicare beneficiaries including health status, health care use, health insurance, socioeconomic and demographic characteristics, and health expenditures. MCBS data are collected annually. Can be linked to other Medicare Data | Cost-related medication nonadherence and spending on basic needs following implementation of Medicare Part D30 | |
Medicare beneficiaries and free prescription drug samples: a national survey31 |