Skip to main content
. Author manuscript; available in PMC: 2021 Nov 1.
Published in final edited form as: Clin Gastroenterol Hepatol. 2019 Aug 8;18(12):2650–2666. doi: 10.1016/j.cgh.2019.07.060

Table 4.

Potential Data Sources for Liver Disease Epidemiologic Research

Data source Country/Region Strengths Weaknesses
National Health and Nutrition Examination Survey (NHANES) United States
  • Nationally representative sample of non-institutionalized individuals

  • Cirrhosis definition based on interview, examination, laboratory data

  • Provides estimates of undiagnosed cirrhosis

  • Cross-sectional design

  • Small sample size (~5,000)

  • Potential for selection bias

  • Potential misclassification of mild liver disease

Veterans Affairs (VA) United States
  • Large sample size

  • Includes clinical notes, laboratory data, and imaging

  • Well-validated ICD coding strategies for liver disease and its associated complications

  • Predominantly male population

  • VA enrollees may differ from general population regarding access or delivery of care

  • Limited information on care received outside VA system

Medicare United States
  • Large sample size

  • Nationally representative of population age ≥65 years

  • Patients followed until death

  • No laboratory data available

  • Relies on diagnosis and procedure codes alone and is subject to misclassification

Medicaid United States
  • Large sample size

  • Includes sample of patients with low socioeconomic status

  • No laboratory data available

  • Relies on diagnosis and procedure codes alone and is subject to misclassification

Private-insurance claims
data
United States
  • Large sample size

  • Nationally representative of privately insured population

  • Unable to ascertain death

  • Enrollment relies on ongoing insurance coverage

  • No laboratory data available

  • Relies on diagnosis and procedure codes alone and is subject to misclassification

National Inpatient Sample (NIS) United States
  • Large sample size

  • Nationally-representative sample of inpatient population

  • Includes all payers

  • No laboratory data available

  • Relies on diagnosis and procedure codes alone and is subject to misclassification

  • Inability to link hospitalizations to individual patients limits longitudinal follow-up post-discharge

National Readmissions Database (NRD) United States
  • Large sample size

  • Accurate assessment of patient readmission

  • No laboratory data available

  • Unable to account for events that may preclude readmission (e.g. death)

  • Relies on diagnosis and procedure codes alone and is subject to misclassification

Medical Expenditure Panel Survey (MEPS) United States
  • Nationally-representative of non-institutionalized individuals

  • Includes health care expenditures from all payers

  • Potential for recall bias

  • Subgroup analyses among certain groups (e.g. race/ethnicity minorities) may not be possible

Surveillance Epidemiology, and End Results (SEER) program United States
  • Includes information on clinical information, tumor stage at diagnosis, first treatment and survival

  • Allows linkage to Medicare

  • Unable to determine etiology or severity of liver disease

  • May not be entirely representative of US given that it only covers selected subset of population

  • Generalizability for Medicare-linked data limited by age of enrollees (≥65 years old)

US Cancer Statistics registry United States
  • Nationally-representative data source on HCC incidence from all 50 states (~97% of population)

  • Unable to determine etiology of liver disease due to lack of laboratory data

  • Unable to determine HCC stage at diagnosis

Organ Procurement and Transplant Network (OPTN) United States
  • Granular data on waitlisted individuals, liver transplantation, and post-liver transplant outcomes

  • Linked by UNOS to social security death index

  • Potential for selection bias by transplant centers

National patient registries Denmark, Finland, Iceland, Norway, Sweden
  • Longitudinal, nationwide clinical data with individual-level linkage

  • Includes detailed information on clinical characteristics, laboratory data, imaging, procedures and outcomes

  • Resource-intensive

Clinical Practice Research Datalink (CPRD) United Kingdom
  • Nationally-representative data

  • Granular data on diagnoses and prescriptions of patients seen by participating providers

  • Covers only a subset of the population

  • Longitudinal follow-up depends on ongoing treatment by a participating practice

  • Limited data on liver-specific information

European Liver Transplant Registry (ELTR) Europe
  • Large sample size (155 centers from 28 countries)

  • Standardized questionnaire used

  • Detailed information on liver transplant indications, transplant types and complications

  • Potential for misclassification due to inaccurate completion of questionnaire

  • No information on patient ethnicity or socioeconomic information

NORDCAN database Denmark, Finland, Faroe Islands, Greenland, Iceland, Norway, Sweden
  • Population-based incidence, prevalence and mortality data for HCC

  • Longitudinal data allows for examination of HCC trends

  • Inclusion of only Nordic countries limits generalizability, particularly to racial/ethnic minorities

Global Burden of Disease (GBD) project Worldwide
  • Captures data from a wide range of sources to estimate incidence, prevalence and mortality from liver disease and HCC

  • Data quality highly variable, particularly in resource-limited areas

  • In many areas, relies on verbal autopsy (post-mortem interview with relatives and/or witnesses of death)