Skip to main content
Canadian Liver Journal logoLink to Canadian Liver Journal
. 2024 Feb 26;7(1):16–27. doi: 10.3138/canlivj-2023-0013

Identifying patients with diagnosed cirrhosis in administrative health databases: a validation study

Nabiha Faisal 1,2,3,, Lisa M Lix 2,3, Randy Walld 3, Alexander Singer 4, Eberhard Renner 1, Harminder Singh 1,2, Leanne Kosowan 4, Alyson Mahar 2,3,5
PMCID: PMC10946181  PMID: 38505786

Abstract

Objectives:

Case ascertainment algorithms were developed and validated to identify people living with cirrhosis in administrative health data in Manitoba, Canada using primary care electronic medical records (EMR) to define the reference standards.

Methods:

We linked provincial administrative health data to primary care EMR data. The validation cohort included 116,675 Manitobans aged >18 years with at least one primary care visit between April 1998 and March 2015. Hospital records, physician billing claims, vital statistics, and prescription drug data were used to develop and test 93 case-finding algorithms. A validated case definition for primary care EMR data was the reference standard. We estimated sensitivity, specificity, positive and negative predictive values (PPV, NPV), Youden's index, area under the receiver operative curve, and their 95% confidence intervals (CIs).

Results:

A total of 116,675 people were in the validation cohort. The prevalence of cirrhosis was 1.4% (n = 1593). Algorithm sensitivity estimates ranged from 32.5% (95% CI 32.2–32.8) to 68.3% (95% CI 68.0–68.9) and PPV from 17.4% (95% CI 17.1–17.6) to 23.4% (95% CI 23.1–23.6). Specificity (95.5–98.2) and NPV (approximately 99%) were high for all algorithms. The algorithms had slightly higher sensitivity estimates among men compared with women, and individuals aged ≥45 years compared to those aged 18–44 years.

Conclusion:

Cirrhosis algorithms applied to administrative health data had moderate validity when a validated case definition for primary care EMRs was the reference standard. This study provides algorithms for identifying diagnosed cirrhosis cases for population-based research and surveillance studies.

Keywords: algorithm, case definitions, electronic medical records, primary care

Introduction

Cirrhosis is a late stage of chronic liver disease, irrespective of etiology, that results from ongoing damage triggering wound healing processes and leading to scarring. Cirrhosis is a major public health concern affecting 1.5 billion individuals and is responsible for 1 million deaths per year world-wide (1). According to data from the Global Burden of Disease study, cirrhosis is among the top 20 causes of disability-adjusted life years and years of life lost (2). Cirrhosis is also an expensive medical condition; hospitalization alone costs more than $18 billion annually in the United States (3). The burden of cirrhosis continues to increase despite advances in therapeutics, such as wide scale hepatitis B vaccination and improve access to effective hepatitis C treatment. Increasing high-risk alcohol consumption and obesity, key risk factors for liver disease in most parts of the world, are expected to further increase the global cirrhosis incidence (1). In Canada, reducing the impact of cirrhosis in the population will require strategies at the national, provincial, and territorial levels, to scale up interventions tailored to local contexts and epidemiology.

The Public Health Agency of Canada and other public health organizations globally use a variety of sources to model the epidemiology of cirrhosis (4). While informative for healthcare planning and delivery, there are considerable challenges with such estimates including the reliability of underlying model assumptions, data integration, model-based inference, and communication between modelers and policy makers, which could yield flawed projections (5). Administrative health data, generated during the delivery of health services to residents covered by provincial health insurance, provide an opportunity to describe local cirrhosis morbidity and mortality more accurately. Administrative health data have been used successfully for surveillance of several other chronic diseases including diabetes, hypertension, arthritis, and multiple sclerosis (6,7). At present, provincial cirrhosis incidence and prevalence estimates are only available from one population-based study that used linked administrative health data in Ontario (8). This study demonstrated that age-specific cirrhosis incidence increased by 22% from 1997 to 2016; the authors suggested that this increase was likely related to rising incidence of metabolic syndrome and heavy alcohol use (8).

Accurate assessments of the population burden of cirrhosis depend on the performance of coding algorithms used to identify cases. Previous validation studies of administrative health data for cirrhosis case ascertainment have primarily sampled patients from specialist liver clinics or referral centres (913). This may limit the utility of study findings in the general population due to the high prevalence of cirrhosis in liver clinic patients, the lack of representativeness of these patients to the general population, and the high likelihood that these patients have more advanced stages of cirrhosis. Therefore, an ideal approach to identify the full population of patients living with cirrhosis using administrative health data, requires the prevalence in the validation cohort to closely approximate the prevalence of cirrhosis in the general population (14).

Our objective was to leverage multiple administrative data holdings available in Manitoba and develop and validate administrative data algorithms to identify individuals with cirrhosis using a validated case-definition for cirrhosis from population-based primary care electronic medical records (EMR) as the reference standard.

Methods

Study design and data sources

A retrospective population-based cohort study was performed in Manitoba, Canada using primary care EMR data from Manitoba Primary Care Research Network (MaPCReN) linked to administrative health data. EMR and administrative health data are housed in the Manitoba Population Research Data Repository at the Manitoba Centre for Health Policy, a research unit at the University of Manitoba. They were linked using an encrypted unique Personal Health Identification Number (PHIN) available in the Repository, allowing for an individual's interactions with the health system to be tracked over time. The Repository is a comprehensive collection of routinely collected administrative, registry, survey, and clinical data for nearly the entire population of 1.3 million Manitobans (15). Data from the Repository have been successfully used for chronic disease surveillance, including the development and validation of case ascertainment algorithms (6,7).

The MaPCReN is Manitoba's provincial practice-based research network within the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). The CPCSSN and MaPCReN databases provide a rich data source for disease prevalence and disease case validation studies (7). As of June 2021, MaPCReN included de-identified EMRs from 265 consenting primary care providers (mainly family physicians) representing 21% of the Manitoba population (16).

The administrative health data sources used in this study included hospitalization records, physician billing claims, vital statistics records, Manitoba Health Insurance Registry (MHIR) files, and prescription drug dispensation records. We accessed hospitalization records containing diagnosis and procedure codes for patients discharged from Manitoba acute care facilities since 1970 (17). Diagnoses were recorded using International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes prior to 2004 and after this date using ICD, 10th revision, Canadian version (ICD-10-CA) codes (17). Physician billing claims capture physician claims, and procedures performed in offices or hospitals including emergency room and outpatient department since 1970 along with the date of service, and a single ICD-9-CM coded diagnosis best indicating the reason for the visit (17). For billing data prior to fiscal year 2015/2016, the diagnosis is recorded using three-digit codes, while for billing data from fiscal year 2015/2016 onward the diagnosis is recording using five-digit codes (17). Vital statistics registrations capture records of births and deaths for all Manitobans along with the underlying cause of death using ICD-10-CA codes since 2000. Prescription drug records capture all community-dispensed prescriptions including a Health Canada drug identification number (DIN) and dispensation date since 1995; in-hospital medication dispensations are not included. DINs are linked to the World Health Organization's Anatomical Therapeutic Chemical (ATC) Codes in the Drug Product Directory maintained by Health Canada. The MHIR contains start and end of public health insurance coverage, including the reason for loss of coverage (ie, due to death or migration) and individual level demographics since 1970 (18).

In addition, the repository contains area-level Statistics Canada Census data collected every 5 years from 1971 to 2021. The Census data can be linked to individual-level data using postal code information in the MHIR.

This study was approved by the Research Ethics Board of the University of Manitoba; HREB: HS25263 (H2021:406). Approval for data access was granted by the Health Information Privacy Committee of Manitoba Health/ Provincial Health Research Privacy Committee (HIPC 2021/2022-2).

Validation cohort

The validation cohort was identified from EMRs within the MaPCReN database. The validation cohort included all Manitoba residents aged 18 years and older with at least one record in the MaPCReN data between April 1, 1998 and March 31, 2015, and who could be linked to administrative health data using the PHIN. Individuals who did not have at least 2 years of health insurance coverage were excluded. Patient characteristics were determined at the study index date, which was defined as April 1, 1998 or the start of insurance coverage (if coverage started after April 1, 1998).

Reference standard

The reference standard was a previously validated case definition for cirrhosis in the MaPCReN EMR data. The case definition requires one billing record for cirrhosis using ICD-9-CM codes (Table 1), or cirrhosis listed as a health condition, or ≥2 billing records or health conditions for cirrhosis related complications. This case definition has a sensitivity of 85.1%, specificity of 99.3%, PPV of 86.3%, and NPV of 99.2% (19).

Table 1:

International classification of diseases and Canadian classification of health interventions codes for cirrhosis

ICD-9-CM 1CD-10-CA CCI
Physicians visit code 571
Hospitalization codes
 Toxic liver disease with fibrosis and cirrhosis of liver 573.3 K71.7
 Alcoholic cirrhosis of liver 571.2* K70.3
 Alcoholic hepatic failure K70.4
 Alcoholic hepatitis 456.1 K70.1
 Ascites 789.5* R18
 Cirrhosis of liver without alcohol 571.5* K74.6
 Portal hypertension 572.3 K76.6
 Hepatic failure, unspecified 572.8 K72.9
 Hepatic fibrosis K74.0
 Alcoholic fibrosis and sclerosis of liver K70.2
 Hepatorenal syndrome 572.4 K76.7
 Hepatic coma 572.2 K72.91
 Chronic hepatic failure K 72.1, K72.9
 Esophageal varices with bleeding 456.0*, 456.2* I85.0, I98.20, I98.3
 Gastric varices I86.4
 Hepatocellular carcinoma 155* C22.0, C22.9 81703
 Insertion of Sengstaken tube 1.NA.13.BA-BD
 Transjugular intrahepatic portosystemic shunt 1.KQ.76GP-NR
 Endoscopy - banding /sclerotherapy 1.NA.13.BA-FA 1.NA.13.BA-X7
Paracentesis 1.OT.52.HA
*

Codes used to identify cases in electronic medical record data

CCI = Canadian Classification of Health Interventions; ICD-9-CM = International Classification of Diseases, 9th Revision, Clinical Modification; ICD-10-CA = International Classification of Diseases, 10th Revision

Administrative health data algorithms

We developed 31 administrative data algorithms using combinations of diagnosis codes in physician billings, hospital discharge diagnosis or procedure codes, diagnosis codes in vital statistics, and medication codes from prescription drug records in lifetime, 2- year and 3-year windows (Supplemental Table 1). The algorithms varied by the database sources, and the number of patient encounters required for a positive identification of cirrhosis. As well, the case definitions varied as to whether diagnosis codes for case ascertainment were assigned by a family physician or specialist (internal medicine, gastroenterologist or hepatologist). We included algorithms consistent with ones previously developed and validated in Ontario administrative health data (11). The years of administrative data searched for the relevant codes were anchored at the date of first mention of cirrhosis diagnosis in the EMR data.

Tables 1 and 2 summarize the ICD-9-CM and ICD-10-CA diagnosis codes, procedure codes, and prescription drug codes used to ascertain cirrhosis in the administrative health data and in the primary care EMR data.

Table 2:

Cirrhosis related prescription drug names and Anatomic Therapeutic Chemical (ATC) codes

Prescription drug name ATC code
Ursodeoxycholic acid A05AA02
Obeticholic acid A05AA04
Lactulose A06AD11
Rifaximin A07AA11
Nadolol C07AA12
Spironolactone C03DA01
Ribavirin J05AP01
Sofosbuvir and Ledpesvir J05AP51
Sofosbuvir and Velpatasvir J05AP55
Elbasvir and Grazoprenvir J05AP54
Glecaprevir and Pibrentasvir J05AP57
Entecavir J05AF10
Tenofovir J05AF07
Lamivudine J05AF05
Sofosbuvir J05AP08

Covariates

Several covariates were used to describe the validation cohort. Sex, age group (18–44, 45–64, 65+ years), income quintile of the residence location, and geographic residence location (Winnipeg and all other areas of Manitoba) were defined at the study index date. Age at cirrhosis case identification was calculated at the index cirrhosis diagnosis in the study observation period for cirrhosis cases only. Income quintile is an area-level measure of socioeconomic status defined using Statistics Canada Census data and based on total household income for dissemination areas, the smallest geographic unit for which Census data are publicly released (20). Postal codes from the MHIR were used to assign individuals to income quintiles and to their geographic residence location.

Statistical analyses

Descriptive statistics, including means, standard deviations (SD), frequencies, and percentages were used to characterize the validation cohort. We estimated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV), Youden's index, and area under the receiver operative curve (AUC-ROC) for each algorithm and their corresponding 95% confidence intervals (CI); the latter were based on the binominal distribution. We defined sensitivity as the percentage of individuals who met the validated case definition for cirrhosis in the EMR data that were correctly identified as having cirrhosis in administrative health data and specificity as the percentage of individuals who did not meet validated the case definition for cirrhosis in the EMR data who were correctly identified as not having cirrhosis in the administrative health data. PPV was the percentage of patients identified as having cirrhosis in administrative health data who met the validated case definition for cirrhosis in the EMR data. NPV was the percentage of individuals without cirrhosis in administrative health data that met the validated case definition for cirrhosis in the EMR data. Youden's index (21) is a summary measure of sensitivity and specificity, defined as sensitivity + specificity – 1, where sensitivity and specificity are calculated as proportions. The optimal value of Youden's index is 1.0. The area under the receiver operative curve (AUC-ROC) provided a global estimate of algorithm performance.

Sensitivity analyses

In sensitivity analyses, we re-defined all algorithms with additional diagnosis codes (sensitive set, Supplemental Table 2) and re-estimated all measures of algorithm performance. Specifically, the sensitive set included diagnosis codes for chronic liver disease in addition to cirrhosis and complications (decompensation or hepatocellular carcinoma). We selected the sensitive set because primary care patients may have less severe liver disease, resulting in them not being identified as having cirrhosis in the administrative health data using the initial list of specific codes. This approach is consistent with other validation studies for identifying cirrhosis that have adopted broader sets of codes intended to ascertain mild disease (9,22).

In addition, sensitivity analyses were conducted to estimate algorithm validity separately among males and females, as well as among different age groups in the validation cohort (18–44, 45–54, and 65+ years), and by time period (1998–2010 and 2011–2020). These sensitivity analyses were conducted to assess potential variations in the accuracy of the algorithms. All analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA).

Results

Description of the validation cohort

Between 1998 and 2015, 116,675 individuals aged ≥18 years had at least one encounter with a primary care provider participating in MaPCReN (Figure 1). Over half of these individuals were female and Winnipeg residents. Of these, 1,593 (1.4%) were identified as having cirrhosis using the reference standard. The mean age of cirrhosis cases at the index date was 46.1 (SD 14.5) years and 52.5 (SD 14.4) years as of the date of index cirrhosis diagnosis during the study observation period. Table 3 reports the characteristics of individuals defined as cirrhosis cases and non-cases.

Figure 1:

Figure 1:

Flow diagram for cirrhosis cohort using MaPCReN electronic medical record and administrative health data, April 1, 1998 - March 31, 2015

EMR = electronic medical records; MaPCReN = Manitoba Primary Care Research Network

Table 3:

Characteristics of patients stratified by cirrhosis cases and non-cases in MaPCReN

Characteristic Cirrhosis case (N = 1,593) Cirrhosis non-case (N = 115,082)
Age at index date, years 46.1 ± 14.5 46.4± 18.4
Male 724 (45.4) 50133 (43.6)
Female 869 (54.6) 64949 (56.4)
Income quintile
Q1-Lowest 406 (25.5) 19115 (16.6)
Q2 312 (19.6) 22204 (19.3)
Q3 272 (17.1) 23202 (20.2)
Q4 376 (23.6) 28156 (24.5)
Q5-Highest 219 (13.7) 21262 (18.5)
Health insurance coverage, years 21.2 ± 3.77 20.6 ± 4.5
Winnipeg residence location 1075 (67.5) 70250 (61.0)
Non-Winnipeg residence location 518 (32.5) 44832 (39.0)
Age at cirrhosis case identification, years* 52.54 ± 14.4 -
18–44 years 470 (29.5) -
45–64 years 819 (51.5) -
65+ years 304 (19) -

Data are mean ± standard deviation or n (%)

*

Calculated at the date of index cirrhosis diagnosis in the study observation period, for cirrhosis cases only

Validation results

We report on a subset of seven algorithms (Table 4) using lifetime data to demonstrate our selection of the optimal algorithms (Supplementary Table 3 reports all 31 algorithms, for each time frame). These algorithms had excellent specificity estimates (95.5–98.2%), while sensitivity estimates varied markedly and ranged from 32.5% (95% CI 32.2–32.8) to 68.3% (95% CI 68.0%–68.9%). PPV estimates ranged from 17.4% (95% CI 17.1–17.6) to 23.4% (95% CI 23.1–23.6) whereas all NPV estimates were approximately 99%. Youden's index ranged from 0.31 (95% CI 0.28–0.33) to 0.64 (95% CI 0.61–0.66) and AUC-ROC ranged from 0.65 (95% CI 0.65 (0.65–0.66) to 0.82 (95% CI 0.82–0.82)). The algorithm which resulted in the highest estimates of sensitivity (67.9%; 95% CI 67.6–68.1) and specificity (95.8% CI 95.7–95.9) with the least number of data sources relied on 1 or more hospitalization or 1 or more physician claims using lifetime data.

Table 4:

Estimates of sensitivity, specificity, predictive values, Youden's Index and area under the receiver operative curve for subset of cirrhosis algorithms

Algorithm Sensitivity 95% (CI) Specificity 95% (CI) PPV 95% (CI) NPV 95% (CI) Youden 95% (CI) AUC 95% (CI)
1+ P 67.4 (67.1–67.6) 96.4 (96.3–96.5) 20.4 (20.2–20.7) 99.5 (99.5–99.6) 0.64 (0.61–0.66) 0.82 (0.82–0.82)
1+ P OR 1+ H 67.9 (67.6–68.1) 95.8 (95.7–95.9) 18.4 (18.2–18.6) 99.5 (99.5–99.6) 0.64 (0.61–0.66) 0.82 (0.82–0.82)
1+ P OR 1+ H OR M OR 1+ Procedure 68.3 (68.0–68.6) 95.5 (95.4–95.6) 17.4 (17.1–17.6) 99.5 (99.5–99.6) 0.64 (0.62–0.66) 0.82 (0.82–0.82)
2+ P 41.3 (41.0–41.6) 98.1 (98.0–98.2) 23.4 (23.1–23.6) 99.2 (99.1–99.2) 0.39 (0.37–0.42) 0.70 (0.69–0.70)
2+ P OR 1+ H 42.8 (42.5–43.1) 97.5 (97.4–97.6) 19.3 (19.1–19.5) 99.2 (99.1–99.2) 0.40 (0.38–0.43) 0.70 (0.70–0.70)
2+ P OR 1+ H OR M OR 1+ Procedure 43.4 (43.1–43.7) 97.2 (97.1–97.3) 17.5 (17.3–17.7) 99.2 (99.1–99.3) 0.41 (0.38–0.43) 0.70 (0.70–0.71)
1+ P by specialist 32.5 (32.2–32.8) 98.2 (98.2–98.3) 20.3 (20.1–20.5) 99.1 (99.0–99.1) 0.31 (0.28–0.33) 0.65 (0.65–0.66)

AUC = area under the receiver operative curve; CI = confidence interval; H = hospitalization; M = mortality; NPV = negative predictive value; P = physician claim; PPV = positive predictive value; 1+ = code occurs on at least one date, 2+ = code occurs on at least two separate dates.

Sensitivity estimates for the algorithm using ‘1 or more specialist billing claim’ were lower (32.5%) in comparison to the algorithm using ‘1 or more physician billing claim’ (67.4%), with no difference in PPV estimates. As expected, increasing the number of physician billing claims required for a positive cirrhosis case identification resulted in lower sensitivity but higher PPV. Limited improvement in sensitivity was observed with the addition of diagnoses in hospital data, procedures in hospital data or diagnoses in vital statistics data to ascertain cases.

Table 5 provides algorithm validity estimates for the first sensitivity analysis that increased the number of eligible diagnosis codes (ie, sensitive set of diagnosis codes), for the same selected cirrhosis algorithms described in the primary analysis. Overall, sensitivity estimates increased substantially for algorithms using diagnoses in hospital data but at the cost of false positives resulting in significantly lower PPV estimates. The absolute difference in sensitivity estimates for algorithms using 1 or more hospitalizations or 2 or more physician claims based on the sensitive set of diagnosis codes and the specific set of diagnosis codes (ie, primary analysis) was 12.5%. Specificity estimates dropped from 97.2% to 82.8% but NPV estimates remained high at 99.3%. The algorithm which resulted in the highest estimates of sensitivity (74.8%; 95% CI 74.5–75) and specificity (81.5% CI 81.2–81.7) using the minimum number of data sources was the same as the primary analysis (1 or more hospitalization or 1 or more physician claim using lifetime data) with sensitive set of diagnosis codes. However, the PPV estimate for this algorithm was extremely low (5.3%, 95% CI 5.1–5.4) for the sensitive set compared to the primary analysis (18.4% 95% CI 18.2–18.6).

Table 5:

Estimates of sensitivity, specificity, predictive values, Youden's Index, and area under the receiver operative curve for subset of cirrhosis algorithms with sensitive set of diagnoses codes

Algorithm Sensitivity 95% (CI) Specificity 95% (CI) PPV 95% (CI) NPV 95% (CI) Youden 95% (CI) AUC 95% (CI)
1+ P 67.4 (67.1–67.6) 96.4 (96.3–96.5) 20.4 (20.2–20.7) 99.5 (99.5–99.6) 0.64 (0.61–0.66) 0.82 (0.82–0.82)
1+ P OR 1+ H 74.8 (74.6–75.1) 81.5 (81.3–81.7) 5.3 (5.2–5.4) 99.6 (99.5–99.6) 0.56 (0.54–0.58) 0.78 (0.78–0.78)
1+ P OR 1+ H OR M OR 1+ Procedure 75.1 (74.8–75.3) 81.3 (81.1–81.5) 5.3 (5.1–5.4) 99.6 (99.5–99.6) 0.56 (0.54–0.59) 0.78 (0.78–0.78)
2+ P 41.3 (41.0–41.6) 98.1 (98.0–98.2) 23.4 (23.1–23.6) 99.2 (99.1–99.2) 0.39 (0.37–0.42) 0.70 (0.69–0.70)
2+ P OR 1+ H 55.3 (55.0–55.6) 82.8 (82.6–83.0) 4.3 (4.1–4.4) 99.3 (99.2–99.3) 0.38 (0.36–0.41) 0.69 (0.69–0.69)
2+ P OR 1+ H OR M OR 1+ Procedure 55.7 (55.4–56.0) 82.6 (82.4–82.8) 4.2 (4.1–4.4) 99.3 (99.2–99.3) 0.38 (0.36–0.41) 0.69 (0.69–0.69)
1+ P by specialist 32.5 (32.2–32.8) 98.2 (98.2–98.3) 20.3 (20.1–20.5) 99.1 (99.0–99.1) 0.31 (0.28–0.33) 0.65 (0.65–0.66)

AUC = area under the receiver operative curve; CI = confidence interval; H = hospitalization; M = mortality; NPV = negative predictive value; P = physician claim; PPV: positive predictive value; 1+ = code occurs on at least one date; 2+ = code occurs on at least two separate dates

Supplemental Tables 4 through 9 provide validity estimates stratified by sex, age group, and time period. Sensitivity estimates were on average 6% higher among men than among women and 5% higher among older individuals age 45+ years compared to younger individuals for the ever timeframe. Algorithms using one or more physician billing claims with or without hospital data or vital statistics diagnosis codes had 20% higher sensitivity estimates during the period from 1998 to 2010 compared to the period from 2011 to 2020. At the same time, PPV estimates were an average of 10% lower.

Discussion

This study tested the validity of multiple algorithms to identify cases of cirrhosis in administrative health data using a validated case definition for primary care EMRs as a reference standard in Manitoba. The study demonstrated that administrative health data algorithms can identify cirrhosis patients who receive regular primary care with a modest degree of accuracy using their lifetime of administrative data.

We identified eight peer-reviewed studies evaluating the accuracy of administrative health data algorithms for identifying cirrhosis patients (913,2325). The literature review showed that the presence of one or more inpatient or outpatient diagnosis codes for cirrhosis with or without additional codes for chronic liver disease had sensitivity estimates of 67–98% when the reference standard was a specialist diagnosis in medical charts. When the reference standard was primary care EMRs, sensitivity estimates were 68–75%.

Potential reasons for observing lower sensitivity than previous studies include that primary care physicians may record cirrhosis as a secondary diagnosis in the charts but bill only for the primary diagnosis which may be the active reason for the visit, such as management of diabetes or hypertension. Primary care providers may not record diagnoses for health conditions that are primarily managed by specialists (26). Further, a diagnosis of cirrhosis made in a specialist clinic is expected to have greater diagnostic accuracy than those made in primary care settings. Another possibility is that cirrhosis cases in our reference standard may have early/asymptomatic disease that did not require hospitalization and had not been referred to specialty care; as a result, they may not have corresponding encounters within the administrative health data. The use of specific diagnosis codes may have also influenced the sensitivity estimates in our validation cohort where a primary care patient may have less severe disease. A sensitivity analysis that used a broader set of diagnosis codes (ie, sensitive set of codes) resulted in increased detection of cirrhosis cases by approximately 20%. However, the PPV estimates of the algorithms based on this expanded diagnosis code set were considerably lower. A recent systematic review proposed a consensus set of nine codes for cirrhosis identification in diverse populations with high PPV; these codes were included in our main analysis (27). Whether researchers ascertain cases based on a sensitive set of diagnosis codes or based on a specific set of diagnosis codes will depend on whether the researcher wants a cirrhosis cohort that is inclusive but has potential false-positive cases or a cohort that is highly specific but potentially misses cases, particularly people who may have less severe forms of cirrhosis.

Many validation studies of administrative data algorithms for cirrhosis have only reported PPV (9,10,13,22,28). Predictive values are dependent on the prevalence of the disease in the population being tested. Given the low prevalence of cirrhosis in our primary care cohort (1.4%), administrative data algorithms demonstrated low PPV but very high NPV. Five studies that reported high PPV estimates were conducted in Veteran Affairs population in United States (89%–93%) (9,10,12,24,29) where cirrhosis prevalence is much higher than in the general population due to Veterans in this health system being predominantly males and having a high burden of hepatitis C infection, alcohol use disorder, and metabolic syndrome (10). This suggests that studies reporting high PPV estimates may have been conducted in cohorts with a high prevalence of cirrhosis and leading to an overestimation of the accuracy of the algorithm use in the general population.

Our work contributes to the current literature about chronic liver disease in several ways. First, we used primary care EMR data to define the reference standard, where the disease prevalence is more likely to reflect the true prevalence in the general population compared to a specialist liver clinic population. As well, primary care data are expected to capture the full disease spectrum ranging from asymptomatic to severely symptomatic patients; in contrast, specialist centers are more likely to capture patient populations having severe disease only (11,12). Second, we had access to a large validation cohort; previous studies were limited to smaller cohorts with reduced precision of validity estimates (9,10,12). Finally, we have provided multiple measures of diagnostic accuracy whereas only a few studies have provided four or more measures of diagnostic accuracy with the most commonly reported being PPV and sensitivity.

We recognize the limitations of the study. A validated case definition for primary care data may not be the ideal reference standard for cirrhosis case identification because of the potential for misclassification (30). Primary care physicians usually do not manage cirrhosis patients’ care of their liver disease; therefore, they may be less likely to include a cirrhosis diagnosis code in their billing claims. However, to address this potential limitation of primary care physician billing claims, the reference standard also used the problem list in the EMR to ascertain cases, where more information on underlying chronic conditions may be captured. Patients not accessing the healthcare system or with infrequent medical contacts or who have not yet sought health care are missed in both administrative health data and EMR data.

Conclusion

Cirrhosis case ascertainment from routinely collected administrative health records is possible and potentially valuable for conducting population-based studies about disease burden, outcomes, and healthcare use. Our study results could be used to establish comprehensive population-based cohorts to examine disease burden for surveillance and planning for treatment, preventative strategies, and allocation of resources.

Acknowledgements:

The authors acknowledge the Manitoba Centre for Health Policy for use of data contained in the Manitoba Population Research Data Repository under HIPC project # 2021/2022-2. The results, conclusions, opinions, and statements expressed are solely those of the authors and no official endorsement by the Manitoba Centre for Health Policy, Manitoba Health, or Health Information Privacy Committee/Provincial Health Research Privacy Committee is intended or should be inferred.

Contributions:

Conceptualization and methodology: N Faisal, LM Lix, A Singer, E Renner, H Singh, and A Mahar. Data curation and interpretation: N Faisal, L Kosowan and R Wald. Writing – Original Draft: N Faisal. Writing – Critical Review & Editing: LM Lix, A Singer, E Renner, H Singh and A Mahar.

Ethics Approval:

Ethics approval was received from the University of Manitoba Health Research Ethics Board [HREB: HS25263(H2021:406). Approval for data access was granted by the Health Information Privacy Committee of Manitoba Health (HIPC 2021/2022-2).

Informed Consent:

N/A

Registry and the Registration No. of the Study/Trial:

N/A

Data Accessibility:

The data that support the findings of this study are not publicly available, in accordance with site-specific privacy restrictions. The data that support the findings of this study are from the Manitoba Population Research Data Repository housed at the Manitoba Centre for Health Policy, University of Manitoba, and were derived from data provided by Manitoba Health and the Winnipeg Regional Health Authority. Data are available, with submission of appropriate ethics and data access approvals, from the Manitoba Centre for Health Policy.

Funding:

No funding was received.

Disclosures:

The authors have no conflicts of interest to declare.

Disclaimer:

The results and conclusions are those of the authors and no official endorsement by the Manitoba Centre for Health Policy, Manitoba Health, or other data providers is intended or should be inferred

Peer Review:

This article was peer reviewed.

Animal Studies:

N/A

Supplemental Material

canlivj-2023-0013_supplement1.pdf

References

  • 1.Cheemerla S, Balakrishnan M. Global epidemiology of chronic liver disease. Clin Liver Dis (Hoboken). 2021;17(5):365–70. 10.1002/cld.1061. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.James SL, Abate D, Abate KH, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1789–858. 10.1016/S0140-6736(18)32279-7. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hirode G, Saab S, Wong RJ. Trends in the burden of chronic liver disease among hospitalized US adults. JAMA Network Open. 2020;3(4):e201997. 10.1001/jamanetworkopen.2020.1997. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Swain MG, Ramji A, Patel K, et al. Burden of nonalcoholic fatty liver disease in Canada, 2019-2030: a modelling study. CMAJ Open. 2020;8(2):E429–36. 10.9778/cmajo.20190212. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Metcalf CJE, Edmunds WJ, Lessler J. Six challenges in modelling for public health policy. Epidemics. 2015;10:93–6. [DOI] [PubMed] [Google Scholar]
  • 6.Lix L, Yogendran M, Burchill C, et al. Defining and validating chronic diseases: an administrative data approach. Manitoba Centre for Health Policy; 2006. [Google Scholar]
  • 7.Marrie RA, Kosowan L, Taylor C, Singer A. Identifying people with multiple sclerosis in the Canadian Primary Care Sentinel Surveillance Network. Mult Scler J Exp Transl Clin. 2019;5(4):2055217319894360. http://dx.doi:10.1177/2055217319894360. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Flemming JA, Dewit Y, Mah JM, Saperia J, Groome PA, Booth CM. Incidence of cirrhosis in young birth cohorts in Canada from 1997 to 2016: a retrospective population-based study. Lancet Gastroenterol Hepatol. 2019;4(3):217–26. 10.1016/S2468-1253(18)30339-X. PMID: [DOI] [PubMed] [Google Scholar]
  • 9.Goldberg D, Lewis J, Halpern S, Weiner M, Lo Re V, 3rd. Validation of three coding algorithms to identify patients with end-stage liver disease in an administrative database. Pharmacoepidemiol Drug Saf. 2012;21(7):765–9. 10.1002/pds.3290. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kanwal F, Kramer JR, Buchanan P, et al. The quality of care provided to patients with cirrhosis and ascites in the department of Veterans Affairs. Gastroenterology. 2012;143(1):70–7. 10.1053/j.gastro.2012.03.038. PMID: [DOI] [PubMed] [Google Scholar]
  • 11.Lapointe-Shaw L, Georgie F, Carlone D, et al. Identifying cirrhosis, decompensated cirrhosis and hepatocellular carcinoma in health administrative data: a validation study. PLoS One. 2018;13(8):e0201120. 10.1371/journal.pone.0201120. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lo Re V, 3rd, Lim JK, Goetz MB, et al. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiol Drug Saf. 2011;20(7):689–99. 10.1002/pds.2148. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nehra MS, Ma Y, Clark CA, Amarasingham R, Rockey D, Singal A. Use of administrative claims data for Identifying patients with cirrhosis. J Clin Gastroenterol. 2013;47(5):e50–4. 10.1097/MCG.0b013e3182688d2f. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol. 2011;64(8):821–9. 10.1016/j.jclinepi.2010.10.006. PMID: [DOI] [PubMed] [Google Scholar]
  • 15.Katz A, Enns J, Smith M, Burchill C, Turner K, Towns D. Population data centre profile: the Manitoba Centre for Health Policy. Int J Popul Data Sci. 2020;4(2):1131. 10.23889/ijpds.v5i1.1131. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Singer A, Kosowan L, LaBine L, et al. Characterizing the use of virtual care in primary care settings during the COVID-19 pandemic: a retrospective cohort study. BMC Prim Care. 2022;23(1):320. 10.1186/s12875-022-01890-w. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Manitoba Centre for Health Policy. Manitoba Population Research Data Repository data descriptions: hospital abstracts. Winnipeg, MB: University of Manitoba; 2020. Available from: http://umanitoba.ca/faculties/health_sciences/medicine/units/chs/departmental_units/mchp/resources/repository/descriptions.html?ds=Hospital. [Google Scholar]
  • 18.Manitoba Centre for Health Policy. Manitoba Population Research Data Repository data descriptions: Manitoba Health Insurance Registry. Winnipeg, MB: University of Manitoba; 2020. Available from: http://umanitoba.ca/faculties/health_sciences/medicine/units/chs/departmental_units/mchp/resources/repository/descriptions.html?ds=Insurance. [Google Scholar]
  • 19.Faisal N, Kosowan L, Zafari H, et al. Development and validation of a case definition to estimate the prevalence and incidence of cirrhosis in pan-Canadian primary care databases. Can Liver J. 2023;6(4):375–87. 10.3138/canlivj-2023-0002. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Roos NP, Mustard CA. Variation in health and health care use by socioeconomic status in Winnipeg, Canada: does the system work well? Yes and no. Milbank Q. 1997;75(1):89–111. 10.1111/1468-0009.00045. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5. 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. PMID: [DOI] [PubMed] [Google Scholar]
  • 22.Thygesen SK, Christiansen CF, Christensen S, Lash TL, S⊘rensen HT. The predictive value of ICD-10 diagnostic coding used to assess Charlson comorbidity index conditions in the population-based Danish National Registry of Patients. BMC Med Res Methodol. 2011;11:83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Niu B, Forde KA, Goldberg DS. Coding algorithms for identifying patients with cirrhosis and hepatitis B or C virus using administrative data. Pharmacoepidemiol Drug Saf. 2015;24(1):107–11. 10.1002/pds.3721. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kramer J, Davila J, Miller E, Richardson P, Giordano T, El-Serag HB. The validity of viral hepatitis and chronic liver disease diagnoses in Veterans Affairs administrative databases. Aliment Pharmacol Ther. 2008;27(3):274–82. 10.1111/j.1365-2036.2007.03572.x. PMID: [DOI] [PubMed] [Google Scholar]
  • 25.Rakoski MO, McCammon RJ, Piette JD, et al. Burden of cirrhosis on older Americans and their families: analysis of the health and retirement study. Hepatology. 2012;55(1):184–91. 10.1002/hep.24616. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Singer A, Kroeker AL, Yakubovich S, Duarte R, Dufault B, Katz A. Data quality in electronic medical records in Manitoba: do problem lists reflect chronic disease as defined by prescriptions? Can Fam Physician. 2017;63(5):382–9. PMID: [PMC free article] [PubMed] [Google Scholar]
  • 27.Shearer JE, Gonzalez JJ, Min T, et al. Systematic review: development of a consensus code set to identify cirrhosis in electronic health records. Aliment Pharmacol Ther. 2022;55(6):645–57. 10.1111/apt.16806. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dam Fialla A, Schaffalitzky de Muckadell OB, Touborg Lassen A. Incidence, etiology and mortality of cirrhosis: a population-based cohort study. Scand J Gastroenterol. 2012;47(6):702–9. 10.3109/00365521.2012.661759. PMID: [DOI] [PubMed] [Google Scholar]
  • 29.Re VL, Lim JK, Goetz MB, et al. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiol Drug Saf. 2011;20(7):689–99. 10.1002/pds.2148. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Coleman N, Halas G, Peeler W, Casaclang N, Williamson T, Katz A. From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database. BMC Fam Pract. 2015;16(1):11. 10.1186/s12875-015-0223-z. PMID: [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

canlivj-2023-0013_supplement1.pdf

Data Availability Statement

The data that support the findings of this study are not publicly available, in accordance with site-specific privacy restrictions. The data that support the findings of this study are from the Manitoba Population Research Data Repository housed at the Manitoba Centre for Health Policy, University of Manitoba, and were derived from data provided by Manitoba Health and the Winnipeg Regional Health Authority. Data are available, with submission of appropriate ethics and data access approvals, from the Manitoba Centre for Health Policy.


Articles from Canadian Liver Journal are provided here courtesy of University of Toronto Press

RESOURCES