Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Feb 18;15(2):e0229218. doi: 10.1371/journal.pone.0229218

Validation of a hierarchical algorithm to define chronic liver disease and cirrhosis etiology in administrative healthcare data

George Philip 1, Maya Djerboua 2, David Carlone 3, Jennifer A Flemming 1,2,3,4,*
Editor: Wenyu Lin5
PMCID: PMC7028265  PMID: 32069337

Abstract

Background and aims

Chronic liver disease (CLD) and cirrhosis are leading causes of death globally with the burden of disease rising significantly over the past several decades. Defining the etiology of liver disease is important for understanding liver disease epidemiology, healthcare planning, and outcomes. The aim of this study was to validate a hierarchical algorithm for CLD and cirrhosis etiology in administrative healthcare data.

Methods

Consecutive patients with CLD or cirrhosis attending an outpatient hepatology clinic in Ontario, Canada from 05/01/2013–08/31/2013 underwent detailed chart abstraction. Gold standard liver disease etiology was determined by an attending hepatologist as hepatitis C (HCV), hepatitis B (HBV), alcohol-related, non-alcoholic fatty liver disease (NAFLD)/cryptogenic, autoimmune or hemochromatosis. Individual data was linked to routinely collected administrative healthcare data at ICES. Diagnostic accuracy of a hierarchical algorithm incorporating both laboratory and administrative codes to define etiology was evaluated by calculating sensitivity, specificity, positive (PPV) and negative predictive values (NPV), and kappa’s agreement.

Results

442 individuals underwent chart abstraction (median age 53 years, 53% cirrhosis, 45% HCV, 26% NAFLD, 10% alcohol-related). In patients with cirrhosis, the algorithm had adequate sensitivity/PPV (>75%) and excellent specificity/NPV (>90%) for all etiologies. In those without cirrhosis, the algorithm was excellent for all etiologies except for hemochromatosis and autoimmune diseases.

Conclusions

A hierarchical algorithm incorporating laboratory and administrative coding can accurately define cirrhosis etiology in routinely collected healthcare data. These results should facilitate health services research in this growing patient population.

Introduction

Chronic liver disease (CLD) and cirrhosis are the 12th leading causes of death globally [1] and over the past two decades, both the incidence of and mortality from cirrhosis have steadily increased in North America. [24] The majority of causes of CLD and cirrhosis have their own distinct epidemiology and natural history with treatment recommendations and outcomes being influenced largely by the underlying etiology. Therefore, the ability to define the cause of CLD and cirrhosis is important both in clinical practice and for clinical research.

The most common causes of CLD and cirrhosis in North America are due to chronic viral hepatitis B (HBV), C (HCV), alcohol-related disease and non-alcoholic fatty liver disease (NAFLD). Together, these conditions are present in approximately 80% of individuals with CLD.[5] More rare causes include autoimmune liver diseases such as autoimmune hepatitis (AIH), primary biliary cholangitis (PBC), and primary sclerosing cholangitis (PSC) and genetic conditions such as hereditary hemochromatosis, Wilson disease, and alpha-1 antitrypsin deficiency. In clinical practice, defining the etiology of CLD and cirrhosis requires careful incorporation of information obtained both from the clinical history and physical examination in addition to results from laboratory, imaging and histologic investigations.[5]

The use of population level administrative healthcare data has evolved as a powerful tool for health services and outcomes research. One of the fundamentals in the use of administrative data is to understand the ability to accurately define specific disease conditions, interventions and outcomes within the databases being used. Given that administrative data does not routinely include details of patients’ clinical history, physical exam findings or results of clinical tests, investigators often use surrogates such as physician diagnostic billing codes to identify conditions of interest. However, it is essential to understand if such methods have adequate diagnostic accuracy to identify the specific condition or outcome under investigation. To date, there have been a small number of studies that have validated several CLD and cirrhosis etiologies within administrative healthcare data.[610] However, to our knowledge no previous work has evaluated a hierarchical algorithm to define etiology in a defined population with CLD or cirrhosis similar to how etiology is defined in clinical practice.

The aim of this study was to validate a hierarchical algorithm for CLD and cirrhosis etiology incorporating both laboratory and administrative coding in routinely collected healthcare data.

Methods

Primary chart abstraction

All consecutive patients with chronic liver disease (elevated AST/ALT > 6 months) or cirrhosis who attended the Kingston Health Sciences Center (KHSC) Liver Clinic between May 1 –August 31, 2013 underwent detailed chart abstraction by a single abstractor (DC). The KHSC Liver Clinic is attended by two subspecialty trained Hepatologists who see a wide variety of chronic liver conditions referred from the surrounding region of Kingston, Ontario, Canada with a catchment area of approximately 1 million. The majority of patients are referred from primary care practitioners as there are no local community practicing gastroenterologists or hepatologists and KHSC does not perform liver transplantation.

Patient data extracted from the clinic’s electronic medical records included patient demographics, laboratory data (including Model for End-stage Liver Disease [MELD], liver enzymes, platelet count) imaging data, endoscopic reports, pathology data, non-invasive fibrosis assessment test results, and any hepatic decompensation events. Cirrhosis was identified based on the presence of any decompensation event (ascites, bleeding varices, encephalopathy, or explicit mention of decompensated cirrhosis) or explicit mention of cirrhosis, or non-bleeding varices. In addition, a liver biopsy result of F4 fibrosis, a non-invasive test result consistent with F4 fibrosis (either serum tests or transient elastography), or imaging consistent with portal hypertension in an individual with known chronic liver disease were also considered diagnostic of cirrhosis. A 5% random sample of charts was re-abstracted by a hepatologist (JAF). Agreement beyond chance on the outcome ascertainment by both abstracters was measured using Cohen’s kappa.

Gold standard: Liver disease etiology

A most responsible cause of liver disease was assigned to each patient based on the overall assessment by the attending Hepatologist evaluating the patient as either HBV, HCV, alcohol-related disease, autoimmune disease (composite of AIH, PBC, PSC), hereditary hemochromatosis, or NAFLD/cryptogenic. NAFLD and cryptogenic were grouped together as the natural history of these two conditions are similar.[11] In cases of viral hepatitis where alcohol was also a contributing factor, the cause of liver disease was assigned as viral hepatitis if the patient remained viremic. In those patients where several causes of liver disease were identified, the one assessed by the Hepatologist as the most likely contributing diagnosis was assigned.

Administrative databases used for liver disease etiology validation

The validation of liver disease etiology was performed by individual linkage of all abstracted patient data to the routinely collected administrative health care data from the province of Ontario, Canada housed at ICES. ICES is an independent, non-profit research institute funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). As a prescribed entity under Ontario’s privacy legislation, ICES is authorized to collect and use routinely collected health care data for the purposes of health system analysis, evaluation and decision support. Secure access to these data is governed by policies and procedures that are approved by the Information and Privacy Commissioner of Ontario. Ontario provides universal health care coverage for its population of approximately 14 million through the Ontario Health Insurance Program (OHIP). The primary databases used in this analysis were: 1) the Registered Persons Database (RPDB) which includes demographic and vital status information for individuals covered under OHIP; 2) the Canadian Institute for Health Information Discharge Abstract Database (CIHI DAD) which captures diagnostic and procedural information from inpatient hospital admissions; 3) the National Ambulatory Care Reporting System (NACRS) which captures diagnostic and procedural information from ambulatory care and emergency room visits; 4) the OHIP Physician Claims Database which includes all claims made by physicians for universally insured services; 5) the Ontario Laboratory Information System (OLIS) which includes over 90% of all bloodwork results performed by hospitals and clinical laboratories in Ontario from 2007–2015 and; 6) Public Health Ontario (PHO) HBV and HCV test results from 1997–2015 which processes over 95% of all viral hepatitis testing in the province. These databases were linked using anonymized unique encoded identifiers at the individual level and analyzed at ICES. Patient income quintile and rurality were derived from RPDB and based on area-level demographics of the patient’s postal code.[12] Previous hospitalizations and emergency room (ER) visits were determined from CIHI DAD and NACRS. Due to an ICES privacy agreement, data containing small cells (n≤5) are not reportable due to re-identification risk. This study was approved by the Health Sciences Research Ethics Board at Queen’s University (DMED 1651–13).

Administrative algorithm to identify liver disease etiology

To identify causes of chronic liver disease in administrative data from ICES, a hierarchical algorithm was developed adapted from an algorithm previously used in the Veteran’s Affairs (VA) administrative data.[13] The algorithm is based on hierarchical criteria similar to how liver disease etiology is assigned in clinical practice that categorizes patients into specific underlying causes of liver disease under the condition that more plausible causes have been excluded (Fig 1). First, patients were assessed for the presence of chronic viral hepatitis through the use of laboratory test results from the PHO Laboratory Information System or OLIS. The presence of a positive HCV RNA or HCV genotype classified an individual as HCV. A positive HCV antibody in isolation was not considered diagnostic for HCV. If negative, a positive HBV DNA or HBV surface antigen was required to define the etiology as HBV. If all viral testing was negative, ICD and OHIP coding from CIHI DAD and NACRS were evaluated for the presence of autoimmune conditions or hereditary hemochromatosis (Table 1) prior to the clinic visit date. Diagnoses in both databases are based on codes from the International Classification of Diseases, 9th (ICD-9, 1988–2001) and 10th revisions (ICD-10, 2002-onwards). If present in the administrative data, they were assigned as either autoimmune or hereditary hemochromatosis. If negative, databases were searched for codes associated with alcohol-related conditions previously used in ICES data holdings (Table 1).[14,15] If all above were negative, then the patient was assigned as having NAFLD/cryptogenic liver disease etiology. The gold standard liver disease etiology from chart abstracted data was then compared to the liver disease etiology diagnosis based on the algorithm.

Fig 1. Hierarchical algorithm to define CLD and cirrhosis etiology in administrative healthcare data.

Fig 1

HCV: hepatitis C; HBV: hepatitis B; sAg: surface antigen; AIH: autoimmune hepatitis; PBC: primary biliary cholangitis; PSC: primary sclerosing cholangitis.

Table 1. ICD-9 and 10 codes used to define CLD and cirrhosis etiology in ICES data.

ICD-9 Codes ICD-10 Codes
Autoimmune Disease
Primary biliary cholangitis or biliary cirrhosis 571.6 K74.3
Primary sclerosing cholangitis (PSC) 576.1 K83.0
Autoimmune hepatitis (AIH) 571.42 K75.4
Metabolic Disorder
Hereditary Hemochromatosis (HH) 275.0 E83.10
Alcohol-related codes
Acute intoxication 305.0 F100
Harmful alcohol use 305.0 F101
Alcohol dependence 303 F102
Alcohol withdrawal 291.0 F103, F104
Other alcohol-related psychoses 291 F105-F109
Accidental or intentional poisoning by alcohol X45, Y15, X65
Alcoholic fatty liver 571.0 K700
Alcoholic hepatitis 571.1 K701
Alcoholic fibrosis and sclerosis of liver 571.2 K702
Alcoholic cirrhosis 571.2 K703
Alcoholic hepatic failure K704
Alcoholic liver disease, unspecified 571.3 K709
Alcoholic gastritis 535.3 K292
Degeneration of nervous system due to alcohol G312
Alcoholic polyneuropathy 357.5 G621
Alcoholic myopathy G721
Alcoholic cardiomyopathy 425.5 I426
Alcohol-induced pancreatitis K852, K860
Alcohol-induced pseudo Cushing’s syndrome E244
Toxic effect of alcohol 980.0, 980.9 T510, T519
Finding of alcohol in blood 790.3 R780
Maternal care for (suspected) damage to fetus from alcohol 760.71 O354, Q860, P043

Statistical analysis

Descriptive characteristics of the liver disease cohort from KHSC overall and stratified by cirrhosis status were described. Frequencies and proportions were calculated for categorical variables (sex, income quintile, rurality, and patients with previous hospitalization or ED visits) while median and interquartile ranges were calculated for the numeric variables (age, MELD, laboratory values). Differences based on cirrhosis status were compared using t-tests and chi-squared tests.

The algorithm’s ability to accurately identify the most plausible cause of CLD and cirrhosis when applied to administrative data in comparison to clinical data was performed stratified by cirrhosis status. Sensitivity with 95% confidence intervals were calculated as the proportion of patients with a specified cause of CLD or cirrhosis identified through administrative data over the number of patients assigned the same cause based on gold standard clinical diagnoses. Specificity with 95% confidence intervals were calculated as the proportion of patients without the specified cause of CLD or cirrhosis in the administrative data over the number of patients without the same cause based on gold standard clinical diagnosis. Positive predicted values (PPV), negative predictive values (NPV), and kappa’s agreement with 95% confidence intervals were also calculated where a kappa >0.60 indicates substantial agreement and a kappa >0.80 indicates almost perfect agreement.[16] All analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA).

Results

Description of the KHSC cohort

A total of 442 unique patients underwent detailed chart abstraction (Table 2). The median age at the time of the clinic visit was 57 years (IQR 49–62), 261 (59%) were male and 233 (53%) had cirrhosis. Fibrosis assessment was based on non-invasive or clinical decompensation in 336/442 (76%) with the remaining 106/442 (24%) based on liver biopsy. Of those with cirrhosis, 93 (40%) had a history of decompensation. The most common gold standard chronic liver disease etiologies were HCV (199, 45%), followed by NAFLD/cryptogenic (115, 26%), and alcohol-related disease (45, 10%). As expected, those with cirrhosis were older (median age 59 vs. 52 years), had higher MELD scores and lower platelet counts compared to those without cirrhosis. The majority of patients had at least one hospitalization (311, 70%) or ER visit (415, 94%) identified prior to their clinic visit. The Cohen’s kappa for the re-abstracted charts (n = 20) showed complete agreement (kappa = 1).

Table 2. Demographics of patients evaluated in the liver clinic at Kingston Health Sciences Centre May 2013–August 2013.

Overall N = 442 Cirrhosis N = 233 No Cirrhosis N = 209 P-value^
Age, median (IQR) 57 (49–62) 59 (54–64) 52 (41–60) < .001
Sex, n (%)
• Male 261 (59) 150 (64) 111 (53) .039
• Female 181 (41) 83 (36) 98 (47)
Income Quintile*, n (%)
• 1 120 (27) 64 (28) 56 (27) .516
• 2 101 (23) 58 (25) 43 (21)
• 3 82 (19) 46 (20) 36 (17)
• 4 74 (17) 33 (14) 41 (20)
• 5/Missing 65 (15) 32 (14) 33 (16)
Rural, n (%) 115 (26) 57 (25) 58 (28) .478
Gold Standard etiology, n (%)
• Hepatitis C 199 (45) 115 (49) 84 (40) < .001
• Hepatitis B 37 (9) 13 (6) 24 (11)
• Alcohol-related 45 (10) ≤40 (<20) ≤10 (<5)
• NAFLD/Cryptogenic 115 (26) 40 (17) 75 (36)
• Autoimmune 40 (9) 24 (10) 16 (8)
• Hemochromatosis 6 (1) ≤5 (<3) ≤10 (<5)
Most recent laboratory values
Bilirubin (umol/L), median (IQR) 15 (11–22) 18 (13–30) 12 (10–16) < .001
AST (U/L), median (IQR) 40 (28–65) 50 (34–81) 32 (24–46) < .001
ALT (U/L), median (IQR) 36 (23–65) 36 (23–70) 36 (22–59) .245
Alk-P (U/L), median (IQR) 82 (65–117) 98 (76–136) 72 (58–89) < .001
Albumin (g/L), median (IQR) 38 (33–41) 35 (31–39) 40 (37–42) < .001
INR, median (IQR) 1.10 (1.00–1.20) 1.20 (1.10–1.30) 1.00 (1.00–1.10) < .001
MELD 9 (7–12) 10 (8–12) 7 (6–8) < .001
Platelets (109), median (IQR) 161 (106–220) 119 (86–166) 212 (175–261) < .001
History of Decompensation, n (%) 93 (40) 93 (40) n/a n/a
At least 1 hospitalization before clinic visit, n (%) 311 (70) 176 (76) 64.6 (135) 0.012
At least 1 hospitalization within 1 year before clinic visit, n (%) 85 (19) 62 (27) 23 (11) < .001
At least 1 ER visit before clinic visit, n (%) 415 (94) 223 (96) 192 (92) 0.092
At least 1 ER visit within 1 year before clinic visit, n (%) 189 (43) 120 (52) 69 (33) < .001

^: comparing those with and without cirrhosis;

* 1 = lowest, 5 = highest;

includes autoimmune hepatitis, primary biliary cholangitis, primary sclerosing cholangitis

IQR: interquartile range; NAFLD: non-alcoholic fatty liver disease; MELD: model for end-stage liver disease; ER: emergency room.

Validation of chronic liver disease etiology algorithm

The results of the validation for those with cirrhosis are shown in Table 3. Due to low numbers of patients with both hemochromatosis and cirrhosis, this etiology was not validated. In the patients with cirrhosis, the coding algorithm showed excellent specificity and NPV with values being > 95% for all gold standard diagnoses. Sensitivity and PPV were > 80% for all etiologies with the exception of NAFLD/cryptogenic where the sensitivity was slightly lower at 75% and PPV of 77%. Further, the kappa value was >0.7 for etiologies showing excellent agreement. Results for those without cirrhosis are shown in Table 4. Again, excellent sensitivity and specificity were demonstrated for viral hepatitis, alcohol-related disease, and NAFLD/cryptogenic etiologies. Although the specificity and NPV for autoimmune liver disease and hemochromatosis were high, the sensitivities were low at 56% and 40% respectively with kappa values of 0.67 and 0.49 respectively.

Table 3. Validation of etiology in patients with cirrhosis.

Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) Kappa (95% CI)
Hepatitis C 0.97 (0.93–0.99) 1.00 (0.97–1.00) 1.00 (0.97–1.00) 0.98 (0.93–0.99) 0.98 (0.95–1.00)
Hepatitis B 0.92 (0.64–1.00) 0.99 (0.97–1.00) 0.86 (0.57–0.98) 1.00 (0.97–1.00) 0.88 (0.75–1.00)
Alcohol-related 0.90 (0.76–0.97) 0.96 (0.92–0.98) 0.82 (0.67–0.92) 0.98 (0.95–0.99) 0.83 (0.73–0.92)
NAFLD/Cryptogenic 0.75 (0.59–0.87) 0.95 (0.91–0.98) 0.77 (0.61–0.89) 0.95 (0.91–0.97) 0.71 (0.59–0.83)
Autoimmune 0.83 (0.63–0.95) 0.99 (0.97–1.00) 0.91 (0.71–0.99) 0.98 (0.95–0.99) 0.86 (0.73–0.92)

NAFLD: non-alcoholic fatty liver disease; PPV: positive predictive value; NPV: negative predictive value

Table 4. Validation of etiology in patients without cirrhosis.

Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) Kappa (95% CI)
Hepatitis C 0.90 (0.82–0.96) 1.00 (0.97–1.00) 1.00 (0.95–1.00) 0.94(0.88–0.97) 0.92 (0.86–0.97)
Hepatitis B 0.96 (0.79–1.00) 0.99 (0.97–1.00) 0.96 (0.79–1.00) 0.99 (0.97–1.00) 0.95 (0.89–1.00)
Alcohol-related 0.80 (0.28–0.99) 0.97 (0.94–0.99) 0.40 (0.12–0.74) 1.00 (0.97–1.00) 0.52 (0.21–0.83)
NAFLD/Cryptogenic 0.92 (0.83–0.97) 0.87 (0.80–0.92) 0.80 (0.70–0.88) 0.95 (0.90–0.98) 0.77 (0.68–0.86)
Autoimmune 0.56 (0.30–0.80) 0.99 (0.97–1.00) 0.90 (0.56–1.00) 0.96 (0.93–0.99) 0.67 (0.46–0.88)
Hemochromatosis 0.40 (0.05–0.85) 1.00 (0.97–0.99) 0.67 (0.09–0.99) 0.99 (0.96–1.00) 0.49 (0.06–0.92)

NAFLD: non-alcoholic fatty liver disease; PPV: positive predictive value; NPV: negative predictive value

Discussion

In this study, we validated a sensitive and highly specific hierarchical algorithm within administrative data to assign a chronic liver disease etiology in patients with cirrhosis. Additionally, to our knowledge, this is the first validation of a hierarchical algorithm to define liver disease etiology within administrative data using a combination of viral hepatitis serology and ICD-9 and 10 codes. The results of this study will facilitate investigators ability to perform both epidemiologic and health services research in patients with cirrhosis using routinely collected administrative healthcare data.

The natural history of CLD and cirrhosis is closely linked to the underlying disease etiology. For example, effective and well tolerated treatments are available for both HCV and HBV that have been shown to alter its natural history. In patients with alcohol-related liver disease, alcohol abstinence is the mainstay of therapy. This is in contrast to NAFLD where, other than lifestyle interventions, no medical therapy is currently approved that has been shown to alter the disease trajectory. Further, from a public health perspective, strategies to identify and manage certain subtypes of patients with liver disease is important. Therefore, it is essential to be able to categorize patients with liver disease into their disease etiology to better understand trends in disease epidemiology, healthcare utilization and clinical outcomes.

Our validation cohort is reflective of a population of patients with chronic liver disease that would be evaluated in a general outpatient Internal Medicine, Gastroenterology, or Hepatology practice. This cohort included men and women both with and without cirrhosis, and also with a history of decompensated liver disease residing both in urban and rural settings. The most common causes of liver disease in our cohort are reflective of the causes of liver disease in the general population of North America with the majority having either HCV, NAFLD or alcohol-related disease. Therefore, this study has external validity.

The diagnostic accuracy of the algorithm to define liver disease etiology was superior in patients who had cirrhosis. This may be due to a more thorough evaluation in someone with cirrhosis compared to someone without. Alternatively, it may be explained by patients with cirrhosis having more contact with the healthcare system and therefore more diagnostic codes recorded in the medical record. This is reflected by the fact that more patients with cirrhosis were hospitalized or had an ER visit prior to the date of their KHSC clinic visit compared to those without.

Based on previous work which has evaluated study power in diagnostic tests, our cohort of patients from KHSC had adequate power (>80%) to determine the sensitives and specificities for HCV, HBV, alcohol-related, NAFLD/cryptogenic, and autoimmune etiologies however, power was lacking to define hereditary hemochromatosis.[17] The highest diagnostic accuracy was seen in patients with viral hepatitis both with and without cirrhosis. This is likely due to the use of PHO viral serology data used to define these diagnoses as compared to the others which relied on the use of ICD or OHIP coding. Additionally, we were able to define alcohol-related etiology with excellent accuracy especially in those with cirrhosis. These results are very comparable to the validation of both viral hepatitis and alcohol-related disease done in patients with cirrhosis in the VA system in the United States.[10] Overall, the sensitivity and PPV to define patients with NAFLD/cryptogenic cirrhosis was slightly lower than that of viral hepatitis or alcohol-related disease however the accuracy remained acceptable for research purposes. In general, using an algorithm with a high specificity is preferable to identify a cohort of patients as it maximizes the likelihood that individuals have the condition of interest. Further, our results are comparable to the only other validation study of NAFLD in administrative data which was also performed in the VA.[9] Approximately 10% of our cohort had autoimmune liver diseases or hereditary hemochromatosis and therefore, especially in patients without cirrhosis, the accuracy was lower compared to other etiologies. This is further explained by the fact that outpatient OHIP billing codes do not have specific diagnostic codes for the autoimmune liver conditions or hereditary hemochromatosis. Therefore, to receive a specific ICD code for these conditions, the individual would need to have had it recorded during a hospital admission or emergency room visit which is less frequent in those without cirrhosis (Table 1). Previous work has also shown the ability to define PSC in administrative data is suboptimal.[7] Therefore, in patients without cirrhosis, the ability of administrative data to define these etiologies should be taken into consideration.

The results of this study should be considered in light of methodologic limitations. The cohort used for the validation was derived from a single center outpatient Hepatology practice and therefore may not be reflective of the entire population of patients with CLD and cirrhosis. However, we believe this algorithm would be applicable to any individual assessed for CLD etiology who has received a general evaluation for causes of CLD as recommended by guidelines given that our cohort was derived largely by referrals from primary care practitioners. Additionally, the catchment of KHSC is approximately 1 million individuals in Ontario with 25% rural residence, however, the ethnic diversity of our catchment area would be less than that of a major urban center. Secondly, we did not evaluate the ability of the administrative data to identify two or more causes of chronic liver disease. Third, due to low numbers of patients with autoimmune liver conditions, we were unable to evaluate each specific condition (AIH, PBC, PSC) separately. Further, there were no patients in the cohort who had diagnoses of alpha-1 antitrypsin deficiency or Wilson disease and therefore the ability of administrative data to define these chronic liver diseases is unknown. Further, in large administrative data, patients with these diagnoses using this algorithm would be grouped into the NAFLD/cryptogenic category. However, given the rarity of these conditions, it would only contribute to < 1% of the CLD/cirrhosis population. Finally, this validation was done in administrative data from a universally insured healthcare system and may not be generalizable to other types of administrative health data.

In conclusion, the use of a hierarchical coding algorithm in administrative healthcare data is able to define CLD and cirrhosis etiology with excellent diagnostic accuracy using a combination of viral hepatitis serology and administrate diagnostic coding, especially in individuals with cirrhosis. These results should facilitate future health services research in this growing patient population.

Acknowledgments

This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions, results, and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by ICES or the MOHLTC is intended or should be inferred. Parts of this material are based on data and information compiled and provided by CIHI, MOHLTC, and PHO. However, the analyses, conclusions, opinions, and statements expressed herein are those of the author, and not necessarily those of CIHI, MOHLTC, and PHO.

Data Availability

The authors are affiliated with ICES and conducted the study in fulfillment of ICES’ mandate as a prescribed entity under Ontario’s Personal Health Information Protection Act. As a result, the authors were authorized, both legally and contractually, to access the data set in a more granular form than approved third party researchers would be permitted to access. The data set that approved third party researchers would be permitted to access would be adjusted to ensure the risk of re-identification of any underlying individuals is low. While data sharing agreements and privacy legislation for the province of Ontario prohibit ICES from making the data set publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at www.ices.on.ca/DAS. Please see the Methods Section for an outline of all datasets used.

Funding Statement

JAF: American Association for the Study of Liver Disease Foundation Clinical, Translational and Outcomes Research Award in Liver Disease (http://www.aasldfoundation.org/); Southeastern Ontario Academic Medical Association New Clinician Scientist Award (https://www.seamo.ca/). The funders has no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lozano R, Naghavi M, Foreman K, Lim S, Shibuya K, Aboyans V, et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. The lancet. 2012. December 15;380(9859):2095–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Heron M. National Vital Statistics Reports Volume 65, Number 5 June 30, 2016 Deaths: Leading Causes for 2014. 2016;65(5). [PubMed] [Google Scholar]
  • 3.Flemming JA, Dewit Y, Mah JM, Saperia J, Groome P, Booth CM. Increased Cirrhosis Incidence in Young Birth Cohorts from 1997–2016: A population-based study. 2019. Lancet Gastroenterol Hepatol. 4(3): 217–226. 10.1016/S2468-1253(18)30339-X [DOI] [PubMed] [Google Scholar]
  • 4.Tapper EB and Parikh ND. Mortality due to cirrhosis and liver cancer in the United States 1999–2016: observational study. BMJ. 2018; 18(362), epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pratt DS, Kaplan MM. Evaluation of abnormal liver-enzyme results in asymptomatic patients. N Engl J Med. 2000;342(17):1266 10.1056/NEJM200004273421707 [DOI] [PubMed] [Google Scholar]
  • 6.Myers RP, Shaheen AA, Fong A, Wan AF, Swain MG, Hilsden RJ et al. Validation of coding algorithms for the identification of patients with primary biliary cirrhosis using administrative data. Can J Gastroenterol. 2010. March; 24(3): 175–182. 10.1155/2010/237860 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Molodecky NA, Myers RP, Barkema HW, Quan H, Kaplan GG. Validity of administrative data for the diagnosis of primary sclerosing cholangitis: a population‐based study. 2011. Liver International; 31(5): 712–720. 10.1111/j.1478-3231.2011.02484.x [DOI] [PubMed] [Google Scholar]
  • 8.Nui B, Forde KA, Goldberg DS. Coding algorithms for identifying patients with cirrhosis and hepatitis B or C virus using administrative data. Pharmacoepidemiol Drug Saf. 2015. January;24(1):107–11. 10.1002/pds.3721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Husain N, Blais P, Kramer J, Kowalkowski M, Richardson P, El-Serag HB, et al. Nonalcoholic fatty liver disease (NAFLD) in the Veterans Administration population: development and validation of an algorithm for NAFLD using automated data. 2014. AP&T; 40(8): 949–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kramer JR, Davila JA, Miller ED, Richardson P, Giordano TP, El-Serag HB. The validity of viral hepatitis and chronic liver disease diagnoses in Veterans Affairs administrative databases. 2008. AP&T; 27(3):274–282. [DOI] [PubMed] [Google Scholar]
  • 11.Caldwell SH, Oelsner DH, iezzoni JC, Hespenheide EE, Battle EH, Driscoll CJ. Cryptogenic cirrhosis: clinical characterization and risk factors for underlying disease. Hepatology. 1999. March;29(3):664–9. 10.1002/hep.510290347 [DOI] [PubMed] [Google Scholar]
  • 12.Statistics Canada. Health system indicators (Canadian Institute for Health Information–CIHI). http://www.statcan.gc.ca/pub/82-221-x/2013001/quality-qualite/qua8-eng.htm. Oct 1, 2019.
  • 13.Beste LA, Leipertz SL, Green PK, Dominitz J, Ross D, Ioannou G. Trends in Burden of Cirrhosis and Hepatocellular Carcinoma by Underlying Liver Disease in US Veterans, 2001–2013. Gastroenterology. 2015. 149:1471–1482. 10.1053/j.gastro.2015.07.056 [DOI] [PubMed] [Google Scholar]
  • 14.Hwang SW, Agha MM, Creatore MI, Glazier RH. Age- and sex-specific income gradients in alcohol-related hospitalization rates in an urban area. Ann Epidemiol. 2005;15:56–63. 10.1016/j.annepidem.2004.04.003 [DOI] [PubMed] [Google Scholar]
  • 15.Myran DT, Hsu AT, Smith G, Tanuseputro P. Rates of emergency department visits attributable to alcohol use in Ontario from 2003 to 2016: a retrospective population-level study. CMAJ. 2019. July 22;191:E804–10. 10.1503/cmaj.181575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Landis J, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33(1): 159–174. [PubMed] [Google Scholar]
  • 17.Bujang MA, Adnan TH. Requirements for minimal sample size for sensitivity and specificity analysis. J Clin Diagn Res. 2016; 10(10): YE01–YE06. 10.7860/JCDR/2016/18129.8744 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Wenyu Lin

14 Nov 2019

PONE-D-19-27866

Validation of a hierarchical algorithm to define chronic liver disease and cirrhosis etiology in administrative healthcare data

PLOS ONE

Dear Dr. Flemming,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised by two reviewers during the review process.

We would appreciate receiving your revised manuscript by Dec 29 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Wenyu Lin, PhD

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In the ethics statement in the manuscript and in the online submission form, please provide additional information about the patient records used in your retrospective study, including: a) whether all data were fully anonymized before you accessed them and b) the date range (month and year) during which patients' medical records were accessed,

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this study, George Philip et al. validated a hierarchical algorithm for CLD and cirrhosis etiology in administrative healthcare data. Diagnostic accuracy of a hierarchical algorithm incorporating both laboratory and administrative codes to define etiology was evaluated by calculating sensitivity, specificity, positive (PPV) and negative predictive values (NPV), and kappa’s agreement. The use of a hierarchical coding algorithm is able to define CLD and cirrhosis etiology with excellent diagnostic accuracy using a combination of viral hepatitis serology and administrate diagnostic coding, especially in individuals with cirrhosis.

The findings are overall interesting, even the cohort is rather small. I have a number of major issues with this manuscript.

Specific comments to improve the study are listed below:

1. Did the authors perform power calculation? There were 442 patients, is that enough?

2. Cirrhosis was identified based on the presence of any decompensation event. In addition, a liver biopsy result of F4 fibrosis. This is not the golden diagnose. How many patients were diagnosed by these two methods respectively?

3. In the table 3 and 4, the authors should describe specific data, not just Sensitivity, Specificity, PPV etal.

4. In the table 4, “Autoimmune”, the Specificity, PPV and NPV were all high, but only Sensitivity was low? Why?

5. In the table 4, “Hemochromatosis”, the Specificity and NPV were high, but Sensitivity and PPV were low? Please explain the results in discussion. Is it due to the small number of patients? Please offer the clear numbers.

6. there were some limitations in the manuscript, such as all patients from one city, but the authors did not refer some shortages.

Reviewer #2: In the manuscripts, a hierarchical algorithm is applied to find etiology of CLD/cirrhosis by using administrative healthcare data, and the resulting etiology is compared to golden standard identified ones. The data is comprehensive and statistics are valid. There’s some concerns:

• In the abstracts, the term “sensitivity/NPV” and “specificity/PPV” is mismatched.

• In the introduction, author states “The most common causes of CLD and cirrhosis in North America are secondary to chronic viral hepatitis B (HBV)….... ”, so what’s primary causes?

• In Table 2, Golden standard etiology categorical data showed less accurate number and percentage of Alcoholic related and hemochromatosis, could author supply addition explanation?

• “93 (40%) had a history of decompensation” did not appeared in Table 2

• The data did not fully support the conclusion “A hierarchical algorithm incorporating laboratory and administrative coding can accurately define CLD and cirrhosis etiology in routinely collected healthcare data”. For in non-cirrhosis patients, the sensitivities were low at 56% and 40% respectively with kappa values of 0.67 and 0.49 respectively. Author should be more cautious to make the conclusion.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Feb 18;15(2):e0229218. doi: 10.1371/journal.pone.0229218.r002

Author response to Decision Letter 0


7 Jan 2020

POINT BY POINT REPONSE TO REVIEWERS

PONE-D-19-27866: Validation of a hierarchical algorithm to define chronic liver disease and cirrhosis etiology in administrative healthcare data

We would like to sincerely thank the editorial board and the peer reviewers for the review and critique of our manuscript with the opportunity to re-submit a revised version. We believe that we have been able to respond to all comments and concerns and feel that this review has greatly improved the manuscript. Please see the below responses and attached revised manuscript.

Reviewer #1: In this study, George Philip et al. validated a hierarchical algorithm for CLD and cirrhosis etiology in administrative healthcare data. Diagnostic accuracy of a hierarchical algorithm incorporating both laboratory and administrative codes to define etiology was evaluated by calculating sensitivity, specificity, positive (PPV) and negative predictive values (NPV), and kappa’s agreement. The use of a hierarchical coding algorithm is able to define CLD and cirrhosis etiology with excellent diagnostic accuracy using a combination of viral hepatitis serology and administrate diagnostic coding, especially in individuals with cirrhosis.

The findings are overall interesting, even the cohort is rather small. I have a number of major issues with this manuscript.

Specific comments to improve the study are listed below:

1. Did the authors perform power calculation? There were 442 patients, is that enough?

Response: Thank you for bringing this to our attention – changes made. You are right to point out that we did not make any comments regarding the study power in the original manuscript. Given that we are using a hierarchical algorithm to define different causes of cirrhosis and CLD our study is a bit different than a standard diagnostic study as we are looking at the power to determine the presence/absence of several different disease states which is dependent on the prevalence of the condition in the group at study. However, we have now calculated the power we had by etiology based on the following manuscript (J Clin Diagn Res. 2016 Oct; 10(10): YE01–YE06). We believe we have adequate power to define the sensitivity and specificity for 1) HCV (based on an anticipated prevalence of 20%-30%, we required between 50-200 patients for sensitivity/specificity >90% [total cases 199]); 2) alcohol-related disease (based on anticipated prevalence of 20%, we required ~25-50 patients for sensitivity/specificity >80% [total cases 45]); 3) NAFLD/cryptogenic (based on anticipated prevalence of 20%, we required ~107 cases for sensitivity/specificity >90% [total cases 115]); HBV (based on anticipated prevalence 5% we required ~5-20 patients for sensitivity/specificity >80% [total cases 37]); autoimmune (based on anticipated prevalence 5% we required ~5-20 patients for sensitivity/specificity >80% [total cases 40]). Our power to define hemochromatosis is not ideal given the low number of cases (total of 6 only) which we had commented on previously. We have clarified this in the discussion with reference to the above-mentioned paper (see Discussion paragraph 5).

2. Cirrhosis was identified based on the presence of any decompensation event. In addition, a liver biopsy result of F4 fibrosis. This is not the golden diagnose. How many patients were diagnosed by these two methods respectively?

Response: Thank you for this comment – changes made. The definition of cirrhosis is F4 fibrosis on liver biopsy based on the Metvair staging system and is the gold standard to diagnose cirrhosis (J Hepatol. 2007: 47:598-607). However, in clinical practice, an individual with a known chronic liver disease with a liver decompensation event is also considered to have cirrhosis and biopsy is not required. We have clarified in the results section that in the cohort 336/442 (76%) had fibrosis staging based on non-invasive or clinical data and 106/442 (24%) were staged based on biopsy data. See results section paragraph 1.

3. In the table 3 and 4, the authors should describe specific data, not just Sensitivity, Specificity, PPV etal.

Response: We appreciate this comment however as written it is unclear exactly what specific data the reviewer is referring to. In studies of diagnostic accuracy, it is important to describe sensitivity, specificity, positive predictive value, negative predictive value and the kappa. If there are other parameters that we are missing we are hoping the reviewer could clarify this statement or make a specific suggestion and we are happy to respond to this omission.

4. In the table 4, “Autoimmune”, the Specificity, PPV and NPV were all high, but only Sensitivity was low? Why?

Response: Thank you for this observation. Unlike many of the other chronic liver diseases, the ability to make a diagnosis of autoimmune hepatitis is based on a scoring system of clinical criteria which includes autoimmune markers, liver biopsy findings and the lack of other clinical diagnoses. Given that the sensitivity was only low in those without cirrhosis, we have two potential explanations. First, it is likely that there was a lower frequency of liver biopsies being performed to secure the diagnosis of autoimmune liver disease in individuals without cirrhosis as this is commonly done in clinical practice. Secondly, unlike ICD coding, outpatient OHIP billing codes do not have codes specific to the autoimmune liver diseases (including PBC and PSC). Therefore, in individuals without cirrhosis who are less likely to be admitted to hospital or visit the emergency room, the sensitivity is lower than in those with cirrhosis (sensitivity 0.83).

5. In the table 4, “Hemochromatosis”, the Specificity and NPV were high, but Sensitivity and PPV were low? Please explain the results in discussion. Is it due to the small number of patients? Please offer the clear numbers.

Response: Thanks – changes made. Typically, in diagnostic accuracy studies, there is always a trade-off between sensitivity and specificity with one going down as the other goes up. Similar to the question #4 response above, outpatient OHIP billing codes are not specific for hemochromatosis, therefore an ICD code for hemochromatosis would only be captured if the individual was admitted to hospital or had an emergency room visit. Table 2 shows that hospitalization/ER visits were less frequent in those without cirrhosis. We have added a comment regarding this in the Discussion section, paragraph 5. Unfortunately, due to privacy agreements we are un able to describe cells with N <=5 as outlined in the methods section.

6. there were some limitations in the manuscript, such as all patients from one city, but the authors did not refer some shortages.

Response: Thanks – change made. We have an entire paragraph dedicated to the limitations of the study. Please see the limitations section of the Discussion, paragraph 6 where four major limitations are discussed (1: single center study; 2: inability to define two causes of liver disease; 3: low numbers of the rarer liver conditions; 4: databases from universal healthcare system). We have expanded upon the fact that these patients were all from one center as suggested.

Reviewer #2: In the manuscripts, a hierarchical algorithm is applied to find etiology of CLD/cirrhosis by using administrative healthcare data, and the resulting etiology is compared to golden standard identified ones. The data is comprehensive and statistics are valid. There’s some concerns:

• In the abstracts, the term “sensitivity/NPV” and “specificity/PPV” is mismatched.

Response: Thanks for pointing this out! Change made see abstract.

• In the introduction, author states “The most common causes of CLD and cirrhosis in North America are secondary to chronic viral hepatitis B (HBV)….... ”, so what’s primary causes?

Response: Thanks for pointing out that this sentence was not clear to readers. Please see change to the sentence in the Introduction, paragraph 2.

• In Table 2, Golden standard etiology categorical data showed less accurate number and percentage of Alcoholic related and hemochromatosis, could author supply addition explanation?

Response: Thank you for pointing this out. We believe you are referring to the total number of patients with alcohol-related liver disease and hemochromatosis. Given that this cohort was taken from a group who is being seen in a referral center, it likely reflects that patients with alcohol-related liver disease maybe: 1) less likely to be referred or 2) less likely to attend outpatient appointments compared to patients with other causes of liver disease. With respect to hemochromatosis, typically patients without liver disease may be managed with phlebotomy by colleagues in Hematology and may not present to Hepatology clinics.

• “93 (40%) had a history of decompensation” did not appeared in Table 2

Response: Thanks - change made. Please see updated Table 2.

• The data did not fully support the conclusion “A hierarchical algorithm incorporating laboratory and administrative coding can accurately define CLD and cirrhosis etiology in routinely collected healthcare data”. For in non-cirrhosis patients, the sensitivities were low at 56% and 40% respectively with kappa values of 0.67 and 0.49 respectively. Author should be more cautious to make the conclusion.

Response: Thank you – change made. Please see revised Abstract and Discussion paragraph 1.

Attachment

Submitted filename: Flemming_Validation_further info Dec 13.docx

Decision Letter 1

Wenyu Lin

3 Feb 2020

Validation of a hierarchical algorithm to define chronic liver disease and cirrhosis etiology in administrative healthcare data

PONE-D-19-27866R1

Dear Dr. Flemming,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Wenyu Lin, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Wenyu Lin

6 Feb 2020

PONE-D-19-27866R1

Validation of a hierarchical algorithm to define chronic liver disease and cirrhosis etiology in administrative healthcare data

Dear Dr. Flemming:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Wenyu Lin

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Flemming_Validation_further info Dec 13.docx

    Data Availability Statement

    The authors are affiliated with ICES and conducted the study in fulfillment of ICES’ mandate as a prescribed entity under Ontario’s Personal Health Information Protection Act. As a result, the authors were authorized, both legally and contractually, to access the data set in a more granular form than approved third party researchers would be permitted to access. The data set that approved third party researchers would be permitted to access would be adjusted to ensure the risk of re-identification of any underlying individuals is low. While data sharing agreements and privacy legislation for the province of Ontario prohibit ICES from making the data set publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at www.ices.on.ca/DAS. Please see the Methods Section for an outline of all datasets used.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES