Abstract
The biochemical response to ursodeoxycholic acid (UDCA)—so-called “treatment response”—strongly predicts long-term outcome in primary biliary cholangitis (PBC). Several long-term prognostic models based solely on the treatment response have been developed that are widely used to risk stratify PBC patients and guide their management. However, they do not take other prognostic variables into account, such as the stage of the liver disease. We sought to improve existing long-term prognostic models of PBC using data from the UK-PBC Research Cohort. We performed Cox’s proportional hazards regression analysis of diverse explanatory variables in a derivation cohort of 1,916 UDCA-treated participants. We used nonautomatic backward selection to derive the best-fitting Cox model, from which we derived a multivariable fractional polynomial model. We combined linear predictors and baseline survivor functions in equations to score the risk of a liver transplant or liver-related death occurring within 5, 10, or 15 years. We validated these risk scores in an independent cohort of 1,249 UDCA-treated participants. The best-fitting model consisted of the baseline albumin and platelet count, as well as the bilirubin, transaminases, and alkaline phosphatase, after 12 months of UDCA. In the validation cohort, the 5-, 10-, and 15-year risk scores were highly accurate (areas under the curve: >0.90).
Conclusions
The prognosis of PBC patients can be accurately evaluated using the UK-PBC risk scores. They may be used to identify high-risk patients for closer monitoring and second-line therapies, as well as low-risk patients who could potentially be followed up in primary care. (HEPATOLOGY 2016;63:930-950)
Primary biliary cholangitis (PBC) is a chronic liver disease in which autoimmune destruction of the intrahepatic bile ducts results in cholestasis and progressive fibrosis.(1) Biliary injury may eventually lead to cirrhosis and liver failure—but the rate of disease progression is variable.(2) Across the spectrum, some patients with PBC progress to end-stage liver disease (ESLD) within a few years of diagnosis; some develop cirrhosis that remains well compensated; others (perhaps the majority) do not even develop cirrhosis. In PBC, as in other conditions, accurate prognostication enables management of the disease to be tailored to the patient. This is the basis of precision medicine—and it has clear benefits: patients at higher risk of adverse outcomes may be prioritized for closer monitoring and second-line therapy; those at low risk may be reassured and followed up less frequently, even in primary care. This enables better distribution of health care resources, reducing costs and improving delivery.(3)
The only licensed pharmacotherapy for PBC is ursodeoxycholic acid (UDCA). Treatment with UDCA has been shown to improve survival in PBC, and for this reason, it is recommended that all patients with PBC take UDCA at a dose of 13-15 mg/kg/day.(1,4,5) In 2006, it was shown that the biochemical response to treatment with UDCA—so-called “treatment response”—strongly predicts long-term outcome in PBC.(6) This was a major advance that prompted the development of several prognostic models based solely on treatment response, including the Barcelona, Paris I, Rotterdam, Toronto, and Paris II criteria.(6–10) These models are highly accurate—and used increasingly to risk stratify PBC patients and guide their management.(2) However, it was shown more recently that the aspartate transaminase (AST) to platelet ratio index (APRI) also predicts outcomes in PBC, independent of UDCA response.(11) This suggests that existing prognostic models of PBC might be improved by taking other variables into account.
In the current study, we aimed to incorporate measures of treatment response with other prognostic variables in a new, long-term prognostic model of PBC that could be used to estimate the absolute risk of developing ESLD within specific time points in the future. To do so, we analyzed data from a derivation cohort consisting of 1,916 UDCA-treated participants, selected at random from the UK-PBC Research Cohort. We derived a scoring system based on treatment response and markers of disease stage. We then validated the scoring system in an independent, validation cohort consisting of 1,249 UDCA-treated participants, also selected at random from the UK-PBC Research Cohort.
Materials and Methods
Study Design
We used data from PBC patients enrolled in the UK-PBC Research Cohort. The cohort has been described in detail elsewhere (in particular, see http://www.uk-pbc.com/about/aboutuk-pbc/ws1/researchcohort/and Carbone et al. 2013).(2) Briefly, PBC was defined according to the guidelines of the European Association for the Study of Liver (EASL).(1) Participants included in the current study were (1) patients with PBC incident or prevalent between January 1, 2008 and July 31, 2014 or (2) liver transplant (LT) recipients who had undergone LT for PBC at any point before July 31, 2014.
Participants were recruited throughout the UK by the UK-PBC Consortium, a research network of 155 National Health Service (NHS) Trusts or Health Boards collaborating in the UK-PBC project (http://www.uk-pbc.com/). Of note, the UK-PBC Consortium includes every hospital providing general or specialist hepatology services in Great Britain, as well as the only major liver treatment center in Northern Ireland. In collaborating centers, PBC patients were identified (1) by searching outpatient clinic records for patients registered with a diagnosis of PBC or LT for PBC and (2) by searching immunology laboratory databases for samples with a positive test for anti-mitochondrial antibody (AMA). Patients with a confirmed diagnosis of PBC or LT for PBC were invited to join the UK-PBC Research Cohort.
We retrospectively reviewed the medical records of all participants to obtain baseline clinical data and ascertain events occurring before the date of recruitment. Participants who had not suffered an event preceding the date of recruitment were prospectively followed up until July 31, 2014.
The study was conducted in accord with the guidelines of the Declaration of Helsinki and the principles of good clinical practice. All participants provided written informed consent. The study was approved by the Oxford C research ethics committee (REC reference: 07/H0606/96) and by the research and development department of each collaborating hospital.
Data Source
Data were captured using baseline and follow-up case record forms (CRFs) that were completed by suitably trained research nurses in collaborating centers. The baseline CRF captured information on the date of diagnosis and the explanatory variables listed below. Follow-up CRFs captured information on survival status and the date and cause of death (if applicable); LT status and date of LT (if applicable); contemporaneous laboratory investigations; and ongoing treatment with UDCA. The most recent follow-up CRF was sent to collaborating centers in July 2014.
For each participant, the baseline CRF was sent to the hospital where the participant first received a diagnosis of PBC, which might be different from the recruiting center. Follow-up CRFs were sent to the participant’s current treatment center, which might also be different to the recruiting center. This was possible because all centers providing general or specialist liver services in Great Britain are collaborating in the study. This ensured that follow-up was complete for all participants.
Completed CRFs underwent quality control (QC) for completeness and accuracy at the University of Cambridge. Missing or inaccurate data were systematically queried with the participant or research nurse who completed the form. Data that passed QC were uploaded into a bespoke database.
Study Entry and Outcome
We calculated the time from the diagnosis of PBC to an event. The date of diagnosis of PBC was defined as the date of the first positive test for AMA or, for seronegative patients, the date of the diagnostic liver biopsy.
Events were defined to reflect ESLD requiring LT, as follows: (1) death from a liver-related cause, meaning liver failure, variceal hemorrhage, or hepatocellular carcinoma (HCC); (2) LT for PBC; or (3) for participants who were still alive and had never undergone LT, serum bilirubin measuring ≥100 μmol/L for the first time. We considered LT for PBC to be an acceptable surrogate for liver-related death, having confirmed that >90% of PBC LT recipients in the UK have biochemical evidence of liver failure at the time of transplantation, reflected by a United Kingdom model for End-stage Liver Disease score >49 (personal communication, NHS Blood and Transplantation; Supporting Fig. 1).(12) Furthermore, we selected the threshold, bilirubin ≥100 μmol/L, because bilirubin at this level is widely accepted to be an indication for LT, as reported in the EASL guidelines on the management of cholestatic liver diseases, 2009.(1)
Participants who did not reach an event were censored at the date of their most recent blood tests or the date of non-liver-related death, if applicable.
Explanatory Variables
We considered variables for inclusion in the risk score that were clinically relevant or had been shown in at least one previous study to predict survival in PBC. These variables were as follows:
Age at diagnosis
Sex
Year of diagnosis
Blood tests at the time of diagnosis, that is, serum sodium, creatinine, bilirubin (BIL), alanine transaminase (ALT), AST, alkaline phosphatase (ALP), platelet count, prothrombin time (PT) and international normalized ratio (INR), immunoglobulin (Ig)G, IgA, IgM, antinuclear antibodies (ANA), AMA, and anti-smooth-muscle antibodies (SMA) (presence/absence)
Spleen size and the presence of ascites by ultra-sound scan at the time of diagnosis
Treatment with UDCA (yes or no)
Liver biochemistry after 12 months of treatment with UDCA (i.e., bilirubin after 12 months of UDCA [BIL12], ALT after 12 months of UDCA [ALT12], AST after 12 months of UDCA [AST12], and alkaline phosphatase after 12 months of UDCA [ALP12])
To account for interoperator variability in the measurement of laboratory investigations, research nurses were asked to provide the reference range reported for each laboratory investigation, as well as the result and date of the test. In our analysis, creatinine, BIL, ALT, AST, ALP, and immunoglobulins were treated as multiples of their respective upper reference levels. Sodium, albumin, and platelet count were treated as multiples of their respective lower reference levels.
Measurements for both AST and ALT were available for comparatively few subjects (n = 586; 14.6%) reflecting variation in biochemistry laboratory practice across the UK. Therefore, we defined a variable, transaminases (TA) that was the ALT where this was available, otherwise the AST. Likewise, measurements for both PT and INR were available for comparatively few patients (n = 897; 21.4%). Where the INR was missing, we estimated the INR to be the ratio of the PT to the mean normal PT, calculated as the mean of the upper and lower reference level in that hospital.
Treatment with UDCA was included as a dichotomous explanatory variable (i.e., any treatment or no treatment). We did not account for the baseline, weight-adjusted dose of UDCA because these data were not available. However, we identified a subgroup of participants for whom the current weight-adjusted dose of UDCA was available (n = 1,253). In this subgroup, the median dose of UDCA was 12 mg/kg/day (interquartile range [IQR]: 9-14). This is lower than the recommended dose of UDCA (13-15 mg/kg/day), albeit comparable to the median dose reported by Lammers et al.(13) in their study of 4,845 PBC patients from leading academic centers across the globe. Notably, we found that the vast majority of participants taking UDCA <13 mg/kg/day fulfilled the Paris I definition of treatment response, suggesting they were receiving an individually effective dose (Supporting Fig. 2). For this reason, we did not consider that failing to account for weight-adjusted dose of UDCA would substantially bias our analysis.
Derivation of PBC Risk Scores
For the derivation and the validation of the risk scores, we excluded participants confirmed to have another chronic liver disease in addition to PBC. We also excluded participants with PBC/autoimmune hepatitis (AIH) overlap syndrome, defined as interface hepatitis on liver histology combined with TA ≥5× upper limit of normal (ULN) or IgG ≥2× ULN, over-and-above features of PBC.(14) Finally, we excluded participants who had never received UDCA, had received <12 months of treatment with UDCA, or had discontinued UDCA prematurely for any reason other than death or LT. This left a cohort of participants with pure PBC who had received ongoing treatment with UDCA for at least 12 months. Following convention,(15,16) we randomly allocated 60% of these UDCA-treated participants to a derivation cohort and the remaining 40% to a validation cohort.
Within the derivation cohort, we undertook multiple imputation using chained equations (20 imputations) to account for missing values; as well as the predictor variables, the imputation model also included the binary event/censoring variable and Nelson-Aalen estimate of cumulative hazard.(17) We performed univariate analysis of 20 variables (listed in Table 1) using Cox’s proportional hazards regression. Variables that were statistically significant at P = 0.05 in univariate analysis were included in a multivariable Cox model. Nonautomatic backward selection was employed to identify the best-fitting model, adjusting for age and calendar year at diagnosis in each iteration of model reduction. The multivariable fractional polynomial procedure in Stata was used to identify the most appropriate functional form for each of the variables included in the best-fitting model. The coefficients were combined with the baseline survivor functions estimated from the model to derive three separate equations predicting the risk of an event occurring within 5, 10, or 15 years of baseline, respectively. Hereafter, we refer to these equations as the 5-, 10-, and 15-year risk scores.
Table 1. Characteristics of Patients at Baseline in the Derivation and Validation Cohorts.
Variables*,† | Derivation Cohort (N 5 1,916) | Validation Cohort (N 5 1,249) | Untreated Cohort (N 5 754)‡ |
---|---|---|---|
Age, years | 55.5 (48.5-62.7) | 55.2 (47.9-62.8) | 54.0 (46.0-62.1) |
Female, n (%) | 1,707 (89.1) | 1,140 (91.3) | 680 (90.2) |
LT, n (%) | 155 (8.1) | 105 (8.4) | 210 (27.8) |
ANA+, n (%) | 392 (20.5) | 250 (20.1) | 202 (26.8) |
AMA+, n (%) | 1,667 (87.0) | 1,070 (85.7) | 679 (90) |
SMA+, n (%) | 111 (5.8) | 91 (7.3) | 57 (7.5) |
Splenomegaly (>12 cm), n (%) | 198 (10.3) | 97 (7.8) | 114 (15.1) |
Ascites, n (%) | 22 (1.1) | 13 (1.0) | 20 (2.7) |
Na ratio | 1.0 (1.0-1.1) | 1.0 (1.0-1.0) | 1.0 (1.0-1.1) |
Creatinine ratio | 0.7 (0.6-0.8) | 0.7 (0.6-0.8) | 0.7 (0.6-0.8) |
BIL ratio | 0.5 (0.4-0.8) | 0.5 (0.4-0.8) | 0.5 (0.4-0.9) |
Albumin ratio | 1.2 (1.1-1.3) | 1.2 (1.1-1.3) | 1.2 (1.1-1.3) |
ALP ratio | 1.9 (1.2-3.5) | 2.1 (1.3-3.6) | 1.5 (0.9-2.9) |
TA ratio | 1.4 (0.9-2.3) | 1.4 (0.9-2.4) | 1.2 (0.7-2.1) |
Platelets ratio | 1.8 (1.5-2.2) | 1.8 (1.5-2.2) | 1.8 (1.4-2.2) |
INR | 1.0 (0.9-1.0) | 1.0 (0.9-1.0) | 1.0 (0.9-1.1) |
IgG ratio | 0.9 (0.7-1.1) | 0.9 (0.7-1.1) | 0.9 (0.7-1.1) |
BIL12 ratio | 0.5 (0.4-0.7) | 0.5 (0.4-0.7) | — |
ALP12 ratio | 1.2 (0.9-2.1) | 1.3 (0.9-2.1) | — |
TA12 ratio | 0.8 (0.6-1.3) | 0.8 (0.6-1.3) | — |
Event rate (%) | 177 (9.2) | 114 (9.1) | 201 (26.7) |
To allow for interoperator variability, the bilirubin, transaminases, alkaline phosphatase at baseline and after 12 months of UDCA, creatinine, INR, and IgG were analyzed as multiples of the upper reference level in the laboratories that measured them. The Na, albumin, and platelet count were analyzed as multiples of the lower reference level in the laboratories that measured them.
Values for all continuous variables are expressed as medians and IQRs.
This subgroup includes only participants who were not treated with UDCA and had been followed up for at least 12 months, in order to allow for a fair comparison with the other subgroups.
Abbreviations: AMA, anti-mitochondrial antibodies; ANA, anti-nuclear antibodies; ALP, alkaline phosphatase; ALP12, alkaline phosphatase after 12 months of UDCA; BIL12, bilirubin after 12 months of UDCA; IgG, immunoglobulin G; INR, international normalized ratio; LT, liver transplantation; n, number; SMA, anti-smooth muscle antibodies; TA12, transaminases after 12 months of UDCA.
Validation of the PBC Risk Score
We applied the 5-, 10-, and 15-year risk scores to participants in the validation cohort. To assess discrimination, we calculated the area under receiver operating characteristic curve (AUC) for each risk score. To assess calibration, we compared the observed versus predicted risk of an event occurring within 5, 10, or 15 years across each decile of the 5-, 10-, and 15-year risk scores, respectively. For comparison, we also assessed the discrimination of the Paris 1, Barcelona, Paris 2, and Toronto models at 5, 10, and 15 years using the AUC.
To assess the accuracy of the risk scores for measurement of risk preceding treatment, we calculated the 5-, 10-, and 15-year risk scores in a group of participants who had never been established on UDCA and had been followed-up for at least 12 months, using the baseline BIL, TA, and ALP instead of the equivalent measurements on treatment. We then calculated the respective AUCs. To assess the accuracy of the risk scores using the (ALT12) rather than transaminases after 12 months of UDCA (TA12), we calculated the 5-, 10-, and 15-year risk scores using the ALT12 for all participants in the validation cohort for whom this measurement was available. We then calculated the respective AUCs. Likewise, to assess the accuracy of the risk scores using the AST12 rather than TA12, we calculated each risk score using the AST12 for all participants in the validation cohort for whom this measurement was available, then calculated the respective AUCs.
All analyses were performed using Stata software (version 13.0; StataCorp LP, College Station, TX).
Results
Cohort Characteristics
A total of 4,099 patients with PBC were recruited to the cohort up to July 31, 2015. Of these, 77 were confirmed to have PBC-AIH overlap syndrome or another liver disease in addition to PBC; these participants were excluded from further analysis. Of those remaining, we excluded 857 participants who had never received UDCA, had received <12 months of treatment with UDCA, or had discontinued UDCA prematurely. This left 3,165 UDCA-treated participants, whom we included in the analysis.
In these UDCA-treated participants, the year of diagnosis of PBC ranged from 1974 to 2014 (Supporting Fig. 3A). The year of diagnosis in those who had undergone LT also ranged from 1974 to 2014 (Supporting Fig. 3B). The median duration of follow-up was 6.3 years (IQR, 3.2-10.7) and the total follow-up was 23,673 patient-years. During follow-up, 291 patients (9.2%) suffered an event: 260 patients (8.2%) underwent LT and 31 patients (1%) died from liver-related causes. The overall event-free survival rate was 96% at 5 years, 89% at 10 years, and 86% at 15 years, comparable to other, recent series.(7)
These UDCA-treated participants were randomly allocated to a derivation cohort consisting of 1,916 participants or validation cohort consisting of 1,249 participants. The baseline characteristics of participants in the derivation and validation cohorts are shown in Table 1; the cohorts were similar, as expected from random allocation. Consistent with other recent series,(7,18,19) approximately 10% of participants had advanced disease at diagnosis (exemplified here by splenomegaly or ascites) and approximately 20% of participants were ANA positive. Complete information about explanatory variables was available for 1,460 participants (76%) in the derivation cohort and for 959 participants (77%) in the validation cohort. Information on outcome was available for all participants. The rate of missing information for each variable is shown in Supporting Table 1.
Derivation of a PBC Risk Score
In univariate analysis, age at diagnosis, calendar year at diagnosis, Na, BIL, TA, ALP, albumin, platelets, IgG, ANA, splenomegaly, ascites, BIL12, ALP12, and TA12 were associated with outcome and were taken forward for multivariable modeling. After non-automatic backward selection, the best-fitting Cox model included five variables: albumin, platelet, BIL12, TA12, and ALP12, with a Harrell’s c statistic of 0.92 (Table 2). Each iteration of the multivariable model was adjusted for age and calendar year at diagnosis, but these variables did not significantly improve the fit and were excluded from the final model (data not shown).
Table 2. Cox Regression Analysis for Liver Event in the Derivation Cohort.
Univariate Analyses |
Multivariate Analyses |
|||||
---|---|---|---|---|---|---|
HR | 95% CI | P Value | HR | 95% CI | P Value | |
Albumin ratio | 0.007 | 0.002-0.020 | <0.001 | 0.052 | 0.013-0.211 | <0.001 |
Platelet ratio | 0.336 | 0.247-0.457 | <0.001 | 0.362 | 0.255-0.514 | <0.001 |
BIL12 ratio | 1.476 | 1.394-1.563 | <0.001 | 1.427 | 1.317-1.210 | <0.001 |
TA12 ratio | 1.225 | 1.180-1.271 | <0.001 | 1.150 | 1.093-1.210 | <0.001 |
ALP12 ratio | 1.275 | 1.216-1.337 | <0.001 | 1.103 | 1.030-1.183 | 0.005 |
Na ratio | 0.001 | 0.001-0.002 | <0.001 | — | — | — |
Creatinine ratio | 0.385 | 0.131-1.129 | 0.082 | — | — | — |
BIL ratio | 1.178 | 1.148-1.208 | <0.001 | — | — | — |
ALP ratio | 1.044 | 1.027-1.061 | <0.001 | — | — | — |
TA ratio | 1.019 | 0.999-1.039 | 0.050 | — | — | — |
INR | 1.420 | 0.839-2.401 | 0.191 | — | — | — |
IgG ratio | 2.430 | 1.676-3.523 | <0.001 | — | — | — |
Age, years | 0.970 | 0.955-0.984 | <0.001 | — | — | — |
Year of diagnosis | 0.941 | 0.919-0.964 | <0.001 | — | — | — |
Female | 0.816 | 0.443-1.503 | 0.514 | — | — | — |
ANA+ | 1.423 | 0.937-2.1 | 0.048 | — | — | — |
AMA+ | 1.090 | 0.589-2.020 | 0.782 | — | — | — |
SMA+ | 1.114 | 0.563-2.020 | 0.756 | — | — | — |
Splenomegaly | 8.453 | 5.969-11.971 | <0.001 | — | — | — |
Ascites | 11.732 | 6.283-21.905 | <0.001 | — | — | — |
Splenomegaly refers to a spleen length >12 cm.
Abbreviations: AMA, anti-mitochondrial antibodies; ANA, anti-nuclear antibodies; ALP, alkaline phosphatase; ALP12, alkaline phosphatase after 12 months of UDCA; BIL12, bilirubin after 12 months of UDCA; CI, confidence interval; HR, hazard ratio; IgG, immunoglobulin G; INR, international normalized ratio; n, number; SMA, anti-smooth muscle antibodies; TA12, transaminases after 12 months of UDCA.
Figure 1 shows the relationship between the hazard ratio for an event and each variable within the final model, with the best-fitting polynomial lines that describe this relationship. Fractional polynomial terms, baseline survivor function at 5, 10, and 15 years, and regression coefficients for the best-fitting fractional polynomial model were included in the scoring system as follows:
UK-PBC Risk Scores =
1-baseline survival function^exp(.0287854*(alp12-xuln-1.722136304)-.0422873*(((altast12xuln/10)^-1)−8.675729006)11.4199*(ln(bil12xuln/10)12.709607778)−1.960303*(albxlln-1.17673001)-.4161954*(pltxlln-1.873564875))
Note: Baseline survivor function = 0. 982 (at 5 years); 0. 941 (at 10 years); 0.893 (at 15 years).
Validation of the PBC Risk Score
A total of 1,109 participants (89%) in the validation cohort had values for BIL12, TA12, ALP12, albumin, and platelets and were included in the validation analysis. One hundred and fourteen patients (9.1%) suffered an event during the follow-up.
In the validation cohort, the AUC was 0.96 (95% confidence interval [CI]: 0.93-0.99) for the 5-year risk score, 0.95 (0.93-0.98) for the 10-year risk score, and 0.94 (0.91-0.97) for the 15-year risk score (Fig. 2). In comparison, the AUCs of previous models for events within 5, 10, or 15 years were as follows: Barcelona = 0.56, 0.61, 0.61; Paris I = 0.81, 0.81, 0.80; Toronto = 0.65, 0.70, 0.70; and Paris II = 0.75, 0.75, 0.74, respectively (Fig. 3). The predicted versus observed risk of an event across each decile of the 5-, 10-, and 15-year risk scores in shown in Fig. 4. There is close correspondence between the predicted and observed risks, suggesting that the risk scores are well calibrated.
The ALT12 was available for 944 subject in the validation cohort, of whom 53 (5.6%) suffered an event during follow-up. The risk score using the ALT12 instead of TA12 had high discrimination in this subgroup, the AUC being 0.91 (0.86-0.95), 0.93 (0.90-0.97), and 0.91 (0.85-0.97) for the 5-, 10-, and 15-year risk scores, respectively. The AST12 was available for 376 subjects in the validation cohort, of whom 42 (11.2%) suffered an event during follow-up. The risk score using the AST12 instead of TA12 had high discrimination in this subgroup, the AUC being 0.86 (0.76-0.96), 0.90 (0.85-0.96), and 0.87 (0.80-0.93) for the 5-, 10-, and 15-year risk scores.
A total of 754 participants had never been established on UDCA and had been followed up for at least 12 months. In this subgroup of untreated participants, the median follow-up was 6.65 years (IQR, 3.5-10.6); total follow-up was 5,646 patient-years, and 201 (26.7%) suffered an event. The risk scores applied to this subgroup using the baseline BIL, TA, and ALP (instead of the equivalent measurements after 12 months of treatment) had high discrimination, the AUC being 0.96 (0.94-0.98), 0.94 (0.91-0.96), and 0.91 (0.88-0.94) for the 5-, 10-, and 15-year risk scores, respectively (Fig. 5).
Discussion
We analyzed data from more than 3,000 participants in the UK-PBC Research Cohort to develop and validate a scoring system for long-term prediction of ESLD. The scoring system incorporates readily available and objective laboratory measures, that is, the baseline platelet count and serum albumin, and the serum bilirubin, transaminases, and ALP measured after 12 months of treatment with UDCA. The scoring system is proposed to facilitate management of PBC in clinical practice.
In the current study, we confirmed that existing long-term prognostic models of PBC are accurate, with AUCs up to 0.81 for the Paris I model. However, the UK-PBC scoring system was superior to existing models, with AUCs of 0.96, 0.95, and 0.94 for the 5-, 10-, and 15-year risk scores, respectively. There are several reasons for its strong performance. The derivation cohort was sizeable, with 1,916 subjects and 177 events. The underlying model incorporated not only variables that define the treatment response (ALP12, TA12, and BIL12), but also crude measures of hepatic fibrosis (platelet count) and hepatocellular synthetic function (serum albumin). Continuous variables were treated as such; variables were transformed using multiple fractional polynomials, and the contribution of each variable to the prediction model was weighted according to its prognostic value.
A major advantage of our scoring system is that it provides accurate, individualized estimates of the risk of developing ESLD within defined time points in the future. This contrasts with existing long-term prognostic models that dichotomize patients into treatment responders or nonresponders, at low or high risk of developing ESLD at an unknown point in the future (Supporting Fig. 4). In clinical practice, the scoring system should be most useful to identify patients who would obtain greatest benefit from further risk reduction using second-line therapy. This is especially pertinent in PBC, with second-line agents currently in development.(20) However, it should also be useful to identify patients at low risk of developing ESLD within a relevant time frame, who could potentially be monitored in primary care.
Although the scoring system was derived primarily to evaluate long-term risk in PBC patients on treatment, we found that the risk scores achieved AUCs >0.90 in untreated participants. The scoring system should therefore provide accurate estimates of long-term risk prior to treatment—and then provide accurate reevaluation of the long-term risk once treatment has been established. As such, the scoring system may be used to quantify risk reduction and the treatment benefit derived from first-line therapy. However, our untreated validation cohort was comparatively small and this observation should be interpreted with care. To show readers how the scoring system might be applied in clinical practice, a calculator for the 5-, 10-, and 15-year risk scores is provided in the Supporting Document. Furthermore, Supporting Textbox 1 provides three examples of the scoring system used to guide the clinical management of hypothetical patients with PBC.
We anticipate that some clinicians may call for specific risk thresholds to simplify clinical decision making. This is beyond the scope of the current study. There is no consensus in the literature on (1) how many risk groups should be created and (2) where (and why) to position the cutpoints. Developing sensible guidance for choosing risk groups remains a topic for further research.(21) Furthermore, we emphasise that risk must be contextualized. Consider a patient in whom the 15-year risk score is 20%. This level of risk would be unacceptable for a 35-year-old with no comorbidities—but it might be acceptable for a 70-year-old with another life-shortening disease. Treatment targets should therefore be determined by the cost-effectiveness of the treatment; its side-effect profile, and the extent to which the individual patient would benefit from the risk reduction.
The UK-PBC Research Cohort consists of thousands of patients recruited from general as well as specialist centers across the entire UK. For this reason, we believe that the cohort is highly representative. The scoring system should therefore be widely applicable.
However, we acknowledge certain limitations. The model includes measurements at baseline and after 12 months of treatment. We do not anticipate a substantial change in the platelet count or serum albumin after 12 months of treatment with UDCA, and for this reason, we consider all the measurements in the model to represent a single point in the course of the patient’s disease. The strong fit of the final model in treated and untreated participants supports this assumption, although we did not specifically test the assumption in the current study. We are in the process of capturing additional data that will enable us to model liver-related outcomes using sets of variables measured at different time points before and after starting treatment. These data will also enable us to develop of models incorporating repeated measurements. Participants in the UK-PBC Research Cohort may be taking a suboptimal dose of UDCA. This could potentially bias the study, if UDCA has dose dependent, beneficial effects over and above those measured by the liver biochemistry on treatment. However, survival rates in the UK-PBC Research Cohort were comparable to those of cohorts in which patients received the optimal dose of UDCA. For this reason, if there is bias related to the dose of UDCA, it is likely to be minimal. In the current data set, HCC and variceal hemorrhage have not been ascertained, except as a cause of death or indication for LT. Therefore, it is uncertain whether the scoring system accurately predicts HCC or variceal hemorrhage, per se. However, with additional data on these outcomes, we will be able to specifically address these questions. The risk scores were derived using the variable TA instead of ALT or AST. However, we have shown that they perform equally well when just the ALT is used for TA, or just the AST. The underlying model uses the platelet count as a crude measure of disease stage. This is advantageous because the platelet count is readily available. However, more-accurate and dynamic measures of liver fibrosis, such as transient elastography, may be preferable. This would be especially true if antifibrotic therapies were available, when it would be important to quantify reduction in fibrosis.
In conclusion, we developed and validated the UK-PBC risk scores to assess the prognosis of patients with PBC using readily available and objective clinical measures. The scoring system has some advantages compared with previous prognostic models. Application of the scoring system in clinical practice may guide management and improve the distribution of health care resources related to PBC. However, external validation of the scoring system in cohorts of treated and untreated patients is a prerequisite to its application in clinical practice, and the scoring system should be updated as the size and characterization of the UK-PBC Research Cohort increases with time.
Supplementary Material
Additional Supporting Information may be found at onlinelibrary.wiley.com/doi/10.1002/hep.28017/suppinfo.
Acknowledgment
The authors gratefully acknowledge the work done by members of the UK-PBC Consortium (see the Supporting Information). The authors acknowledge Ms. Lynda Smith for her major role in helping us to administer this study and many others. The authors acknowledge Ms. Elisa Allen (NHS Blood and Transplant) for providing data related to liver transplant PBC recipients in the UK. Finally (and most important), the authors thank thank all of the participants who granted us access to their medical records, enabling us to conduct this study. The UK-PBC project is a portfolio study of the NIHR Comprehensive Research Network. The views expressed are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health.
This study was funded by the Isaac Newton Trust, University of Cambridge; Addenbrooke’s Charitable Trust (ACT), Cambridge University Hospitals NHS Foundation Trust; the PBC Foundation; Intercept Pharmaceuticals, the Wellcome Trust (grant reference: 085925), and the Medical Research Council (MRC; grant reference: MR/L001489/1). M.C. is a Sheila Sherlock Fellow of the European Association for the Study of the Liver. G.F.M. was an MRC clinical research training fellow and received salary support from the Sackler Trust at the University of Cambridge; he is now a postdoctoral fellow of the National Institute for Health Research Rare Diseases (NIHT-RD) initiative. G.M.H., D.E.J., and R.N.S. receive salary support from an MRC stratified medicine award (UK-PBC, MR/L001489/1).
Abbreviations
- AIH
autoimmune hepatitis
- ALP
alkaline phosphatase
- ALP12
alkaline phosphatase after 12 months of UDCA
- AMA
anti-mitochondrial-antibody
- ANA
anti-nuclear antibodies
- ALT
alanine aminotransferase
- ALT12
ALT after 12 months of UDCA
- AST
aspartate transaminase
- AST12
AST after 12 months of UDCA
- AUC
area under receiver operating characteristic curve
- BIL
bilirubin
- BIL12
bilirubin after 12 months of UDCA
- CI
confidence interval
- CRFs
case record forms
- EASL
European Association for the Study of Liver
- ESLD
end-stage liver disease
- HCC
hepatocellular carcinoma
- IgG
immunoglobulin G
- INR
international normalized ratio
- IQR
interquartile range
- LT
liver transplantation
- NHS
National Health Service
- PBC
primary biliary cholangitis
- PT
prothrombin time
- QC
quality control
- SMA
anti-smooth-muscle antibodies
- TA
transaminases
- TA12
transaminases after 12 months of UDCA
- UCDA
ursodeoxycholic acid
- ULN
upper limit of normal
Footnotes
URLs: UK-PBC: http://www.uk-pbc.com/; Academic Department of Medical Genetics: http://medgen.medschl.cam.ac.uk/.
Potential conflict of interest: Dr. Hirschfield advises Intercept and is on the speakers’ bureau for Falk. Dr. Williamson consults for Intercept. Dr. Sandford consults for Otsuka and received grants from Intercept. Dr. Heneghan received grants from Astellas.
Author names in bold designate shared co-first authorship.
References
- 1).European Association for the Study of the Liver. EASL Clinical Practice Guidelines: management of cholestatic liver diseases. J Hepatol. 2009;51:237–267. doi: 10.1016/j.jhep.2009.04.009. [DOI] [PubMed] [Google Scholar]
- 2).Carbone M, Mells GF, Pells G, Dawwas MF, Newton JL, Heneghan MA, et al. Sex and age are determinants of the clinical phenotype of primary biliary cirrhosis and response to ursodeoxycholic acid. Gastroenterology. 2013;144:560–569.e7. doi: 10.1053/j.gastro.2012.12.005. quiz, e513-e564. [DOI] [PubMed] [Google Scholar]
- 3).Hayes DF, Markus HS, Leslie RD, Topol EJ. Personalized medicine: risk prediction, targeted therapies and mobile health technology. BMC Med. 2014;12:37. doi: 10.1186/1741-7015-12-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4).Poupon RE, Poupon R, Balkau B. Ursodiol for the long-term treatment of primary biliary cirrhosis. The UDCA-PBC Study Group. N Engl J Med. 1994;330:1342–1347. doi: 10.1056/NEJM199405123301903. [DOI] [PubMed] [Google Scholar]
- 5).Lindor KD, Gershwin ME, Poupon R, Kaplan M, Bergasa NV, Heathcote EJ, American Association for Study of Liver Diseases Primary biliary cirrhosis. Hepatology. 2009;50:291–308. doi: 10.1002/hep.22906. [DOI] [PubMed] [Google Scholar]
- 6).Pares A, Caballeria L, Rodes J. Excellent long-term survival in patients with primary biliary cirrhosis and biochemical response to ursodeoxycholic Acid. Gastroenterology. 2006;130:715–720. doi: 10.1053/j.gastro.2005.12.029. [DOI] [PubMed] [Google Scholar]
- 7).Corpechot C, Abenavoli L, Rabahi N, Chretien Y, Andreani T, Johanet C, et al. Biochemical response to ursodeoxycholic acid and long-term prognosis in primary biliary cirrhosis. Hepatology. 2008;48:871–877. doi: 10.1002/hep.22428. [DOI] [PubMed] [Google Scholar]
- 8).Kuiper EM, Hansen BE, de Vries RA, den Ouden-Muller JW, van Ditzhuijsen TJ, Haagsma EB, et al. Improved prognosis of patients with primary biliary cirrhosis that have a biochemical response to ursodeoxycholic acid. Gastroenterology. 2009;136:1281–1287. doi: 10.1053/j.gastro.2009.01.003. [DOI] [PubMed] [Google Scholar]
- 9).Kumagi T, Guindi M, Fischer SE, Arenovich T, Abdalian R, Coltescu C, et al. Baseline ductopenia and treatment response predict long-term histological progression in primary biliary cirrhosis. Am J Gastroenterol. 2010;105:2186–2194. doi: 10.1038/ajg.2010.216. [DOI] [PubMed] [Google Scholar]
- 10).Corpechot C, Chazouilleres O, Poupon R. Early primary biliary cirrhosis: biochemical response to treatment and prediction of long-term outcome. J Hepatol. 2011;55:1361–1367. doi: 10.1016/j.jhep.2011.02.031. [DOI] [PubMed] [Google Scholar]
- 11).Trivedi PJ, Bruns T, Cheung A, Li KK, Kittler C, Kumagi T, et al. Optimising risk stratification in primary biliary cirrhosis: AST/platelet ratio index predicts outcome independent of ursodeoxycholic acid response. J Hepatol. 2014;60:1249–1258. doi: 10.1016/j.jhep.2014.01.029. [DOI] [PubMed] [Google Scholar]
- 12).Barber K, Madden S, Allen J, Collett D, Neuberger J, Gimson A, et al. Elective liver transplant list mortality: development of a United Kingdom end-stage liver disease score. Transplantation. 2011;92:469–476. doi: 10.1097/TP.0b013e318225db4d. [DOI] [PubMed] [Google Scholar]
- 13).Lammers WJ, van Buuren HR, Hirschfield GM, Janssen HL, Invernizzi P, Mason AL, et al. Global PBC Study Group Levels of alkaline phosphatase and bilirubin are surrogate end points of outcomes of patients with primary biliary cirrhosis: an international follow-up study. Gastroenterology. 2014;147:1338–1349.e5. doi: 10.1053/j.gastro.2014.08.029. quiz, e1315. [DOI] [PubMed] [Google Scholar]
- 14).Chazouilleres O, Wendum D, Serfaty L, Montembault S, Rosmorduc O, Poupon R. Primary biliary cirrhosis-autoimmune hepatitis overlap syndrome: clinical features and response to therapy. Hepatology. 1998;28:296–301. doi: 10.1002/hep.510280203. [DOI] [PubMed] [Google Scholar]
- 15).Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, Brindle P. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ. 2008;336:1475–1482. doi: 10.1136/bmj.39609.449676.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16).Hippisley-Cox J, Coupland C. Derivation and validation of updated QFracture algorithm to predict risk of osteoporotic fracture in primary care in the United Kingdom: prospective open cohort study. BMJ. 2012;344:e3427. doi: 10.1136/bmj.e3427. [DOI] [PubMed] [Google Scholar]
- 17).White IR, Royston P. Imputing missing covariate values for the Cox model. Stat Med. 2009;28:1982–1998. doi: 10.1002/sim.3618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18).Invernizzi P, Lleo A, Podda M. Interpreting serological tests in diagnosing autoimmune liver diseases. Semin Liver Dis. 2007;27:161–172. doi: 10.1055/s-2007-979469. [DOI] [PubMed] [Google Scholar]
- 19).Vergani D, Alvarez F, Bianchi FB, Cancado EL, Mackay IR, Manns MP, et al. Liver autoimmune serology: a consensus statement from the committee for autoimmune serology of the International Autoimmune Hepatitis Group. J Hepatol. 2004;41:677–683. doi: 10.1016/j.jhep.2004.08.002. [DOI] [PubMed] [Google Scholar]
- 20).Hirschfield GM, Mason A, Luketic V, Lindor K, Gordon SC, Mayo M, et al. Efficacy of obeticholic acid in patients with primary biliary cirrhosis and inadequate response to ursodeoxycholic acid. Gastroenterology. 2015;148:751–761.e8. doi: 10.1053/j.gastro.2014.12.005. [DOI] [PubMed] [Google Scholar]
- 21).Royston P, Altman DG. External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol. 2013;13:33. doi: 10.1186/1471-2288-13-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.