Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 27.
Published in final edited form as: Hepatology. 2015 Oct 20;63(3):930–950. doi: 10.1002/hep.28017

The UK-PBC Risk Scores: Derivation and Validation of a Scoring System for Long-Term Prediction of End-Stage Liver Disease in Primary Biliary Cholangitis

Marco Carbone 1,2, Stephen J Sharp 3, Steve Flack 1, Dimitrios Paximadas 1, Kelly Spiess 1, Carolyn Adgey 4, Laura Griffiths 5, Reyna Lim 6, Paul Trembling 7, Kate Williamson 8, Nick J Wareham 3, Mark Aldersley 9, Andrew Bathgate 10, Andrew K Burroughs 11, Michael A Heneghan 12, James M Neuberger 6, Douglas Thorburn 11, Gideon M Hirschfield 13, Heather J Cordell 14, Graeme J Alexander 2, David EJ Jones 5, Richard N Sandford 1, George F Mells, the members of the UK-PBC Consortium1,2,*
PMCID: PMC6984963  EMSID: EMS85391  PMID: 26223498

Abstract

The biochemical response to ursodeoxycholic acid (UDCA)—so-called “treatment response”—strongly predicts long-term outcome in primary biliary cholangitis (PBC). Several long-term prognostic models based solely on the treatment response have been developed that are widely used to risk stratify PBC patients and guide their management. However, they do not take other prognostic variables into account, such as the stage of the liver disease. We sought to improve existing long-term prognostic models of PBC using data from the UK-PBC Research Cohort. We performed Cox’s proportional hazards regression analysis of diverse explanatory variables in a derivation cohort of 1,916 UDCA-treated participants. We used nonautomatic backward selection to derive the best-fitting Cox model, from which we derived a multivariable fractional polynomial model. We combined linear predictors and baseline survivor functions in equations to score the risk of a liver transplant or liver-related death occurring within 5, 10, or 15 years. We validated these risk scores in an independent cohort of 1,249 UDCA-treated participants. The best-fitting model consisted of the baseline albumin and platelet count, as well as the bilirubin, transaminases, and alkaline phosphatase, after 12 months of UDCA. In the validation cohort, the 5-, 10-, and 15-year risk scores were highly accurate (areas under the curve: >0.90).

Conclusions

The prognosis of PBC patients can be accurately evaluated using the UK-PBC risk scores. They may be used to identify high-risk patients for closer monitoring and second-line therapies, as well as low-risk patients who could potentially be followed up in primary care. (HEPATOLOGY 2016;63:930-950)


Primary biliary cholangitis (PBC) is a chronic liver disease in which autoimmune destruction of the intrahepatic bile ducts results in cholestasis and progressive fibrosis.(1) Biliary injury may eventually lead to cirrhosis and liver failure—but the rate of disease progression is variable.(2) Across the spectrum, some patients with PBC progress to end-stage liver disease (ESLD) within a few years of diagnosis; some develop cirrhosis that remains well compensated; others (perhaps the majority) do not even develop cirrhosis. In PBC, as in other conditions, accurate prognostication enables management of the disease to be tailored to the patient. This is the basis of precision medicine—and it has clear benefits: patients at higher risk of adverse outcomes may be prioritized for closer monitoring and second-line therapy; those at low risk may be reassured and followed up less frequently, even in primary care. This enables better distribution of health care resources, reducing costs and improving delivery.(3)

The only licensed pharmacotherapy for PBC is ursodeoxycholic acid (UDCA). Treatment with UDCA has been shown to improve survival in PBC, and for this reason, it is recommended that all patients with PBC take UDCA at a dose of 13-15 mg/kg/day.(1,4,5) In 2006, it was shown that the biochemical response to treatment with UDCA—so-called “treatment response”—strongly predicts long-term outcome in PBC.(6) This was a major advance that prompted the development of several prognostic models based solely on treatment response, including the Barcelona, Paris I, Rotterdam, Toronto, and Paris II criteria.(610) These models are highly accurate—and used increasingly to risk stratify PBC patients and guide their management.(2) However, it was shown more recently that the aspartate transaminase (AST) to platelet ratio index (APRI) also predicts outcomes in PBC, independent of UDCA response.(11) This suggests that existing prognostic models of PBC might be improved by taking other variables into account.

In the current study, we aimed to incorporate measures of treatment response with other prognostic variables in a new, long-term prognostic model of PBC that could be used to estimate the absolute risk of developing ESLD within specific time points in the future. To do so, we analyzed data from a derivation cohort consisting of 1,916 UDCA-treated participants, selected at random from the UK-PBC Research Cohort. We derived a scoring system based on treatment response and markers of disease stage. We then validated the scoring system in an independent, validation cohort consisting of 1,249 UDCA-treated participants, also selected at random from the UK-PBC Research Cohort.

Materials and Methods

Study Design

We used data from PBC patients enrolled in the UK-PBC Research Cohort. The cohort has been described in detail elsewhere (in particular, see http://www.uk-pbc.com/about/aboutuk-pbc/ws1/researchcohort/and Carbone et al. 2013).(2) Briefly, PBC was defined according to the guidelines of the European Association for the Study of Liver (EASL).(1) Participants included in the current study were (1) patients with PBC incident or prevalent between January 1, 2008 and July 31, 2014 or (2) liver transplant (LT) recipients who had undergone LT for PBC at any point before July 31, 2014.

Participants were recruited throughout the UK by the UK-PBC Consortium, a research network of 155 National Health Service (NHS) Trusts or Health Boards collaborating in the UK-PBC project (http://www.uk-pbc.com/). Of note, the UK-PBC Consortium includes every hospital providing general or specialist hepatology services in Great Britain, as well as the only major liver treatment center in Northern Ireland. In collaborating centers, PBC patients were identified (1) by searching outpatient clinic records for patients registered with a diagnosis of PBC or LT for PBC and (2) by searching immunology laboratory databases for samples with a positive test for anti-mitochondrial antibody (AMA). Patients with a confirmed diagnosis of PBC or LT for PBC were invited to join the UK-PBC Research Cohort.

We retrospectively reviewed the medical records of all participants to obtain baseline clinical data and ascertain events occurring before the date of recruitment. Participants who had not suffered an event preceding the date of recruitment were prospectively followed up until July 31, 2014.

The study was conducted in accord with the guidelines of the Declaration of Helsinki and the principles of good clinical practice. All participants provided written informed consent. The study was approved by the Oxford C research ethics committee (REC reference: 07/H0606/96) and by the research and development department of each collaborating hospital.

Data Source

Data were captured using baseline and follow-up case record forms (CRFs) that were completed by suitably trained research nurses in collaborating centers. The baseline CRF captured information on the date of diagnosis and the explanatory variables listed below. Follow-up CRFs captured information on survival status and the date and cause of death (if applicable); LT status and date of LT (if applicable); contemporaneous laboratory investigations; and ongoing treatment with UDCA. The most recent follow-up CRF was sent to collaborating centers in July 2014.

For each participant, the baseline CRF was sent to the hospital where the participant first received a diagnosis of PBC, which might be different from the recruiting center. Follow-up CRFs were sent to the participant’s current treatment center, which might also be different to the recruiting center. This was possible because all centers providing general or specialist liver services in Great Britain are collaborating in the study. This ensured that follow-up was complete for all participants.

Completed CRFs underwent quality control (QC) for completeness and accuracy at the University of Cambridge. Missing or inaccurate data were systematically queried with the participant or research nurse who completed the form. Data that passed QC were uploaded into a bespoke database.

Study Entry and Outcome

We calculated the time from the diagnosis of PBC to an event. The date of diagnosis of PBC was defined as the date of the first positive test for AMA or, for seronegative patients, the date of the diagnostic liver biopsy.

Events were defined to reflect ESLD requiring LT, as follows: (1) death from a liver-related cause, meaning liver failure, variceal hemorrhage, or hepatocellular carcinoma (HCC); (2) LT for PBC; or (3) for participants who were still alive and had never undergone LT, serum bilirubin measuring ≥100 μmol/L for the first time. We considered LT for PBC to be an acceptable surrogate for liver-related death, having confirmed that >90% of PBC LT recipients in the UK have biochemical evidence of liver failure at the time of transplantation, reflected by a United Kingdom model for End-stage Liver Disease score >49 (personal communication, NHS Blood and Transplantation; Supporting Fig. 1).(12) Furthermore, we selected the threshold, bilirubin ≥100 μmol/L, because bilirubin at this level is widely accepted to be an indication for LT, as reported in the EASL guidelines on the management of cholestatic liver diseases, 2009.(1)

Participants who did not reach an event were censored at the date of their most recent blood tests or the date of non-liver-related death, if applicable.

Explanatory Variables

We considered variables for inclusion in the risk score that were clinically relevant or had been shown in at least one previous study to predict survival in PBC. These variables were as follows:

  • Age at diagnosis

  • Sex

  • Year of diagnosis

  • Blood tests at the time of diagnosis, that is, serum sodium, creatinine, bilirubin (BIL), alanine transaminase (ALT), AST, alkaline phosphatase (ALP), platelet count, prothrombin time (PT) and international normalized ratio (INR), immunoglobulin (Ig)G, IgA, IgM, antinuclear antibodies (ANA), AMA, and anti-smooth-muscle antibodies (SMA) (presence/absence)

  • Spleen size and the presence of ascites by ultra-sound scan at the time of diagnosis

  • Treatment with UDCA (yes or no)

  • Liver biochemistry after 12 months of treatment with UDCA (i.e., bilirubin after 12 months of UDCA [BIL12], ALT after 12 months of UDCA [ALT12], AST after 12 months of UDCA [AST12], and alkaline phosphatase after 12 months of UDCA [ALP12])

To account for interoperator variability in the measurement of laboratory investigations, research nurses were asked to provide the reference range reported for each laboratory investigation, as well as the result and date of the test. In our analysis, creatinine, BIL, ALT, AST, ALP, and immunoglobulins were treated as multiples of their respective upper reference levels. Sodium, albumin, and platelet count were treated as multiples of their respective lower reference levels.

Measurements for both AST and ALT were available for comparatively few subjects (n = 586; 14.6%) reflecting variation in biochemistry laboratory practice across the UK. Therefore, we defined a variable, transaminases (TA) that was the ALT where this was available, otherwise the AST. Likewise, measurements for both PT and INR were available for comparatively few patients (n = 897; 21.4%). Where the INR was missing, we estimated the INR to be the ratio of the PT to the mean normal PT, calculated as the mean of the upper and lower reference level in that hospital.

Treatment with UDCA was included as a dichotomous explanatory variable (i.e., any treatment or no treatment). We did not account for the baseline, weight-adjusted dose of UDCA because these data were not available. However, we identified a subgroup of participants for whom the current weight-adjusted dose of UDCA was available (n = 1,253). In this subgroup, the median dose of UDCA was 12 mg/kg/day (interquartile range [IQR]: 9-14). This is lower than the recommended dose of UDCA (13-15 mg/kg/day), albeit comparable to the median dose reported by Lammers et al.(13) in their study of 4,845 PBC patients from leading academic centers across the globe. Notably, we found that the vast majority of participants taking UDCA <13 mg/kg/day fulfilled the Paris I definition of treatment response, suggesting they were receiving an individually effective dose (Supporting Fig. 2). For this reason, we did not consider that failing to account for weight-adjusted dose of UDCA would substantially bias our analysis.

Derivation of PBC Risk Scores

For the derivation and the validation of the risk scores, we excluded participants confirmed to have another chronic liver disease in addition to PBC. We also excluded participants with PBC/autoimmune hepatitis (AIH) overlap syndrome, defined as interface hepatitis on liver histology combined with TA ≥5× upper limit of normal (ULN) or IgG ≥2× ULN, over-and-above features of PBC.(14) Finally, we excluded participants who had never received UDCA, had received <12 months of treatment with UDCA, or had discontinued UDCA prematurely for any reason other than death or LT. This left a cohort of participants with pure PBC who had received ongoing treatment with UDCA for at least 12 months. Following convention,(15,16) we randomly allocated 60% of these UDCA-treated participants to a derivation cohort and the remaining 40% to a validation cohort.

Within the derivation cohort, we undertook multiple imputation using chained equations (20 imputations) to account for missing values; as well as the predictor variables, the imputation model also included the binary event/censoring variable and Nelson-Aalen estimate of cumulative hazard.(17) We performed univariate analysis of 20 variables (listed in Table 1) using Cox’s proportional hazards regression. Variables that were statistically significant at P = 0.05 in univariate analysis were included in a multivariable Cox model. Nonautomatic backward selection was employed to identify the best-fitting model, adjusting for age and calendar year at diagnosis in each iteration of model reduction. The multivariable fractional polynomial procedure in Stata was used to identify the most appropriate functional form for each of the variables included in the best-fitting model. The coefficients were combined with the baseline survivor functions estimated from the model to derive three separate equations predicting the risk of an event occurring within 5, 10, or 15 years of baseline, respectively. Hereafter, we refer to these equations as the 5-, 10-, and 15-year risk scores.

Table 1. Characteristics of Patients at Baseline in the Derivation and Validation Cohorts.

Variables*, Derivation Cohort (N 5 1,916) Validation Cohort (N 5 1,249) Untreated Cohort (N 5 754)

Age, years               55.5 (48.5-62.7)                55.2 (47.9-62.8)             54.0 (46.0-62.1)
Female, n (%)             1,707 (89.1)              1,140 (91.3)              680 (90.2)
LT, n (%)                155 (8.1)                 105 (8.4)              210 (27.8)
ANA+, n (%)                392 (20.5)                 250 (20.1)              202 (26.8)
AMA+, n (%)             1,667 (87.0)              1,070 (85.7)              679 (90)
SMA+, n (%)                111 (5.8)                   91 (7.3)                57 (7.5)
Splenomegaly (>12 cm), n (%)                198 (10.3)                   97 (7.8)              114 (15.1)
Ascites, n (%)                  22 (1.1)                   13 (1.0)                20 (2.7)
Na ratio                 1.0 (1.0-1.1)                  1.0 (1.0-1.0)               1.0 (1.0-1.1)
Creatinine ratio                 0.7 (0.6-0.8)                  0.7 (0.6-0.8)               0.7 (0.6-0.8)
BIL ratio                 0.5 (0.4-0.8)                  0.5 (0.4-0.8)               0.5 (0.4-0.9)
Albumin ratio                 1.2 (1.1-1.3)                  1.2 (1.1-1.3)               1.2 (1.1-1.3)
ALP ratio                 1.9 (1.2-3.5)                  2.1 (1.3-3.6)               1.5 (0.9-2.9)
TA ratio                 1.4 (0.9-2.3)                  1.4 (0.9-2.4)               1.2 (0.7-2.1)
Platelets ratio                 1.8 (1.5-2.2)                  1.8 (1.5-2.2)               1.8 (1.4-2.2)
INR                 1.0 (0.9-1.0)                  1.0 (0.9-1.0)               1.0 (0.9-1.1)
IgG ratio                 0.9 (0.7-1.1)                  0.9 (0.7-1.1)               0.9 (0.7-1.1)
BIL12 ratio                 0.5 (0.4-0.7)                  0.5 (0.4-0.7)                 —
ALP12 ratio                 1.2 (0.9-2.1)                  1.3 (0.9-2.1)                 —
TA12 ratio                 0.8 (0.6-1.3)                  0.8 (0.6-1.3)                 —
Event rate (%)                177 (9.2)                 114 (9.1)              201 (26.7)
*

To allow for interoperator variability, the bilirubin, transaminases, alkaline phosphatase at baseline and after 12 months of UDCA, creatinine, INR, and IgG were analyzed as multiples of the upper reference level in the laboratories that measured them. The Na, albumin, and platelet count were analyzed as multiples of the lower reference level in the laboratories that measured them.

Values for all continuous variables are expressed as medians and IQRs.

This subgroup includes only participants who were not treated with UDCA and had been followed up for at least 12 months, in order to allow for a fair comparison with the other subgroups.

Abbreviations: AMA, anti-mitochondrial antibodies; ANA, anti-nuclear antibodies; ALP, alkaline phosphatase; ALP12, alkaline phosphatase after 12 months of UDCA; BIL12, bilirubin after 12 months of UDCA; IgG, immunoglobulin G; INR, international normalized ratio; LT, liver transplantation; n, number; SMA, anti-smooth muscle antibodies; TA12, transaminases after 12 months of UDCA.

Validation of the PBC Risk Score

We applied the 5-, 10-, and 15-year risk scores to participants in the validation cohort. To assess discrimination, we calculated the area under receiver operating characteristic curve (AUC) for each risk score. To assess calibration, we compared the observed versus predicted risk of an event occurring within 5, 10, or 15 years across each decile of the 5-, 10-, and 15-year risk scores, respectively. For comparison, we also assessed the discrimination of the Paris 1, Barcelona, Paris 2, and Toronto models at 5, 10, and 15 years using the AUC.

To assess the accuracy of the risk scores for measurement of risk preceding treatment, we calculated the 5-, 10-, and 15-year risk scores in a group of participants who had never been established on UDCA and had been followed-up for at least 12 months, using the baseline BIL, TA, and ALP instead of the equivalent measurements on treatment. We then calculated the respective AUCs. To assess the accuracy of the risk scores using the (ALT12) rather than transaminases after 12 months of UDCA (TA12), we calculated the 5-, 10-, and 15-year risk scores using the ALT12 for all participants in the validation cohort for whom this measurement was available. We then calculated the respective AUCs. Likewise, to assess the accuracy of the risk scores using the AST12 rather than TA12, we calculated each risk score using the AST12 for all participants in the validation cohort for whom this measurement was available, then calculated the respective AUCs.

All analyses were performed using Stata software (version 13.0; StataCorp LP, College Station, TX).

Results

Cohort Characteristics

A total of 4,099 patients with PBC were recruited to the cohort up to July 31, 2015. Of these, 77 were confirmed to have PBC-AIH overlap syndrome or another liver disease in addition to PBC; these participants were excluded from further analysis. Of those remaining, we excluded 857 participants who had never received UDCA, had received <12 months of treatment with UDCA, or had discontinued UDCA prematurely. This left 3,165 UDCA-treated participants, whom we included in the analysis.

In these UDCA-treated participants, the year of diagnosis of PBC ranged from 1974 to 2014 (Supporting Fig. 3A). The year of diagnosis in those who had undergone LT also ranged from 1974 to 2014 (Supporting Fig. 3B). The median duration of follow-up was 6.3 years (IQR, 3.2-10.7) and the total follow-up was 23,673 patient-years. During follow-up, 291 patients (9.2%) suffered an event: 260 patients (8.2%) underwent LT and 31 patients (1%) died from liver-related causes. The overall event-free survival rate was 96% at 5 years, 89% at 10 years, and 86% at 15 years, comparable to other, recent series.(7)

These UDCA-treated participants were randomly allocated to a derivation cohort consisting of 1,916 participants or validation cohort consisting of 1,249 participants. The baseline characteristics of participants in the derivation and validation cohorts are shown in Table 1; the cohorts were similar, as expected from random allocation. Consistent with other recent series,(7,18,19) approximately 10% of participants had advanced disease at diagnosis (exemplified here by splenomegaly or ascites) and approximately 20% of participants were ANA positive. Complete information about explanatory variables was available for 1,460 participants (76%) in the derivation cohort and for 959 participants (77%) in the validation cohort. Information on outcome was available for all participants. The rate of missing information for each variable is shown in Supporting Table 1.

Derivation of a PBC Risk Score

In univariate analysis, age at diagnosis, calendar year at diagnosis, Na, BIL, TA, ALP, albumin, platelets, IgG, ANA, splenomegaly, ascites, BIL12, ALP12, and TA12 were associated with outcome and were taken forward for multivariable modeling. After non-automatic backward selection, the best-fitting Cox model included five variables: albumin, platelet, BIL12, TA12, and ALP12, with a Harrell’s c statistic of 0.92 (Table 2). Each iteration of the multivariable model was adjusted for age and calendar year at diagnosis, but these variables did not significantly improve the fit and were excluded from the final model (data not shown).

Table 2. Cox Regression Analysis for Liver Event in the Derivation Cohort.

Univariate Analyses
Multivariate Analyses
HR 95% CI P Value HR 95% CI P Value

Albumin ratio   0.007 0.002-0.020 <0.001 0.052 0.013-0.211 <0.001
Platelet ratio   0.336 0.247-0.457 <0.001 0.362 0.255-0.514 <0.001
BIL12 ratio   1.476 1.394-1.563 <0.001 1.427 1.317-1.210 <0.001
TA12 ratio   1.225 1.180-1.271 <0.001 1.150 1.093-1.210 <0.001
ALP12 ratio   1.275 1.216-1.337 <0.001 1.103 1.030-1.183   0.005
Na ratio   0.001 0.001-0.002 <0.001
Creatinine ratio   0.385 0.131-1.129   0.082
BIL ratio   1.178 1.148-1.208 <0.001
ALP ratio   1.044 1.027-1.061 <0.001
TA ratio   1.019 0.999-1.039   0.050
INR   1.420 0.839-2.401   0.191
IgG ratio   2.430 1.676-3.523 <0.001
Age, years   0.970 0.955-0.984 <0.001
Year of diagnosis   0.941 0.919-0.964 <0.001
Female   0.816 0.443-1.503   0.514
ANA+   1.423 0.937-2.1   0.048
AMA+   1.090 0.589-2.020   0.782
SMA+   1.114 0.563-2.020   0.756
Splenomegaly   8.453 5.969-11.971 <0.001
Ascites 11.732 6.283-21.905 <0.001

Splenomegaly refers to a spleen length >12 cm.

Abbreviations: AMA, anti-mitochondrial antibodies; ANA, anti-nuclear antibodies; ALP, alkaline phosphatase; ALP12, alkaline phosphatase after 12 months of UDCA; BIL12, bilirubin after 12 months of UDCA; CI, confidence interval; HR, hazard ratio; IgG, immunoglobulin G; INR, international normalized ratio; n, number; SMA, anti-smooth muscle antibodies; TA12, transaminases after 12 months of UDCA.

Figure 1 shows the relationship between the hazard ratio for an event and each variable within the final model, with the best-fitting polynomial lines that describe this relationship. Fractional polynomial terms, baseline survivor function at 5, 10, and 15 years, and regression coefficients for the best-fitting fractional polynomial model were included in the scoring system as follows:

Fig. 1.

Fig. 1

Relationship between the hazard ratio for a liver event (liver death or liver transplantation) and each variable within the final model together with the best-fitting polynomial line in the UK-PBC Research Cohort (A. ALP12; B. BIL12; C. TAl2; D. Albumin; E. Platelets). To allow for inter-operator variability, BIL12, TAl2 and ALP12 were analysed as multiples of their upper reference levels whereas Albumin and Platelets were analysed as multiples of their lower reference levels. The best-fitting polynomial line to describe the relationship between risk and each variable is shown. Note in particular that for each variable, risk increases or decreases as a continuum and there are no points at which the trajectory of risk suddenly changes, which suggests they are best modelled as continuous variables. Note also that some variables do not have a linear relationship with risk and for this reason, they were transformed using multivariable fractional polynomials (see text). Abbreviations: ALP12, alkaline phosphatase after 12 months of UDCA; BIL12, bilirubin after 12 months of UDCA; LLN, lower limit of normal; ULN, upper limit of normal; TA12, transaminases after 12 months of UDCA; UDCA, ursodeoxycholic acid.

UK-PBC Risk Scores =

1-baseline survival function^exp(.0287854*(alp12-xuln-1.722136304)-.0422873*(((altast12xuln/10)^-1)−8.675729006)11.4199*(ln(bil12xuln/10)12.709607778)−1.960303*(albxlln-1.17673001)-.4161954*(pltxlln-1.873564875))

Note: Baseline survivor function = 0. 982 (at 5 years); 0. 941 (at 10 years); 0.893 (at 15 years).

Validation of the PBC Risk Score

A total of 1,109 participants (89%) in the validation cohort had values for BIL12, TA12, ALP12, albumin, and platelets and were included in the validation analysis. One hundred and fourteen patients (9.1%) suffered an event during the follow-up.

In the validation cohort, the AUC was 0.96 (95% confidence interval [CI]: 0.93-0.99) for the 5-year risk score, 0.95 (0.93-0.98) for the 10-year risk score, and 0.94 (0.91-0.97) for the 15-year risk score (Fig. 2). In comparison, the AUCs of previous models for events within 5, 10, or 15 years were as follows: Barcelona = 0.56, 0.61, 0.61; Paris I = 0.81, 0.81, 0.80; Toronto = 0.65, 0.70, 0.70; and Paris II = 0.75, 0.75, 0.74, respectively (Fig. 3). The predicted versus observed risk of an event across each decile of the 5-, 10-, and 15-year risk scores in shown in Fig. 4. There is close correspondence between the predicted and observed risks, suggesting that the risk scores are well calibrated.

Fig. 2.

Fig. 2

Receiver operating characteristic (ROC) curves for the prediction of death or liver transplantation (LT) according to the UK-PBC risk scores at 5 (A), 10 (B) and 15 years (C). Note: Area under the ROC curve (AUROC) can be interpreted as a summary index of classification performance. An AUC value of 0.5 indicates a ’random call,’ whereas an AUC value of 1.0 indicates perfect separation of events and non events.

Fig. 3.

Fig. 3

Receiver operating characteristic (ROC) curves for the prediction of death or liver transplantation (LT) using the Barcelona, Paris 1, Toronto and Paris II definitions of treatment response, in the UK-PBC cohort at 5 (A), 10 (B) and 15 years (C). Each plot in Figure 3 shows only one point because the Barcelona, Paris I, Toronto and Paris II definitions of treatment response are dichotomous, having a pre-defined threshold and only two levels: responder and non responder. The ROC curve is therefore plotted using the single, pre defined threshold. In contrast, the UK-PBC Risk Scores are continuous. The ROC curve is therefore plotted by incrementally varying the threshold and measuring the sensitivity and specificity at each threshold. Comparison to the Rotterdam criteria was not possible because serum albumin after 12 months of treatment was not available for all participants in the cohort.

Fig. 4.

Fig. 4

Predicted versus observed risk of an event across each decile of the 5 (A), 10 (B) and 15 year (C) UK-PBC risk scores. There is close correspondence between the predicted and observed risks, suggesting that the risk scores are well calibrated.

The ALT12 was available for 944 subject in the validation cohort, of whom 53 (5.6%) suffered an event during follow-up. The risk score using the ALT12 instead of TA12 had high discrimination in this subgroup, the AUC being 0.91 (0.86-0.95), 0.93 (0.90-0.97), and 0.91 (0.85-0.97) for the 5-, 10-, and 15-year risk scores, respectively. The AST12 was available for 376 subjects in the validation cohort, of whom 42 (11.2%) suffered an event during follow-up. The risk score using the AST12 instead of TA12 had high discrimination in this subgroup, the AUC being 0.86 (0.76-0.96), 0.90 (0.85-0.96), and 0.87 (0.80-0.93) for the 5-, 10-, and 15-year risk scores.

A total of 754 participants had never been established on UDCA and had been followed up for at least 12 months. In this subgroup of untreated participants, the median follow-up was 6.65 years (IQR, 3.5-10.6); total follow-up was 5,646 patient-years, and 201 (26.7%) suffered an event. The risk scores applied to this subgroup using the baseline BIL, TA, and ALP (instead of the equivalent measurements after 12 months of treatment) had high discrimination, the AUC being 0.96 (0.94-0.98), 0.94 (0.91-0.96), and 0.91 (0.88-0.94) for the 5-, 10-, and 15-year risk scores, respectively (Fig. 5).

Fig. 5.

Fig. 5

Receiver operator characteristic curves for the 5 (A), 10 (B) and 15 years (C) UK-PBC risk scores in the subgroup of untreated participants (n = 754)* of the UK-PBC cohort* This subgroup includes only participants who were not treated with UDCA and had been followed up for at least 12 months.

Discussion

We analyzed data from more than 3,000 participants in the UK-PBC Research Cohort to develop and validate a scoring system for long-term prediction of ESLD. The scoring system incorporates readily available and objective laboratory measures, that is, the baseline platelet count and serum albumin, and the serum bilirubin, transaminases, and ALP measured after 12 months of treatment with UDCA. The scoring system is proposed to facilitate management of PBC in clinical practice.

In the current study, we confirmed that existing long-term prognostic models of PBC are accurate, with AUCs up to 0.81 for the Paris I model. However, the UK-PBC scoring system was superior to existing models, with AUCs of 0.96, 0.95, and 0.94 for the 5-, 10-, and 15-year risk scores, respectively. There are several reasons for its strong performance. The derivation cohort was sizeable, with 1,916 subjects and 177 events. The underlying model incorporated not only variables that define the treatment response (ALP12, TA12, and BIL12), but also crude measures of hepatic fibrosis (platelet count) and hepatocellular synthetic function (serum albumin). Continuous variables were treated as such; variables were transformed using multiple fractional polynomials, and the contribution of each variable to the prediction model was weighted according to its prognostic value.

A major advantage of our scoring system is that it provides accurate, individualized estimates of the risk of developing ESLD within defined time points in the future. This contrasts with existing long-term prognostic models that dichotomize patients into treatment responders or nonresponders, at low or high risk of developing ESLD at an unknown point in the future (Supporting Fig. 4). In clinical practice, the scoring system should be most useful to identify patients who would obtain greatest benefit from further risk reduction using second-line therapy. This is especially pertinent in PBC, with second-line agents currently in development.(20) However, it should also be useful to identify patients at low risk of developing ESLD within a relevant time frame, who could potentially be monitored in primary care.

Although the scoring system was derived primarily to evaluate long-term risk in PBC patients on treatment, we found that the risk scores achieved AUCs >0.90 in untreated participants. The scoring system should therefore provide accurate estimates of long-term risk prior to treatment—and then provide accurate reevaluation of the long-term risk once treatment has been established. As such, the scoring system may be used to quantify risk reduction and the treatment benefit derived from first-line therapy. However, our untreated validation cohort was comparatively small and this observation should be interpreted with care. To show readers how the scoring system might be applied in clinical practice, a calculator for the 5-, 10-, and 15-year risk scores is provided in the Supporting Document. Furthermore, Supporting Textbox 1 provides three examples of the scoring system used to guide the clinical management of hypothetical patients with PBC.

We anticipate that some clinicians may call for specific risk thresholds to simplify clinical decision making. This is beyond the scope of the current study. There is no consensus in the literature on (1) how many risk groups should be created and (2) where (and why) to position the cutpoints. Developing sensible guidance for choosing risk groups remains a topic for further research.(21) Furthermore, we emphasise that risk must be contextualized. Consider a patient in whom the 15-year risk score is 20%. This level of risk would be unacceptable for a 35-year-old with no comorbidities—but it might be acceptable for a 70-year-old with another life-shortening disease. Treatment targets should therefore be determined by the cost-effectiveness of the treatment; its side-effect profile, and the extent to which the individual patient would benefit from the risk reduction.

The UK-PBC Research Cohort consists of thousands of patients recruited from general as well as specialist centers across the entire UK. For this reason, we believe that the cohort is highly representative. The scoring system should therefore be widely applicable.

However, we acknowledge certain limitations. The model includes measurements at baseline and after 12 months of treatment. We do not anticipate a substantial change in the platelet count or serum albumin after 12 months of treatment with UDCA, and for this reason, we consider all the measurements in the model to represent a single point in the course of the patient’s disease. The strong fit of the final model in treated and untreated participants supports this assumption, although we did not specifically test the assumption in the current study. We are in the process of capturing additional data that will enable us to model liver-related outcomes using sets of variables measured at different time points before and after starting treatment. These data will also enable us to develop of models incorporating repeated measurements. Participants in the UK-PBC Research Cohort may be taking a suboptimal dose of UDCA. This could potentially bias the study, if UDCA has dose dependent, beneficial effects over and above those measured by the liver biochemistry on treatment. However, survival rates in the UK-PBC Research Cohort were comparable to those of cohorts in which patients received the optimal dose of UDCA. For this reason, if there is bias related to the dose of UDCA, it is likely to be minimal. In the current data set, HCC and variceal hemorrhage have not been ascertained, except as a cause of death or indication for LT. Therefore, it is uncertain whether the scoring system accurately predicts HCC or variceal hemorrhage, per se. However, with additional data on these outcomes, we will be able to specifically address these questions. The risk scores were derived using the variable TA instead of ALT or AST. However, we have shown that they perform equally well when just the ALT is used for TA, or just the AST. The underlying model uses the platelet count as a crude measure of disease stage. This is advantageous because the platelet count is readily available. However, more-accurate and dynamic measures of liver fibrosis, such as transient elastography, may be preferable. This would be especially true if antifibrotic therapies were available, when it would be important to quantify reduction in fibrosis.

In conclusion, we developed and validated the UK-PBC risk scores to assess the prognosis of patients with PBC using readily available and objective clinical measures. The scoring system has some advantages compared with previous prognostic models. Application of the scoring system in clinical practice may guide management and improve the distribution of health care resources related to PBC. However, external validation of the scoring system in cohorts of treated and untreated patients is a prerequisite to its application in clinical practice, and the scoring system should be updated as the size and characterization of the UK-PBC Research Cohort increases with time.

Supplementary Material

Additional Supporting Information may be found at onlinelibrary.wiley.com/doi/10.1002/hep.28017/suppinfo.

Supplementary File

Acknowledgment

The authors gratefully acknowledge the work done by members of the UK-PBC Consortium (see the Supporting Information). The authors acknowledge Ms. Lynda Smith for her major role in helping us to administer this study and many others. The authors acknowledge Ms. Elisa Allen (NHS Blood and Transplant) for providing data related to liver transplant PBC recipients in the UK. Finally (and most important), the authors thank thank all of the participants who granted us access to their medical records, enabling us to conduct this study. The UK-PBC project is a portfolio study of the NIHR Comprehensive Research Network. The views expressed are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health.

This study was funded by the Isaac Newton Trust, University of Cambridge; Addenbrooke’s Charitable Trust (ACT), Cambridge University Hospitals NHS Foundation Trust; the PBC Foundation; Intercept Pharmaceuticals, the Wellcome Trust (grant reference: 085925), and the Medical Research Council (MRC; grant reference: MR/L001489/1). M.C. is a Sheila Sherlock Fellow of the European Association for the Study of the Liver. G.F.M. was an MRC clinical research training fellow and received salary support from the Sackler Trust at the University of Cambridge; he is now a postdoctoral fellow of the National Institute for Health Research Rare Diseases (NIHT-RD) initiative. G.M.H., D.E.J., and R.N.S. receive salary support from an MRC stratified medicine award (UK-PBC, MR/L001489/1).

Abbreviations

AIH

autoimmune hepatitis

ALP

alkaline phosphatase

ALP12

alkaline phosphatase after 12 months of UDCA

AMA

anti-mitochondrial-antibody

ANA

anti-nuclear antibodies

ALT

alanine aminotransferase

ALT12

ALT after 12 months of UDCA

AST

aspartate transaminase

AST12

AST after 12 months of UDCA

AUC

area under receiver operating characteristic curve

BIL

bilirubin

BIL12

bilirubin after 12 months of UDCA

CI

confidence interval

CRFs

case record forms

EASL

European Association for the Study of Liver

ESLD

end-stage liver disease

HCC

hepatocellular carcinoma

IgG

immunoglobulin G

INR

international normalized ratio

IQR

interquartile range

LT

liver transplantation

NHS

National Health Service

PBC

primary biliary cholangitis

PT

prothrombin time

QC

quality control

SMA

anti-smooth-muscle antibodies

TA

transaminases

TA12

transaminases after 12 months of UDCA

UCDA

ursodeoxycholic acid

ULN

upper limit of normal

Footnotes

URLs: UK-PBC: http://www.uk-pbc.com/; Academic Department of Medical Genetics: http://medgen.medschl.cam.ac.uk/.

Potential conflict of interest: Dr. Hirschfield advises Intercept and is on the speakers’ bureau for Falk. Dr. Williamson consults for Intercept. Dr. Sandford consults for Otsuka and received grants from Intercept. Dr. Heneghan received grants from Astellas.

Author names in bold designate shared co-first authorship.

References

  • 1).European Association for the Study of the Liver. EASL Clinical Practice Guidelines: management of cholestatic liver diseases. J Hepatol. 2009;51:237–267. doi: 10.1016/j.jhep.2009.04.009. [DOI] [PubMed] [Google Scholar]
  • 2).Carbone M, Mells GF, Pells G, Dawwas MF, Newton JL, Heneghan MA, et al. Sex and age are determinants of the clinical phenotype of primary biliary cirrhosis and response to ursodeoxycholic acid. Gastroenterology. 2013;144:560–569.e7. doi: 10.1053/j.gastro.2012.12.005. quiz, e513-e564. [DOI] [PubMed] [Google Scholar]
  • 3).Hayes DF, Markus HS, Leslie RD, Topol EJ. Personalized medicine: risk prediction, targeted therapies and mobile health technology. BMC Med. 2014;12:37. doi: 10.1186/1741-7015-12-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4).Poupon RE, Poupon R, Balkau B. Ursodiol for the long-term treatment of primary biliary cirrhosis. The UDCA-PBC Study Group. N Engl J Med. 1994;330:1342–1347. doi: 10.1056/NEJM199405123301903. [DOI] [PubMed] [Google Scholar]
  • 5).Lindor KD, Gershwin ME, Poupon R, Kaplan M, Bergasa NV, Heathcote EJ, American Association for Study of Liver Diseases Primary biliary cirrhosis. Hepatology. 2009;50:291–308. doi: 10.1002/hep.22906. [DOI] [PubMed] [Google Scholar]
  • 6).Pares A, Caballeria L, Rodes J. Excellent long-term survival in patients with primary biliary cirrhosis and biochemical response to ursodeoxycholic Acid. Gastroenterology. 2006;130:715–720. doi: 10.1053/j.gastro.2005.12.029. [DOI] [PubMed] [Google Scholar]
  • 7).Corpechot C, Abenavoli L, Rabahi N, Chretien Y, Andreani T, Johanet C, et al. Biochemical response to ursodeoxycholic acid and long-term prognosis in primary biliary cirrhosis. Hepatology. 2008;48:871–877. doi: 10.1002/hep.22428. [DOI] [PubMed] [Google Scholar]
  • 8).Kuiper EM, Hansen BE, de Vries RA, den Ouden-Muller JW, van Ditzhuijsen TJ, Haagsma EB, et al. Improved prognosis of patients with primary biliary cirrhosis that have a biochemical response to ursodeoxycholic acid. Gastroenterology. 2009;136:1281–1287. doi: 10.1053/j.gastro.2009.01.003. [DOI] [PubMed] [Google Scholar]
  • 9).Kumagi T, Guindi M, Fischer SE, Arenovich T, Abdalian R, Coltescu C, et al. Baseline ductopenia and treatment response predict long-term histological progression in primary biliary cirrhosis. Am J Gastroenterol. 2010;105:2186–2194. doi: 10.1038/ajg.2010.216. [DOI] [PubMed] [Google Scholar]
  • 10).Corpechot C, Chazouilleres O, Poupon R. Early primary biliary cirrhosis: biochemical response to treatment and prediction of long-term outcome. J Hepatol. 2011;55:1361–1367. doi: 10.1016/j.jhep.2011.02.031. [DOI] [PubMed] [Google Scholar]
  • 11).Trivedi PJ, Bruns T, Cheung A, Li KK, Kittler C, Kumagi T, et al. Optimising risk stratification in primary biliary cirrhosis: AST/platelet ratio index predicts outcome independent of ursodeoxycholic acid response. J Hepatol. 2014;60:1249–1258. doi: 10.1016/j.jhep.2014.01.029. [DOI] [PubMed] [Google Scholar]
  • 12).Barber K, Madden S, Allen J, Collett D, Neuberger J, Gimson A, et al. Elective liver transplant list mortality: development of a United Kingdom end-stage liver disease score. Transplantation. 2011;92:469–476. doi: 10.1097/TP.0b013e318225db4d. [DOI] [PubMed] [Google Scholar]
  • 13).Lammers WJ, van Buuren HR, Hirschfield GM, Janssen HL, Invernizzi P, Mason AL, et al. Global PBC Study Group Levels of alkaline phosphatase and bilirubin are surrogate end points of outcomes of patients with primary biliary cirrhosis: an international follow-up study. Gastroenterology. 2014;147:1338–1349.e5. doi: 10.1053/j.gastro.2014.08.029. quiz, e1315. [DOI] [PubMed] [Google Scholar]
  • 14).Chazouilleres O, Wendum D, Serfaty L, Montembault S, Rosmorduc O, Poupon R. Primary biliary cirrhosis-autoimmune hepatitis overlap syndrome: clinical features and response to therapy. Hepatology. 1998;28:296–301. doi: 10.1002/hep.510280203. [DOI] [PubMed] [Google Scholar]
  • 15).Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, Brindle P. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ. 2008;336:1475–1482. doi: 10.1136/bmj.39609.449676.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16).Hippisley-Cox J, Coupland C. Derivation and validation of updated QFracture algorithm to predict risk of osteoporotic fracture in primary care in the United Kingdom: prospective open cohort study. BMJ. 2012;344:e3427. doi: 10.1136/bmj.e3427. [DOI] [PubMed] [Google Scholar]
  • 17).White IR, Royston P. Imputing missing covariate values for the Cox model. Stat Med. 2009;28:1982–1998. doi: 10.1002/sim.3618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18).Invernizzi P, Lleo A, Podda M. Interpreting serological tests in diagnosing autoimmune liver diseases. Semin Liver Dis. 2007;27:161–172. doi: 10.1055/s-2007-979469. [DOI] [PubMed] [Google Scholar]
  • 19).Vergani D, Alvarez F, Bianchi FB, Cancado EL, Mackay IR, Manns MP, et al. Liver autoimmune serology: a consensus statement from the committee for autoimmune serology of the International Autoimmune Hepatitis Group. J Hepatol. 2004;41:677–683. doi: 10.1016/j.jhep.2004.08.002. [DOI] [PubMed] [Google Scholar]
  • 20).Hirschfield GM, Mason A, Luketic V, Lindor K, Gordon SC, Mayo M, et al. Efficacy of obeticholic acid in patients with primary biliary cirrhosis and inadequate response to ursodeoxycholic acid. Gastroenterology. 2015;148:751–761.e8. doi: 10.1053/j.gastro.2014.12.005. [DOI] [PubMed] [Google Scholar]
  • 21).Royston P, Altman DG. External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol. 2013;13:33. doi: 10.1186/1471-2288-13-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

RESOURCES