Skip to main content
eBioMedicine logoLink to eBioMedicine
. 2018 Aug 23;35:10–11. doi: 10.1016/j.ebiom.2018.08.032

Predicting liver disease post hepatitis virus infection: In silico pathology and pattern recognition

Brett A Lidbury 1,
PMCID: PMC6161479  PMID: 30146340

Liver biopsy, the current gold standard for the assessment of hepatic pathology, is an invasive procedure that as well as being unpleasant for the patient, can have post-procedural complications. In this regard, less-invasive alternatives are attractive, and the FIB-4 (Fibrosis-4) index is particularly appealing since it uses routine liver function test (LFT) transaminase marker and platelet count data, with Age, to estimate liver damage without biopsy. While logistic regression methods are useful for class predictions (e.g. stages of liver fibrosis, S0 – S4), Wei et al. [1] report improvements on fibrosis stage prediction, associated with hepatitis B virus (HBV) or hepatitis C virus (HCV) infection, through the comparison of machine learning algorithms.

Machine learning takes data analytics to a new realm beyond traditional test statistics, and as the name suggests, “learn” patterns in data that can be subsequently tested for accuracy in data sub-sets, or validated in additional data sets with the same variables. Furthermore, to enhance predictive estimates for the outcome of interest (in this case, stage of liver fibrosis), algorithms can be run as ensembles where iterations are performed thousands of times to counter wide variation in the target data-set, which are not restricted by data distribution considerations, and as such are robust in the face of diverse data structures. While these advantages are available, awareness on the dangers of overfitting the models, and other challenges, are necessary to reach appropriate real-world conclusions [2].

From among the machine learning options, the various recursive partitioning algorithms (decision trees, random forests) are often chosen [3], and work well on medical problems since the output reflects a decision process common in medical/health environments. The concept of decision trees is not new in relation to liver disease, and earlier examples include their use in assessing treatment efficacy and associated economic outcomes [4], and more recently Shang et al. [5] demonstrated the application of classification decision trees, in tandem with logistic regression, to routine pathology blood test markers that resulted in the development of a predictive model of HBV infection in Chinese patients that did not require specialised immunoassay testing. Investigations to enhance the prediction of HBV and HCV infection via routine pathology data by various data pre-processing methods, prior to running single or ensemble decision trees, have been performed on HBsAg negative versus HBsAg positive imbalanced data [6].

With a focus on recursive partitioning methods, Wei et al. have sought to improve the predictive power of the established FIB-4 index by comparing single decision trees, random forests, and gradient boosting machine learning methods ([7,8,9,10]). The impact of the additional LFT markers, serum albumin and gamma glutamyl transferase (GGT), on the tree models was also investigated. Random forests (RF) are an ensemble decision tree method that uses bootstrap resampling to build models, while gradient boosting (GB) takes another approach to multiple trees, namely, by improving early trees in the model to reduce variation. Due to the previously mentioned challenge of overfitting models, RF and GB are seen as superior in providing accurate predictions of the outcome of interest, in comparison to single trees.

Single decision trees showed at least similar to slightly improved prediction of FIB-4 stage (S0 - non-fibrosis to S4 - cirrhosis) in comparison to traditional logistic regression models, using data from liver biopsy and blood samples provided by the study participants. The stage of fibrosis/cirrhosis was assessed by three independent histopathologists for this study. Both the RF and GB displayed significantly improved prediction performance across the S0 – 4 stages, with Wei et al. ultimately demonstrating superior predictive accuracy for gradient boosting over random forest. Furthermore, this study presented a new online tool, LiveBoost, which applies these research findings into a user-friendly platform.

The study highlighted here provides an excellent example of how biomedical problems can benefit from collaborations with computer science and advanced statistics. Biology, and particularly the health and medical sciences, now has access to massive data collections for not only genetic data, but pathology results, clinical and associated meta-data with which to develop into both research resources and clinical support tools. Other machine learning algorithms are available that may enhance these findings further (e.g. support vector machines), and more data is always required for further validation in other populations. For the direct translation to a clinical domain, decision tree modelling of the various types fits well with the nature of the information available, and provides a suitable format to support clinical judgements.

Disclosure

The author declares no conflicts of interest.

References

  • 1.Wei Clinical prediction of HBV and HCV related hepatic fibrosis using machine learning. EBioMedicine. 2018 doi: 10.1016/j.ebiom.2018.07.041. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Foster K.R., Koprowski R., Skufca J.D. Machine learning, medical diagnosis, and biomedical engineering research - commentary. Biomed Eng Online. 2014;13(94) doi: 10.1186/1475-925X-13-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kingsford C., Salzberg S.L. What are decision trees? Nat Biotechnol. 2008;26(9):1011. doi: 10.1038/nbt0908-1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Crowley S., Tognarini D., Desmond P., Lees M., Saal G. Introduction of lamivudine for the treatment of chronic hepatitis B: expected clinical and economic outcomes based on 4-year clinical trial data. J Gastroenterol Hepatol. 2002;17(2):153–164. doi: 10.1046/j.1440-1746.2002.02673.x. [DOI] [PubMed] [Google Scholar]
  • 5.Shang G., Richardson A., Gahan M.E., Easteal S., Ohms S., Lidbury B.A. Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining. J Med Virol. 2013;85(8):1334–1339. doi: 10.1002/jmv.23609. [DOI] [PubMed] [Google Scholar]
  • 6.Richardson A.M., Lidbury B.A. Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data. BMC Bioinform. 2013;14:206. doi: 10.1186/1471-2105-14-206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Breiman L., Friedman J.H., Olshen R.A., Stone C.J. Vol. 538. 1984. Classification and regression trees; p. 19. [Google Scholar]
  • 8.Liaw A., Wiener M. 2(3) 2002. Classification and regression by random forest. R news; pp. 18–22. [Google Scholar]
  • 9.Greg R. 2010. gbm: Generalized boosted regression models; pp. 16–31.http://CRAN.R-project.org/package=gbm R Package Version. [Google Scholar]
  • 10.Therneau T., Atkinson B. rpart: Recursive partitioning and regression trees. 2018. https://CRAN.R-project.org/package=rpart R package version 4.1-13.

Articles from EBioMedicine are provided here courtesy of Elsevier

RESOURCES