Abstract
Diffuse liver disease is common, primarily driven by high prevalence of non-alcoholic fatty liver disease (NAFLD). It is currently assessed by liver biopsy to determine fibrosis, often staged as F0 (normal) - F4 (cirrhosis). A noninvasive assessment method will allow a broader population to be monitored longitudinally, facilitating risk stratification and treatment efficacy assessment. Ultrasound shear wave elastography (SWE) is a promising noninvasive technique for measuring tissue stiffness that has been shown to correlate with fibrosis stage. However, this approach has been limited by variability in stiffness measurements. In this work, we developed and evaluated an automated framework, called SWE-Assist, that checks SWE image quality, selects a region of interest (ROI), and classifies the ROI to determine whether the fibrosis stage is at or exceeds F2, which is important for clinical decision-making. Our database consists of 3,392 images from 328 cases. Several classifiers, including random forest, support vector machine, and convolutional neural network (CNN)) were evaluated. The best approach utilized a CNN and yielded an area under the receiver operating curve (AUROC) of 0.89, compared to the conventional stiffness only based AUROC of 0.74. Moreover, the new method is based on single image per decision, vs. 10 images per decision for the baseline. A larger dataset is needed to further validate this approach, which has the potential to improve the accuracy and efficiency of non-invasive liver fibrosis staging.
I. Introduction
The high prevalence of non-alcoholic fatty liver disease (NAFLD) is a main cause of diffuse liver disease. Despite recent advances, including development of numerous therapeutic agents presently in phase 2 and 3 trials, NAFLD remains a silent disease in which the vast majority of patients accumulate progressive liver damage without signs or symptoms and, undiagnosed, receive no medical care. NAFLD is exceptionally common, with an estimated one hundred million afflicted people in the United States alone [1], [2]. NAFLD prevalence is underestimated and progression is under recognized by primary care physicians [3]. It is estimated that the vast majority of people with NAFLD remain undiagnosed and untreated [4]. This diagnostic gap has major consequences: by 2030, NAFLD is projected to be the leading cause of end-stage liver disease and the dominant reason for liver transplantation in the United States [5]. NAFLD has also been shown to be a strong independent risk factor for cardiometabolic disease, with a markedly increased risk of cardiovascular morbidity and death [6], [7]. The economic burden of NAFLD has been estimated to be up to $292 billion/year in the United States alone [8]. Even though there are currently no approved therapeutics for NAFLD, there are numerous therapeutic agents in clinical trials. Subject selection for these trials and for subsequent therapies will most likely require identification of those patients at highest risk of adverse NAFDL outcomes. These patients are those who have fibrosis stage ≥ F2 at the time of diagnosis, a category recently defined as high-risk Nonalcoholic Steatohepatitis (NASH) by the Non-Invasive Biomarkers of Metabolic Liver Disease (NIMBLE) NAFLD biomarkers consortium.
While not the standard system for NAFLD, METAVIR is one of the commonly used liver fibrosis scoring systems, and it provides useful data concerning fibrosis progression in NAFLD patients. It divides the fibrosis level into five stages, ranging from F0 (normal) to F4 (cirrhosis) [9]–[11]. Being able to diagnose ≥ F2 is important, as F2 is often considered critical for clinical decision-making.
Liver biopsy is the current reference standard for liver fibrosis staging, but is limited by sampling error, and high intra- and inter-observer variability [12]. Moreover, it is invasive, painful, expensive, and associated with morbidity and even mortality [13]. As a result, only a small minority of NAFLD patients typically undergo liver biopsy. The vast majority is left undiagnosed, and at risk of progression, even though weight reduction is of benefit [14] and numerous treatments are in phase 2 and 3 clinical trials.
Shear wave elastography (SWE), deployed on conventional ultrasound devices, employs acoustically induced shear waves [15], to measure tissue stiffness non-invasively [16], and is considered a promising alternative to biopsy. An example SWE image is shown in Figure 1. The color code displays tissue stiffness, measured as Young’s modulus (eYM), on top of a conventional ultrasound B mode image. Blue indicates the tissue is relatively soft, and red indicates the tissue relatively stiff. Note heterogeneity, a typical finding in advanced liver fibrosis. A circular region of interest (ROI) is drawn within the SWE image box to compute mean eYM in a region that is intended to be free of blood vessels or artifacts.
Figure 1.

Example of color-coded SWE image from the right lobe of a liver. The SWE image overlays the conventional B mode image, also shown in the bottom.
In addition to assessing liver fibrosis, SWE has been applied to diagnosing malignancy of breast [17], [18] and thyroid lesions [19].
II. Previous Work
In 136 subjects undergoing liver biopsy, we showed an AUROC (area under receiver operating characteristic curve) of 0.77 for SWE, with sensitivity 91.4% and specificity 52.5% for diagnosis of ≥ F2 fibrosis at an eYM cutoff of 7.29 kPa [20]. We validated these results in another 277 subjects, finding 95.4% sensitivity and 50.5% specificity for ≥ F2 fibrosis at the same cutoff [21]. Others have shown similar shear wave speed increases with hepatic fibrosis [20], [22]. We noted distinct stiffness pattern changes in advanced liver fibrosis, indicating that image analysis may improve SWE accuracy and reduce variability. In this paper, we confirmed this by developing and validating algorithms that reduce SWE measurement variability and improve ≥ F2 classification accuracy. Recently, results on automated liver fibrosis scoring were published from a small study with 85 SWE images from 85 cases, using texture features and a support vector machine classifier (SVM) [23]. Image quality checking and ROI selection were not addressed. Our work addresses these important steps on a much larger database (3392 SWE images) and compares a convolutional neural network (CNN) to other classifiers, including SVM and Random Forest, with the CNN found to have the best performance among all.
III. Methods
A. Materials
A data set of 328 SWE cases was compiled as a retrospective study. The study was approved by the MGH IRB and was compliant with the Health Insurance Portability and Accountability Act. Patients known to have or suspected of having diffuse liver disease who were scheduled for ultrasound (US)-guided non-focal liver biopsy in the MGH Interventional Radiology Department were eligible for the study. Subjects who underwent SWE examination immediately prior to their liver biopsy for diffuse liver disease evaluation during the period January 2014 – March 2015 were included. About 10 SWE images per case were obtained at the right upper lobe prior to liver biopsy, resulting in a total of 3,392 images. SWE images were generated with a SuperSonic Imagine (Aix-en-Provence, France) Aixplorer system, which was loaned to the investigators. The authors controlled the data and the information submitted for publication. Among the 328 patients, there were 148 men and 180 women, with a mean age of 49.3 years. A total of 116 patients had a fibrosis stage of ≥ F2. Table 1 lists the detailed patient demographics.
Table 1.
Patient demographics
| Total # of Subjects (N) | 328 |
|---|---|
| Mean Age | 49.3±13.7 |
| Gender | 147 men; 180 women |
| Fibrosis Stage (METAVIR System) | N (Percentage) |
| F0 - Absent | 130 (39.5%) |
| F1 - Enlarged fibrotic portal tract | 119 (36.4%) |
| F2 - Few portal-portal septa but intact architecture | 38 (11.6%) |
| F3 - Many septa with architectural distortion but no obvious cirrhosis | 28 (8.6%) |
| F4 - Cirrhosis | 13 (4.0%) |
To prepare the SWE data for analysis, both the SWE images and stiffness measurements were extracted from the recorded Supersonic Imagine DICOM images using MATLAB (MATLAB, Natick MA). The SWE images were matched with biopsy results based on patient records. All biopsy results were analyzed by a blinded sub-specialist pathologist using the METAVIR criteria.
B. Algorithmic Approach
In a typical clinical workflow, the operator manually positions the SWE image box based on the US image, then selects an ROI to measure the tissue stiffness. Both of these steps introduce variability in the SWE measurements. The operator-placed ROI is small compared with the SWE image box (Figure 3) and therefore makes use of only a small fraction of the available information. Moreover, ROI placement is highly operator-dependent, leading to inconsistent quality, particularly if an ROI is near a vessel or bile duct.
Figure 3.

(A) SWE image with high percentage color fill-in (PCFI). (B) SWE image with low PCFI, resulting in erroneous stiffness measurement.
To explore an objective approach to reducing variability, a processing pipeline, SWE-Assist, was implemented as shown in Figure 2. The pipeline includes an (1) image quality check, (2) automated ROI selection, and (3) a machine learning classifier that is intended to improve standard SWE measurements from the ROI. The classifier has been primarily tested to detect ≥ F2 fibrosis.
Figure 2.

SWE-Assist processing pipeline.
In current clinical practice, low quality measurements are common. This is addressed clinically by taking ten images in succession [24], computing the means of the ten ROIs and then computing the median of these measurements. Figure 3A shows a typical “high quality” SWE image with elastography-histopathology concordance: the SWE image box is nearly completely filled with color-coded pixels. Conversely, Figure 3B is a typical “low quality” SWE image, where the SWE image box contains relatively few valid pixels and therefore less data. To define an image quality measure, we identified all cases in which elastography-histopathology discordance was observed and performed iterative analyses to identify where image removal would improve elastographic liver fibrosis staging. These analyses permitted empiric identification of specific image features associated with elastography-histopathology discordance. Based on this analysis, we developed a Percentage of Color Fill-In (PCFI) metric, computed as the ratio of number of the pixels where a stiffness measurement exists to the size of the SWE box.
To automatically select an ROI, a rectangular window (e.g., 16 × 16 pixels) is raster scanned over the SWE image box, with inter-pixel variability measured as standard deviation for each candidate ROI. The lowest variability rectangular ROI is selected. We used low variability as a quality measure based on the empiric finding that lower inter-pixel variability is associated with better elastography-histopathology concordance. However, since liver heterogeneity increases with fibrosis stage and is particularly marked in cirrhosis, the variability-minimization ROI selection algorithm is only applied when mean eYM is below 8 kPa, which is below the eYM cutoff value for cirrhosis [20].
We explored two basic approaches to scoring fibrosis in the ROI: 1) training a classifier using handcrafted features, 2) deep learning, which operates directly on the input images, essentially automatically learning features as part of the training process [25]. For the first approach, the feature set considered includes the basic set of stiffness measures (mean, minimum, maximum, and standard deviation) as well as additional statistical features (skewness, kurtosis, and entropy). These features were evaluated with a random forest classifier [26] and support vector machine (SVM) [27] in conjunction with principal component analysis (PCA) [28]. For deep learning, a convolutional neural network (CNN) was used, which has shown to be well suited for identifying underlying features in images [29]. Table 2 describes our CNN configuration used for this paper. It contains two sets of 2D convolution and max pooling layers, followed by a fully connected layer, a dropout layer, and a fully connected output layer. The network is trained to output a binary classification (0 for < F2 and 1 for ≥ F2 fibrosis).
Table 2.
CNN Configuration
| Layers | # of Filters |
|---|---|
| Input (Size=16×l6) | -- |
| Conv2D (Size = 3×3, Stride=l, Non-Linearity = ReLu) | 16 |
| Max pooling (Size=2×2, Stride=2) | 16 |
| Conv2D (Size = 3×3, Stride=l, Non-Linearity = ReLu) | 32 |
| Max pooling (Size=2×2, Stride=2) | 32 |
| Fully connected (N=1024) | -- |
| Dropout (Thresh=0.5) | -- |
| Fully connected (N = 2) | -- |
IV. Results
The value of the image quality check, ROI selection, and several candidate classification approaches were each evaluated. These were compared to the baseline approach, which computes the mean stiffness within the sonographer-selected ROI for each of ten images, and then computes the median of the ten mean stiffness values [19]. A threshold is applied to the median value to detect ≥ F2 fibrosis. Applying the baseline approach to this database results in an AUROC of 0.74.
Using our automated image selection algorithm with a SWE image acceptance criterion of PCFI ≥ 70%, which resulted in a subset of 2287 images from 306 cases, we achieved the same AUC of 0.74 for ≥ F2 diagnosis using single image analysis as conventional clinical practice achieved using ten images. This indicates the potential of automated image quality assessment via PCFI threshold criterion method to improve measurement quality and/or reduce required measurement number.
An example of applying the automated ROI selection is shown in Figure 4, compared to a sonographer-selected ROI. In Figure 4a, the manually selected ROI is positioned at a desired location, and the two ROI match well. In Figure 4b, the manually selected ROI is erroneously positioned close to a vessel, as indicated by the red arrow in the US image at the bottom of Figure 4b, and also confirmed by the extremely high tissue stiffness measurement. As a comparison, the algorithm selected ROI is chosen at a more desirable location, away from the vessel, with a low SD. The selection was verified by a radiologist.
Figure 4.

(A-B) Manual and (C-D) SWE-Assist ROI placement. The ROI in (D) is desirably further from vessel artifact than (B).
The learning-based approaches used a 3-fold cross validation based on the algorithm selected ROIs. Each fold of the training/test set was grouped by case numbers to prevent leakage.
Figure 5 plots ROCs for the full processing pipeline of PCFI, automated ROI selection, and classification using single images vs the standard clinical approach (Figure 5 black). All classifiers yield a larger AUROC than the baseline, (AUROC = 0.74). The CNN showing the largest improvement with AUROC=0.89 and 95% Confidence Interval of [0.83 – 0.94].
Figure 5.

AUROCs for liver SWE >=F2 fibrosis classification. SWE-Assist (red) improves the classification with one image vs. the baseline approach (black) that is based on 10 images.
V. Discussion
An automated approach consisting of an image quality check, ROI selection, and CNN classification yielded a more accurate detection of ≥ F2 fibrosis levels than from a previously published baseline approach, with an AUC of 0.89 vs. 0.74. Moreover, the new result was based on a single image, whereas the baseline approach requires ten images per decision. The new approach is thus promising for faster, more accurate liver fibrosis assessments, which are needed to replace invasive liver biopsy. Future work will expand the SWE image database and evaluate several approaches to further improve accuracy. These include: generalizing the ROI to “pixels of interest,” to make fuller use of all the information in the SWE image, extracting features from the ultrasound image to assess fat content and to identify image locations that may result in SWE artifacts (e.g., blood vessels, lesions), and investigating the utility of adding demographic and clinical features. Ultimately, we envision that these algorithms will be incorporated in a real-time computer-aided imaging and diagnosis system, which will aid the sonographer in generating high quality elastographic images as well as assessing fibrosis.
Acknowledgments
DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering.
Contributor Information
Manish Dhyani, Massachusetts General Hospital, Boston MA 02114 USA.
Joseph R. Grajo, University of Florida College of Medicine
Anthony E. Samir, Massachusetts General Hospital, Boston MA 02114 USA
References
- [1].Rinella ME et al. , “Controversies in the diagnosis and management of NAFLD and NASH,” Gastroenterol. Hepatol, vol. 10, no. 4, pp. 219–227, 2014. [PMC free article] [PubMed] [Google Scholar]
- [2].Spengler EK and Loomba R, “Recommendations for Diagnosis, Referral for Liver Biopsy, and Treatment of NAFLD and NASH,” Mayo Clin Proc, vol. 90, no. 9, pp. 1233–1246, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Patel PJ et al. , “Underappreciation of non-alcoholic fatty liver disease by primary care clinicians: limited awareness of surrogate markers of fibrosis,” Intern. Med. J, no. Md, 2017. [DOI] [PubMed] [Google Scholar]
- [4].Puri P and Sanyal AJ, Nonalcoholic Fatty Liver Disease, Sixth Edit. Elsevier Inc., 2012. [Google Scholar]
- [5].Wong VWS et al. , “Validity criteria for the diagnosis of fatty liver by M probe-based controlled attenuation parameter,” J. Hepatol, vol. 67, no. 3, pp. 577–584, 2017. [DOI] [PubMed] [Google Scholar]
- [6].Chalasani N et al. , “The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases,” Hepatology, vol. 67, no. 1, pp. 328–357, 2018. [DOI] [PubMed] [Google Scholar]
- [7].Athyros VG, Tziomalos K, Katsiki N, Doumas M, Karagiannis A, and Mikhailidis DP, “Cardiovascular risk across the histological spectrum and the clinical manifestations of nonalcoholic fatty liver disease: An update,” World J. Gastroenterol, vol. 21, no. 22, pp. 6820–6834, Jun. 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Younossi ZM et al. , “The economic and clinical burden of nonalcoholic fatty liver disease in the United States and Europe,” Hepatology, vol. 64, no. 5, pp. 1577–1586, Nov. 2016. [DOI] [PubMed] [Google Scholar]
- [9].Gerstenmaier JF and Gibson RN, “Ultrasound in chronic liver disease,” Insights Imaging, vol. 5, no. 4, pp. 441–455, May 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Theise ND, “Liver biopsy assessment in chronic viral hepatitis: a personal, practical approach,” Mod. Pathol, vol. 20, no. 1s, pp. S3–S14, Feb. 2007. [DOI] [PubMed] [Google Scholar]
- [11].“Intraobserver and interobserver variations in liver biopsy interpretation in patients with chronic hepatitis C. The French METAVIR Cooperative Study Group,” Hepatol. Baltim. Md, vol. 20, no. 1 Pt 1, pp. 15–20, Jul. 1994. [PubMed] [Google Scholar]
- [12].Ratziu V et al. , “Sampling variability of liver biopsy in nonalcoholic fatty liver disease,” Gastroenterology, vol. 128, no. 7, pp. 1898–1906, 2005. [DOI] [PubMed] [Google Scholar]
- [13].Noureddin M and Loomba R, “Nonalcoholic fatty liver disease: Indications for liver biopsy and noninvasive biomarkers,” Clin. Liver Dis, vol. 1, no. 4, pp. 103–106, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Vilar-Gomez E and Chalasani N, “Non-invasive assessment of non-alcoholic fatty liver disease: Clinical prediction rules and blood-based biomarkers,” J. Hepatol, vol. 68, no. 2, pp. 305–315, 2017. [DOI] [PubMed] [Google Scholar]
- [15].Bercoff J, Tanter M, and Fink M, “Supersonic shear imaging: a new technique for soft tissue elasticity mapping,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control, vol. 51, no. 4, pp. 396–409, 2004. [DOI] [PubMed] [Google Scholar]
- [16].Nowicki A and Dobruch-Sobczak K, “Introduction to ultrasound elastography,” J. Ultrason, vol. 16, no. 65, pp. 113–124, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Zhang Q et al. , “Deep learning based classification of breast tumors with shear-wave elastography,” Ultrasonics, vol. 72, pp. 150–157, 2016. [DOI] [PubMed] [Google Scholar]
- [18].Moon WK et al. , “Computer-aided tumor diagnosis using shear wave breast elastography,” Ultrasonics, vol. 78, pp. 125–133, 2017. [DOI] [PubMed] [Google Scholar]
- [19].Samir AE et al. , “Shear-wave elastography for the preoperative risk stratification of follicular-patterned lesions of the thyroid: diagnostic accuracy and optimal measurement plane,” Radiology, vol. 277, no. 2, pp. 565–573, 2015. [DOI] [PubMed] [Google Scholar]
- [20].Samir AE et al. , “Shear-wave elastography for the estimation of liver fibrosis in chronic liver disease: determining accuracy and ideal site for measurement,” Radiology, vol. 274, no. 3, pp. 888–896, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Dhyani M, Grajo JR, Bhan AK, Corey K, Chung R, and Samir AE, “Validation of Shear Wave Elastography Cutoff Values on the Supersonic Aixplorer for Practical Clinical Use in Liver Fibrosis Staging,” Ultrasound Med. Biol, vol. 43, no. 6, pp. 1125–1133, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Trifanov DS et al. , “Amyloidosis of the liver on shear wave elastography: case report and review of literature,” Abdom. Imaging, vol. 40, no. 8, pp. 3078–3083, 2015. [DOI] [PubMed] [Google Scholar]
- [23].Gatos I et al. , “A Machine-Learning Algorithm Toward Color Analysis for Chronic Liver Disease Classification, Employing Ultrasound Shear Wave Elastography,” Ultrasound Med. Biol, vol. 43, no. 9, pp. 1797–1810, 2017. [DOI] [PubMed] [Google Scholar]
- [24].Barr RG et al. , “Elastography assessment of liver fibrosis: society of radiologists in ultrasound consensus conference statement,” Radiology, vol. 276, no. 3, pp. 845–861, 2015. [DOI] [PubMed] [Google Scholar]
- [25].Lecun Y, Bengio Y, and Hinton G, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [DOI] [PubMed] [Google Scholar]
- [26].Liaw a and Wiener M, “Classification and Regression by randomForest,” R News, vol. 2, no. December, pp. 18–22, 2002. [Google Scholar]
- [27].Cortes C and Vapnik V, “Support-Vector Networks,” Mach. Learn, vol. 20, no. 3, pp. 273–297, 1995. [Google Scholar]
- [28].Wold S, Esbensen K, and Geladi P, “Principal component analysis,” Chemom. Intell. Lab. Syst, vol. 2, pp. 37–52, 1987. [Google Scholar]
- [29].Krizhevsky A Sulskever Ii., and Hinton GE, “ImageNet Classification with Deep Convolutional Neural Networks,” Adv. Neural Inf. Process. Syst. NIPS, vol. 60, no. 6, pp. 84–90, 2012. [Google Scholar]
