Skip to main content
Medical Physics logoLink to Medical Physics
. 2012 May 14;39(6):3112–3123. doi: 10.1118/1.4711815

Automated measurement of uptake in cerebellum, liver, and aortic arch in full-body FDG PET/CT scans

Christian Bauer 1, Shanhui Sun 1, Wenqing Sun 2, Justin Otis 2, Audrey Wallace 2, Brian J Smith 3, John J Sunderland 4, Michael M Graham 4, Milan Sonka 5, John M Buatti 6, Reinhard R Beichel 7,a)
PMCID: PMC3365916  PMID: 22755696

Abstract

Purpose: The purpose of this work was to develop and validate fully automated methods for uptake measurement of cerebellum, liver, and aortic arch in full-body PET/CT scans. Such measurements are of interest in the context of uptake normalization for quantitative assessment of metabolic activity and/or automated image quality control.

Methods: Cerebellum, liver, and aortic arch regions were segmented with different automated approaches. Cerebella were segmented in PET volumes by means of a robust active shape model (ASM) based method. For liver segmentation, a largest possible hyperellipsoid was fitted to the liver in PET scans. The aortic arch was first segmented in CT images of a PET/CT scan by a tubular structure analysis approach, and the segmented result was then mapped to the corresponding PET scan. For each of the segmented structures, the average standardized uptake value (SUV) was calculated. To generate an independent reference standard for method validation, expert image analysts were asked to segment several cross sections of each of the three structures in 134 F-18 fluorodeoxyglucose (FDG) PET/CT scans. For each case, the true average SUV was estimated by utilizing statistical models and served as the independent reference standard.

Results: For automated aorta and liver SUV measurements, no statistically significant scale or shift differences were observed between automated results and the independent standard. In the case of the cerebellum, the scale and shift were not significantly different, if measured in the same cross sections that were utilized for generating the reference. In contrast, automated results were scaled 5% lower on average although not shifted, if FDG uptake was calculated from the whole segmented cerebellum volume. The estimated reduction in total SUV measurement error ranged between 54.7% and 99.2%, and the reduction was found to be statistically significant for cerebellum and aortic arch.

Conclusions: With the proposed methods, the authors have demonstrated that automated SUV uptake measurements in cerebellum, liver, and aortic arch agree with expert-defined independent standards. The proposed methods were found to be accurate and showed less intra- and interobserver variability, compared to manual analysis. The approach provides an alternative to manual uptake quantification, which is time-consuming. Such an approach will be important for application of quantitative PET imaging to large scale clinical trials.

Keywords: automated uptake measurement, cerebellum, liver, aortic arch

INTRODUCTION

There are several underlying assumptions in using F-18 fluorodeoxyglucose (FDG) standardized uptake values (SUVs) for characterizing and following metabolic activity in tumors.1 The failure of some of these assumptions can introduce significant variability in the calculated SUVs. These assumptions include accurate measurement of the injected dose; accurate decay correction of all measurements; complete injection of the dose, i.e., no infiltration; correct entry of all relevant values into the reconstruction program; and similar blood time–activity curves of all subjects. Problems have been seen with all of these assumptions. One approach that has been suggested to avoid these problems and to improve the reproducibility of SUV measurements is to use lesion to normal tissue SUV ratios as the index of metabolic rate rather than the SUV by itself.

Other assumptions are introduced when using the lesion to normal tissue ratio. The fundamental underlying assumption is that a normal tissue can be identified where the uptake is an indication of the availability of FDG to the tissue during the entire uptake period. A second assumption is that the normal tissue activity can be identified in a reproducible fashion. Several authors have examined the utility of ratio for discrimination of benign from malignant nodules. Obrzut et al.2 found that lung nodule to cerebellum ratio was more accurate than SUV alone. Blake et al.3 also found that tissue ratios performed better than absolute SUV measures in diagnosing adrenal nodules with the adrenal to liver ratio being a better discriminator than nodule SUV.

Since the main goal of this approach requires accurate and reproducible measures of average uptake in normal tissue, appropriate methodology that reliably and automatically yields the average FDG uptake in normal tissue regions is required. This will allow the empirical determination of which normal tissue is best under different circumstances, should simplify using the lesion to normal tissue method, and should produce highly reproducible results.

The automated segmentation and/or uptake measurement of structures like the cerebellum, liver, and aortic arch reference regions in full-body FDG PET/CT scans is largely unaddressed in the literature. Li et al.4 presented a method to segment the liver in low-contrast CT based on a segmentation of the liver in PET datasets, while a variety of methods have been presented in the literature for contrast-enhanced CT scans.5, 6 For segmentation of brain structures, atlas-based registration approaches are typically utilized (e.g., van der Lijn et al.7). Xia et al.8 presented a method which utilizes information from PET and CT images to segment brain structures. An approach for aortic arch segmentation in CT scans without contrast enhancement was presented by Feuerstein et al.9 Note that all these methods were developed for imaging modalities or protocols that are different from the ones utilized in this work. For example, the resolution of full-body PET/CT images is typically lower than a dedicated brain PET/CT scan.

In this paper, we introduce and validate a fully automated approach for uptake measurement of cerebellum, liver, and aortic arch in full-body FDG PET/CT scans to enable the reproducible calculation of tracer uptake in these structures. All acquisitions in this study used an imaging protocol specifically developed for head and neck cancer, but our proposed methodology can be easily adapted to other acquisition protocols. The presented method represents a powerful enabling tool for tumor-to-reference tissue ratio measurement in single- and multicenter studies that could be adopted for clinical application. These methods will be particularly useful in clinical and clinical research applications where physiologic variability associated with direct SUV measurements is unacceptably high. It is also possible that the tumor-to-reference tissue ratio measurements enabled by this work may prove to be a generally more robust response-to-therapy assessment approach than the more common SUV method in both clinical and clinical research applications.

METHODS

Methods for automated segmentations of the cerebellum, liver, and aortic arch were developed to automatically quantify FDG uptake in these whole-body FDG PET/CT reference regions. In the case of the cerebellum, a complete 3D segmentation of the structure is generated, whereas a sphere and tube are identified inside the liver and aortic arch, respectively, to deal with uncertainties due to respiratory motion and partial volume effects.

All three reference regions are automatically identified in subparts of the image relative to the brain, which increases robustness of the segmentation and reduces computation time. Note that the approach was specifically tailored to the imaging data described in Sec. 3A, which generally showed the whole cerebellum, but in the majority of cases, the brain was only partially imaged. To segment the brain, each PET scan is thresholded at 2.5 SUVbw, and morphological closing and opening with a spherical structuring element with the radius of 5 mm is performed to avoid holes inside the brain or leakage in case of adjacent tumors. In the resulting binary image, the brain is identified as the connected component with a volume larger than 1000 mm3 closest to the top of the scan. Relative to the brain segmentation, ROIs (see Fig. 1) are defined as reference for the different reference region identification methods. ROIs for the cerebellar hemisphere center locations and the aortic arch are aligned with the lower left and right outer bounds of the brain. The ROI for the liver spans the whole lower right area of the image with a minimum distance from the brain center location. There are no restrictions on the ROIs in the sagittal direction.

Figure 1.

Figure 1

Segmented brain (yellow) and regions of interest for identification of (a) liver center (green), (b) cerebellum hemisphere centers (red), and (c) aortic arch (blue). Distance measures for the ROIs are in centimeters. All PET images are shown with inverted gray-scales.

In this context, segmentations of cerebellum and liver are directly performed in the SUVbw normalized PET scans, while the segmentation of the aortic arch is performed in the CT volume of the PET/CT to increase robustness. The above described ROI generation was designed for and may need adaptation when applied to pediatric PET/CT image data.

Cerebellum reference region segmentation

Our method for cerebellum segmentation is based on an active shape model (ASM) (Ref. 10) segmentation approach. First, in a training stage, a model of the cerebellum is generated from learning data. Second, the model is automatically initialized at the approximate location of the cerebellum. Finally, segmentation is performed by means of a robust model matching approach.

Cerebellum model generation

To model the shape variability of cerebella as well as the gray-value appearance around its surface along normal profiles, learning data are required. For this purpose, nine normal cerebella were segmented in training PET scans and utilized for model generation, as described below.

Statistical shape model.

From the binary segmentations, triangle mesh representations were generated with a marching cubes algorithm.11 A set of 642 corresponding vertices (landmark points) were automatically identified on all meshes by means of a minimum description length approach12 based on shape index and curvedness.13 The selected number of landmarks was found to be a good trade-off between processing speed and representation of shape details. Using the landmark correspondences, the learning shape models were aligned in a common coordinate frame using Procrustes analysis.14 Based on the mean point locations and a principal component analysis of the covariance matrix, a point distribution model10 was derived, which allows describing a cerebellum shape in terms of a mean shape and variation about the mean in a linear model: x=x¯+Pb, where x¯ denotes the mean shape, P the shape eigenvector matrix, and b the shape coefficients. Figure 2 depicts the mean shape and the variation associated with the first eigenvector, which describes the most dominant shape variation of cerebella.

Figure 2.

Figure 2

Visualization of the shape variation of the cerebellum learned from training datasets. The first mode of variation (σ = −2 to σ = +2) is shown with the mean shape (σ = 0) located in the center.

Gray-value surface profile model.

The gray-value patterns around different cerebellum surface locations are quite characteristic (Fig. 3). This information is incorporated in our model by means of gray-value profiles as follows. First, landmark points of our statistical shape model are transformed to their related PET training dataset. Second, for each landmark point, a surface profile model pi reflecting the typical SUV change along the surfaces normal direction is generated. For this purpose, PET values along landmark surface normals are sampled in a range of ±1 cm from the surface points with a sampling interval of 0.5 mm. For all profiles, the first derivatives are computed and averaged for each landmark, which results in pi.

Figure 3.

Figure 3

Axial (a), coronal (b), and sagittal (c) image slices of an FDG PET scan showing the brain with the segmented cerebellum. All PET images are shown with inverted gray-scales.

Automated model initialization

Before the model can be matched to PET volume data, it needs to be initialized and placed in rough proximity of the target structure. We utilize the mean cerebellum shape parameters (b = 0) and assume no rotation and scaling. The approximate location of the cerebellum is additionally calculated in an automated fashion. For this purpose, the bounding box of the brain segmentation serves as a reference to specify subvolumes containing the right/left hemispheres of the cerebellum (see Fig. 1). In these subregions, the cerebellum hemisphere centers are identified based on the fact that metabolism is lower at the centers compared to the surrounding tissue. Thus, all gradient vectors point away from this location. To identify this location, the corresponding Gaussian gradient vector field is computed at a scale of σ = 3 mm, individual vectors normalized to unit-length, and the average outward flux15 (AOF) is determined. The AOF measures the divergence of a vector field at a certain location. The centers of the cerebellum hemispheres are identified as the locations with the maximum AOF in the search regions and a minimum distance of 3 mm to the surface of the brain segmentation. The average of the left and right cerebellum hemisphere centers is used as the initial location for our model.

Active shape model matching

Matching of an active shape model to the target structure in the PET volume is an iterative process. In each iteration, two processing steps are performed. First, by utilizing the present model shape parameters b and transformation parameters T (translation, rotation, and scaling), shape points are updated by utilizing the gray-value profile models (Sec. 2A1). For each landmark point, the gray-value information along a line normal to the model surface is extracted from the image and the first derivative is computed. The update point along this line is found by matching against the template pi of the corresponding landmark i. For the matching function, the cross correlation is used. The utilized range to search for update points was set to ±10 mm, and gray-values in the PET scan <1 SUV were set to 1 to avoid mismatches at the transitions between head and air. Second, the update shape points are utilized to recalculate model (b) and transformation (T) parameters. Usually, this is achieved by utilizing Procrustes alignment and least squares optimization.10 However, this approach does not perform robustly in the case some of the identified update points are wrong (outliers). In addition, we have to potentially deal with truncated PET data, where parts of the brain (or even of the cerebellum) were not imaged and with nontypical adjacent image structures (e.g., tumors). To avoid such issues, we utilize the robust ASM matching approach that was recently presented by Sun et al.16 The basic idea behind this approach is to identify outliers and to exclude them from the model parameter recalculation. For segmentation, we perform 100 model matching iterations, which were found to be sufficient in combination with the proposed model initialization to achieve convergence of the model.

Liver reference region segmentation

The liver is located in the lower right part of the image relative to the brain (see VOI in Fig. 1). In this area, the liver can be identified as the largest organ above a certain uptake value, because it is mostly surrounded by large structures with low uptake (lungs, muscles, fat tissue, etc.) and smaller organs with similar or higher uptake (heart, kidneys, etc.). The processing steps of our 3D liver reference region identification method are illustrated in Fig. 4 on an axial slice of the 3D PET scan. After thresholding the PET scan with a threshold of 1.0 SUVbw in the liver ROI [Fig. 4b], holes in the binary image are closed using morphological operations [Fig. 4c], and a distance transformation17 is calculated [Fig. 4d]. The distance transformation assigns the distance value to the closest surface point to each location of the binary segmentation. The maximal hyperellipsoid inside the liver is determined in two steps. First, the center is found by finding the maximum point location in this distance transformed image [Fig. 4d]. Second, the axes lengths of the hyperellipsoid in x-, y-, and z-direction are calculated. All axes lengths are reduced by 1 cm to avoid the influence of partial volume effects and respiratory motion close to the boundary of the liver [Fig. 4e]. In this context note that a direct utilization of the thresholding result [Fig. 4b or 4c] for uptake measurement is not suitable, because frequently kidneys or other structures would be included in the segmentation. In comparison, the proposed method avoids such problems, since only small portions of the liver surface are typically affected by adjoined structures.

Figure 4.

Figure 4

Processing steps of the 3D liver reference region identification approach depicted on a single axial PET image. (a) PET image slice showing the liver. (b) Thresholding. (c) Hole closing. (d) Distance transformation with maximum distance point marked. (e) 2D contour of the maximal hyperellipsoid with 1 cm distance to liver boundary. Note that all PET images are shown with inverted gray-scales.

Aortic arch reference region segmentation

Our approach for segmentation of a reference region in the aortic arch is based on a method for the extraction of tubular structures in CT data.18, 19 First, tubular structures are identified in the CT scan of PET/CT dataset. Second, the carina is identified which then serves as a reference point for the aorta identification similar to the ideas of Feuerstein et al.9 The individual processing steps are outlined in Fig. 5 and will be described in detail in the following paragraphs.

Figure 5.

Figure 5

Processing steps for the aortic arch reference region segmentation. (a) Volume rendering of the subregion in the CT dataset. (b) Identified tubular structures with trachea and main bronchi candidates (blue) and aortic arch candidates (red). (c) Identified trachea and main bronchi with carina and detected aortic arch. (d) Sagittal image slice with identified aortic arch.

Extraction of tubular structures

Inside the VOI of the aortic arch (Fig. 1) of the CT image [Fig. 5a], edge information is obtained by utilizing a Gaussian gradient magnitude filter at scale σ = 1 mm. Subsequently, a tube detection filter (TDF) (Refs. 18, 19) is applied. This filter generates a tube-likeliness measure together with a radius estimate at the centers of elongated objects with approximately circular cross sections. To avoid artifacts from calcifications in the aorta, the gray-value range in the CT dataset is truncated to a maximum value of 200 HU. After applying the TDF, centerline-based representations are extracted for tubular structures by utilizing a height ridge traversal with hysteresis thresholding (upper threshold of 0.5 and a lower threshold of 0.1).18 The resulting extracted tubular structures are depicted in Fig. 5b.

Aortic arch detection

From the detected tubular structures, candidate centerlines for the trachea, main bronchi, and the aorta are selected. Candidate centerlines for the trachea and main bronchi must have a maximum average radiodensity of −800 HU and a minimum average radius of 2 mm [blue centerlines in Fig. 5b]. The candidates for the aorta must have a minimum average radiodensity of 100 HU and a minimum average radius of 7 mm [red centerlines in Fig. 5b]. Among the trachea and main bronchi candidate centerlines, the trachea is identified as the tubular structure with the largest extent in z direction, ending at a bifurcation. This bifurcation point provides the location of the carina which we use as a landmark for the identification of the aortic arch. The spatial relation between aortic arch and carina is biologically very stable. The ascending aorta passes through an axial plane anterior to the carina and the descending aorta passes through an axial plane posterior to the carina, with the aortic arch in between connecting these parts. Thus, a predefined part of the aortic arch is selected as the centerline part passing through two planes with a predefined location relative to the carina. An example of the identified trachea/main bronchi and the resulting aortic arch are shown in Fig. 5c. Figure 5d depicts the aorta with the estimated radius together with a sagittal image slice of the CT dataset.

In few datasets, the above described rule-based approach for aortic arch detection is not able to identify a tubular object following all those rules (e.g., due to too low image contrast or resolution). In these cases, no segmentation is performed and an error reported instead.

VALIDATION METHODOLOGY

Image data

One hundred and thirty four PET/CT scans from 49 subjects with head and neck cancer were acquired between one and six times on either a Siemens Biograph Duo or Siemens Biograph 40 PET/CT scanner (Siemens Healthcare, Hoffman Estates, IL) between 2003 and 2010. All subjects were injected with 370 MBq ± 10% of [F-18] FDG with an uptake time of 90 min ± 10%. In all cases, subjects were fasted for >4 h and had blood glucose <200 mg/dl. Patients were imaged with arms down, because of the interest in the head and neck region in this particular image dataset. The CT-based attenuation correction works equally well with arms up/arms down orientations, thus arm positioning should neither impact nor bias the methodology. All reconstructions were performed with 2D OSEM iterative algorithms. Reconstructions on the Biograph 40 were performed with 4 iterations, 8 subsets, and a 7 mm Gaussian filter and on the Biograph Duo with 2 iterations, 8 subsets, and a 5 mm Gaussian filter. Biograph Duo PET images were reconstructed onto a 128 × 128 pixel image matrix (3.5 mm × 3.5 mm × 3.4 mm), while Biograph 40 PET images were reconstructed onto a 168 × 168 pixel image matrix (3.4 mm × 3.4 mm × 2.0 mm). The reconstructed CT images from the Emotion Duo/Siemens Sensation 40 CT units associated with the Biograph Duo/Biograph 40 were reconstructed onto 512 × 512 pixel CT image matrices (0.98 mm × 0.98 mm × 3.4/3.0 mm).

Independent reference standard and statistical analysis

Manual SUV measurement

To allow for a quantitative evaluation of the uptake measurement methods, human reviewers traced reference regions for cerebellum, liver, and aortic arch in axial, coronal, and sagittal slices of the datasets. For tracing, the human reviewers used a drawing tool, which allows users to specify polygonal lines to indicate the outline of the measurement region. The reviewers were able to navigate through the three-dimensional PET and CT datasets in all directions and to see the contours they drew in the other image slices. The default window/level used for visualization of the PET scan was 0–10 SUV for the cerebellum and 0–6 SUV for liver and aortic arch, but the users were able to adjust window/level as desired. The default window/level used for visualization of the CT scan was −100–900 HU.

In case of the cerebellum where our method obtains full 3D segmentations, the human reviewers were instructed to trace the exact outline of the cerebellum in four predefined slices per dataset: one axial, one coronal, and two sagittal slices (one in each hemisphere of the cerebellum). These slices were selected randomly inside the bounding box of the segmented cerebellum with a margin of 1 cm to the boundary to avoid problems due to partial volume effects. In case of the aortic arch and the liver, the human reviewers traced a reference region in one axial, one coronal, and one sagittal slice they selected for each of the measurement regions. The human reviewers were instructed to choose appropriate slices showing larger areas of the measurement region of interest and to trace the regions in such a way that boundary artifacts are avoided.

Two expert reviewers manually traced reference regions for the cerebellum, liver, and aortic arch in all 134 datasets. In the case of the cerebellum, two additional reviewers traced 44 and 20 datasets, respectively. The fewer cerebellum regions traced by the latter two reviewers were a random selection from the larger set and representative of those traced by the other reviewers. Statistical analysis methods were employed to accommodate differences in the numbers of regions traced.

On average, the segmentation of all three structures took 16.9 min (median: 10.35 min) per PET/CT scan. Average SUVs were computed for each manually segmented structure (i.e., over all traced slices) and utilized for comparison with averages calculated on automatically segmented volumetric regions. Note that for cerebellum segmentations, the average SUV was calculated over the four predefined slices (four-slice) described above, in addition to the average over the whole 3D structure (volume).

Once two or more manual SUV measurements were established by the reviewers for all three structures and datasets, two-stage regression models were used to compare results from the human reviewers to those from the automated methods, as described in Secs. 3B2, 3B3, 3B4 below.

Consensus-true SUV model

Reviewer-based measurements were used to estimate (1) a consensus-true SUV for each image, (2) systematic differences between reviewers, and (3) within- and between-reviewer variability. Letting yij denote the measured value from image i and reviewer j, the first stage of the regression model can be expressed mathematically as

yij=τi+γj+ε'ijτi~N(μ,σ2)γj~N(0,σreviewer2)ε'ij~N(0,σmanual2),

where τi are (normally distributed) image-specific true SUVs; μ and σ2 are the SUV mean and variance in the patient population; γj are reviewer-specific mean effects; σreviewer2 is the between-reviewer variance; and the ε'ij are within-reviewer errors. Variability in manual SUV measurements is due to both between- and within-reviewer variability. Thus, the sum of the two associated variances (σreviewer2+σmanual2) represents the total variability for the reviewer-based method.

Automated SUV model

In the second stage, agreement between automated SUV measurements and consensus-true SUV was assessed, and variability due to the automated method was estimated. Both were accomplished with the regression model

zi=β0+β1τi+ε"iε"i~N(0,σauto2),

where zi is the automated measurement from image i; β0 and β1 are intercept (shift) and slope (scale) parameters that define a linear relationship between automated and true SUV; and the ɛ"ij are residual errors associated with the automated calculation of SUV. Since automated results are deterministic, the residual error variance (σauto2) represents the total variability for the method. The relative total variances in the automated and manual procedures can thus be estimated as

λ=(1-σauto2σreviewer2+σauto2)×100%

and are interpreted as the percent by which the total variability in measured SUVs would be decreased if the automated method was used instead of the manual method.

Bayesian analysis

In the analysis, the stage-one and stage-two models were jointly specified as a Bayesian hierarchical model, and their parameters were estimated simultaneously. Vague prior distributions of N(0,1000) for mean parameters and Gamma (0.001,0.001) for variance parameters were specified and confirmed, via sensitivity analysis, to have negligible impact on posterior inference. OpenBUGS (Ref. 20) was used to generate 9000 Markov chain Monte Carlo simulated samples from the posterior distributions of model parameters. Three parallel chains of the sampler were generated by varying sampler starting values and were checked for convergence with the diagnostic of Gelman and Rubin.21 For inference, posterior means are reported along with 95% credible intervals (CrI) computed as the highest posterior density intervals.22 Credible intervals may be interpreted as containing associate model parameters with 95% probability.

RESULTS

Examples of automated segmentation results are depicted in Fig. 6. The segmentation of all three reference regions in a PET/CT scan took about 7 min, on average. The automated aortic arch segmentation failed in ten cases and delivered wrong results (outliers) in three cases of patients with breathing tubes, which were excluded from further statistical analysis. Table TABLE I. shows the average measured FDG uptake and the measurement volume in the three reference regions over all datasets. Figures 78910 show the measured SUVs plotted against the consensus-true SUVs estimated from the statistical analysis. In the case of cerebellum, Figs. 78 depict the plot for average SUVs calculated over all four slices, which were also traced manually by reviewers (Sec. 3B), and the whole volume, respectively. Different plotting symbols are used to identify measurements by reviewer and method (manual or automated). The solid lines represent the estimated relationships between automated measurements and the consensus truth, and the dashed lines the relationships that would exist between the two if there were no shift (intercept β0 = 0) or scale (slope β1 = 1) differences. Noteworthy, features of the plots include the relationships between the two lines and the amount of scatter about each. Differences between the lines would be indicative of systematic differences between manual and automated measurements. The relative scattering of each type of measurement about its line can help inform about the relative amounts of variability in the methods. More scattering in the manual measurements, than in the automated measurements, would indicate greater variability, and vice versa. The estimated percent changes in the slopes and total variability for the automatic methods, relative to manual method, are presented in Tables TABLE II. and TABLE III. and correspond directly to the data presented in the figures.

Figure 6.

Figure 6

Results of automatically identified reference measurement regions. Coronal (top row) and sagittal (bottom row) maximum intensity projections of PET scans with overlaid outlines of projected cerebellum, aortic arch, and liver regions. All PET scans are depicted as inverted maximum intensity projections using a gray-value range of 0–6 SUV. Note that in the majority of cases, only parts of the brain are imaged, which does not affect the ability of our method to locate all three structures. In the example shown in (d), even parts of the cerebellum are missing.

TABLE I.

Average FDG uptake and measurement volume with proposed methods.

Site FDG uptake (SUV) Measurement volume (ml)
Cerebellum 4.96 ± 1.01 160.27 ± 21.90
Aorta 1.70 ± 0.26 4.17 ± 1.02
Liver 2.01 ± 0.31 314.07 ± 123.40

Figure 7.

Figure 7

Scatterplot of measured vs reviewer consensus-true SUV in the four measurement slices for the cerebellum.

Figure 8.

Figure 8

Scatterplot of measured vs reviewer consensus-true SUV for the cerebellum.

Figure 9.

Figure 9

Scatterplot of measured vs reviewer consensus-true SUV for the aorta.

Figure 10.

Figure 10

Scatterplot of measured vs reviewer consensus-true SUV for the liver.

TABLE II.

Estimated intercept and slope for the linear relationship between automated SUV measurements and reviewer census-true SUV. Estimates are posterior means with 95% credible intervals given in parentheses.

Site Auto Intercept β0 (SUV) Slope β1 (−)
Cerebellum 4-Slice −0.03 (−0.34, 0.20) 0.99 (0.98, 1.01)
  Volume 0.04 (−0.24, 0.28) 0.95 (0.93, 0.96)
Aorta Volume −0.06 (−0.29, 0.14) 1.03 (0.97, 1.09)
Liver Volume 0.02 (−0.23, 0.15) 0.97 (0.94, 1.00)

TABLE III.

Estimated percent difference in total SUV measurement error variance (λ), as well as automated measurement error variance (σauto2), variability between reviewers (σreviewer2), and variability within reviewers (σmanual2). Estimates are posterior means with 95% credible intervals given in parentheses.

Site Auto λ (%) σauto2 (SUV2) σreviewer2 (SUV2) σmanual2 (SUV2)
Cerebellum 4-Slice 99.2 (98.0, 100) 0.0005 (0.0004, 0.0011) 0.0828 (0.0043, 0.2597) 0.0244 (0.0206, 0.0282)
  Volume 89.8 (80.6, 99.3) 0.0067 (0.0050, 0.0086) 0.0828 (0.0043, 0.2597) 0.0244 (0.0206, 0.0282)
Aorta Volume 76.7 (29.5, 100) 0.0051 (0.0036, 0.0068) 0.1721 (0.0013, 0.4536) 0.0025 (0.0019, 0.0032)
Liver Volume 54.7 (−9.1, 100) 0.0026 (0.0018, 0.0035) 0.0436 (0.0002, 0.1090) 0.0020 (0.0016, 0.0025)

In Figs. 7910, there do not appear to be substantial systematic differences between the methods. The solid and dashed lines are similar, and all associated 95% credible intervals include 0 for the slopes and 1 for the intercepts. The greatest systematic difference is seen in the comparison of manual SUV measurements to automated volumetric SUV measurements in the cerebellum, where automated measurements are attenuated at the high end (Fig. 8). Estimates indicate that the automated measurements slope is 0.95 (95% CrI: 0.93–0.96) and is significantly different from 1 (that of the dashed line).

With respect to variability, the automated method produces the least amount of scatter about the lines in the cerebellum analyses. This is confirmed by the estimates in Table TABLE III. showing total variability to be decreased by 99.2% (95% CrI: 98.0%–100%) and 89.8% (95% CrI: 80.6%–99.3%) in the two respective analyses. Variability is also decreased, although less so, in the aorta analysis by 76.7% (95% CrI: 29.5%–100%). In the liver analysis, there is an estimated mean decrease of 54.7%, but the decrease is not significantly different from 0 (95% CrI: −9.1%–100%). Finally, estimates of the variances due to the automated method, between-reviewer mean differences, and within-reviewer measurement differences are summarized in the last three columns of the table.

DISCUSSION

The developed reference region segmentation methods are fully automated and showed good agreement with the independent standard generated from manual segmentations. There was no significant systematic shift or scaling difference, with exception of the slope, which occurred when the average cerebellum SUV was calculated for the whole volume, instead of the four specified cross sections. This can be explained as follows. The cerebellum is an inhomogeneous structure with different normal FDG uptake in its substructures (Fig. 3). These substructures have a characteristic, nonuniform distribution of FDG uptake that is volume averaged in the whole 3D cerebellum region, but are not proportionally sampled by the four cross sections. Because the independent reference was traced in the same four cross sections, it appears as if the automatic volumetric measurements are minorly off in scale, even though the average SUV calculated on the volume is more representative of the whole cerebellum. The variability of measured FDG uptake is reported in Table TABLE I.. In comparison to the cerebellum, liver and aortic arch show less intrinsic variation (lower standard deviation) in the scans utilized by this study.

An important factor in measuring uptake of an organ or structure is the size of the utilized region. This is demonstrated by an experiment on one PET/CT scan reported in Figs. 1112. Eleven different regions (Fig. 11) were generated and utilized to measure/estimate uptake of the liver (Fig. 12). In this context note that regions A, E1, and E2 directly relate to the proposed fully automated method, manual tracings of reviewer 1, and manual tracings of reviewer 2, respectively. All the other regions were manually generated for comparison. In general, larger reference regions allow a more robust estimate of the true mean. However, in practice (e.g., clinical trials), small circular or elliptic regions (e.g., regions C1–C4 in Fig. 12) are commonly utilized, because they are less time-consuming to generate manually. Small regions may lead to differing estimates, depending on where the region was placed. In contrast, the proposed method for liver FDG uptake measurement is fully automated and utilizes a large volume (Table TABLE I.) to estimate the FDG uptake in the liver. Also, when large regions are utilized, partial volume effects at the object boundary need to be considered to avoid estimation errors (region L in Fig. 12).

Figure 11.

Figure 11

3D rendering of reference regions utilized for liver FDG PET uptake measurement comparison in Fig. 12. (PET) Coronal and sagittal images representing a volumetric PET scan in the liver region. (L) Volumetric liver segmentation. (L*) Same as L, but the volume was eroded by a 1 cm margin. (A) Proposed automated method. (E1) Combination of axial, coronal, and sagittal liver slices segmented by reviewer 1. (E2) Combination of axial, coronal, and sagittal liver slices segmented by reviewer 2. (E1ax) Single axial liver slice segmented by reviewer 1. (E2ax) Single axial liver slice segmented by reviewer 2. (C1) Cylindrical liver region over five slices in axial direction. (C2) Circular region located in the middle of C1. (C3) Circular region located at the bottom of C1. (C4) Circular region located at the top of C1.

Figure 12.

Figure 12

Example of SUV variation in dependence of the reference regions depicted in Fig. 11. Bars and lines represent the mean and standard deviation of SUVs in the segmented region, respectively.

All three segmentation methods are fully automated and generate reproducible results (i.e., zero intraobserver variability). Compared to the manual approach, the estimated average SUV measurement error variance was reduced between 54.7% and 95.2%. While this reduction was statistically significant for cerebellum and aortic arch, it was not found to be statistically significant in the case of the liver. In this context, it is interesting to note that the variability between reviewers (σreviewer2) and variability within reviewers (σmanual2) are the lowest for the liver. Reasons for both observations might be that large regions inside the liver can be quite easily defined and the low variability of SUVs in the liver.

The human reviewers had detailed instruction on how to generate the manual segmentations and were required to adhere to certain constraints (e.g., preselected cross sections to outline the cerebellum). Thus, it seems very likely that inter- and intraobserver variability will be even higher without such instructions and constraints.

Manual segmentation of all three reference regions took on average 17 min of reviewer tracing time per dataset. In contrast, the proposed fully automated approach completes on average in less than 7 min and can be further reduced by means of code optimization and parallel computing, if needed.

The proposed method for segmentation of the aortic arch failed to produce correct results in 3 cases of 134 (2%). An investigation showed that in all these cases, patients had breathing tubes, which lead to an incorrect identification of the bifurcation of the trachea into main bronchi that is required by the algorithm to correctly locate the aortic arch. We plan to address this issue by incorporating a whole-body segmentation method into the algorithm, which will allow identifying the neck region and, therefore, excluding this area as a possible location for airway bifurcations. Also, the algorithm did not succeed in detecting the aortic arch in ten cases and terminated due to low contrast of aorta to surrounding tissue and low image resolution of the CT data of the PET/CT scan.

A potential application of the proposed methods is the normalization of tumor uptake with respect to a reference region in the context of treatment response assessment. This is an important step for practical application of quantitative imaging for response assessment in oncology clinical trials. In the presented study, we have demonstrated the feasibility of automated and accurate uptake measurement in cerebellum, liver, and aortic arch, all of which are potentially promising candidates. However, at this stage it is not clear which of the three regions, if at all, is best suited for this task. The pairwise correlation analysis shown in Fig. 13 suggests that measured average SUVs for aortic arch and liver correlate more (r = 0.7) than cerebellum and aortic arch (r = 0.52) or cerebellum and liver (r = 0.5). Thus, future research will focus on investigating this issue.

Figure 13.

Figure 13

Pairwise correlation between automated volumetric SUV measurements.

In the current implementation, fixed (generous) SUVbw thresholds for liver and brain (Sec. 2) segmentation are utilized, because the approach is geared toward automated reference region segmentation for (tumor) uptake normalization under the assumption of limited image acquisition and SUVbw normalization errors. Additional processing will be required to handle major errors in SUVbw calculation or in quality control applications, where PET image values can be off by a large margin. For example, a histogram analysis could be utilized to adapt all threshold values utilized by our method to a particular scan. Also, plausibility checks could be added to automatically identify completely failed PET/CT image acquisitions.

The presented approach was developed for a head and neck cancer specific imaging protocol where patients were imaged with arms down. We have also successfully utilized the presented algorithm to segment reference regions in PET/CT scans where patients had their arms up during scanning (Fig. 14), even though the algorithm was not developed for such an imaging protocol. Also, many existing clinical oncology related PET/CT imaging protocols image the cerebellum only partially, if at all. The selected robust ASM matching approach (Sec. 2A3) is well suited to segment partially imaged target objects, as demonstrated by Sun et al.16 and the example is depicted in Fig. 6d. In addition, the developed method could be adapted to PET imaging studies of the brain where the cerebellum is often used as a reference region. In the case the brain/cerebellum is not imaged at all, the ROI generation described in Sec. 2 can be easily adapted accordingly, because the ROIs are quite large and only provide rough information regarding the location of an organ/structure of interest.

Figure 14.

Figure 14

Examples of segmentation results produced with our algorithm on PET/CT scan with an imaging protocol requiring arms up.

CONCLUSION

We have presented a fully automated approach for FDG uptake measurement of cerebellum, liver, and aortic arch in full-body FDG PET/CT scans. Average SUVs were derived from volumetric regions inside all three structures, which were identified by means of automatic segmentation methods. The validation of our approach on 134 FDG PET/CT scans of head and neck cancer patients showed good agreement with the manually generated independent standard. In addition, the proposed method was found to have lower estimated variability compared to manual segmentation. Our fully automated approach requires no user interaction, completes on average in less than 7 min of unattended computing time, and offers reproducible definition of reference regions for PET image normalization. In contrast, manual segmentation of all three reference regions took approximately 17 min of reviewer tracing time per dataset and had greater variability. In the near future, we plan to evaluate several lesion-to-background ratios based on these reference regions to see if they provide a meaningful alternative for uptake normalization in the context of assessing response to therapy in conjunction with an established clinical outcomes database.

ACKNOWLEDGMENTS

This work was supported in part by NIH/NCI Grant No. U01CA140206, NIH/NIBIB Grant No. R01EB004640, and NIH/NHLBI Grant No. R01HL111453.

References

  1. Graham M. M., Peterson L. M., and Hayward R. M., “Comparison of simplified quantitative analyses of FDG uptake,” Nucl. Med. Biol. 27(7), 647–655 (2000). 10.1016/S0969-8051(00)00143-8 [DOI] [PubMed] [Google Scholar]
  2. Obrzut S., Pham R. H., Vera D. R., and Hoha C. K., “Comparison of lesion-to-cerebellum uptake ratios and standardized uptake values in the evaluation of lung nodules with 18f-FDG PET,” Nucl. Med. Commun. 28(1), 7–13 (2007). 10.1097/MNM.0b013e328013dce7 [DOI] [PubMed] [Google Scholar]
  3. Blake M. A., Slattery J. M., Halpern E. F., Fischman A. J., Mueller P. R., and Boland G. W., “Adrenal lesions: Characterization with fused pet/ct image in patients with proved or suspected malignancy-initial experience,” Radiology 238(3), 970–977 (2006). 10.1148/radiol.2383042164 [DOI] [PubMed] [Google Scholar]
  4. Li C., Wang X., Xia Y., Eberl S., Yin Y., and Feng D. D., “Automated PET-guided liver segmentation from low-contrast CT volumes using probabilistic atlas,” Comput. Methods Programs Biomed. 3565–3568 (2011). [DOI] [PubMed] [Google Scholar]
  5. Heimann T., van Ginneken B., Styner M., Arzhaeva Y., Aurich V., Bauer C., Beck A., Becker C., Beichel R., Bekes G., Bello F., Binnig G., Bischof H., Bornik A., Cashman P., Chi Y., Cordova A., Dawant B., Fidrich M., Furst J., Furukawa D., Grenacher L., Hornegger J., Kainmuller D., Kitney R., Kobatake H., Lamecker H., Lange T., Lee J., Lennon B., Li R., Li S., Meinzer H. P., Nemeth G., Raicu D., Rau A. M., van Rikxoort E., Rousson M., Rusko L., Saddi K., Schmidt G., Seghers D., Shimizu A., Slagmolen P., Sorantin E., Soza G., Susomboon R., Waite J., Wimmer A., and Wolf I., “Comparison and evaluation of methods for liver segmentation from CT datasets,” IEEE Trans. Med. Imaging 28(8), 1251–1265 (2009). 10.1109/TMI.2009.2013851 [DOI] [PubMed] [Google Scholar]
  6. Ling H., Zhou S., Zheng Y., Georgescu B., Suehling M., and Comaniciu D., “Hierarchical, learning-based automatic liver segmentation,” in Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Anchorage, Alaska, 2008), pp. 1–8. [Google Scholar]
  7. van der Lijn F., de Bruijne M., Klein S., den Heijer T., Hoogendam Y., van der Lugt A., Breteler M., and Niessen W., “Automated brain structure segmentation based on atlas registration and appearance models,” IEEE Trans. Med. Imaging 31, 276–286 (2011). 10.1109/TMI.2011.2168420 [DOI] [PubMed] [Google Scholar]
  8. Xia Y., Eberl S., Fulham M., and Feng D. D., “Dual-modality brain PET-CT image segmentation based on adaptive use of functional and anatomical information,” Comput. Med. Imaging Graph. 36(1), 47–53 (2012). 10.1016/j.compmedimag.2011.06.004 [DOI] [PubMed] [Google Scholar]
  9. Feuerstein M., Kitasaka T., and Mori K., “Automated anatomical likelihood driven extraction and branching detection of aortic arch in 3-D chect CT,” in International Workshop on Pulmonary Image Analysis. Medical Image Computing and Computer Assisted Intervention (London, UK, 2009), pp. 49–60. [Google Scholar]
  10. Cootes T. F., Cooper D., Taylor C. J., and Graham J., “Active shape models—Their training and application,” Comput. Vis. Image Underst. 61(1), 38–59 (1995). 10.1006/cviu.1995.1004 [DOI] [Google Scholar]
  11. Lorensen W. E. and Cline H. E., “Marching cubes: A high resolution 3D surface construction algorithm,” ACM SIGGRAPH Comput. Graph. 21(4), 163–169 (1987). 10.1145/37402.37422 [DOI] [Google Scholar]
  12. Heimann T., Wolf I., Williams T., and Meinzer H. P., “3D active shape models using gradient descent optimization of description length,” in Proceedings of IP MI, Springer, Heidelberg (Springer, Glenwood Springs, CO, 2005), Vol. 3565, pp. 566–577. [DOI] [PubMed]
  13. Koenderink J. J. and van Doorn A. J., “Surface shape and curvature scales,” Image Vis. Comput. 10, 557–565 (1992). 10.1016/0262-8856(92)90076-F [DOI] [Google Scholar]
  14. Statistical Shape Analysis, edited by Dryden I. L. and Mardia K. V. (Wiley, New York, 1998). [Google Scholar]
  15. Bouix S., Siddiqi K., and Tannenbaum A., “Flux driven automatic centerline extraction,” Med. Image Anal. 9(3), 209–221 (2005). 10.1016/j.media.2004.06.026 [DOI] [PubMed] [Google Scholar]
  16. Sun S., Bauer C., and Beichel R., “Automated 3-D segmentation of lungs with lung cancer in CT data using a novel robust active shape model approach,” IEEE Trans. Med. Imaging 31(2), 449–460 (2012). 10.1109/TMI.2011.2171357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Maurer C. R., Qi R., and Raghavan V., “A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions,” IEEE Trans. Pattern Anal. Mach. Intell. 25(2), 265–270 (2003). 10.1109/TPAMI.2003.1177156 [DOI] [Google Scholar]
  18. Bauer C. and Bischof H., “Extracting curve skeletons from gray value images for virtual endoscopy,” in Workshop on Medical Imaging and Augmented Reality (Springer, Tokio, Japan, 2008), pp. 393–402. [Google Scholar]
  19. Bauer C., Bischof H., and Beichel R., “Segmentation of airways based on gradient vector flow,” in International Workshop on Pulmonary Image Analysis. Medical Image Computing and Computer Assisted Intervention (London, UK, 2009), pp. 191–201. [Google Scholar]
  20. Lunn D. J., Thomas A., Best N., and Spiegelhalter D., “WinBUGS—A Bayesian modelling frame work: Concepts, structure, and extensibility,” Stat. Comput. 10, 325–337 (2000). 10.1023/A:1008929526011 [DOI] [Google Scholar]
  21. Gelman A. and Rubin D. B., “Inference from iterative simulation using multiple sequences,” Stat. Sci. 7, 457–511 (1992). 10.1214/ss/1177011136 [DOI] [Google Scholar]
  22. Chen M. H. and Shao Q. M., “Monte Carlo estimation of Bayesian credible and HPD intervals,” J. Comput. Graph. Stat. 8(1), 69–92 (1999). [Google Scholar]

Articles from Medical Physics are provided here courtesy of American Association of Physicists in Medicine

RESOURCES