Skip to main content
Medical Physics logoLink to Medical Physics
. 2010 Nov 23;37(12):6390–6401. doi: 10.1118/1.3515751

Automatic anatomy recognition via multiobject oriented active shape models

Xinjian Chen 1, Jayaram K Udupa 2,a), Abass Alavi 3, Drew A Torigian 3
PMCID: PMC3003721  PMID: 21302796

Abstract

Purpose: This paper studies the feasibility of developing an automatic anatomy recognition (AAR) system in clinical radiology and demonstrates its operation on clinical 2D images.

Methods: The anatomy recognition method described here consists of two main components: (a) multiobject generalization of OASM and (b) object recognition strategies. The OASM algorithm is generalized to multiple objects by including a model for each object and assigning a cost structure specific to each object in the spirit of live wire. The delineation of multiobject boundaries is done in MOASM via a three level dynamic programming algorithm, wherein the first level is at pixel level which aims to find optimal oriented boundary segments between successive landmarks, the second level is at landmark level which aims to find optimal location for the landmarks, and the third level is at the object level which aims to find optimal arrangement of object boundaries over all objects. The object recognition strategy attempts to find that pose vector (consisting of translation, rotation, and scale component) for the multiobject model that yields the smallest total boundary cost for all objects. The delineation and recognition accuracies were evaluated separately utilizing routine clinical chest CT, abdominal CT, and foot MRI data sets. The delineation accuracy was evaluated in terms of true and false positive volume fractions (TPVF and FPVF). The recognition accuracy was assessed (1) in terms of the size of the space of the pose vectors for the model assembly that yielded high delineation accuracy, (2) as a function of the number of objects and objects’ distribution and size in the model, (3) in terms of the interdependence between delineation and recognition, and (4) in terms of the closeness of the optimum recognition result to the global optimum.

Results: When multiple objects are included in the model, the delineation accuracy in terms of TPVF can be improved to 97%–98% with a low FPVF of 0.1%–0.2%. Typically, a recognition accuracy of ≥90% yielded a TPVF ≥95% and FPVF ≤0.5%. Over the three data sets and over all tested objects, in 97% of the cases, the optimal solutions found by the proposed method constituted the true global optimum.

Conclusions: The experimental results showed the feasibility and efficacy of the proposed automatic anatomy recognition system. Increasing the number of objects in the model can significantly improve both recognition and delineation accuracy. More spread out arrangement of objects in the model can lead to improved recognition and delineation accuracy. Including larger objects in the model also improved recognition and delineation. The proposed method almost always finds globally optimum solutions.

Keywords: object recognition, image segmentation, active shape models, live wire

INTRODUCTION

With medical imaging becoming increasingly functional and quantitative, clinical radiology will likely lay increasingly higher emphasis on quantification in routine practice. To facilitate quantitative radiology, computerized recognition, labeling, and delineation of anatomic organs, tissue regions, and suborgans, and guided by these, the delineation and quantification of abnormalities will become important in clinical radiology. Once organs have been recognized and delineated, an important component of this task will also be to automatically report certain fundamental morphological, physiological, architectural, and functional information, pertaining to the organs and tissues, derived from single∕multiple modality images. For the purpose of this paper, such an assistive process of recognizing, delineating, and quantifying organs and tissue regions, occurring automatically during clinical image interpretation in the Radiology reading room, will be called automatic anatomy recognition (AAR). As a first step toward this larger goal, the purpose of this paper is to demonstrate in 2D images the feasibility of developing such a system in clinical radiology.

AAR is the process of identifying and delineating objects in medical images, in other words, the complete process of image segmentation. The whole segmentation operation can be thought of as consisting of two related processes: Recognition and delineation. Recognition is the high-level process of determining roughly the whereabouts of an object of interest and distinguishing it from other objects in the image. Delineation is the low-level process of determining the precise spatial extent of the object in the image. The efficient incorporation of high-level recognition help together with accurate low-level delineation has remained a challenge in image segmentation.

Historically, purely image-based segmentation methods, such as those employing thresholding, region growing, clustering, active contours, level sets, live wire, watershed, fuzzy connectedness, graph cut, and Markov random fields predate most methods employing object population shape and appearance prior models such as atlases, statistical shape and appearance models, and fuzzy geographic models. The relative merits of and the synergy that exists between these two approaches—purely image-based and model-based strategies—is clearly emerging in the segmentation field. As such hybrid methods that form a combination of the two approaches are emerging as powerful segmentation tools1, 2, 3, 4, 5, 6, 7, 8, 9, 10 where their superior performance and robustness over each of the component methods has been well demonstrated. Many model-based11, 12, 13, 14, 15, 16, 17, 18 as well as some purely image-based techniques,19, 20, 21, 22, 23, 24, 25, 26 specially tailored for specific body regions and image modalities, have been developed. However, in the context of AAR, it is better to have a general approach that is applicable to any (or most) body regions, image modalities, and protocols, and that do not depend heavily on the characteristics of fixed shape families and image modalities. While perhaps some of the above techniques can be generalized in this spirit of AAR, the same methods demonstrated to work in this general setting are few.

In this paper, we propose a general automatic anatomy recognition method which can be applied to different body regions and different image protocols. We propose an AAR approach by extending the 2D OASM method6 to multiobject OASM (MOASM) and demonstrate the generality of the method on three different body regions and two image protocols. The AAR methodology described here consists of two main novel components: (a) multiobject generalization of OASM and (b) object recognition strategies. The OASM algorithm is generalized to multiple objects by including a model for each object and assigning a cost structure specific to each object in the spirit of live wire.27 The delineation of multiobject boundaries is done in MOASM via a three level dynamic programming algorithm, wherein the first level is at pixel level which aims to find optimal oriented boundary segments between successive landmarks, the second level is at landmark level which aims to find optimal location for the landmarks, and the third level is at the object level which aims to find optimal arrangement of object boundaries over all objects. The object recognition strategy attempts to find that pose vector (consisting of translation, rotation, and scale component) for the multiobject model that yields the smallest total boundary cost over all objects. The proposed method was evaluated on 2D images drawn from three routine clinical data sets from three different body regions: Chest CT, abdominal CT, and foot MRI data sets. An overall delineation accuracy of true positive volume fraction30 (TPVF) >97%, false positive volume fraction30 (FPVF) <0.2% was achieved, suggesting the feasibility of AAR and high accuracy. The results indicate that increasing the number of objects in the model improves both recognition and delineation accuracy in clinical images. More spread out arrangement of objects in the model can also lead to improved recognition and delineation accuracy. Including larger objects in the model also improved recognition and delineation.

This paper is organized as follows. In Sec. 2, the complete methodology of our approach and the object recognition strategies are described. In Sect. 3, we describe a detailed evaluation of this method in terms of its recognition and delineation accuracy and efficiency on three different data sets. In Sec. 4, we summarize our conclusions. A preliminary version of this paper has appeared in the Conference Proceedings of the SPIE 2009 Medical Imaging Symposium.28

MULTIOBJECT ORIENTED ACTIVE SHAPE MODELS

Overview of the approach

The proposed MOASM is a multiobject generalization of the OASM method.6 Compared to one object in OASM, there are multiple objects in the model in MOASM. Let O1, O2,….,Om be the physical objects of interest in a given body region B, such as the liver, lungs, heart, etc., in the thoracic region. For segmenting the boundaries of these objects in an image of a particular manifestation of them, ASM captures the statistical variations in the boundaries of these objects within these objects’ family via a statistical shape model M. MOASM, like OASM, considers each object boundary to be oriented and determines a cost structure K associated with M via the principles underlying the live wire method. As per this cost structure, every shape instance xO of every object O in B generated by M is assigned a total cost KO(xO) in a given image I. This cost is determined from the live wire segments generated in I between all pairs of successive landmarks of the shape instance xO. MOASM seeks that set of oriented boundaries in I corresponding to O1,….,Om, each of which is a sequence of live wire segments between successive pairs of landmarks of a shape instance xO, such that xO satisfies the constraints of M, and the total cost ∑OKO(xO) over all objects in I considered in the model is the smallest possible. The main steps involved in MOASM are listed below. Each step is described in detail in each subsection of Sec. 2. A comparison is also given between the proposed MOASM and the OASM method in each subsection.

Algorithm MOASM

Model building ∕training phase
  • (T1) Specify landmarks on boundaries of objects O1,…,Om in the training images provided for body region B.

  • (T2) Construct a shape model M for the objects in B from the landmarks and training images.

  • (T3) Create boundary cost function K.

Segmentation phase
  • (S1) Initialization∕recognition: Determine, in the given image I of B of a patient, the pose at which M should be set in I so that the model boundaries are close to the real object boundaries in I. Let the shape instance of the multiple object assembly corresponding to the recognized site be x=(xO1,…,xOm).

  • (S2) Delineation: For the shape instances x of the multiple object assembly, determine the best oriented boundaries in I as per the MOASM method.

  • (S3) If the convergence criterion is satisfied, output the best oriented boundaries found in S2 and stop. Else subject x to constraints of model M and go to step S2.

In this procedure, T1–T3 constitute training or model creation steps, and S1–S3 represent initialization∕recognition and delineation steps. These steps are described in the following sections in detail.

T1: Specifying landmarks

Similar to the OASM method, MOASM specifies landmarks for multiple objects. Suppose each object Oi considered for inclusion in the model has ni landmarks, 1≤im. Then the vector representation for planar shapes would be

x=(xO1,xO2,,xOm)=(x1O1,x2O1,,xn1O1,x1O2,,xn2O2,xnmOm)=(x1O1,y1O1,x2O1,y2O1,,xn1O1,yn1O1,x1O2,y1O2,,xn2O2,yn2O2,xnmOm,ynmOm). (1)

In many ASM studies, a manual procedure is used to label the landmarks in a training set, although automatic methods are also available for this purpose. For our approach, any such method will work, although we have used the manual method in producing all presented results.

T2: Building model M

To obtain a true shape representation of the family of an object Oi, location, scale, and rotation effects within the family need to be filtered out. This is done by aligning shapes within the family of Oi (in the training set) to each other by changing the pose parameters (scale, rotation, and translation).29 For multiple objects, the object assemblies are aligned. Principal component analysis (PCA) is then applied to the aligned N training shape vectors xi, i=1,…,N. The model M is then constructed following the ASM procedure.29

T3: Creating boundary cost function K

Similar to OASM, the live wire method of Ref. 25 is used in MOASM because of its boundary orientedness properties and the distinct graph model it uses. Unlike OASM, MOASM trains the boundary cost functions K1, K2,….,Km for each object O1, O2,….,Om separately, where Ki is the boundary cost structure for Oi. Each cost structure Ki is specifically optimally tailored for each object Oi, as briefly explained below (see Ref. 25 for details).

A boundary element, bel for short, of I is an ordered pair (d,e) of four-adjacent pixels d and e. It represents the oriented edge between pixels d and e, (d,e), and (e,d) representing its two possible orientations. To every bel of I, we assign a set of features. The features are intended to express the likelihood of the bel belonging to the boundary (of a particular object Oi) that we are seeking in I. The cost c(b) associated with each belb of I is a linear combination of the costs assigned to its features,

c(b)=iwicf(fi(g))iwi, (2)

where wi is a positive constant indicating the emphasis given to feature function fi and cf is the function to convert feature values fi(g) to cost values cf(fi(g)). In live wire,25fi’s represent features such as intensity on the immediate interior of the boundary, intensity on the immediate exterior of the boundary, and gradient magnitude at the center of the bel. cf is an inverted Gaussian function, and here, identical weights wi are used for all selected features.

For the purpose of MOASM, we shall utilize the feature of live wire to define the best oriented path between any two points as a sequence of bels with minimum total cost. The only deviation in this case is that the two points will be taken to be any two successive landmarks employed in the model, and the landmarks are assumed to correspond to pixel vertices. With this facility, we assign a cost to every pair of successive landmarks of any shape instance xOi, which represents the total cost of the bels in the best oriented path ⟨b1,b2,…,bh⟩ from landmark xkOi to landmark xk+1Oi. That is,

κ(xkOi,xk+1Oi)=j=1hc(bj). (3)

For any shape instance xOi=(x1Oi,x2Oi,,xniOi) of Oi, the cost structure Ki=κ(xOi) may now be defined as

Ki(xOi)=k=1niD(xkOi)κ(xkOi,xk+1Oi)D(xk+1Oi), (4)

where we assume that xni+1Oi=x1Oi and D(xkOi) is the Mahalanobis distance for the intensity profile at xkOi for object Oi as defined below,

D(xkOi)=(gkg¯k)T(Skg)1(gkg¯k), (5)

where gk is the intensity profile at landmark xkOi (we sample 2l+1 intensities along the direction perpendicular to the boundary at this landmark) and g¯k and Skg are the mean and covariance matrix of the intensity profile at this landmark, respectively.

That is, Ki(xOi) is the weighted sum of the costs associated with the best oriented paths between all ni pairs of successive landmarks of shape instance xOi. Thus, once the bel cost function is determined via training (as described in Ref. 25), Ki is also determined automatically by Eq. 4. Then, for MOASM, the total boundary cost function can be created as K(x)=i=1mKi(xOi). The multiobject model M together with the multi object cost structure K constitutes our MOASM: MMOASM=(M,K).

S1: Automatic initialization∕recognition

The goal of automatic initialization, or object recognition, is to find an initial pose for M in the given image I. Compared to OASM, automatic initialization (recognition) via MOASM is more robust and efficient. As the number of objects in the model increases and if the objects are selected strategically, the ways in which objects can be fitted in the given image to be segmented become severely constrained, and therefore, the search space becomes substantially smaller. So, even when there is only one object of interest, MOASM uses multiple objects in the model just to make the recognition of the object of interest, and hence its delineation, more accurate, robust, and effective.

Suppose p denotes the pose vector for the object assembly, which includes a location component (tx,ty), scale component (s), and orientation component (θ). The goal of recognition is to find the best initial pose in I for MMOASM. Our experiments indicate that, because of the globally optimal delineations and the orientedness nature of MMOASM, the small variations in orientation observed in strict protocol guided clinical image acquisitions can be automatically handled in the delineation step, and thus, θ can be ignored for the purpose of recognition. Thus, p becomes three-dimensional (tx,ty,s). The recognition task is then to find

p=argminpBC(p,I,K1,.,Km), (6)

where BC(p,I,K1,,Km)=i=1mKi(xOi), which denotes the sum of the cost (as per cost structures K1,….,Km) of the boundaries of all m objects delineated starting with the model at p. As m increases, the constraints imposed become quite severe and the search for p becomes remarkably easier, and accurate recognition can be achieved by discretizing the space of p and by performing exhaustive search within a restricted space of the p vectors. The automatic initialization method proposed here is an essential underpinning of the MOASM method. It relies on the fact that, at a position of a shape instance of MOASM that is close to the correct boundaries of O1,…,Om in I, the total cost BC of the oriented boundaries of allobjects included in MOASM is likely to be sharply smaller than the cost of the oriented boundary assembly found at other locations in I. In our implementation, the recognition process consists of the following steps. Let the given image to be segmented be I and the mean pose vector p¯=(t¯x,t¯y,s¯)t represent the mean location (t¯x,t¯y) and scale (s¯) observed over all training images for the objects included in the model. Let SS be a search space—a set of pose vectors—centered around p¯.

Object recognition algorithm

 
Begin
1. For each pSS, perform steps 2 and 3.
2. Place M at p. Deform each object as per standard ASM searching method.
3. Evaluate BC(p,I,K1,….,Km).
4. Find p corresponding to the minimum BC over all tested pSS.
5. Output the delineated object shapes xp=(xpO1,,xpOm) corresponding to p.
end

It is important to note that SS is a small set of pose vectors uniformly sampled in a region around p¯ in the implementation. Typically, we have used a search region of 100×100×0.5 (in x, y, s units, where x and y units are in terms of the number of pixels) which is sampled at every 20–25 pixels along x and y and 0.1–0.15 in s. Thus, SS consists of roughly 50–125 vectors.

S2: Delineation

This step assumes that the recognized (initialized) shape assembly instance xp of the objects in M derived from step S1 is sufficiently close to the actual boundaries of Oi (for all 1≤im) in I. It then determines what the new position of the landmarks of the objects represented in xp should be such that the sum of the costs of the oriented paths between all pairs of successive landmarks is the smallest, where the cost takes into account not only the boundary cost Ki of each object Oi [Eq. 4] but also the relationship among objects. This is accomplished through a three-level dynamic programming (3LDP) algorithm. Compared to 2LDP in the OASM method, the 3LDP algorithm in MOASM not only attempts to find an optimal path for each object but also aims at delineating all objects in a consistent and globally optimal way.

Let the shape for object Oi to be modified be xOi=(x1Oi,x2Oi,,xniOi). At each landmark xkOi, L=2q+1(q>)l, where l is the number of points selected on each side of xkOi in the appearance aspect of the model during training) points are selected, including xkOi, with q points on either side of xkOi along a line perpendicular to the shape boundary at xkOi. Let the set of these points be denoted by PkOi.

From each point in PkOi, there exists a minimum cost oriented path to each point Pk+1Oi in I, which can be determined via a first level DP as in the live wire method. We are interested in that set of minimum cost paths, one selected between each pair (PkOi,Pk+1Oi), such that, the resulting closed boundary is continuous, and its total cost is the smallest possible. This problem can be solved via a second level of DP as illustrated in the graph of Fig. 1. In this graph, the set of nodes is P1OiUP2OiUUPniOiUP1Oi, and the set of directed arcs is (P1Oi×P2Oi)U(P2Oi×P3Oi)UU(Pni1Oi×PniOi)U(P1Oi×P1Oi). Each arc (u,v) in this graph has a cost which is simply the cost κ(u,v) [Eq. 3] of the minimum cost oriented (live wire) path between u and v in I. Thus, each directed arc such as (u,v) also represents an oriented boundary segment from u to v as a sequence of bels. Note that a directed path, such as the one shown in the middle of Fig. 1, starting from some node u in P1Oi in the first column and ending in the same node in the last column constitutes a closed, oriented boundary as a sequence of bels. Our aim is to find the best of all such directed paths (closed, oriented boundaries), each starting from some node in the first column and ending in the same node in the last column. This problem can be solved by a second level of DP. Since there are L nodes in the first (and every) column, in OASM, the second level DP is applied L times, and a directed closed path among the L optimal paths which yields the lowest total cost is considered to be the best and the desired delineation of Oi.

Figure 1.

Figure 1

The weighted graph used in the second level dynamic programming. The nodes in each column represent the set of points PkOi that are sampled along a line orthogonal to the boundary at each landmark xkOi.

In the MOASM method, instead of choosing the best among the L optimal paths for each object (as done in OASM), one more level (object level) optimization is carried out. This is done by combining the cost of the optimal boundaries with the cost associated with object relationships to ensure propriety of relationships as described below. Suppose we determine a linear order for the objects, denoted by O1,….,Om, and fix this order once for all (how to choose the order is discussed later). Let the L optimal boundaries of Oi found above be BOi, 1≤ℓ≤L. We quantify the relationship between successive objects (Oi,Oi+1), 1≤im, in the sequence via three factors, for 1≤im and 1≤ℓ, kL: Relative position RP,ki=x¯Oix¯kOi+1, where x¯Oi is the geometric center of BOi; relative size RS,ki=AOiAkOi+1, where Ai is the BkOi+1 area enclosed by BOi; and boundary distance BD,ki, which is the Housdorff distance between BOi and BkOi+1. The relationship Ri between Oi and Oi+1 is quantitatively expressed by a weighted combination of the three factors as

R,ki=x{RP,RS,BD}wxGx(μx,σx), (7)

where Gxxx) denotes a Gaussian function with mean μx and standard deviation σx. GRP is a Gaussian function of RP,ki. Its parameters μRP and σRP are estimated from the training data sets. GRS and GBD and their parameters are defined in an analogous manner. wx are weights chosen such that ∑xwx=1. (In all experiments, we used wRP=wRS=0.3.)

Given L optimal shapes for each Oi, 1≤im, the best shape for each Oi and the arrangement of objects in I is determined via a third level DP (3LDP) carried out on the graph shown in Fig. 2. Each of the L optimal shapes BOi for each Oi constitutes a node in this graph and a pair such as (BOi,BkOi+1), 1≤im, 1≤ℓ, kL, constitutes a directed arc. The integrated cost assigned to this arc is

IC(BOi,BkOi+1)=Ki(BOi)Ki+1(BOi+1)R,ki. (8)

As in Fig. 1, we observe that there is an optimum path in the graph of Fig. 2, starting from each node in the first column and ending on some node in the last column. The MOASM method finds each such optimum path (which represents one arrangement of the object assembly) and the best among L such arrangements via 3LDP.

Figure 2.

Figure 2

The graph used in the third level of optimization. There are m objects to be delineated and L possible optimal shapes for each object. The horizontal direction represents different objects. The vertical direction represents the different object shapes.

The 3LDP algorithm may be summarized as follows. Given a shape assembly xp=(xpO1,xpO2,,xpOm)=(x1O1,x2O1,,xn1O1,x1O2,,xn2O2,xnmOm) and I, it outputs a new shape assembly xo and the associated optimum oriented boundaries as a sequence of bels.

Algorithm 3LDP

 
Input: Initialized shape assembly xp
Output: Resulting shape assembly xo and the associated oriented boundaries.
begin
1. For each object Oi(1≤im), do steps 2 to 4.
2. Determine P1Oi,P2Oi,.,PniOi sets of points for Oi in I.
3. Determine costs κ(u,v) via first level DP for all directed arcs in Fig. 1.
4. For each point u in P1Oi, determine the best directed path from u back to u via the second level DP, the corresponding shape xOi, and its total cost Ki(xOi).
5. Compute arc costs IC(BOi,BkOi) in Fig. 2. Find the best path xo which has minimal total cost in the graph of Fig. 2 via third level DP.
6. Output the optimum shape assembly xo and the corresponding oriented boundaries.
end

S3: Applying model constraints

The convergence criterion used here is a measure of the distance between two shapes encountered in two consecutive executions of step S2. This measure is simply the maximum of the distance between corresponding landmarks in the two shape assemblies of all objects. If this distance is greater than 0.5 pixel unit, the optimum shape assembly found in step S2 is subjected to the constraints of model M. Then the iterative process is continued by going back to step S2. Otherwise, the MOASM process is considered to have converged and it stops with an output of the optimum shape assembly and the optimum oriented boundaries found in the process.

EXPERIMENTAL RESULTS

In this section, we demonstrate both qualitatively, through image display, and quantitatively, through evaluation experiments, the extent of effectiveness of the MOASM strategy. Three clinical image data sets which include abdominal CT, chest CT, and foot MRI have been considered. Our method of evaluation, based on the framework of reference,30 will focus on the analysis of accuracy and efficiency of MOASM. We will also present an evaluation of the proposed object recognition strategies. We will consider manual segmentation performed in these different data sets to constitute a surrogate of true segmentation for assessing the delineation and recognition accuracy of the methods.

Image data sets

The image data sets and objects employed in the experiments are described in Table 1 and Fig. 3. Both CT data sets constitute slices selected from 3D CT studies on a Siemens Sensation 16 CT scanner with a slice spacing of 5 mm, image size of 512×512, and pixel size of 0.78×0.78 mm2. The MRI Foot data set constitutes slices selected from 3D studies on a GE 1.5T MRI scanner with a slice spacing of 1.3 mm, image size of 256×256, and pixel size of 0.55×0.55 mm2. In each data set, 40 slices selected from full 3D images, acquired from 15 different subjects are used. These slices are approximately at the same anatomic location in the body, so that, for each object, the 40 2D images in each set can be considered to represent images of a family of objects of same shape. Two to three slices are taken on average from the same subject’s data, either from the same 3D image or from different 3D images. Among the 40 images, 25 are randomly selected for training, and the rest are used as test images. MOASM models were built by manually selecting landmarks on the segmented boundaries.

Table 1.

Description of the image data sets used in the three AAR experiments.

Data set Objects Number of landmarks used Image domain No. of images
Abdominal CT (1) Right pelvic bone 10 512×512 40
(2) Vertebra 12
(3) Left pelvic bone 10
(4) Skin boundary 12
Chest CT (1) Right lung 10 512×512 40
(2) Heart 9
(3) Left lung 10
(4) Skin boundary 12
Foot MRI (1) Tibia 7 256×256 40
(2) Talus 14
(3) Calcaneous 15
(4) Skin boundary 11

Figure 3.

Figure 3

The four objects selected in the abdominal CT, chest CT, and foot MRI data sets.

Qualitative analysis

A subjective inspection revealed that, in all experiments, the MOASM results matched the perceived boundary very well. Some examples are displayed for the three data sets in Fig. 4. Automatic initialization based on location and scale search worked well in all cases in the sense that initialized shapes were found close to the true boundary. In the figure, the original image with the mean shape assembly in its default pose is displayed in Fig. 4b. In Fig. 4c, the model shapes resulting from the recognition process, which form an input to the MOASM algorithm, are displayed. The final delineation results are shown in Fig. 4d.

Figure 4.

Figure 4

One example from each of the three data sets: (a) original image, (b) default initial model pose, (c) automatic recognition result, and (d) delineation result by MOASM.

Quantitative analysis of segmentation

Here, we will focus on the analysis of accuracy and efficiency of MOASM from the perspective of the final delineation results and compare them with those of the classical ASM method. Accuracy here relates to how well the segmentation results agree with the true delineation of the objects. Efficiency indicates the practical viability of the method, which is determined by the amount of time required for performing computations and for providing any user help needed in segmentation. The measures that are used under each of these groups and their definitions are given below. In these analyses, all test data sets from all three groups of data were utilized.

Accuracy. The following measures, called TPVF and (FPVF) (Ref. 30) are used to assess the accuracy of the methods. The ground truth for delineation (and recognition) is provided by manually delineating the shapes. In all applications, all data sets have been previously manually segmented in the domain. Table 2 lists the mean and standard deviation values of TPVF and FPVF achieved by using ASM and MOASM methods. TRVF∕FPVF are calculated as per Ref. 30 for each object separately. Then the average TRVF∕FPVF over all objects is used as the performance. It shows that MOASM produces considerably more accurate final segmentations than the ASM method. It is important to note that only a small set of landmarks is needed for each object in MOASM as shown in Table 1 due to the efficiency of live wire for boundary delineation. However, with more landmarks, the method will have even superior performance.

Table 2.

Mean and standard deviation of TPVF and FPVF for ASM and MOASM.

Data set TPVF FPVF
ASM MOASM ASM MOASM
Abdominal CT 88.56%±2.2% 97.3%±1.1% 0.66%±0.01% 0.13%±0.01%
Chest CT 87.89%±0.15% 98.3%±0.2% 0.99%±0.02% 0.17%±0.01%
Foot MRI 90.01%±0.34% 97.2%±0.3% 0.41%±0.01% 0.11%±0.01%

Efficiency. Both methods are implemented on an Intel Pentium IV PC with a 3.4 GHZ CPU using MATLAB programming. In determining the efficiency of a segmentation method, two aspects should be considered—the computation time (Tc) and the human operator time (To). The mean Tc and To per data set estimated over the three data sets for each experiment are listed in Table 3. To measured here is the operator time required in the training step (for selecting landmarks). Table 3 shows that the operator time (training) required in MOASM is much less than that of ASM since far fewer landmarks are needed in MOASM. The computation time required in MOASM is a little more than that of ASM because of the 3LDP algorithm.

Table 3.

Mean operator time To and computational time Tc (in seconds) in all experiments for ASM and MOASM. The number of landmarks used is indicated by nℓ.

Data Set To Tc
ASM MOASM ASM MOASM
Abdominal CT 160(nℓ=132) 60(nℓ=44) 18 30
Chest CT 130(nℓ=123) 55(nℓ=41) 16 28
Foot MRI 170(nℓ=141) 65(nℓ=47) 12 20

Evaluation of object recognition strategies

It is clear that the effectiveness of recognition depends very much on the integrated cost function IC, which in turn depends on the cumulative boundary cost function BC, and the cost structures K1,…,Km. This in turn implies that the recognition efficacy is influenced by the particular combination of objects included in MOASM. Here, we evaluate the relationship between object recognition and delineation accuracy and the number of objects considered in M and their actual spatial distribution.

Robust region. To characterize the robustness of the recognition method, we define a concept called robust region RR as

RR={posevectorp|whentherecognizedobjectassemblyisatp,theresultingdelineationaccuracyisacceptablyhigh(say,withTPVF>97%andFPVF<3%)}. (9)

RR is a useful factor to quantify the robustness of the recognition method and to compare among recognition strategies. The size of RR is an indicator of the robustness of the recognition strategy resulting from a particular choice of the objects included in MOASM. The larger the robust region is, the more robust the recognition method is. Figure 5 shows an illustration of the robust region RR (red region) for an image from the abdominal CT data set. The white point is the geometric center of the recognized shape. The red region is the robust region RR, considering only the x, y location component of p. In this case, as the white point is within the red region, it implies high delineation accuracy. The size of the robust region in the AAR method over all images in these three data sets is roughly 22 pixels×23 pixels×0.20, 23 pixels×24 pixels×0.20 and 20 pixels×22 pixels×0.16, respectively. Figure 6 illustrates how the size (volume) of RR (x range×y range×scale range) varies for different numbers of objects in these three data sets. The results suggest that the MOASM recognition process becomes more robust as the number of objects in the model increases. This is true even in studying only one object.

Figure 5.

Figure 5

Illustration of robust region RR for an image from the abdominal data set. The white point is the geometric center of the recognized shape assembly. In this case recognition is perfect. The darker region around the white point located within the vertebral body is the robust region RR considering just the x and y location components of p.

Figure 6.

Figure 6

The volume of the robust region in the (x, y, s) space for different number of objects. (a) Abdominal CT data set, (b) chest CT data set, and (c) MRI data set. The mean value and standard deviation over all test data sets are shown.

Accuracy dependence on the number of objects, objects’ distribution, and object size. It is interesting to note that the recognition accuracy improves with increasing number of objects. Figure 7 shows the experimental results on the three data sets. In order to test how the objects’ distribution may influence recognition accuracy, two objects were selected in different combinations for the three data sets. Figure 8 shows the experimental results on these three data sets. We observe that the recognition accuracy is higher if the objects are distributed more evenly. We also found that it is a good idea to include the encompassing object (such as skin boundary) as it provides a strong and robust constraint. Figure 9 demonstrates that delineation accuracy depends on the size of the object also. The figure shows how the delineation accuracy of object 1 in the three data sets improves as the object used for recognition is chosen to have increasingly larger size. On the horizontal axis, label 1 indicates that only object 1 is used in the model for recognizing and delineating object 1. Label 2 indicates that, in addition to object 1, object 2 is also employed in the model, and so on.

Figure 7.

Figure 7

The depedence of delineation accuracy on the number of objects in the three data sets. (a) Abdominal CT data set, (b) chest CT data set, and (c) foot MRI data set. For labels 1, 2, and 3, all combinations of 1, 2, and 3 objects were considered. The mean value over all test dta sets and standard deviation are shown.

Figure 8.

Figure 8

The dependence of delineation accuracy on the distribution of objects in the three data sets. Distribution: 1: objects 1, 2; 2: objects 1, 3; 3: objects 1, 4; 4: objects 2, 3; 5: objects 2, 4; 6: objects 3, 4. (a) Abdominal CT data set; (b) chest CT data set; (c) foot MRI data set.

Figure 9.

Figure 9

Illustration of how better recognition via larger objects included in MOASM influences delineation accuracy. The delineation accuracy is shown for object 1 in all three data sets. The horizontal axis indicates the number of the object included in the model (with increasing size) in addition to object 1 for facilitating recognition. (a) Abdominal CT data set. (b) Chest CT data set. (c) Foot MRI data set.

Interdependence between recognition and delineation. The effectiveness of AAR depends on the interaction between object recognition and object delineation; in other words, the pose at which the objects should be initialized influences delineation accuracy. Conversely (obviously), the recognition accuracy itself depends on delineation accuracy since IC, BC, and Ki will depend on the delineated boundaries. Experiments were also carried out to study the interdependence between recognition and delineation. We find that (Fig. 10) with the improvement of recognition, delineation also becomes more accurate. It means that good delineation can help in recognition and perfect recognition can make delineation most accurate. This is the spirit of the synergy established between ASM and live wire by MOASM. Delineation accuracy is shown over the same objects that are used in the model for recognition. Here, the recognition accuracy is defined as an inverse function of the distance of the geometric center loc of the model at the end of recognition from RR, as follows:

AR=exp(αdist(loc,RR)), (10)

where α is a constant (here α=0.1). If the distance is 0, AR=1 (perfect recognition).

Figure 10.

Figure 10

Recognition accuracy versus delineation accuracy in the abdominal CT data set. (a) TPVF versus recognition accuracy; (b) FPVF versus recognition accuracy.

How far is MOASM solution from the global optimum?

As described in Ref. 6, the single-object OASM algorithm finds a globally optimal boundary if the recognition step brings the model to a pose, wherein the line profile drawn orthogonal to the model shape boundary at every landmark is intersected by the globally optimal boundary. This property is valid also for the MOASM algorithm. From Figs. 678910, it is also clear that MOASM has better performance than (the single-object) OASM in terms of both delineation and recognition accuracy. To determine how often globally optimum solutions are actually found, we examined the MOASM recognition results for each of the four objects in all (15×3=45) test data sets to ascertain how often the above optimality property is satisfied. Table 4 suggests that, overall, in 97% of the cases the globally optimal solution is actually found. We also computed the percent difference between the true optimum boundary cost (which can be determined since the true boundaries are known) and the optimum found by the MOASM method for the chest CT data set. These are listed in Table 5, which shows that the differences are quite small, although for object 3 (heart) the cost function can be perhaps improved. Better cost functions also lead to a larger RR and hence more efficient recognition.

Table 4.

Results showing how often globally optimal boundaries are found for the four objects in the three data sets.

  Object 1 Object 2 Object 3 Object 4 Total
Abdominal CT 15∕15=100% 14∕15=93.3% 15∕15=100% 14∕15=93.3% 43∕45=95.5%
Chest CT 15∕15=100% 15∕15=100% 14∕15=93.3% 15∕15=100% 44∕45=97.7%
Foot MRI 15∕15=100% 15∕15=100% 15∕15=100% 14∕15=93.3% 44∕45=97.7%
Total 45∕45=100% 44∕45=97.7% 44∕45=97.7% 43∕45=95.5% 131∕135=97.0%

Table 5.

Difference between the true optimum cost and the cost of the optimum boundaries actually found for the different objects in the chest CT data set. Mean values over the 15 test images are shown.

Object 1 Object 2 Object 3 Object 4 Total
1.4% 2.3% 9.7% 0.04% 3.8%

All results reported thus far in this section were produced assuming the orders O1, O2, O3, and O4 for objects in all three data sets for devising the cost function IC referred to in Fig. 2, where the number assignment to objects is as depicted in Fig. 3. To test the dependence of delineation accuracy on this order, for the three data sets, we ran the entire segmentation experiment for five other object sequences. The results for all six sequences are summarized in Table 6. The results suggest that the delineation accuracy is more or less independent of the object sequence employed for IC.

Table 6.

Delineation accuracy (mean and standard deviation) for six different object sequences considered in composing IC for the three data sets.

Order of objects TPVF FPVF
Abdominal CT Chest CT Foot MRI Abdominal CT Chest CT Foot MRI
O1-O2-O3-O4 97.36%±1.12% 98.33%±0.26% 97.25%±0.35% 0.13%±0.01% 0.17%±0.01% 0.11%±0.01%
O1-O3-O2-O4 97.05%±1.19% 98.21%±0.29% 97.15%±0.36% 0.15%±0.01% 0.18%±0.01% 0.13%±0.01%
O2-O1-O3-O4 97.12%±1.20% 98.15%±0.32% 97.17%±0.31% 0.14%±0.01% 0.19%±0.01% 0.13%±0.01%
O2-O3-O1-O4 96.96%±1.21% 98.19%±0.31% 97.12%±0.39% 0.16%±0.01% 0.18%±0.01% 0.12%±0.01%
O3-O1-O2-O4 96.93%±1.25% 98.17%±0.29% 97.11%±0.38% 0.15%±0.01% 0.17%±0.01% 0.14%±0.01%
O3-O2-O1-O4 97.31%±1.15% 98.29%±0.29% 97.26%±0.33% 0.14%±0.01% 0.16%±0.01% 0.12%±0.01%

CONCLUDING REMARKS

This paper presented a generalization of the single-object oriented ASM methodology to multiple objects. The rationale for this generalization was, primarily, for improving the accuracy and robustness of object recognition, and secondarily, and consequently, also for improving delineation accuracy. The robustness of recognition also implies automated recognition. The paper introduced several new concepts: (1) using an assembly of objects even when there is only one object of interest, (2) separating recognition from delineation although they are interdependent, (3) the notion of robust region, (4) assessing recognition accuracy via RR independently of delineation, and (5) the dependence of recognition and delineation accuracy on the number and geographic distribution of objects included in the MOASM. These concepts were illustrated and validated via routine clinical chest CT, abdominal CT, and foot MRI data sets. The evaluated results indicate that (a) an overall delineation accuracy of TPVF >97%, FPVF <0.2% can be achieved, suggesting the feasibility of AAR. (b) Increasing the number of objects can considerably improve both recognition and delineation accuracy in clinical images. (c) More spread out arrangement of objects can lead to improved recognition and delineation accuracy. The larger the robust region is, the more robust the recognition method is. It can be used to assess recognition accuracy and to compare among methods. It also helps to quantify the synergy in the method between recognition and delineation. Experimental verification indicates that the MOASM method almost always finds the global optimum and performs better than OASM.

Some ideas underlying MOASM can be further improved. First, the idea of creating an ordered sequence of the objects included in MOASM is somewhat artificial since there is no natural or meaningful sequential order for the objects. Our motivation for the idea was to achieve object level optimization within the existing computational framework of DP. Although, as illustrated in Table 6, there seems to be little variation in delineation accuracy when this order is changed, a more appropriate formulation would have been to define a complete graph of the objects for each 1≤ℓ≤L, finding an optimum spanning tree in each, and finally choosing the best among the L optimal spanning trees. Second, in MOASM, we have made the cost function BC object specific which helps considerably in making recognition powerful via the integrated cost function IC. The latter can be possibly further improved by devising different cost functions for the section of the boundary between each pair of successive landmarks in each object. Since objects generally have interfaces with multiple neighboring objects, this strategy may match the cost function with the real variability in the characteristics commonly observed along object boundaries.

In this paper, the focus was on 2D images. Although we took 2D subsets of 3D images for illustration and evaluation, there are many 2D imaging modes and 2D type applications even in medical imaging where MOASM will be applicable. Generalizing the techniques to 3D images is certainly not trivial mainly from the view point of creating appropriate 3D models for OASM. Once the models are built and a 3D delineation algorithm is devised, then the above concepts are directly applicable to 3D images. One way is to segment the organ slice-by-slice, whose feasibility has been demonstrated for 3D organ initialization.7 Alternatively, as demonstrated in Ref. 31 for OASM, multiple 2D object assembly models covering the body region can be employed to recognize and delineate 3D objects in 3D images.

ACKNOWLEGMENTS

The authors’ work is supported by NIH Grant No. EB004395.

References

  1. Zhou Y. and Bai J., “Atlas-based fuzzy connectedness segmentation and intensity nonuniformity correction applied to brain,” IEEE Trans. Biomed. Eng. 54, 122–129 (2007). 10.1109/TBME.2006.884645 [DOI] [PubMed] [Google Scholar]
  2. Hansegard J., Urheim S., Lunde K., and Rabben S. I., “Constrained active appearance models for segmentation of triplane echocardiograms,” IEEE Trans. Med. Imaging 26(10), 1391–1400 (2007). 10.1109/TMI.2007.900692 [DOI] [PubMed] [Google Scholar]
  3. Horsfield M. A., Bakshi R., Rovaris M., Rocca M. A., Dandamudi S. V. S. R., Valsasina P., Eudica E., Lucchini F., Guttmann C. R. G., Sormani M. P., and Filippi M., “Incorporating domain knowledge into the fuzzy connectedness framework: Application to brain lesion volume estimation in multiple sclerosis,” IEEE Trans. Med. Imaging 26, 1670–1680 (2007). 10.1109/TMI.2007.901431 [DOI] [PubMed] [Google Scholar]
  4. Yao J. and Chen D., “Live level set: A hybrid method of livewire and level set for medical image segmentation,” Med. Phys. 35, 4112–4120 (2008). 10.1118/1.2968876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Rousson M. and Paragios N., “Prior knowledge, level set representation and visual grouping,” Int. J. Comput. Vis. 76, 231–243 (2008). 10.1007/s11263-007-0054-z [DOI] [Google Scholar]
  6. Liu J. and Udupa J. K., “Oriented active shape models,” IEEE Trans. Med. Imaging 28, 571–584 (2009). 10.1109/TMI.2008.2007820 [DOI] [PubMed] [Google Scholar]
  7. Chen X., Yao J., Zhuge Y., and Bagci U., in 3D Automatic Image Segmentation Based on Graph Cut-Oriented Active Appearance Models, Proceedings of 2010 IEEE 17th International Conference on Image Processing, Hong Kong, September 26–19, 2010, pp. 3653–3656.
  8. Besbes A., Komodakis N., Langs G., and Paragios N., in Shape Priors and Discrete MRFs for Knowledge-Based Segmentation, Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, June 20–25, 2009, pp. 1295–1302.
  9. Freedman D. and Zhang T., in Interactive Graph Cut Based Segmentation with Shape Priors, Proceedings of 2005 IEEE Conference on Computer Vision and Pattern Recognition, San Diego, June 20–25, 2005, pp. 755–762.
  10. Kohli P., Rihan J., Bray M., and Torr P. H. S., “Simultaneous segmentation and pose estimation of humans using dynamic graph cuts,” Int. J. Comput. Vis. 79, 285–298 (2008). 10.1007/s11263-007-0120-6 [DOI] [Google Scholar]
  11. Frangi A. F., Rueckert D., Schnabel J. A., and Niessen W. J., “Automatic construction of multiple-object three-dimensional statistical shape models: Application to cardiac modeling,” IEEE Trans. Med. Imaging 21, 1151–1166 (2002). 10.1109/TMI.2002.804426 [DOI] [PubMed] [Google Scholar]
  12. Park H., Bland P., and Meyer C., “Construction of an abdominal probabilistic atlas and its application in segmentation,” IEEE Trans. Med. Imaging 22, 483–492 (2003). 10.1109/TMI.2003.809139 [DOI] [PubMed] [Google Scholar]
  13. Cuadra M. B., Pollo C., Bardera A., Cuisenaire O., Villemure J. -G., and Thiran J. -P., “Atlas-based segmentation of pathological MR brain images using a model of lesion growth,” IEEE Trans. Med. Imaging 23, 1301–1314 (2004). 10.1109/TMI.2004.834618 [DOI] [PubMed] [Google Scholar]
  14. Bazin P. L. and Pham D. L., “Homeomorphic brain image segmentation with topological and statistical atlases,” Med. Image Anal. 12, 616–625 (2008). 10.1016/j.media.2008.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Noble J. H. and Dawant B. M., “Automatic segmentation of the optic nerves and chiasm in CT and MR using the atlas-navigated optimal medial axis and deformable-model algorithm,” Proc. SPIE 7259, 725916 (2009). 10.1117/12.810941 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ehm M., Klinder T., Kneser R., and Lorenz C., “Automated vertebra identification in CT images,” Proc. SPIE 7259, 72590B–72590B (2009). 10.1117/12.811836 [DOI] [PubMed] [Google Scholar]
  17. Mitchell S. C., Lelieveldt B. P. F., van der Geest R. J., Bosch H. G., Reiber J. H. C., and Sonka M., “Multistage hybrid active appearance model matching: Segmentation of left and right ventricles in cardiac MR images,” IEEE Trans. Med. Imaging 20, 415–423 (2001). 10.1109/42.925294 [DOI] [PubMed] [Google Scholar]
  18. Christensen G., Rabbitt R., and Miller M., “3-D brain mapping using a deformable neuroanatomy,” Phys. Med. Biol. 39, 609–618 (1994). 10.1088/0031-9155/39/3/022 [DOI] [PubMed] [Google Scholar]
  19. Kaneko T., Gu L., and Fujimoto H., “Recognition of abdominal organs using 3D mathematical morphology,” Syst. Comput. Japan 33, 75–83 (2002). 10.1002/scj.1148 [DOI] [Google Scholar]
  20. Lee C. -C., Chung P. -C., and Tsai H. -M., “Identifying multiple abdominal organs from CT image series using a multimodule contextual neural network and spatial fuzzy rules,” IEEE Trans. Inf. Technol. Biomed. 7, 208–217 (2003). 10.1109/TITB.2003.813795 [DOI] [PubMed] [Google Scholar]
  21. Haas B., Coradi T., Scholz M., Kunz P., Huber M., Oppitz U., Andre L., Lengkeek V., Huyskens D., van Esch A., and Reddick R., “Automatic segmentation of thoracic and pelvic CT images for radiotherapy planning using implicit anatomic knowledge and organ-specific segmentation strategies,” Phys. Med. Biol. 53, 1751–1771 (2008). 10.1088/0031-9155/53/6/017 [DOI] [PubMed] [Google Scholar]
  22. Campadelli P., Casiraghi E., Pratissoli S., and Lombardi G., “Automatic abdominal organ segmentation from CT images,” Electronic Letters on Compute Vision and Image Analysis 8(1), 1–14 (2009). [Google Scholar]
  23. Seifert S., Barbu A., Zhou S. K., Liu D., Feulner J., Huber M., Suehling M., Cavallaro A., and Comaniciu D., “Hierarchical parsing and semantic navigation of full body CT data,” Proc. SPIE 7259, 7250902 (2009). [Google Scholar]
  24. Zhou X., Hayashi T., Han M., Chen H., Hara T., Fujita H., Yokoyama R., Kanematsu M., and Hoshi H., “Automated segmentation and recognition of the bone structure in non-contrast torso CT images using implicit anatomical knowledge,” Proc. SPIE 7259, 72593S–72583 (2009). 10.1117/12.812945 [DOI] [Google Scholar]
  25. Ling H., Zhou S. K., Zheng Y., Georgescu B., Suehling M., and Comaniciu D., in Hierarchical, Learning-Based Automatic Liver Segmentation,” Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, June 23–28, 2008, pp.1–8.
  26. Liu F., Zhao B., Kijewski P. K., Wang L., and Schwartz L. H., “Liver segmentation for CT images using GVF snake,” Med. Phys. 32, 3699–3706 (2005). 10.1118/1.2132573 [DOI] [PubMed] [Google Scholar]
  27. Falcao A. X., Udupa J. K., and Miyazawa F. K., “An ultra-fast user-steered image segmentation paradigm: Live wire on the fly,” IEEE Trans. Med. Imaging 19, 55–62 (2000). 10.1109/42.832960 [DOI] [PubMed] [Google Scholar]
  28. Chen X. J., Udupa J. K., Zheng X. F., Alavi A., and Torigian D., “Automatic anatomy recognition via multi object oriented active shape models,” Proc. SPIE 7259, 72594P–72594P (2009). 10.1117/12.812181 [DOI] [Google Scholar]
  29. Cootes T. F., Taylor C. J., Cooper D. H., and Graham J., “Active shape models – Their training and application,” Comput. Vis. Image Underst. 61, 38–59 (1995). 10.1006/cviu.1995.1004 [DOI] [Google Scholar]
  30. Udupa J. K., Leblanc V. R., Zhuge Y., Imielinska C., Schmidt H., Currie L. M., Hirsch B. E., and Woodburn J., “A framework for evaluating image segmentation algorithms,” Comput. Med. Imaging Graph. 30, 75–87 (2006). 10.1016/j.compmedimag.2005.12.001 [DOI] [PubMed] [Google Scholar]
  31. Liu J., “Synergistic hybrid image segmentation: Combining image and model-based strategies,” Ph.D. thesis, Department of Bioengineering, University of Pennsylvania, 2006. [Google Scholar]

Articles from Medical Physics are provided here courtesy of American Association of Physicists in Medicine

RESOURCES