Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 8.
Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2021 Feb 15;11596:1159633. doi: 10.1117/12.2581234

Anatomy Recognition in CT Images of Head & Neck Region via Precision Atlases

Jieyu Li a,b, Jayaram K Udupa b,*, Yubing Tong b, Dewey Odhner b, Drew A Torigian b
PMCID: PMC8653545  NIHMSID: NIHMS1758540  PMID: 34887608

Abstract

Multi-atlas segmentation methods will benefit from atlases covering the complete spectrum of population patterns, while the difficulties in generating such large enough datasets and the computation burden required in the segmentation procedure reduce its practicality in clinical application. In this work, we start from a viewpoint that different parts of the target object can be recognized by different atlases and propose a precision atlas selection strategy. By comparing regional similarity between target image and atlases, precision atlases are ranked and selected by the frequency of regional best match, which have no need to be globally similar to the target subject at either image-level or object-level, largely increasing the implicit patterns contained in the atlas set. In the proposed anatomy recognition method, atlas building is first achieved by all-to-template registration, where the minimum spanning tree (MST) strategy is used to select a registration template from a subset of radiologically near-normal images. Then, a two-stage recognition process is conducted: in rough recognition, sub-image level similarity is calculated between the test image and each image of the whole atlas set, and only the atlas with the highest similarity contributes to the recognition map regionally; in refined recognition, the atlases with the highest frequencies of best match are selected as the precision atlases and are utilized to further increase the accuracy of boundary matching. The proposed method is demonstrated on 298 computed tomography (CT) images and 9 organs in the Head & Neck (H&N) body region. Experimental results illustrate that our method is effective for organs with different segmentation challenge and samples with different image quality, where remarkable improvement in boundary interpretation is made by refined recognition and most objects achieve a localization error within 2 voxels.

Keywords: multi-atlas segmentation, anatomy recognition, precision atlas

1. INTRODUCTION

Typical prior-knowledge-based anatomy segmentation methods include those based on shape and geographic models [1], atlases [2], and deep neural network models [3]. Compared to model based methods which first determine models and then match target objects, muti-atlas segmentation (MAS) methods take great advantage of directly using raw intensity images from atlases for decision making on specific patient images, which is the basis of precision medicine.

An implicit assumption in the MAS method is that the atlas set should have complete annotation of target objects and cover individual differences of the whole population under study. However, this assumption is usually unrealistic and not satisfied in clinical practice, which will result in sub-optimal segmentation. The basic question of the minimum number of atlas images needed to be able to cover the subject-specific patterns of variation is only now beginning to be addressed [4].

Atlas selection is one of the basic components of MAS methods, which aims to fully utilize the limited atlas set, improve the computational efficiency, and avoid degrading segmentation accuracy by irrelevant atlases. Although numerous strategies have been proposed to improve performance of atlas-based segmentation and the importance of atlas selection has been emphasized from different aspects [5], the criteria for atlas ranking are all purely based on different kinds of similarity measures proposed in the literature, such as those based on image intensity, features, degree of overlap [6], and/or meta-information such as patient age and gender [7]. In this work, we start from a novel viewpoint that different parts of a target object in the novel image can be recognized by different atlases, and that the frequencies of regional best-match, instead of similarity itself, will be an effective strategy in selecting precision atlases.

2. METHODS

The proposed precision-atlas-based recognition method is as represented in Figure 1, which consists of atlas building and 2-stage object recognition.

Figure 1.

Figure 1.

A schematic representation of precision-atlas-based recognition method.

2.1. Atlas building

The atlas set is built by aligning the set of all atlas images (Ia) into a same image space, and the corresponding binary masks of target objects (O, =1, …, L) are geometrically transformed in the same manner. All-to-template registration is applied to satisfy the computational capability in clinical practice, and seven-parameter rigid alignment strategy is used in registration to unify the global scale, pose, and position of images and preserve inter-subject regional variations for precision atlas selection. The template image is determined among a candidate subset of atlases, denoted by IRa, which are radiologically near-normal with the least amount of artifacts and pathological abnormalities.

The minimum spanning tree (MST) algorithm [8] is utilized to determine the template image. A complete weighted directed graph is first established where the nodes are the candidate images in IRa and the arc weights/costs are assigned by the dissimilarity between the node images. Mean Absolute Difference (MAD) is used as a metric to measure the dissimilarity between two candidate images, where only the image content inside the union foreground region is considered in MAD calculation to exclude the influence of background information such as the scanner table. After the graph is set up, an MST of the graph with least total cost is found.

The root image IRoota of the MST is used as the template target image, by which all images are registered into the same image space. The registered atlas set is denoted by I, the transformed binary mask set is denoted by J for each object, and the root image IRoota does not change after the entire registration process, to make notation uniform, we will denote it simply by IRoot.

2.2. Object Recognition

The recognition process starts from an initial step of registration where the target image Ia is registered to the template root image IRoot and transformed into I. Objects of interest in I are recognized one at a time. There contain two levels of recognition, rough and refined. In the rough recognition, the object O is localized (recognized) by examining all atlas images in I. Then, an atlas subset P which can be best associated with O in I is identified from I. In the refined recognition, the locality of O is sharpened by examining the atlas images in only P.

As mentioned previously, different parts of the segmentation can come from different atlases. Only the atlas with best regional similarity contributes to the recognition map of the target object O in I and is determined as the precision atlas in this local region. The similarity is locally measured in the sliding ω×ω 2D windows. The frequency of local best match inside the region of the target object and its surrounding tissue is taken as the measure to determine the precision atlases P for O.

The fuzzy recognition map FM of O is generated by adding up the contributions from foreground and background parts from best-match atlases on each voxel v of I in local regions. For the ease of this process, we slightly modify the representation of binary images in J by changing background voxel values 0 to −1 but will still maintain the binary representation, and the updating process is illustrated in Figure 2.

Figure 2.

Figure 2.

A 2D example of updating FMl. Regional similarity is calculated inside a 5 × 5 sub-image centered on the highlighted pixel. FMl (left) is updated into FMl (right) by adding up the binary mask of the regional best-match atlas (middle).

In rough recognition, an initial region of interest (ROI), denoted Rin, is first generated by dilating the union of the foreground regions of the images in J, where O should be contained inside Rin after the aligmnent of image spaces. Then, the best-match atlas II is determined for each voxel v inside Rin such that IKIargmin{ψ(Vω,I(v),Vω,K(v))}, where ψ stands for the dissimilarity function (sum of squared difference, SSD) and Vω,I stands for the sub-image with size ω×ω centered on the target voxel. Let JJ be the binary image representing O in I*, if the best-match is at or above a certain confidence level ψ(Vω,I(v),Vω,I*(v)) ≤ θ, then add Vω,J* (v) to FM as in Figure 2. The threshold θ is used to avoid cases where no atlas is locally similar to the target sub-image. Meanwhile, the atlas index associated with I* is recorded by a map AM(v) on the position of v.

Two maps are generated from rough recognition: the map of fuzzy recognized membership map FM and the map recording regional best-match atlases AM. From FM, a refined ROI (Rre) can be determined for the target object and image by first binarizing FM and then proper dilation. The value range of FM is [-−ω×ω, +ω×ω], indicating the cumulative votes of membership on v from all best match atlas sub-images going through v, and the threshold is set to 1, meaning that at least one more best-match atlas going through the voxel agrees it as foreground than background. The frequency of best match for each atlas is counted from AM inside Rre, and the top δ% atlases with highest frequencies compose the precision atlas set P. While counters of all atlases in I can have non-zero values, only atlases in P are examined in the next stage of refined recognition.

The implementation details of the refined recognition are similar to those of the rough recognition, except that a non-local searching strategy is used to alleviate regional individual differences and mis-registrations which are difficult to consider in image-level registration. Each voxel v inside Rre is checked to determine image I* ∈ P and non-local best match position v* ∈ Rf(v) such that I* , vKP,vRF(v)argmin{ψ(Vω,I(v),Vω,K(v))}. The target sub-image Vω,I(v), like a floating window, searches the best-match atlas inside the 3D searching range Rf(v), which is defined as an isotropic (in millimeters) range with (2r × fr + 1) × (2r × fr + 1) × (2 × fr + 1) voxels centered on the target position v. fr refers to the radius of the maximum extended searching range in voxels in the z direction, and r represents the ratio between slice spacing and 2D pixel length. Given a typical situation where the voxel size in a CT image is 1 mm × 1 mm × 2 mm and the maximum extended searching range fr is set as 2, the searching of best match should be restricted inside a range of 9 × 9 × 5 voxels.

The proposed organ recognition process contains four parameters, including experimental parameters which are the threshold θ for the (dis)similarity function ψ and the sub-image size ω, and empirical parameters which are the ratio δ% for selecting precision atlases and the non-local floating window searching range fr. The experimental parameters are object-dependent and decided by experiments on the near-normal atlas subset IRa. A leave-one-out strategy is used in rough recognition with different combinations of θ and ω, and the combination yielding the best average Dice coefficient on binarized FM is utilized in the actual recognition procedure. The empirical parameters are determined based on the representability of the atlas set. If the test image can be well represented by very few atlases, then δ% should be determined as a small ratio. Conversely, the ratio should be large if the atlas set does not contain that many patterns. If only a limited atlas set is available, the searching range fr should be large to provide more chances for sub-image matching, while a large fr may also lead to the problem of mismatching with surrounding confounding objects.

3. EXPERIMENTS, RESULTS, AND DISCUSSION

3.1. Experiments

This study was conducted following approval from the Institutional Review Board at the Hospital of the University of Pennsylvania (HUP) along with a Health Insurance Portability and Accountability Act waiver. Experiments are conducted on computed tomography (CT) images of 298 patients with cancer in Head & Neck (H&N) body region. 9 objects are considered in this work, including cervical esophagus (CtEs), cervical spinal cord (CtSC), mandible (Mnd), orohypopharynx constrictor muscle (OHPh), supraglottic/glottic larynx (SPGLx), right parotid gland (RPG), left parotid gland (LPG), right submandibular gland (RSmG), and left submandibular gland (LSmG). All object samples are divided into groups of good quality (GQ) and poor quality (PQ) in terms of whether the object and its surrounding tissue are involved by pathology or whether the imaging quality is affected by artifacts [9]. Among all subjects, 36 of them show overall good quality on all considered objects and hence compose the radiologically near-normal atlas set IRa. The voxel size of the datasets varies from 0.93 × 0.93 × 1.5 mm3 to 1.6 × 1.6 × 3 mm3. The root images determined by MST algorithm have a resolution of 1 × 1 × 3 mm3, and the sizes of all images are unified to 512 × 512 × 92 voxels after registration.

The method is N-fold cross validated on the whole dataset excluding the near-normal set IRa, from which experimental parameters are determined as shown in Table 1. Empirical parameters are set to δ%=20% and fr=2. Since the clinical contouring is done for different objects depending on the location of the tumor in clinical practice, ground truth masks for all objects are not necessarily available for each patient study. As such the dataset is separated differently for each object, typically in 2-4 folds, and cross-validation experiments are conducted with 100-200 atlases in each fold.

Table 1.

Experimentally determined parameters for sub-image size ω and similarity threshold θ for each object.

Object CtEs CtSC Mnd OHPh LPG, RPG LSmG, RSmG SpGLx
ω/ θ 9/ 400 17/ 200 5/ 1200 13/ 400 11/ 800 7/ 1200 7/ 400

Performance of rough and refined recognition in the proposed method is quantitatively evaluated with the localization error (LE) and scale error (SE), separately measuring the distance of geometric centers and ratio of object sizes between the binarized recognition maps and ground truth masks. Although binarized maps are not intended to be precise delineations of O, they can be utilized as rough indicators of the whereabouts of O in I.

3.2. Results and discussions

Quantitative results of all considered objects are listed in Table 2. The improvement can be found by comparing results of rough and refined recognition, where the refined recognition advances by considering only precision atlases and utilizing the non-local searching strategy. The localization accuracy is measured by LE, where values are typically reduced by 1-2 mm and reach about or less than 3 mm (size of one voxel) after refinement. SE values for almost all objects are moving towards 1, showing that the recognized results have similar sizes with the true object sizes. Standard deviations of both measures are reduced for almost all objects, showing that the stability of refined recognition is improved with better object matching. Most of the improvements on results are statistically significant with p-value < 0.05.

Table 2.

Mean (1st value) and standard deviation (2nd value) of quantitative results, comparing with a model based method.

Object Object-level
image quality
Number of
test samples
Rough recognition Refined recognition AAR-RT [10]
LE (mm) SE LE (mm) SE LE (mm)
CtEs GQ 171 4.971
3.259
0.969
0.103
3.205
2.368
0.971
0.073
3.7
PQ 75 5.549
3.305
0.992
0.118
3.244
2.618
0.995
0.092
15.5
Ctsc GQ 169 5.03
4.271
0.983
0.047
4.205
3.287
0.994
0.028
3.72
PQ 83 6.157
5.348
0.97
0.066
3.12
2.713
0.993
0.026
12.47
Mnd GQ 172 2.098
1.76
0.997
0.027
1.149
0.918
1
0.014
3.58
PQ 82 2.715
2.67
0.998
0.041
1.47
1.259
0.999
0.02
16.08
OHPh GQ 51 6.046
4.664
0.968
0.077
5.271
4.69
0.939
0.076
3.56
PQ 178 4.845
3.641
0.967
0.091
4.248
2.875
0.949
0.072
15.4
LPG GQ 79 3.882
2.667
1.018
0.089
2.626
1.577
0.99
0.081
4.33
PQ 96 3.879
2.659
1.013
0.102
2.534
1.558
0.993
0.086
12.63
RPG GQ 79 3.755
2.277
1.009
0.093
2.647
1.574
0.973
0.082
4.26
PQ 95 3.895
2.515
1.013
0.106
2.804
1.993
0.992
0.097
11.24
SpGLx GQ 41 6.362
4.899
1.122
0.108
3.802
3.013
1.024
0.086
3.97
PQ 55 6.018
4.006
1.132
0.15
4.31
3.043
1.017
0.107
15.99
LSmG GQ 98 4.914
3.869
1.125
0.208
3.339
3.659
1.062
0.231
3.39
PQ 31 6.055
3.902
1.127
0.158
4.881
3.686
1.084
0.164
17.33
RSmG GQ 103 5.001
3.993
1.119
0.202
3.169
2.453
1.038
0.161
3.09
PQ 30 6.142
4.559
1.206
0.236
4.765
3.903
1.13
0.269
16.3

Furthermore, recognition accuracies are similar for objects with GQ and PQ, showing that this method is less affected by image quality as long as there exist atlases with similar imperfection, and the number of such samples does not matter since only atlases with regional best-match are considered in generating the recognition map. This method outperforms some model-based methods such as our previously proposed AAR-RT method [10], where models are generated only on GQ samples and show obviously different availability towards samples with different image quality. Moreover, these results are remarkable considering the challenges posed by these objects and our own recent results on object recognition via deep-learning methods which refine initial recognition provided by model-based methods. It is truly very challenging to localize these objects within one voxel totally automatically in routine patient images obtained clinically.

Image examples are illustrated in Figures 3 for all considered objects. Ground truth masks (1st column) and fuzzy maps from the rough and refined recognition procedures (2nd and 3rd columns) are overlaid on 2D slices of gray scale images (1st row) and overlapped by ground truth contours (2nd row). The corresponding surface (for ground truth) or fuzzy volume renditions (for fuzzy recognition maps) are shown as well (3rd row). From the comparisons of the fuzzy maps, we observe an obvious tendency of sharpened boundaries from rough to refined recognition, which demonstrates the advantage of more precise atlases for the specific target object sample and the better matching introduced by the non-local floating window strategy in refined recognition.

Figure 3.

Figure 3.

Image examples for objects in the H&N region.

4. CONCLUSIONS

In this paper, we introduce a new precision atlas selection approach for automatic anatomy recognition in medical images with pathology. The proposed method starts from a viewpoint that the recognition of different parts of the target object can be taken from different atlases with best regional (and not global) similarity, and the frequency of regional best match is taken as the measure for selecting precise atlases.

The recognition process starts form atlas building followed by two-stage recognition. The atlas set is constructed via all-to-template registration to transform all atlases and the target novel images into a unified image space, where the template image is determined by the minimal spanning tree strategy from a subset of radiologically near-normal atlases. Then, rough recognition is conducted to generate fuzzy membership maps, by which a refined ROI is determined, as well as a set of precision atlases with the highest frequency of regional best-match. Subsequently, refined recognition is conducted with the refined ROI, refined atlases, and the non-local strategy for better object matching. Experiments are conducted on the H&N region with 298 patient datasets and 9 objects.

We summarize our conclusions as follows. (i) The proposed method shows high accuracy and robustness in anatomy recognition. The tendency of gradual refinement in boundary matching from rough recognition to refined recognition can be observed from both quantitative results and image examples. (ii) Samples with different object qualities show less difference in recognition accuracy, which confirms one of the key spirits of precision atlas selection that only regionally similar atlases contribute to the recognition results while other atlases won’t have any influence. (iii) Although only CT images of the H&N body regions are evaluated in the current experiments, the proposed method is applicable to other image modalities and other body regions as long as a set of atlases is available such that patterns of different portions of the test sample are able to be represented by a part of the atlas set.

Recall that the goal of this work is object localization only and not detailed delineation. The high accuracy of the output from refined recognition can be exploited by a deep-learning network to subsequently delineate objects with very high accuracy. This is a topic of a separate paper the authors are working on.

References

  • [1].Udupa JK, Odhner D, Zhao L, Tong Y, Matsumoto MM, Ciesielski KC, Falcao AX, Vaideeswaran P, Ciesielski V, Saboury B, and Mohammadianrasanani S, "Body-wide hierarchical fuzzy modeling, recognition, and delineation of anatomy in medical images," Med. Image Anal 18(5), 752–771 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Iglesias JE, and Sabuncu MR, "Multi-atlas segmentation of biomedical images: a survey," Med. Image Anal 24(1), 205–219 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Cerrolaza JJ, Picazo ML, Humbert L, Sato Y, Rueckert D, Ballester MÁG, and Linguraru MG, "Computational anatomy for multi-organ analysis in medical imaging: A review," Med. Image Anal 56, 44–67 (2019). [DOI] [PubMed] [Google Scholar]
  • [4].Jin Z, Udupa JK, and Torigian DA, "How many models/atlases are needed as priors for capturing anatomic population variations?" Med. Image Anal, 101550 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Schipaanboord B, Boukerroui D, Peressutti D, van Soest J, Lustberg T, Dekker A, van Elmpt W, and Gooding MJ, "An evaluation of atlas selection methods for atlas-based automatic segmentation in radiotherapy treatment planning," IEEE Trans. Med. Imaging, 38(11), 2654 – 2664 (2019). [DOI] [PubMed] [Google Scholar]
  • [6].Sanroma G, Wu G, Gao Y, and Shen D, "Learning to rank atlases for multiple-atlas segmentation," IEEE Trans. Med. Imaging 33(10), 1939–1953 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Aljabar P, Heckemann RA, Hammers A, Hajnal JV, and Rueckert D, "Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy," Neuroimage 46(3), 726–738 (2009). [DOI] [PubMed] [Google Scholar]
  • [8].Grevera GJ, Udupa JK, Odhner D, and Torigian DA, "Optimal atlas construction through hierarchical image registration," Medical Imaging 2016: Image-Guided Procedures, Robotic Interventions, and Modeling 9786, 97862C (2016). [Google Scholar]
  • [9].Pednekar GV, Udupa JK, McLaughlin DJ, Wu X, Tong Y, Simone CB II, …, and Torigian DA, "Image quality and segmentation," In Medical Imaging 2018: Image-Guided Procedures, Robotic Interventions, and Modeling, 10576, 105762N (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Wu X, Udupa JK, Tong Y, Odhner D, Pednekar GV, Simone CB II, …, and Torigian DA, "AAR-RT - A system for auto-contouring organs at risk on CT images for radiation therapy planning: Principles, design, and large-scale evaluation on head-and-neck and thoracic cancer cases," Medical Image Analysis 54, 45–62 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES