Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Mar 1.
Published in final edited form as: Ann Biomed Eng. 2010 Jun 23;38(12):3581–3591. doi: 10.1007/s10439-010-0103-6

An Automated Segmentation Approach for Highlighting the Histological Complexity of Human Lung Cancer

J C Sieren 1,3,5, J Weydert 2, A Bell 2, B De Young 2, A R Smith 1,5, J Thiesse 1, E Namati 1, Geoffrey McLennan 1,3,4,5
PMCID: PMC2996273  NIHMSID: NIHMS251946  PMID: 20571856

Abstract

Lung cancer nodules, particularly adenocarcinoma, contain a complex intermixing of cellular tissue types: incorporating cancer cells, fibroblastic stromal tissue, and inactive fibrosis. Quantitative proportions and distributions of the various tissue types may be insightful for understanding lung cancer growth, classification, and prognostic factors. However, current methods of histological assessment are qualitative and provide limited opportunity to systematically evaluate the relevance of lung nodule cellular heterogeneity. In this study we present both a manual and an automatic method for segmentation of tissue types in histological sections of resected human lung cancer nodules. A specialized staining approach incorporating immunohistochemistry with a modified Masson's Trichrome counterstain was employed to maximize color contrast in the tissue samples for automated segmentation. The developed, clustering-based, fully automated segmentation approach segments complete lung nodule cross-sectional histology slides in less than 1 min, compared to manual segmentation which requires multiple hours to complete. We found the accuracy of the automated approach to be comparable to that of the manual segmentation with the added advantages of improved time efficiency, removal of susceptibility to human error, and 100% repeatability.

Keywords: Adenocarcinoma, Lung nodule, Histology, Immunohistochemistry, Computer analysis, Pathology, Quantification

Introduction

The World Health Organization (WHO) has established and regularly amends classification guidelines for lung cancer.1 The principle of classifying lung cancers is to group patient populations with similar prognoses allowing the development and application of treatment approaches most likely to improve the clinical outcome. The WHO guidelines assist in the systematic categorization of lung cancers based on the origin of cancerous cells and their configuration. However, a cancer nodule is a complex biomass, containing cancerous cells as well as a variety of other cellular tissue types including but not limited to fibrosis and necrotic tissue. The arrangement of these various cellular tissue types in regard to proportions, adjacency, and intermixing is important to understand cancer growth and may significantly impact patient diagnosis, prognosis, and treatment planning.

In order to gain a comprehensive understanding of the content and architecture of the tissue type distributions throughout a nodule biomass, the cellular content of the nodule needs to be identified and segmented. Developing a method for the controlled evaluation of lung nodule architecture is advantageous to the clinical and research environment. A clear understanding of the composition and relationships between the histological tissue types comprising the lung nodule biomass is necessary to extend the knowledge of cancer growth patterns, clarify lung cancer classifications and outcomes, and to set sampling standards for biopsy-based diagnosis approaches.

Recent studies have found a valuable link between the proportion of tissue types, such as fibrotic component, within a lung nodule and prognosis.6,12,16,20,2325 Maeshima et al. found the fibrotic proportion of nodule content was an independent prognostic factor, with direct correlation between increased fibrotic component and decreased survival.12 Suzuki et al. reported a similar finding in patients with primary lung adenocarcinomas of less than 3 cm. They found a 100% 5-year survival rate for patients with less that 5 mm diameter central fibrosis, and less than 74% 5-year survival for patients with a larger central fibrosis region.23 Necrosis was found to be a negative prognostic factor in a study examining small-sized (<2 cm) adenocarcinoma of the lung performed by Inoue et al.6 Furthermore, tumors with a necrotic component were found to have an elevated incidence of nodal invasion.

However, all of these studies have relied on the qualitative evaluation of histopathological sections to score or grade the appearance of a particular tissue type which only provides a limited understanding of the nodules complete composition. A highly valuable expansion of these previously reported studies is now achievable using the automated segmentation approach for the evaluation of lung cancer nodules, presented in this article. The developed segmentation approach is quantitative in nature and is not dependent on a human observer. In addition, the approach permits not only the evaluation of tissue type proportions but also the relationship between the present tissue types, across the whole nodule volume.

Characterization of the complex structural content of cancer nodules has been limited to sampling of two-dimensional (2D) gold-standard histopathology data, with the assumption that the 2D perspective may be extrapolated to represent the volume. With advances in volumetric imaging techniques such as multi-row detector computerized tomography (MDCT), magnetic resonance imaging (MRI), and positron emission tomography (PET) imaging, it is becoming increasingly evident that corresponding volumetric histopathology would be highly valued. A comprehensive volumetric histopathology dataset for lung cancer nodules would overcome the sampling limitation for structural characterization and is required to assess the effectiveness of using random histological sampling to deduce the true nodule tissue architecture.

An extensive process model for the collection, manipulation, and image acquisition of volumetric multi-modality datasets of lung nodules has been developed.21 A challenge with the designed process model was the time-consuming and subjective task of requiring expert pathologists to manually segment the tissue types of interest in the histology datasets. In this study we report the development of an automated technique for segmenting and labeling lung nodule tissue types from histology.

Previous studies investigating automated analysis of tumor histopathology have focused on the identification and assessment of nuclear morphology using standard Hematoxylin and Eosin (H&E) staining7,8,18,27 or immunohistochemical staining2 for distinguishing between benign and malignant tissue samples. The outputs from these studies are aimed at incorporation into the clinical environment to assist with the automated analysis of biopsy samples, in particular for breast cancer. The aim of this study, however, is to develop a method in which a greater understanding can be achieved not only by the cancerous portion of lung nodules, but also the content and relationships within the complete biomass.

To assist in the automated segmentation of the cancerous and non-cancerous nodule content, we have explored and established a specialized staining approach which incorporates immunohistochemistry with a counterstain based on Masson's Trichrome. Masson's Trichrome is a useful three-color staining approach for generating high level contrast in histological data, in particular between collagen and other eosinophilic structures.26 It has been widely applied in the areas of highlighting scar tissue, such as glomerulosclerosis of the kidney11,13,22 or cirrhosis of the liver.3,10,19 Combined immunohistochemistry and Masson's Trichrome staining have been previously utilized to assist in the assessment of myocardial infarction using both sequential sections17 and by combining the immunohistochemistry with use of Masson's Trichrome as a counterstain.9 However, the application of this combined staining approach to cancer data for assisting automated segmentation has not been previously described.

Methods

Image Data

Eleven lobectomy specimens containing suspicious pulmonary nodules identified from radiographs were obtained from consented patients, according to the University of Iowa Institutional Review Board approval. Of these 11 lobectomy specimens all were found to contain cancerous nodules, seven were adenocarcinomas, three squamous cell carcinomas, and one neuroendocrine carcinoma. The excised lobes were processed as previously described by Sieren et al.21 In brief, each lobectomy specimen was cannulated through the main airway and inflation-fixed using a modified Heitzmen technique. Following fixation the nodules were isolated from the surrounding lobe via gross dissection. A custom built system, termed the large image microscope array (LIMA), was utilized to sequentially image and section throughout the entire nodule mass at 250 μm intervals.15 Each 250 μm nodule section underwent paraffin processing, was embedded and sectioned to generate a series of slides with 4 μm histology sections. Hence, the resulting histological dataset consisted of 4 μm histological sections at 250 μm intervals. Staining was performed and included one slide stained using standard H&E and additional slides stained with customized staining protocols established for the automated segmentation algorithm, as discussed below. The complete field of view of each generated histology slide, typically between 1 to 4 cm2, was digitized using a ScanScope slide digitizer (Aperio Technology Inc, Vista, CA) at a magnification of 20× resulting in an image pixel resolution of 2.54 μm.

Standard histological processing induces tissue distortion which prevents direct and accurate correlation between individual sections taken throughout the nodule. This is even more problematic in lung tissue, which has porous tissue architecture. Due to the unique imaging and sectioning process of the LIMA system coupled with the fixation process, the LIMA datasets contain no significant deformation of the tissue structure and can be used as a reliable basis to correct for the distorted histology data. Each digitized histology section image was non-rigidly registered via a landmark driven thin plate spline algorithm to the corresponding LIMA image.21 Thus any disruption to the structure caused by histological processing was corrected, restoring the 3D relationship between individual histology sections.21

Manual Segmentation

The most widely used staining approach for clinical diagnosis of lung cancer is H&E. A surgical pathologist with a sub-specialty expertise in pulmonary pathology analyzed the digitized H&E histological images (J.W.). An Intuos graphics tablet and pen (Wacom, Vancouver, WA) were used to generate image masks corresponding to the different tissues types. An example of a tissue type mask generated by the manual segmentation of an H&E section of lung adenocarcinoma tumor is shown in Fig. 1. The cellular-based tissue types identified and segmented included; solid regions of cancerous tumor cells, cancerous tumor cells in a bronchoalveolar carcinoma (BAC) configuration, necrotic cells, active fibroblastic stromal tissue, inactive (hyaline) fibrosis, blood, and normal tissue.

FIGURE 1.

FIGURE 1

A pathologist with a pulmonary sub-specialty manually traced the digitized H&E histological sections, resulting in a set of individual tissue type maps (binary images), which can be combined into a single representative image. The tissue type classes identified by the pathologist included cancerous tumor (in both: solid and bronchoalveolar presentations), necrosis, active and inactive fibrotic tissue, red blood cells, and normal tissue. This image segmentation required approximately two and a half hours to complete (original image magnification of 20×).

For the manual tracing of these tissue classes, a set of definitions were established to promote consistency across the datasets. Cancer (solid) was identified as a solid grouping of cancer cells. Cancer (BAC) was defined as cancer cells following a BAC invasion pattern, which is a non-solid, alveolar pattern. Dead cells of any origin, including cancerous, were identified as necrosis. Tissue areas in which greater than 50% of the cells were fibroblasts were assigned the class of active fibrosis. Inactive fibrosis was defined as fibroblasts intermixed with collagen in which less than 50% of the cells were fibroblasts. Blood was identified as groupings of erythrocytes (red blood cells).

The task of manually tracing histology sections was extremely time consuming, with each section taking between 1 and 3 h to complete. This was a costly process as a pathologist with expertise in pulmonary pathology was required to conduct these tracings due to the complexity involved in defining the tissue boundaries. Due to the human observer component there are likely some inconsistencies in the definition of tissue type boundaries and there also exists a trade-off between the level of detail and the time required to complete the traced maps.

Automated Segmentation

A Specialized Staining Approach

Cytokeratins (CK), found in the cytoskeleton of epithelial cells, are effective in identifying epithelial derived carcinoma. Furthermore, different combinations of the 20 different forms of human CK can be used to characterize poorly differentiated carcinoma.14 Hence CK was chosen as the immunohistochemical target for highlighting cancerous tumor cells. It should be noted that CK immunohistochemistry does not selectively target neoplastic epithelial cells (vs. normal epithelial cells); however, in this study cohort it is expected that normal, non-neoplastic cells make up an insignificant portion of the biomass, since we are almost exclusively in alveolar regions of the peripheral lung. To ensure a positive immunohistochemical staining of all the cancerous tumor cells in any NSCLC nodule, a number of CK weights were targeted. A CK cocktail was used consisting of monoclonal mouse antibodies against CK 7 (1:50, Dako Corporation, Carpinteria, CA), CK 8/18 (1:200, Abcam, Cambridge, MA), and a pan-CK antibody; AE1/AE3 (1:200, Chemicon Int, Tenecula, CA). Localization was achieved with a secondary antibody of goat anti-mouse horseradish peroxide and reacted with DAB. In order to produce the most consistent and dependable immunohistochemical staining possible an automated immunostaining system (Dako Corporation, Carpinteria, CA) was utilized. Each staining run was accompanied by a slide labeled with a non-specific IgG antibody of the same isotype and concentration as the primary antibody, to serve as a control.

Hematoxylin and variations of the Masson's Trichrome were explored for counterstaining suitability. Hematoxylin is a simple nuclear stain and hence highlighted the nuclei of all cell types present. Masson's Trichrome is a more complicated staining process involving multiple dyes for the differentiation of muscle, collagen, fibrin, and erythrocytes. The reagents for this staining technique include: Bouin's fixative, Biebrich Scarlet, Weigert's Iron Hematoxylin, Phosphotungstic–Phosphomolybdic acid solution, and Aniline Blue and when applied sequentially the resulting stain colors nuclei—black, cytoplasm—pink, erythrocytes and muscle—red, and collagen—blue. Two modifications of the traditional Masson's Trichrome counterstain were explored. The first (Mod 1) involved the exclusion of the Bouin's fixative step and the second (Mod 2) included the Bouin's fixative but excluded the Weigert's Iron Hematoxylin. Of these staining combinations, the pan-CK cocktail counterstained with the Masson's Trichrome (Mod 2) counterstain was found to produce the greatest contrast between the tissue types of interest. Of particular advantage was the increased contrast between areas of inactive and active fibrosis with dense collagen fibers stained blue and predominantly elastic type matrix stained pink. To increase the consistency of the counterstaining process an automated slide stainer, DRS-601 (Sakura Finetek, CA, USA), was used. The resulting developed staining protocol combining immunohistochemistry with a modified Masson's Trichrome counterstain required approximately eight times the processing time and was approximately six times more expensive to generate than a standard H&E slide.

Algorithm Design

The immunohistochemical staining approach was developed to increase the contrast between different tissue types with in a nodule. The chosen staining approach using immunohistochemistry (pan-CK cocktail primary) followed by a modified Masson's Trichrome counterstain produced the greatest color contrast. Automated staining equipment was utilized for the application of the developed staining approach in an effort to minimize the variation in stain intensity between sections and also between nodule datasets. While these efforts proved successful in minimizing the stain variation between sections of a single dataset, the staining across datasets varied greatly. This proved to be a significant challenge in the development of an automated segmentation approach as the variation in the staining intensity across the datasets was larger than the separation of the color-based feature set.

The developed algorithm used the Lab and HSV color spaces as features for the segmentation of the tissue types. Both these color systems differ from the RGB color space in that they were developed to more closely represent the human perception of color. The a and b axes of the Lab system mark the variation from red to green and from yellow to blue, respectively, while the third channel (L) reflects the luminance. In the HSV color model the three channels represent hue, saturation, and value (intensity). In the case of features for the classification of immunohistochemically stained samples the Lab color space is advantageous in that the luminance can be excluded and color is explained by orthogonal axes. The HSV model is preferable when a single value is desired to represent the hue in the image. Figure 2 shows an example of representation of the colors in the developed immunohistochemical stain approach, in the RGB, Lab, and HSV color space.

FIGURE 2.

FIGURE 2

A small histology sample, immunohistochemically stained with the pan-CK with Masson's Trichrome (Mod 2) counterstain, is shown along with the corresponding represented in the Red–Green–Blue (RGB), CIE L*a*b* (Lab), and Hue-Saturation-Value (HSV) color spaces.

The large variance in the staining of the different nodule cases excluded supervised classification approaches for the segmentation of the complete, multi-nodule, immunohistochemical dataset. A k-means clustering approach was chosen as a suitable unsupervised algorithm capable of accommodating the variations in staining across the nodule datasets.4,5,7 The k-means clustering algorithm attempts to optimally partitioned data into a set number of natural groups, k.5 In general, this is achieved by initializing k centroids, c. Each point in the data is assigned to a group based on its proximity to that group's centroid. Once all points are assigned the updated centroids for each group are calculated and the process is repeated until there is no change in the location of the centroids—indicating a stable partitioning of the data into k groups. The first step in the developed segmentation approach was to use k-means clustering to segment the data into regions, using the a and b channels from the Lab color space. The Euclidean distance measure was used to determine the closest centroid for each point. To avoid the occurrence of partitions at local minima, the clustering was repeated five times. At each repetition, the total distance from all points to their centroid was calculated and the partition result with the lowest total distance was chosen.

The number of groups, k, was initialized by the user but was restricted to either 3 or 4 groups. All complete histopathological slides contained at least three tissue classes along with background data (k = 4). However, not all the small sample images used for validation contained background pixels (k = 3). All the histopathological slides in the dataset contained cancerous tumor, inactive fibrosis, and active fibrosis. The color separation of these tissue types, based on the devised staining approach, was the greatest and hence directly correlated to the partitioning of the cluster algorithm. However, the labeling of the regions output from the k-means clustering algorithm was randomly assigned and bore no reference to the properties of the feature set. Hence, a hue-based labeling step was created to assign labels based on the mean hue of the regions.

The mean hue for each region identified by the k-means clustering algorithm was calculated and these values sorted in descending order. The region with the highest mean hue was assigned a “mixed” class label with a pixel value of 1. “Active fibrosis” and “inactive fibrosis” class labels were assigned to the following ranked hue values and identified by pixel values of 2 and 3, respectively. Finally, if a k of 4 was selected, the lowest ranked region based on mean hue was assigned to “background,” with a pixel label of 4. This process is summarized in Fig. 3.

FIGURE 3.

FIGURE 3

Summary diagram for the automated segmentation approach designed to identify tissue types within immunohistochemically stained lung cancer nodule histology.

The accurate labeling of the cancerous tumor tissue was deemed the highest priority for the automated segmentation technique; however, based on the k-means clustering approach, the blood and cancerous tumor pixels were grouped together. A second-pass labeling approach was incorporated to further classify the mixed class into cancerous tumor and blood. A binary mask image was created containing only pixels with a “mixed” tissue type label. This binary mask contained many sub-regions, most of which corresponded to cancerous tumor and a few of which corresponded to red blood cells. Connected component labeling was used to assign new labels to each of these individual regions. For each new region the mode of the a channel (from Lab color space) and the area was calculated. Regions with a mode a value greater than 0.65 and an area greater than 5 pixels were assigned as “red blood cells” and given a pixel value of 5 in the original label image. The threshold of 0.65 for the mode a value separating the cancerous tumor and blood classes was empirically determined by locating the average minima of the mode a histogram from a number of sample images.

Mode filtering with a two-by-two neighborhood was applied to the final labeled image so that single pixel regions were removed. Some degree of smoothing of the image was desired as many single pixel regions would disrupt the further analysis of the dataset. For illustrative purposes, each label in the resulting segmented dataset was assigned the same color as used for the manually segmented result, making the distinction between classes clearer to view.

For validation purposes, a testing set was created. Twenty-five test sample images were randomly selected throughout the complete, multi-nodule immunohistochemical dataset. Two surgical pathologists (J.W. and A.B.) manually traced the tissue types present in the test samples using the Intuos graphics tablet and pen. The test sample images were limited to 200 by 200 pixels so that a high level of detail could be obtained through manual tracing, in approximately 15 min per sample image. Figure 4 illustrates five example cases from the testing set. This figure illustrates the five stained sample images as well as the corresponding three segmented images, achieved through manual tracing by Observer 1, manual tracing by Observer 2, and the result of the automated segmentation approach. The testing set permitted not only the qualitative comparison between the segmentation approaches (as reflected in Fig. 4 for five images) but also a qualitative assessment using confusion matrices.

FIGURE 4.

FIGURE 4

Some examples of the test sample images used for validation purposes. Two surgical pathologist observers manually traced regions of the different tissue types. These results were compared to the output of the automated classification approach.

Results

The developed algorithm effectively overcame the challenge of accommodating different staining intensities across datasets and was able to successfully segment cancerous tumor, inactive fibrosis, active fibrosis, and blood using the developed immunohistochemical staining approach. The approach was time efficient, delivering a segmentation result for a complete histopathology section in less than 1 min. Figure 5 illustrates the automated segmentation result for two sample slices, from two different nodule cases.

FIGURE 5.

FIGURE 5

Examples of the automated classification approach applied to large-scale data, for two different adenocarcinoma cases. The complete cross-section of the nodule was stained and digitized (left) with an original magnification of 20×, and the automated classification approach was applied to the complete field of view, allowing the segmentation of the complete nodule cross-section to be displayed as a color coded map (right). Case 1 (top) showed strong immunohistochemical staining of the cancerous tumor regions, while Case 2 (bottom) had much weaker immunohistochemical staining of cancerous regions. However, the clustering-based automated segmentations technique was able to adapt to this staining variation and produce an accurate segmentation for both datasets in less than 1 min.

Confusion matrices were generated for the evaluation and comparison of the observer performances and the automated generated segmentation based on the validation testing set. The confusion matrix compares the classification via two methods and clearly illustrates the percentage of pixels in the testing set for which an agreement of tissue class label was achieved (percentages along the diagonal of the matrix). Perhaps more insightful is the illustration of the percentage disagreements between classifications. This can be useful in highlighting tissue classes which are challenging to distinguish from each other. Three matrices were generated, comparing the classification from Observer 1 to Observer 2, Observer 1 and the automated result and Observer 2 and the automated result (Table 1).

TABLE 1.

Confusion matrices assist in the comparison between the manual and automated segmentation results; these matrices show the percentage of pixels classes into each category by the segmentation approach.

Manual tracing (Observer 1)

Manual tracing (Observer 2) Tumor (solid) (%) Active fibrosis (%) Inactive fibrosis (%) Blood (%) Background (%) Necrosis (%) Normal (%)
Tumor (solid) 17.7 1.6 0.8 0 0 0 0
Active fibrosis 2.8 28.4 7 0 0 0 0.1
Inactive fibrosis 0.5 10.7 24.5 0.2 0.2 0 0
Blood 0.1 0.2 0.2 1 0 0 0
Background 0.9 0.1 0 0 0 0 0.1
Necrosis 0.3 0 0 0 0 0.1 0
Normal 0 0.7 0.2 0 0.2 0 1.3
Manual tracing (Observer 1)

Automated segmentation Tumor (solid) (%) Active fibrosis (%) Inactive fibrosis (%) Blood (%) Background (%) Necrosis (%) Normal (%)

Tumor (solid) 18.3 3.5 0.7 0.2 0.1 0 0
Active fibrosis 3.6 33 11.7 0.2 0.3 0.1 0.8
Inactive fibrosis 0.3 5.2 20.2 0.1 0 0 0.7
Blood 0 0 0.1 0.7 0 0 0
Background 0 0 0 0 0 0 0
Necrosis 0 0 0 0 0 0 0
Normal 0 0 0 0 0 0 0
Manual tracing (Observer 2)

Automated segmentation Tumor (solid) (%) Active fibrosis (%) Inactive fibrosis (%) Blood (%) Background (%) Necrosis (%) Normal (%)

Tumor (solid) 17.1 4.1 0.4 0.2 0.7 0.2 0.2
Active fibrosis 2.8 30 14.6 0.4 0.3 0.2 1.5
Inactive fibrosis 0.3 4.2 21 0.1 0.2 0 0.7
Blood 0 0 0.1 0.7 0 0 0
Background 0 0 0 0 0 0 0
Necrosis 0 0 0 0 0 0 0
Normal 0 0 0 0 0 0 0

The percentages along the matrix diagonal show the agreement between the two approaches, and hence the summation along the diagonal reveals the accuracy. The accuracy was 73% between Observer 1 and Observer 2; 72% between Observer 1 and the automated approach and 69% between Observer 2 and the automated result.

Classification accuracies can be easily determined from the confusion matrices by summation across the diagonal. The comparison of the two observer tracings to each other revealed an accuracy of 73% between the pixel classifications across all tissues. An accuracy of 72% was found between the pixel classifications of Observer 1 and the automated result and 69% between Observer 2 and the automated result.

The relative sensitivity and specificity for the classification of each tissue class was also calculated. Sensitivity is mathematically defined as the number of true positives divided by the number of true positives plus the number of false negatives. Hence, sensitivity was calculated as the agreement between the two assessment methods (number of pixels labeled by both methods as the same tissue class) divided by the total number of pixels assigned to that tissue class by either one or both of the assessment methods. Specificity is mathematically defined as the number of true negatives divided by the number true negatives plus false positives. Comparable sensitivity and specificity results were obtained using the automated segmentation technique when compared to the manual tracings. The average sensitivity and specificity, with standard error, of the segmentation result for each tissue type class are graphed in Figs. 6 and 7.

FIGURE 6.

FIGURE 6

Comparison of the sensitivity for each tissue type class with respect to the three segmentation approaches: manual segmentation by Observer 1, manual segmentation by Observer 2, and the developed automated segmentation algorithm.

FIGURE 7.

FIGURE 7

Comparison of the specificity for each tissue type class with respect to the segmentation approaches.

The repeatability of the automated segmentation approach was also tested by repeatedly running the algorithm, six times over the testing set and comparing the confusion matrices. A standard error of 0 ± 0 pixels was found for all tissue classes, for all test images indicating 100% repeatability of the algorithm.

Discussion

We have presented a manual and an automated methodology for generating detailed volumetric histopathology segmentations for lung cancer nodules. The segmentations of the nodules are designed to highlight, both visually and quantitatively, the relative proportions and distributions of a variety of histological tissue types commonly present.

The described specialized staining approach combined immunostaining, targeting multi-weighted CK with a modification of a classic Masson's Trichrome counterstain. This stain approach resulted in a high level of color contrast between with histological tissue classes for segmentation, including cancerous tumor (maroon), red blood cells (red), active fibrotic tissue which was identified as predominantly elastic type matrix (pink), and inactive fibrotic tissue consisting of dense collagen fibers (blue), highly suited to computer controlled segmentation of these tissues. A weakness of the presented staining is the variability in stain intensity present across the lung nodule cases, largely due to the specific tumor binding of the multiple antibodies against various weighted CK. This weakness in the staining approach was overcome through the selection of a clustering approach for the automated segmentation algorithm. The preparation of slides with this staining technique, as opposed to standard H&E processing, requires significantly longer processing time as well as more expensive reagents. However, for research purposes the advantages of automated segmentation of these data outweigh the added time and expense for slide preparation. The incorporation of a strategy based on this technique into the clinical environment may assess these advantages and disadvantages differently.

The possible methods for the automated segmentation of data are extensive, each with specific advantages and limitations. The clustering approach utilized as the basis for the segmentation of the tissue types was advantageous, as this technique delivered consistent results despite a variation in staining intensity. In addition the segmentation was not reliant on training data. Training data are used for supervised segmentation approaches, and requires an extensive dataset in which labels have been assigned. For the automated analysis of histological data, generating a labeled training dataset, which accurately depicts the wide variety of nodule cases and staining variations, is not practically conceivable and the assignment of labels would likely have to be performed manually by a human observer. Therefore, the unsupervised approach of clustering, which avoids the reliance on any a priori knowledge about the tissue types, is a great benefit.

The performance of the automated approach was compared to that of two surgical pathologist observers who manually traced tissue regions. It can be seen in the analysis of the segmentation results generated by the two pathologists (Fig. 4 and Table 1) that the task of segmenting tissue types is subjective. Comparable accuracy, sensitivity and specificity levels were achieved between the automated segmentation and the two observers. Additional advantages of the automated approach over manual tracing involved a consistent and repeatable result performed in a decreased processing time. The segmentation of a digitized slide took between 1 and 3 h for manual tracing and less than 1 min for the automated result.

Along with the advantages the developed approach also has limitations. The algorithm was developed to segment malignant lung nodules and is dependant on the assumption that at least three tissue types are present within the nodule. This assumption is built upon a priori knowledge of tissue composition of lung cancer nodules and has performed effectively for the cases examined to date; however, it is a potential limitation of the algorithm. In particular, this algorithm would not perform appropriately if presented with a benign nodule case.

The current established method for acquiring human lung cancer nodules is from patients undergoing surgical resection via lobectomy, of a biopsy diagnosed cancerous nodule. Hence, our current protocol does not provide access to non-cancerous nodules. Modifications to this protocol are planned for the future to broaden the scope of nodule cases to include non-cancerous as well as cancerous nodules, for which a first-pass categorizing step will be added to the protocol to determine whether the nodule is cancerous or non-cancerous.

A second limitation of the developed automated segmentation algorithm approach is the inability to segment necrosis and normal airway and vessel wall tissues. At this time, the current interest lies in the cancerous and fibrotic tissues and hence the algorithm is applicable for this purpose. Recently, the necrotic portion of the tumor has been linked to prognosis and hence further development of our automated segmentation approach will be conducted to improve the distinction of necrosis.6 Texture analysis, with or without adjustments to the staining protocol, will be explored for this modification.

The developed segmentation approach assists in quantifying histological content of lung cancer nodules in a consistent, reliable, and time efficient manner with similar accuracy to expert human observers. This approach will be highly useful in future studies exploring the prognostic significance of the histological tissue type compositions in lung cancer as well as presenting the opportunity to quantifiably examine the relationship between tissue types.

Acknowledgments

The authors thank Dr. M. Iannettoni for support of this research and assistance with patient identification. We also thank Dr. L. Van Natta, Dr. W. Lynch, Dr. K. Parekh, Ms. J. Rick-McGillin, and Ms. K. McLauglin for assisting patient recruitment; Ms. J. Rodgers, Ms. K. Walters, and Mr. A. Stessman for technical assistance with histopathological preparation. Research for this project was supported by funding from the National Institutes of Health (R01 CA129022).

Footnotes

The authors have no conflict of interest.

References

  • 1.Brambilla E, Travis WD, Colby TV, Corrin B, Shimosato Y. The new World Health Organization classification of lung tumours. Eur Respir J. 2001;18:1059–1068. doi: 10.1183/09031936.01.00275301. [DOI] [PubMed] [Google Scholar]
  • 2.Di Cataldo S, Ficarra E, Acquaviva A, Macii E. Achieving the way for automated segmentation of nuclei in cancer tissue images through morphology-based approach: a quantitative evaluation. Comput Med Imaging Graph. 2010 doi: 10.1016/j.compmedimag.2009.12.008. in press. [DOI] [PubMed] [Google Scholar]
  • 3.Dufour JF, DeLellis R, Kaplan MM. Reversibility of hepatic fibrosis in autoimmune hepatitis. Ann Intern Med. 1997;127:981–985. doi: 10.7326/0003-4819-127-11-199712010-00006. [DOI] [PubMed] [Google Scholar]
  • 4.Guillemin F, Devaux M, Guillon F. Evaluation of plant histology by automatic clustering based on individual cell morphological features. Image Anal Stereol. 2004;23:13–22. [Google Scholar]
  • 5.Hartigan A, Wong MA. A k-means clustering algorithm. Appl Stat. 1979;28:100–108. [Google Scholar]
  • 6.Inoue M, Takakuwa T, Minami M, Shiono H, Utsumi T, Kadota Y, Nasu T, Aozasa K, Okumura M. Clinicopathologic factors influencing postoperative prognosis in patients with small-sized adenocarcinoma of the lung. J Thorac Cardiovasc Surg. 2008;135:830–836. doi: 10.1016/j.jtcvs.2007.10.034. [DOI] [PubMed] [Google Scholar]
  • 7.Karacali B, Vamvakidou AP, Tozeren A. Automated recognition of cell phenotypes in histology images based on membrane- and nuclei-targeting biomarkers. BMC Med Imaging. 2007;7:7. doi: 10.1186/1471-2342-7-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Latson L, Sebek B, Powell KA. Automated cell nuclear segmentation in color images of hematoxylin and eosin-stained breast biopsy. Anal Quant Cytol Histol. 2003;25:321–331. [PubMed] [Google Scholar]
  • 9.Lazarous DF, Shou M, Unger EF. Combined bromodeoxyuridine immunohistochemistry and Masson trichrome staining: facilitated detection of cell proliferation in viable vs. Infarcted myocardium. Biotech Histochem. 1992;67:253–255. doi: 10.3109/10520299209110032. [DOI] [PubMed] [Google Scholar]
  • 10.Lee WM. Acute liver failure. N Engl J Med. 1993;329:1862–1872. doi: 10.1056/NEJM199312163292508. [DOI] [PubMed] [Google Scholar]
  • 11.Ma LJ, Fogo AB. Model of robust induction of glomerulosclerosis in mice: importance of genetic background. Kidney Int. 2003;64:350–355. doi: 10.1046/j.1523-1755.2003.00058.x. [DOI] [PubMed] [Google Scholar]
  • 12.Maeshima AM, Niki T, Maeshima A, Yamada T, Kondo H, Matsuno Y. Modified scar grade: a prognostic indicator in small peripheral lung adenocarcinoma. Cancer. 2002;95:2546–2554. doi: 10.1002/cncr.11006. [DOI] [PubMed] [Google Scholar]
  • 13.Meyrier A. Mechanisms of disease: focal segmental glomerulosclerosis. Nat Clin Pract Nephrol. 2005;1:44–54. doi: 10.1038/ncpneph0025. [DOI] [PubMed] [Google Scholar]
  • 14.Moll R, Franke WW, Schiller DL, Geiger B, Krepler R. The catalog of human cytokeratins: patterns of expression in normal epithelia, tumors and cultured cells. Cell. 1982;31:11–24. doi: 10.1016/0092-8674(82)90400-7. [DOI] [PubMed] [Google Scholar]
  • 15.Namati E, De Ryk J, Thiesse J, Towfic Z, Hoffman E, McLennan G. Large image microscope array for the compilation of multimodality whole organ image databases. Anat Rec (Hoboken) 2007;290:1377–1387. doi: 10.1002/ar.20600. [DOI] [PubMed] [Google Scholar]
  • 16.Okudera K, Kamata Y, Takanashi S, Hasegawa Y, Tsushima T, Ogura Y, Nakanishi K, Sato H, Okumura K. Small adenocarcinoma of the lung: prognostic significance of central fibrosis chiefly because of its association with angiogenesis and lymphangiogenesis. Pathol Int. 2006;56:494–502. doi: 10.1111/j.1440-1827.2006.01997.x. [DOI] [PubMed] [Google Scholar]
  • 17.Ouyang J, Guzman M, Desoto-Lapaix F, Pincus MR, Wieczorek R. Utility of desmin and a Masson's trichrome method to detect early acute myocardial infarction in autopsy tissues. Int J Clin Exp Pathol. 2009;3:98–105. [PMC free article] [PubMed] [Google Scholar]
  • 18.Petushi S, Garcia FU, Haber MM, Katsinis C, Tozeren A. Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer. BMC Med Imaging. 2006;6:14. doi: 10.1186/1471-2342-6-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rudolph KL, Chang S, Millard M, Schreiber-Agus N, DePinho RA. Inhibition of experimental liver cirrhosis in mice by telomerase gene delivery. Science. 2000;287:1253–1258. doi: 10.1126/science.287.5456.1253. [DOI] [PubMed] [Google Scholar]
  • 20.Sakao Y, Miyamoto H, Sakuraba M, Oh T, Shiomi K, Sonobe S, Izumi H. Prognostic significance of a histologic subtype in small adenocarcinoma of the lung: the impact of nonbronchioloalveolar carcinoma components. Ann Thorac Surg. 2007;83:209–214. doi: 10.1016/j.athoracsur.2006.07.051. [DOI] [PubMed] [Google Scholar]
  • 21.Sieren JC, Weydert J, Namati E, Thiesse J, Sieren JP, Reinhardt JM, Hoffman E, McLennan G. A process model for direct correlation between computed tomography and histopathology—application in lung cancer. Acad Radiol. 2009;17:169–180. doi: 10.1016/j.acra.2009.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sugimoto H, Grahovac G, Zeisberg M, Kalluri R. Renal fibrosis and glomerulosclerosis in a new mouse model of diabetic nephropathy and its regression by bone morphogenic protein-7 and advanced glycation end product inhibitors. Diabetes. 2007;56:1825–1833. doi: 10.2337/db06-1226. [DOI] [PubMed] [Google Scholar]
  • 23.Suzuki K, Yokose T, Yoshida J, Nishimura M, Takahashi K, Nagai K, Nishiwaki Y. Prognostic significance of the size of central fibrosis in peripheral adenocarcinoma of the lung. Ann Thorac Surg. 2000;69:893–897. doi: 10.1016/s0003-4975(99)01331-4. [DOI] [PubMed] [Google Scholar]
  • 24.Terasaki H, Niki T, Matsuno Y, Yamada T, Maeshima A, Asamura H, Hayabuchi N, Hirohashi S. Lung adenocarcinoma with mixed bronchioloalveolar and invasive components: clinicopathological features, subclassification by extent of invasive foci, and immunohistochemical characterization. Am J Surg Pathol. 2003;27:937–951. doi: 10.1097/00000478-200307000-00009. [DOI] [PubMed] [Google Scholar]
  • 25.Travis WD, Garg K, Franklin WA, Wistuba II, Sabloff B, Noguchi M, Kakinuma R, Zakowski M, Ginsberg M, Padera R, Jacobson F, Johnson BE, Hirsch F, Brambilla E, Flieder DB, Geisinger KR, Thunnisen F, Kerr K, Yankelevitz D, Franks TJ, Galvin JR, Henderson DW, Nicholson AG, Hasleton PS, Roggli V, Tsao MS, Cappuzzo F, Vazquez M. Evolving concepts in the pathology and computed tomography imaging of lung adenocarcinoma and bronchioloalveolar carcinoma. J Clin Oncol. 2005;23:3279–3287. doi: 10.1200/JCO.2005.15.776. [DOI] [PubMed] [Google Scholar]
  • 26.Wick MR. Diagnostic Histochemistry. Cambridge: Cambridge University Press; 2008. p. 20. [Google Scholar]
  • 27.Wolberg WH, Street WN, Mangasarian OL. Computer-derived nuclear features compared with axillary lymph node status for breast carcinoma prognosis. Cancer. 1997;81:172–179. doi: 10.1002/(sici)1097-0142(19970625)81:3<172::aid-cncr7>3.0.co;2-t. [DOI] [PubMed] [Google Scholar]

RESOURCES