Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 20.
Published in final edited form as: IEEE Trans Autom Sci Eng. 2025 Jan 20;22:11205–11218. doi: 10.1109/tase.2025.3530936

Oral-Anatomical Knowledge-Informed Semi-Supervised Learning for 3D Dental CBCT Segmentation and Lesion Detection

Yeonju Lee 1, Min Gu Kwak 2, Rui Qi Chen 3, Hao Yan 4, Mel Mupparapu 5, Fleming Lure 6, Frank C Setzer 7, Jing Li 8
PMCID: PMC12410824  NIHMSID: NIHMS2072985  PMID: 40918035

Abstract

Cone beam computed tomography (CBCT) is a widely-used imaging modality in dental healthcare. It is an important task to segment each 3D CBCT image, which involves labeling lesions, bone, teeth, and restorative material on a voxel-by-voxel basis, as it aids in lesion detection, diagnosis, and treatment planning. The current clinical practice relies on manual segmentation, which is labor-intensive and demands considerable expertise. Leveraging Artificial Intelligence (AI) to fully automate the segmentation process could tremendously improve the quality and efficiency of dental healthcare. The main hurdle in this advancement is reducing AI’s reliance on a large quantity of manually labeled images to train robust, accurate, and generalizable algorithms. To tackle this challenge, we propose a novel Oral-Anatomical Knowledge-informed Semi-Supervised Learning (OAK-SSL) model for 3D CBCT image segmentation and lesion detection. The uniqueness of OAK-SSL is its capability of integrating qualitative oral-anatomical knowledge of plausible lesion locations into the deep learning design. Specifically, the unique design of OAK-SSL Includes three key elements, including transformation of qualitive knowledge into quantitative representation, knowledge-informed dual-task learning architecture, and knowledge-informed semi-supervised loss function. We apply OAK-SSL to a real-world dataset, focusing on segmenting CBCT images that contain small lesions. This task is inherently challenging yet holds significant clinical value as treating lesions at their early stages lead to excellent prognosis. OAK-SSL demonstrated significantly better performance than a range of existing methods.

Index Terms—: Artificial intelligence, knowledge-informed deep learning, healthcare automation, dental CBCT segmentation and lesion detection

Note to Practitioners—

This study tackles the challenges arising from a limited amount of labeled data due to the time-consuming manual segmentation of 3D dental cone beam computed tomography (CBCT) images. The scarcity of labeled data often impedes AI models from accurately segmenting periapical lesions, which is crucial for the successful automation of CBCT segmentation. To overcome this, we introduce a novel semi-supervised learning algorithm that integrates the oral-anatomical knowledge about lesion location for 3D CBCT image segmentation. Our method effectively segments periapical lesions, including even small-sized periapical lesions, without solely relying on labeled data. The proposed method offers two significant benefits to clinicians. First, it reduces the necessity for large amounts of labeled data, particularly easing the burden of manually segmenting early-stage periapical lesions. Second, it helps reduce intra- and inter-observer disagreements and human errors by providing consistent and automated segmentation maps. These benefits not only simplify the segmentation process in dental imaging but also improve its reliability. As a result, our automated algorithm makes it easier and more trustworthy for practitioners. However, practitioners should be aware that the effectiveness of our method relies on the assumption of consistency between labeled and unlabeled data. When applying this method, it is crucial to carefully consider the characteristics of the unlabeled dataset. Significant differences in image quality, patient demographics, or acquisition parameters between labeled and unlabeled data might affect model performance.

I. Introduction

DENTAL cone beam computed tomography (CBCT) is widely used in dentistry to provide 3D images of the oral cavity. It is known for the superior capability in assisting with the diagnosis of odontogenic lesions compared to other imaging modalities [1]. Segmentation of each CBCT image, which involves labeling odontogenic lesions, bones, teeth, and restorative materials on a voxel-by-voxel basis, is important for accurate lesion detection, diagnosis, and treatment planning. However, manual labeling is labor-intensive and demands considerable expertise. While semi-automatic segmentation software can provide some initial assistance, thorough review and adjustments by clinicians are often needed. Fully automating the segmentation with artificial intelligence (AI) could tremendously improve the quality and efficiency of dental healthcare. The main hurdle in this advancement is reducing AI’s reliance on a large quantity of manually labeled images to train robust, accurate, and generalizable algorithms.

To address the challenge of limited labeled data, researchers have explored the use of unlabeled data [2]–[4], an approach known as semi-supervised learning (SSL). SSL has shown its effectiveness in semantic segmentation tasks by performing auxiliary tasks with unlabeled data. These auxiliary tasks enable AI models to observe objects of various sizes and shapes. Furthermore, SSL can produce more robust representations [5]. Leveraging SSL to train AI models for CBCT segmentation can help alleviate the need for a larger amount of labeled data while preserving segmentation performance.

This paper focuses on the segmentation of CBCT images with periapical lesions. Periapical lesions are a specific type of odontogenic lesions, which occur near tooth roots. Due to their high prevalence, the timely detection and accurate segmentation of these lesions are of crucial importance [6]. This can facilitate diagnosis and treatment, benefiting a broad patient population. However, periapical lesions have certain challenging characteristics that significantly complicate their segmentation compared to other labels on CBCT. First, periapical lesions are usually much smaller than adjacent oral structures [7], which makes AI models more susceptible to overfitting when segmenting them [8]. Also, like in many other medical applications, the detection of early-stage lesions is of great interest as it can facilitate timely treatment and ensure the best possible prognosis [6]. However, early-stage periapical lesions are very small, making their detection and segmentation difficult. Furthermore, periapical lesions exhibit intensities similar to those of surrounding oral structures, which complicates their distinction due to ambiguous boundaries [9]. Last but not least, the considerable variations in the size and shape of periapical lesions introduce further challenges.

Just as clinicians use anatomical knowledge to enhance the efficiency of their manual labeling process, researchers have similarly leveraged the knowledge to effectively train AI models. They have incorporated a wide range of anatomical knowledge into AI models, including the shapes of organs or lesions [4], [10], neighboring anatomical structures [7]. Zheng et al. [7] incorporated anatomical knowledge that prohibits the connectivity of certain structures. These integrations have yielded more accurate and realistic segmentation results with a limited amount of labeled data. Additionally, the knowledge incorporation can guide the models to concentrate their search within a limited space, preventing the generation of counter-intuitive results [7].

In this study, we propose an Oral-Anatomical Knowledge-informed Semi-Supervised Learning (OAK-SSL) model for automated 3D dental CBCT segmentation. Among the multiple labels present on a CBCT image, periapical lesions present the most significant challenge due to the aforementioned characteristics. To address this, OAK-SSL is designed to leverage the oral-anatomical knowledge that periapical lesions must occur near tooth roots. As illustrated in Figure 1, periapical lesions originate at the tooth root and expand outward (leftmost figure). The lesions must not exist in isolation from the tooth (middle figure) and must not attach to non-tooth root areas of the tooth (rightmost figure). Due to the qualitative nature of this knowledge, we first propose a distance map to quantify the closeness of each voxel to the nearest tooth root. Furthermore, we propose a dual-task learning architecture for the OAK-SSL model in which the prediction of the distance map is used as an auxiliary task to assist the segmentation task. This design helps guide the model’s focus to anatomically plausible locations of lesions, facilitating lesion detection and segmentation. Additionally, OAK-SSL enhances its capabilities by effectively utilizing both labeled and unlabeled images. Labeled images inform the model through supervised learning which aims to maximize the segmentation and distance prediction accuracies simultaneously. For unlabeled images, we propose an unsupervised loss comprised of two important components: a confidence loss aimed to enhance the model’s confidence on lesion segmentation, especially for voxels at anatomically plausible locations informed by predicted distances; a stability loss aimed to boost the model’s training stability by integrating Mean Teacher (MT) strategy. The main contributions of this study are as follows:

  • Transformation of qualitative knowledge into quantitative representation: While the knowledge of ‘periapical lesions must occur near tooth roots’ is easy to understand for clinicians when looking for lesions on CBCT, the qualitative nature of the knowledge makes it difficult to be integrated into DL design. We propose a novel approach to transform this knowledge into quantitative representations with distance maps, enabling its seamless integration with DL.

  • Knowledge-informed dual-task learning architecture: We propose a dual-task learning architecture for the OAK-SSL model, which enhances segmentation accuracy by utilizing distance map prediction as an auxiliary task. As the knowledge is encoded in the distance map, this architecture directs the model’s focus toward anatomically plausible lesion locations, thereby facilitating more accurate lesion detection and segmentation.

  • Knowledge-informed semi-supervised loss function design: For labeled images, we propose a supervised loss to simultaneously maximize the segmentation and distance prediction accuracies. For unlabeled images, we propose a novel unsupervised loss that includes confidence and stability components, which not only improves the model’s confidence in lesion segmentation at anatomically plausible locations informed by predicted distances but also integrates an MT strategy to ensure training stability. The supervised and unsupervised losses are combined into a total loss to enable end-to-end training of OAK-SSL.

  • Superior performance in real-world CBCT datasets with small periapical lesions: We apply OAK-SSL to a real-world dataset in collaboration with dental experts at Penn Dental Medicine. We focus on the challenging task of segmenting CBCT images that contain small lesions. This task holds significant clinical value as treating lesions at their early stages lead to excellent prognosis. OAK-SSL performs significantly better than competing methods even when the labeled images used in training lack small lesions. This not only highlights OAK-SSL’s robustness but also minimizes the labeling effort required from clinicians.

Fig. 1.

Fig. 1.

Example of periapical lesions at an anatomically plausible location (left) and implausible locations (middle and right). The left figure shows a genuine periapical lesion adhering to the anatomical knowledge that ‘periapical lesions must be near the tooth root’. The middle and right figures display artificially created red markings representing anatomically impossible lesions, which demonstrate violations of domain knowledge. In the middle figure, the red marking is isolated in the background, away from any tooth. In the right figure, the red marking is attached to non-tooth root areas.

II. Related Works

A. Applications of AI models in Dental CBCT

AI models have been widely applied in CBCT image analysis including classification and segmentation tasks. A convolutional neural network (CNN) plays a critical role in the successful application of AI in CBCT image analysis. The convolution operations in a CNN can extract image features while preserving spatial information. Existing studies have focused on training models using 2D slices of 3D CBCT images, referred to as 2D CBCT images hereafter. Miki et al. employed AlexNet, a specific CNN architecture, for classifying the individual teeth in 2D CBCT images [11]. Esmaeilyfard et al. used a CNN to classify the presence of dental decay and to identify dental decay types from 2D CBCT images in three different views (axial, sagittal, and coronal) [12]. Furthermore, several studies automated the differential diagnosis between cysts and granulomas using a CNN with all 2D CBCT images and their cropped images [13].

A significant portion of the research in CBCT image analysis focuses on segmentation tasks. The segmentation tasks have mainly employed U-shaped CNN architectures, commonly called U-net, which generate segmentation maps equal in size to the input images. Jang et al. proposed a two-step approach using a 2D U-net for 2D CBCT image segmentation, with the aim of identifying individual teeth, especially in cases where adjacent teeth are difficult to separate [14]. Given the challenges of segmenting lesions, Setzer et al. applied a 2D U-net to segment oral structures with a focus on lesions by assigning higher class weights to the lesions [6]. Several studies enhanced U-net with specific oral-anatomical knowledge for more precise lesion segmentation. Kirnbauer et al. developed a two-step approach to incorporate the knowledge of possible lesion locations. They first predicted tooth coordinates and then segmented lesions with a 2D U-Net using cropped images based on those coordinates [15]. Zheng et al. incorporated the knowledge that some labels cannot be adjacent to each other spatially, such as ‘lesions cannot be in the background’ and ‘materials cannot be connected to bones’, into the loss function of a 2D U-Net using posterior regularization [7].

In this study, we tackle three primary limitations in existing research. First, existing studies have relied on only fully labeled data. However, it is hard to obtain enough labeled data in practice. Second, these existing studies have converted 3D CBCT images to 2D slices. However, transforming 3D CBCT images into 2D slices risks losing vital anatomical details, as the human head and oral structures are inherently 3D [16]. Third, the two-step approach might lose contextual information and complicate optimization. To address these limitations, we propose an end-to-end approach that utilizes unlabeled data, guided by the oral-anatomical knowledge related to lesion sites, in order to generate more realistic and precise 3D segmentation maps for CBCT images.

B. Semantic Segmentation under Limited Labeled Data

Training robust segmentation models requires substantial amounts of labeled data, posing significant challenges, particularly in medical image segmentation. To address this challenge, researchers have explored various approaches. Self-supervised learning has emerged as a promising solution, which utilizes pretext tasks to learn meaningful image representations without relying on segmentation labels [17]. Yang et al. proposed a self-supervised method with a pretext task that reformulates the jigsaw puzzle problem as a patch-wise classification task solvable using fully convolutional networks. This approach learns representations by predicting the original positions of randomly shuffled image patches, potentially benefiting image segmentation tasks [18]. Felfeliyan et al. introduced SS-MRCNN, which aims to learn representations by localizing and recovering artificially distorted areas in unlabeled images [19]. Despite these advancements, self-supervised learning often requires fine-tuning for application to segmentation tasks. Moreover, the pretext tasks might not be directly aligned with downstream semantic segmentation tasks, potentially limiting their effectiveness [20].

A recent significant advancement in semantic segmentation is the development of Segment Anything Model (SAM) [21]. SAM is a foundation model specifically designed for image segmentation tasks, demonstrating robust segmentation accuracy across diverse domains and datasets. It utilizes user inputs such as points and bounding boxes to define and refine segmentation targets, effectively operating under conditions of limited labeled data. SAM has shown promising applicability in medical image segmentation [22]. However, SAM’s reliance on user inputs, such as manually placed points or drawn bounding boxes, introduces human intervention during the segmentation process. This requirement for interactive user involvement might limit SAM’s suitability for developing fully automated segmentation models [23].

Another approach to address the challenge of limited labeled data in semantic segmentation is semi-supervised learning. This method leverages both labeled and unlabeled data to improve model performance. Semi-supervised learning can be categorized into three types based on their auxiliary tasks: consistency regularization, adversarial learning, and pseudo-labeling.

A consistency regularization method is based on the assumption of SSL that the model should not change predictions under small perturbations [24]. Several studies encourage consistent predictions even for perturbed inputs [25] or across different models [26]. Olsson et al. proposed Classmix for precise segmentation, ensuring consistent predictions from manipulated images by selectively transferring objects from one image to another [25]. On the other hand, Cui et al. applied MT framework to a brain segmentation task [26]. MT framework utilizes paired student and teacher models with identical architectures. The teacher model’s parameters are an exponential moving average (EMA) of the student model’s parameters, promoting the consistency between previous student models and current student model. Employing the EMA method ensures stable predictions throughout training epochs and allows for straightforward integration with other methods, as it does not require training additional models [2].

An adversarial learning method basically uses two networks: a generator and a discriminator. The generator generates fake segmentation maps, while the discriminator’s role is to distinguish the fake maps from manual segmentations. This process helps the discriminator learn more refined and sturdy representations based on the manual segmentations [27]. Souly et al. proposed generating class-specific fake images by providing the generator with class-specific information and feeding them into the discriminator to enhance segmentation performance [28]. However, training adversarial models is challenging due to the delicate balance required between the generator and discriminator.

A pseudo-labeling method generates labels for unlabeled data based on the model predictions. It assigns the class label with the highest predicted probability as ‘pseudo-label’, thereby improving model performance by augmenting the training dataset with pseudo-labeled unlabeled data. This approach exposes the model to a broader range of data, akin to having more labeled data, enhancing its ability to generalize. It typically involves minimizing the entropy of predictions to induce the model to be more confident with its predictions and reduce its uncertainty. Consequently, pseudo-labeling encourages the model to learn the low-density decision boundary that differentiates the representations of different labels [29]. Wang et al. proposed a pseudo-labeling method for semantic segmentation that incorporated contrastive learning to verify the reliability of pseudo-label [30]. Similarly, Basak and Yin introduced pseudo-labels to guide the selection of unlabeled patches for contrastive learning in medical image segmentation [31]. In a related approach, Yang et al. proposed a class-aware semi-supervised learning approach that enforced the distinctions between classes in pseudo-labels [32]. Additionally, Thompson et al. utilized a pseudo-labeling method for brain tumor segmentation and refined the pseudo-label using superpixels that are the clusters of adjacent pixels [33].

Holistic approaches have also been developed that combine multiple semi-supervised learning methods. MixMatch [34] and FixMatch [35] integrated consistency regularization with pseudo-labeling [34], [35]. These approaches have been extended to integrate various tasks [36], [37]. Lu et al. additionally integrated uncertainty estimation to improve left atrium segmentation [38].

OAK-SSL distinguishes itself from existing methods for semantic segmentation under limited labeled data by incorporating oral-anatomical knowledge about potential lesion sites. Its novelties include quantitative knowledge representation, a knowledge-informed dual-task learning architecture, and a knowledge-informed semi-supervised loss function design. Furthermore, OAK-SSL operates without the need for fine-tuning or additional user inputs during the segmentation process, enhancing its automation capabilities.

C. Incorporation of Anatomical Knowledge in Semantic Segmentation

Incorporating anatomical knowledge, such as size, shape, and location of organs or lesions, into the AI model is one of the effective strategies to achieve precise segmentation. The knowledge can be incorporated in three ways: post-processing, generative models, and regularization [39]. Post-processing and generative models require additional steps and models, while integrating knowledge directly into the model as a form of regularization is more straightforward. Thus, we limited our scope of literature to studies that have integrated knowledge as regularization.

Mirikharaji and Hamarneh introduced a method to encode star-shape prior into the loss function, penalizing skin lesion segmentations that do not adhere to star shapes [40]. Xue et al. introduced a task for predicting a signed distance map (SDM), indicating the minimal distance from specific organs or tissues, to capture geometric shape distributions. They encouraged the consistency between an SDM and a predicted segmentation map to ensure geometrically accurate segmentation [41]. Kervadec et al. incorporated a task of predicting the sizes of target areas and applied a penalty for the sizes outside predefined bounds [4]. Gupta et al. encoded topological interactions between organs into their model and penalized unrealistic configurations [42]. Zheng et al. regularized the approximate posterior distribution with relative positional information of different labels, stating that one oral structure must not be connected to another [7].

However, most existing studies have incorporated ‘must not be’ knowledge to regularize the segmentation models, focusing on penalizing unrealistic shapes, sizes, and relative positions. In contrast, our method focuses on incorporating ‘must be’ knowledge that is related to anatomically plausible locations of lesions. This strategy allows us to design a model that not only avoids unrealistic segmentation but also actively directs its focus toward limited search space. By precisely defining what the model should look for, our method offers a straightforward and effective way to apply anatomical knowledge to regularize the segmentation models.

III. Proposed Method

In this section, we describe the proposed OAK-SSL model, designed to segment multiple labels including ‘teeth’, ‘bone’, ‘material’ (restorative materials), ‘lesion’ (periapical lesions), and ‘background’ on each 3D CBCT image. Figure 2 illustrates the graphical overview of OAK-SSL. OAK-SSL incorporates oral-anatomical knowledge, specifically that ‘periapical lesion must be near tooth root’, into its architecture. This knowledge is utilized through dual-task learning (Sec. III-B.1), where a distance prediction task guides the model’s attention to anatomically plausible lesion locations by predicting proximity to tooth roots. Additionally, in the unsupervised learning component (Sec. III-B.3), we incorporate this anatomical knowledge into the loss function to enhance the model’s confidence in detecting lesions at anatomically plausible locations in unlabeled images.

Fig. 2.

Fig. 2.

Graphical overview of the proposed OAK-SSL model. The anatomical knowledge ‘periapical lesion must be near the tooth root’ is integrated throughout the architecture, informing both supervised and unsupervised learning paths. In the supervised learning path of OAK-SSL, each labeled image (Xl) is processed by the dual-task learning model (f), which outputs predicted segmentation probability maps (pˆl) and a distance map (dˆl).seg and dist are computed using these outputs to compose the supervised loss s. In the unsupervised learning path of OAK-SSL, each unlabeled image (Xu) is processed by f, also yielding predicted segmentation probability maps (pˆu) and a distance map (dˆu).confidence is calculated by comparing a predicted lesion probability map (pˆles) with a binary lesion pseudo label map (yˆles), utilizing dˆu as weights. stability is calculated by comparing the predictions between f and the teacher model (t). confidence and stability are combined into the unsupervised loss u. The total loss is =s+u, which is used in the end-to-end training of OAK-SSL with anatomical knowledge incorporation.

A. Oral-Anatomical Knowledge Representation

OAK-SSL is designed to leverage the oral-anatomical domain knowledge that periapical lesions must be located near tooth roots [11]. However, while this knowledge is easy to understand for clinicians when looking for lesions on CBCT images, the qualitative nature of the knowledge makes it difficult to be integrated into DL design. To address this challenge, we propose to use a distance map to quantify the distance of each voxel in a CBCT image to the nearest tooth root. Voxels with shorter distance scores are closer to tooth roots, and thus being more probable lesion sites. In the OAK-SSL model design that will be introduced in Sec. III-B, prediction of this distance map is utilized as an auxiliary task to augment the primary segmentation task. This helps the model direct its focus to potential lesion locations, thereby enhancing its ability to detect and segment lesions—the most important yet challenging label.

To establish this distance map, a practical challenge arises. Although teeth are segmented in the labeled images under one label, tooth roots are not separately segmented. What complicates this situation even more is the fact that there is no exact definition of tooth roots so that even clinicians cannot precisely identify which voxels correspond to tooth roots. To address this challenge, we propose to use tooth boundary (TB) as a proxy for tooth roots. We define TB as a subset of the voxels labeled as ‘teeth’, which are situated adjacent to voxels labeled as ‘bone’ or ‘lesion’. This definition allows us to easily identify TB by a simple neighbor-checking algorithm. The algorithm screens through each voxel labeled as ‘teeth’ and assigns the voxel to TB if it has at least one neighbor voxel being ‘bone’ or ‘lesion’.

With TB identified, we can generate the distance map for each labeled CBCT image as follows:

d(x)=0,ifxTBinfzTBx-z2,ifxTB. (1)

Herein, a value of zero is assigned to a voxel x that is considered to be TB. For the other voxels that are not part of the TB, a positive Euclidean distance to the nearest TB (z) is assigned. To ensure numerical stability, we normalize the values of d(x) to a range between zero and one using min-max scaling. Figure 3(b) shows the distance map corresponding to the labeled CBCT image in Figure 3(a). In Figure 3(b), darker red areas indicate a shorter distance to TB, while the colors fade to lighter hues and eventually to blue, signifying an increase in distance. It is worth noting that while TB may include regions extending beyond tooth roots, it provides an effective proxy and guides the model’s focus to localized areas pertinent to lesions. More importantly, TB can be easily algorithmically identified while tooth roots cannot. Our experiments in Sec. IV-D provide further evidence to support the effectiveness of TB in enabling the model to utilize the oral-anatomical knowledge.

Fig. 3.

Fig. 3.

(a) A labeled CBCT image (only the sagittal view is shown for simplicity) and (b) its corresponding distance map. The distance map illustrates the proximity to the nearest TB, with regions closer to TB in darker red and those farther away in a lighter hue and eventually blue.

B. Oral-Anatomical Knowledge Integration into DL design

B.1. Dual-Task Architecture

The proposed OAK-SSL model employs a dual-task learning framework, in which prediction of the distance map defined in Sec. III-A is used as an auxiliary task to augment the segmentation task by directing the model’s focus to anatomically probable lesion sites. Let f:X(pˆ,dˆ) denote the OAK-SSL model, where the input X is a 3D CBCT image and the output includes segmentation probability maps corresponding to each label, pˆ, and a predicted distance map dˆ. The segmentation probability maps pˆ have dimensions of (num_classes, H,W,D), where num_classes is the number of labels, and H,W,D represent the height, width, and depth of the input patch, respectively. The predicted distance map dˆ has dimensions of (1,H,W,D). Specifically, as depicted in Figure 2, the architecture of OAK-SSL employs a U-net backbone with an encoder (En) and a decoder (De). The detailed architecture of the U-net backbone is illustrated in Figure 4. This choice is motivated by U-net’s simplicity, its extensibility in incorporating domain knowledge, and its proven effectiveness in handling segmentation tasks [43], [44]. We choose not to use a pre-trained backbone network due to the heterogeneity between CBCT and public medical image datasets from different domains and imaging modalities [45]. To enable dual-task learning, we add a segmentation layer (S) and in parallel a distance prediction layer (D) after the decoder. S predicts segmentation probability maps that represent the likelihood of each voxel belonging to a specific class label. D outputs a single-channel distance map predicting the distance of each voxel to TB. Unlike traditional approaches, we refrain from applying a ReLU activation function in D with the purpose of preserving true zero values. Zero values are critical for our distance predictions as they indicate TB. This dual-task design enables the two tasks to exchange information with each other, helping to improve the performance of both tasks. The distance prediction task helps the segmentation task for capturing and segmenting lesions by guiding the model’s focus on potential lesion sites. Conversely, the segmentation helps the distance prediction task by providing spatial context. While our primary task is segmentation, accurate distance predictions are also important. This is because the predicted distances of voxels will be incorporated into a proposed unsupervised loss for incorporating unlabeled images (details in Sec. III-B.3). This further helps the segmentation task.

Fig. 4.

Fig. 4.

Modified U-Net architecture incorporating a distance prediction layer. Blue boxes represent feature maps with their dimensions indicated. Arrows demonstrate operations such as convolutions, up-sampling, and skip connections. The outputs include both segmentation and distance maps.

B.2. Incorporation of Labeled Images

Consider a training set that consists of L labeled and U unlabeled images, Xl,yl,dll=1LXuu=1U, where Xl and Xu denote labeled and unlabeled images, respectively; yl and dl denote voxel-wise labels and distance scores of the labeled image Xl. In this section, we focus on discussing how OAK-SSL incorporates labeled images while leaving the discussion on unlabeled images to the next section. The OAK-SSL model f proposed in Sec. III-B.1 takes a labeled images Xl as input, and outputs segmentation probability maps pˆl and a predicted distance map dˆl. The model computes a supervised loss s that is composed of two components: a segmentation loss (seg) and a distance loss dist, i.e.,

s=seg+αdist, (2)

where α is a balancing hyperparameter. seg minimizes a weighted average of Dice loss dice and cross-entropy loss CE. The Dice loss is a continuous, differentiable approximation of the Dice metric, allowing for direct optimization of the segmentation evaluation metric used in this work. Since Dice loss is not very smooth by itself, cross-entropy loss is jointly employed, which could help stabilize the training process. seg is computed as follows:

segpˆl,yl=1Ni=1N12dicepˆl,i,yl,i+CEpˆl,i,yl,i, (3)

where

dicepˆl,i,yl,i=1-2jpˆl,i,jyl,i,jjpˆl,i,j+jyl,i,j, (4)
CEpˆl,i,yl,i=-jyl,i,jlogpˆl,i,j, (5)

and i indexes the voxels in Xl. The objective of minimizing seg is to align pˆl as closely as possible with the labels in yl.

Furthermore, dist aims to accurately predict the distance of each voxel to TB, i.e.,

distdˆl,dl=1Ni=1NMSEdˆl,i,dl,i, (6)

where MSE is the mean square error between the predicted and true distances of voxels in the image.

B.3. Incorporation of Unlabeled Images

In this section, we discuss how OAK-SSL incorporates unlabeled image Xuu=1U in the training set. An unlabeled image Xu is processed by the same model f used for labeled images, outputting segmentation probability maps pˆu and a predicted distance map dˆu. Given the absence of ground-truth labels and distance scores, we propose an unsupervised loss to leverage these unlabeled images. The loss is composed of two components: confidence and stability, i.e.,

u=βconfidence+γstability, (7)

where β and γ serve as balancing hyperparameters. confidence aims to enhance the model’s confidence on lesion segmentation, especially for voxels at anatomically plausible locations. The design of confidence involves several steps:

First, we transform the multi-label segmentation map pˆu into a binary lesion segmentation map pˆles, where pˆles,j denotes the predicted probability of voxel j being ‘lesion’ in the unlabeled CBCT image. The model is encouraged to assign high pˆles,j values to voxels it identifies as ‘lesion’ and low pˆles,j to all other voxels. This differentiation helps avoid a scenario where the model indiscriminately predicts all voxels with uniform confidence. To achieve this, we define a binary lesion pseudo label yˆles,j, which is one if voxel j is predicted to be ‘lesion’ and zero otherwise. We further propose a modified cross-entropy loss MCE to align pˆles,j with yˆles,j, i.e.,

MCEpˆles,j,yˆles,j=1maxpˆu,j>ρCEpˆles,j,yˆles,j, (8)

where 1maxpˆu,j>ρ is an indicator function that is one if the highest segmentation probability among all the labels for voxel j exceeds a threshold ρ, and zero otherwise. This indicator function serves to exclude voxels with unreliable pseudo labels from affecting the training process. The threshold ρ is dynamic, starting high (e.g., 0.95) early in training and gradually decreasing to a lower value (e.g., 0.80), allowing more voxels to contribute to training as the model’s reliability improves.

Furthermore, we refine MCE to formulate confidence by incorporating predicted distances, i.e.,

confidencepˆles,dˆu,yˆles=j=1Me-dˆu,jMCEpˆles,j,yˆles,jj=1M1maxpˆu,j>ρ. (9)

The role of e-dˆu,j is to give higher weights to voxels closer to TB, thereby emphasizing confidence in predictions for voxels at anatomically plausible lesion sites. To ensure numerical stability, confidence is set to zero when the denominator becomes zero.

In addition to confidence, the proposed unsupervised loss in Eq. (7) also includes stability.stability aims to ensure training stability by integrating MT training strategy [2] with the dual-task architecture of OAK-SSL, i.e.,

stabilityfXu,tXu=1Mj=1MMSEpˆu,j,pˆt,j+MSEdˆu,j,dˆt,j. (10)

Herein, t denotes the teacher model for the OAK-SSL model f which is considered a student. pˆt and dˆt are the predicted segmentation probability maps and distance map for Xu by model t, respectively. t shares the same architecture as f, as illustrated in Figure 2. Both models process Xu.f is updated through stochastic gradient descent optimization, whereas t’s parameter θt is updated via the EMA of f’s parameter θf. The update rule of θt is formulated as follows:

θtmθt+1-mθf, (11)

where m[0,1) is the momentum coefficient. Following common practice [2], we set m to a large value, specifically 0.95, in line with typical settings. t effectively acts as an ensemble of every f across all training epochs. The loss function stability employs the MSE loss to foster the consistency between the predictions of t and f, thereby guiding f to learn from a powerful model without the need to train additional advanced models.

B.4. Total Loss Function and Algorithm

The total loss function for OAK-SSL is composed of the supervised loss proposed in Sec. III-B.2, s, and unsupervised loss in Sec. III-B.3, u:

=s+u. (12)

Algorithm 1 provides the pseudocode of training the proposed OAK-SSL model.

Algorithm 1.

Oral-Anatomical Knowledge-informed Semi-Supervised Learning (OAK-SSL) for 3D dental CBCT segmentation

Input: Labeled dataset Xl,yl,dll=1L, and unlabeled dataset Xuu=1U
Output: OAK-SSL model f
 1: Initialize model f with random weights
 2: while Training do
 3: ……………………………………………
  ▷ Compute supervised loss s
 4: Sample and augment a batch of labeled data Xl
 5: Compute segpˆl,yl
 6: Compute distdˆl,dl
 7: Compute s
 8: ……………………………………………
  ▷ Compute unsupervised loss u
 9: Sample and augment a batch of unlabeled data Xu
 10: Generate the binary lesion pseudo label (yˆles) and predicted distance-based weights e-dˆu
 11: Compute confidencepˆles,dˆu,yˆles
 12: Compute stabilityfXu,tXu
 13: Compute u
 14: ……………………………………………
 15: Compute =s+u
 16: Update f by a gradient step on
 17: Update t with a EMA of f
 18: end while

B.5. Hyperparameter Tuning Strategy

In our OAK-SSL model, three main hyperparameters exist: α,β, and γ.α controls the weight of dist within s. Similarly, β and γ adjust the impact of confidence and stability within u, respectively. It is advisable to gradually increase α to 1.0 as the segmentation layer acquires sufficient contextual information. This progression promotes effective dual-task learning. Similarly, the incremental adjustment of β and γ might be beneficial as f starts to yield more reliable pseudo-labels and teacher models. We found a smooth warmup to gradually increase all three hyperparameters during the first 500 epochs to work well in our experiments. Full details of the warmup schedule can be found in Appendix-A. Additionally, we scaled γ with a constant to make stability have comparable magnitude with the other loss terms.

IV. Experiments

A. Data Description

Our experiment utilized 145 3D CBCT images collected from the Department of Endodontics, Penn Dental Medicine, University of Pennsylvania. The CBCT images had been acquired with a Morita Veraviewpocs 3DF40 (Morita Mfg, Kyoto, Japan) featuring a 40 mm field of view and a voxel size of 125 mm with the settings 90 kVp at 3 mA. All images include at least one root with lesions. These images have a size of 341 × 341 × 341 voxels with few exceptions. For the few images with slightly different sizes, we kept their original sizes as our method works on cropped patches. To create labeled images, manual segmentation was obtained using ITK-SNAP, a specialized software tool that provides semi-automatic segmentation. Then, it was further reviewed and refined by dental and oral radiology experts. The manual segmentation includes five categories: ‘lesion’, ‘materials’, ‘bone’, ‘teeth’, and ‘background’.

Through our experiments, we will demonstrate that OAK-SSL can precisely segment small lesions without training on labeled data that includes such small lesions. For this purpose, we first sorted the labeled samples based on the total lesion voxels within an image in ascending order. The 50 samples with the fewest lesion voxels were categorized into the ‘small lesion’ group. The remaining samples were classified as the ‘regular lesion’ group. The training set comprises 20 labeled samples from the regular lesion group and 80 unlabeled samples, among which 20 are from the small lesion group and 60 from the regular lesion group. For validation, we used 15 labeled samples from the regular lesion group. The test dataset includes 30 labeled samples, all from the small lesion group.

We customized preprocessing methods by dataset. For the labeled data in the training and validation datasets, we first normalized image intensities to a range between 0 and 1 using min-max scaling. This step was followed by normalization based on the intensity statistics of the images. Then, we selectively cropped patches containing lesions, identified through manual segmentation, to a size of 128 × 128 × 128 voxels. To augment the data, we applied various transformations such as random rotations, zooming, and adding noise to simulate different imaging conditions. For the unlabeled data in the training dataset that does not have manual segmentation, we focused on cropping patches around the teeth, where lesions are most likely to be found. This decision was influenced by the fact that lesions are generally indistinguishable from other structures based on intensity alone. Therefore, we used the mean tooth intensity from the labeled data in the training dataset as a reference for more precise cropping. This process excluded regions darker than the teeth. The test dataset underwent the same normalization process as the labeled data in the training dataset but was exempt from other transformations. Instead of cropping the test images, we employed a slide window approach throughout each test image, using patches of a size of 128 × 128 × 128 voxels during inference.

B. Competing Methods and Performance Metrics

To set up baseline performances, we trained a supervised U-net model without a distance prediction layer (i.e., no knowledge integration) with 20 labeled samples, denoted as ‘Supervised’, to evaluate the effectiveness of SSL. Furthermore, to demonstrate the effectiveness of OAK-SSL over other SSL models, we selected four representative SSL models for comparison. These models can incorporate unlabeled images:

  • Π-model [46]: it incorporates unlabeled images by encouraging consistent predictions of these images under random input perturbation

  • MT [2]: it integrates unlabeled images by ensuring consistency in the predictions made by both the teacher and student models for those images.

  • UA-MT [47]: it incorporates unlabeled images by promoting consistency between the predictions of student and teacher models, while simultaneously filtering out unreliable predictions through uncertainty estimation with Monte Carlo dropout.

  • SASSNet [10]: it integrates unlabeled images by enforcing consistency between the predictions for labeled and unlabeled images through adversarial loss.

Segmentation performance was measured using the Dice score that evaluates the overlap between the predicted segmentation map and manual segmentation. The Dice score is calculated as follows:

DiceA,B=2ABA+B. (13)

A Dice score closer to 1 indicates a higher similarity with the manual segmentation, whereas a score closer to 0 indicates inaccurate segmentation. Additionally, we computed the perroot lesion detection accuracy [7]. We counted a true positive (TP) when the model correctly identified a lesion in the tooth root where a true lesion exists, and a false negative (FN) when it failed to detect it. If the model predicted a lesion in the root of a healthy tooth, it was classified as a false positive (FP). From these TP, FP, and FN values, we calculated precision and recall for each image. Furthermore, we also reported the number of cases where the model predicted a lesion at an anatomically impossible location, specifically those far from the tooth root, such as within the background. For example, when the model incorrectly predicted a lesion within the background, we counted it as one case. We conducted one-tailed paired t-tests to confirm that the Dice scores and detection accuracy of OAK-SSL were significantly higher than those of competing methods by comparing p-values. To address the issue of multiple comparisons, we applied the Bonferroni correction [48].

C. Training Process

For all models, the training batch size for both the labeled and unlabeled datasets was set at two. We employed the AdamW optimizer [49] with a learning rate of 0.001 and a weight decay of 0.0001. All models were trained for 1000 epochs. We applied an early stopping strategy to find the optimal model. If the validation loss did not improve for three consecutive times, checked every 100 epochs, we selected the corresponding best model to evaluate its test performance.

D. Experimental Results

Table I presents the average test Dice scores and corresponding p-values for each model. As illustrated in Table I, OAK-SSL achieved the highest Dice scores across all class labels compared to competing methods except for materials. Among all labels, the lesion Dice score demonstrated the largest difference between OAK-SSL and other methods. This superiority highlighted that OAK-SSL could effectively segment small lesions even when the training set does not include labeled images with small lesions. This robust performance can be attributed to the integration of the oral-anatomical knowledge regarding anatomically plausible lesion locations. Additionally, the teeth Dice score showed the second-largest difference, indicating the effectiveness of OAK-SSL in teeth segmentation which could be facilitated by the identification of the TB from the distance prediction layer. The supervised U-net model exhibited the lowest Dice scores across all labels, showing the adverse effects of insufficient labeled data on U-net’s performance. This emphasizes the importance of incorporating unlabeled data for effective CBCT segmentation. Among the SSL methods, П-model had the lowest lesion Dice score. The performance degradation observed in Π-model implies that its consistency regularization strategy encouraging consistent predictions under minor input perturbations might accidentally hinder the accurate representation and segmentation of small lesions, as these small lesions are particularly sensitive to such perturbations [50]. While MT outperformed Π-model, SASSNet and UA-MT showed better Dice scores compared to MT. This suggests that including unreliable predictions during training may adversely impact model learning.

TABLE I.

Average test Dice scores and p-values for comparing each model with OAK-SSL. The highest performances are highlighted in bold. The p-values marked with an asterisk (*) are considered statistically significant after Bonferroni correction for multiple comparisons.

Model Lesion Material Bone Teeth Background
Dice Supervised 0.215 0.578 0.556 0.463 0.955
Π-model 0.414 0.786 0.842 0.790 0.983
MT 0.486 0.772 0.825 0.762 0.979
SASSNet 0.511 0.795 0.846 0.798 0.983
UA-MT 0.588 0.795 0.856 0.821 0.985
OAK-SSL 0.647 0.792 0.861 0.834 0.986
p-value Supervised vs. OAK-SSL <0.001* <0.001* <0.001* <0.001* <0.001*
Π-model vs. OAK-SSL <0.001* 0.316 <0.001* <0.001* <0.001*
MT vs. OAK-SSL <0.001* 0.103 <0.001* <0.001* <0.001*
SASSNet vs. OAK-SSL <0.001* 0.585 <0.001* <0.001* <0.001*
UA-MT vs. OAK-SSL 0.005* 0.617 0.047 <0.001 * 0.020

An interesting observation we would like to share is that OAK-SSL exhibited a significant advantage in segmenting smaller lesions over other competing methods. We categorized the 30 test images into groups based on the total number of lesion voxels in the image. The ‘Top 10 smallest’ group consists of images with the fewest number of lesion voxels, including those up to the 10th fewest count. The ‘Top 20 smallest’ group extends this range to include images with lesion voxel counts up to the 20th fewest. The ‘Top 30 smallest’ group encompasses all 30 images, which is the same as the entire test dataset. This grouping allowes us to effectively analyze and compare lesion segmentation performance across images with varying total lesion voxels. Table II presents the average test Dice scores for each model across different groups. We noted that as the total lesion voxels increased from the ‘Top 10 smallest’ group to the ‘Top 30 smallest’ group, the performance of all models improved, suggesting that larger lesions are easier to segment. OAK-SSL consistently showed the highest lesion Dice scores across all groups. In the ‘Top 10 smallest’ group, the difference in lesion Dice scores between OAK-SSL and the supervised model was most significant, highlighting the supervised model’s inability to segment small lesions. The performance gap between OAK-SSL and the supervised model remained consistent across all groups, while the gap between OAK-SSL and the SSL models such as Π-model, MT, and SASSNet decreased as the total lesion voxels in an image increased. Π-model, MT, and SASSNet showed the largest performance difference with OAK-SSL in the ‘Top 10 smallest’ group among all groups, with differences of 0.351, 0.215, and 0.191 respectively. These differences narrowed in the ‘Top 20 smallest’ group and ‘Top 30 smallest’, which demonstrates other competing methods struggled to segment small lesions. On the other hand, the performance gap between UA-MT and OAK-SSL remained consistent across groups, with the largest difference observed in the ‘Top 20 smallest’ group.

TABLE II.

Average test Dice scores for each group, where ‘Top n smallest’ includes images with lesion voxel counts up to the nth smallest. The highest performances are highlighted in bold.

Model Top 10 smallest Top 20 smallest Top 30 smallest
Supervised 0.080 0.205 0.215
Π-model 0.181 0.389 0.414
MT 0.317 0.464 0.486
SASSNet 0.341 0.514 0.511
UA-MT 0.474 0.556 0.588
OAK-SSL 0.532 0.640 0.647

In Figure 5, we provide 2D visualizations of segmentation maps predicted by all models except for the supervised model, across four different CBCT test images. Figures 5(a) and 5(b) display cases with smaller lesions in the sagittal view, while Figures 5(c) and 5(d) show the cases with comparatively bigger lesions in the test set in axial and sagittal views, respectively. In Figure 5(a), only OAK-SSL and UA-MT could accurately segment the small lesions, while other competing methods incorrectly identified these as background. As illustrated in Figure 5(b), П-model failed to segment both lesions, whereas MT detected the upper lesions but missed the lower ones. Although UA-MT successfully segmented the actual lesions in Figure 5(c), it also erroneously predicted lesions in anatomically implausible locations, specifically in the background within the bones. In Figure 5(d), all models except for Π-model detected both lesions. Among these models, OAK-SSL most precisely captured both the size and shape of the lesions. This result suggests that OAK-SSL could provide more accurate information to clinicians regarding lesion size and shape, which is crucial for clinical decision-making. Unlike competing methods that erroneously segmented lesions at anatomically implausible locations, OAK-SSL accurately identified lesions at anatomically plausible sites, which demonstrates its effectiveness in discriminating between anatomically plasuible and implasuible lesion locations.

Fig. 5.

Fig. 5.

Examples of segmentation maps: Each column represents segmentation maps from the same model across different cases. Rows (a) and (b) show segmentations for images with smaller lesions in the test set, while rows (c) and (d) display segmentations for images with comparatively larger lesions in the test set. Images in rows (a), (b), and (d) are presented in sagittal view, whereas images in row (c) are depicted in axial view.

Average detection accuracy and the total number of cases where the models segmented lesions at anatomically implausible locations are reported in Table III. Notably, OAK-SSL exhibited the highest precision and recall, implying fewer FP and FN with only one reported case of predicting lesions at an implausible location. In contrast, Π-model showed a lower recall, while MT had a lower precision with numerous predictions at anatomically impossible locations. These results suggest that Π-model tended to under-segment, resulting in many FN, whereas MT was prone to over-segment, leading to many FP. SASSNet and UA-MT also demonstrated comparatively lower precision and generated predictions at anatomically implausible locations, which underscores the importance of incorporating knowledge about plausible lesion locations.

TABLE III.

Average test detection accuracy and total cases of anatomically impossible lesion predictions per patient for each model. The highest performances are in bold. The p-values marked with an asterisk (*) are considered statistically significant after Bonferroni correction for multiple comparisons.

Metric Model Precision Recall Impossible Location
Detection Accuracy Π-model 0.642 0.617 6
MT 0.493 0.839 35
SASSNet 0.580 0.917 20
UA-MT 0.616 0.906 14
OAK-SSL 0.791 0.933 1
p-value Π-model vs. OAK-SSL 0.040 <0.001* -
MT vs. OAK-SSL <0.001* 0.060 -
SASSNet vs. OAK-SSL <0.001* 0.373 -
UA-MT vs. OAK-SSL <0.001* 0.242 -

E. Ablation Studies

In this section, we focus on comparing models on the Dice scores of ‘lesion’ and ‘teeth’ as OAK-SSL shows the largest performance improvement of these two labels in Table I.

E.1. Evaluation of Including Unlabeled Images in OAK-SSL

We evaluate the significance of incorporating unlabeled images in OAK-SSL by comparing it with OAK-SSL-supervised. The latter utilizes only labeled images, and thus incorporating solely the supervised loss s in Eq. (10). Figure 6(a) delineates that OAK-SSL significantly outperformed OAK-SSL-supervised in terms of Dice scores for both lesion and teeth segmentation. This superiority is attributed to the unsupervised loss u, which facilitates the model’s ability to learn informative features for segmenting lesions and teeth. The results underscore the efficacy of leveraging unlabeled images to provide effective CBCT image segmentation.

Fig. 6.

Fig. 6.

Average test Dice scores for lesion and teeth segmentation per patient, comparing the performance of OAK-SSL model configurations in an ablation study. The highest performances are highlighted in bold and p-values are provided for statistical significance. (a) Comparison between OAK-SSL and OAK-SSL-supervised. OAK-SSL-supervised is the model that removed the unsupervised loss u from OAK-SSL. (b) Comparison between OAK-SSL and OAK-SSL-constant. OAK-SSL-constant is the model that removed the knowledge-based weights in u.

E.2. Evaluation of the Unsupervised loss in OAK-SSL

As demonstrated in the previous section, the inclusion of unlabeled images and consequently the unsupervised loss u in OAK-SSL is important. We now focus on evaluating an important design aspect of u, which is the inclusion of e-dˆu,j in Eq. (7). The role of e-dˆu,j is to assign greater weights to voxels nearer to TB, thereby reinforcing prediction confidence at anatomically plausible lesion locations. We compared OAK-SSL with OAK-SSL-constant, where the latter employs a uniform weight irrespective of the voxels’ proximity to TB, to assess the impact of this weighted approach. Figure 6(b) presents the Dice scores for lesion and teeth segmentation achieved by OAK-SSL and OAK-SSL-constant. OAK-SSL yielded significantly better lesion and teeth Dice scores than OAK-SSL-constant. The knowledge-based weighting strategy for u considerably improved the lesion Dice score, which proved its efficacy in directing the model more precisely to potential lesion sites when employing unlabeled data. This approach compensates for the lack of manual segmentation, validating the advantage of knowledge-based weights over constant ones.

F. Explanation of OAK-SSL’s High Performance

Gradient-weighted Class Activation Mapping (Grad-CAM) is a visualization technique highlighting areas within an image that significantly influence the model’s decision-making process for classification into a specific class. This technique operates by computing the gradients of the specific class relative to the feature maps of the final convolutional layer [51]. Grad-CAM can be applied to our segmentation task, which reveals the significance of each voxel for the model’s lesion segmentation [52]. Higher Grad-CAM values indicate that the model considers the voxel more significant for lesion segmentation. Grad-CAM visualizations for two test samples across the models are depicted in Figure 7. All competing methods referred to non-lesion areas for segmenting the lesions, frequently resulting in FP [53]. In contrast, OAK-SSL focused on actual lesion areas and retained high Grad-CAM values at the boundaries, effectively ignoring non-lesion areas for its lesion segmentation. This Grad-CAM analysis demonstrates the strength of OAK-SSL that can differentiate between lesion and non-lesion areas compared to other competing methods [54]. The incorporation of the oral-anatomical knowledge enables OAK-SSL to concentrate on relevant areas for lesion segmentation, which is likely to generate more focused gradients. These focused gradients lead to more distinctive lesion representations [55], enhancing the model’s robustness against perturbations.

Fig. 7.

Fig. 7.

Grad-CAM visualizations for lesion segmentation of each model: The Grad-CAM values illustrate how the regions within the CBCT images significantly influence the lesion segmentation of each model.

V. Conclusion

Automating the segmentation of CBCT images significantly eases the workload of clinicians and supports their decision-making processes. In this study, we proposed an OAK-SSL model that integrates oral-anatomical knowledge into the deep learning design. We demonstrated the superior performance of OAK-SSL compared to existing supervised and semi-supervised learning algorithms. OAK-SSL excels at identifying small lesions without requiring labeled data for such small lesions. Our proposed method could significantly reduce the need for manual segmentation while still ensuring accurate segmentation.

While OAK-SSL showed promising results, there are some limitations to be addressed in future research. First, the knowledge integrated into OAK-SSL primarily focused on the locations of periapical lesions, which might provide partial information to AI models. Enriching the models with additional periapical lesion attributes, such as shape and size, could augment the existing knowledge about location. For example, incorporating domain knowledge that periapical lesions are likely to have a circular shape [56] would potentially lead to more precise lesion segmentation. Moreover, OAK-SSL treats various periapical lesion types uniformly, yet distinguishing among different periapical lesion types, such as radicular cysts and periapical granulomas, is crucial for determining appropriate treatments. Advancing OAK-SSL to facilitate differential diagnosis would not only improve its diagnostic capabilities but also aid in selecting the most effective treatment plan for each specific periapical lesion type. Our current method was developed and validated using datasets from a single clinical sites and imaging machine. It is crucial to recognize that distribution shifts may occur when applying the model to CBCT images from different clinical sites, imaging machines, and patient populations. Future research could focus on investigating techniques to make the model more robust to these distribution shifts. Last but not least, although our method is developed to incorporate the knowledge about periapical lesions, we could also include information on other odontogenic lesions. This additional incorporation could aid in detecting malignant lesions, where early detection is imperative.

AcKNOWLEDGMENTS

This study is supported by NIH grant DE031485.

Appendix

A. Hyperparameter Tuning

To gradually increase the values of α,β, and γ, we use a carefully designed exponential function (See Eq. (14)(16)). The rationale behind this function is to ensure a smooth and controlled increase of α,β, and γ over time. The exponential function is employed to ensure a gradual increase, with the rate of change being regulated by the epoch progression. The negative scaling factor of −5 controls the steepness of the curve, preventing the increase from being too abrupt or too slow. Additionally, the epoch normalization term ensures that α,β, and γ increase steadily, allowing the corresponding loss to be reflected in the overall loss function once the model is mature enough to deliver reliable loss values. The square term inside the exponential function introduces non-linearity, making the increase smoother and more controlled as epochs advance. Additionally, since the stability loss, which uses γ as a hyperparameter, has a smaller scale compared to other losses, we multiplied γ by a higher coefficient (e.g., 8) before the exponential function to ensure that it increases to a more substantial level.

α=e-51-minepoch-10500,12 (14)
β=e-51-minepoch-50500,12 (15)
γ=8×e-51-minepoch-50500,12 (16)

Contributor Information

Yeonju Lee, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Min Gu Kwak, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Rui Qi Chen, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Hao Yan, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA.

Mel Mupparapu, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA.

Fleming Lure, MS Technologies Corporation, Rockville, MD 20850, USA.

Frank C. Setzer, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA

Jing Li, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

References

  • [1].Orhan K, Bayrakdar I, Ezhov M, Kravtsov A, and Özyürek T, “Evaluation of artificial intelligence for detecting periapical pathosis on cone-beam computed tomography scans,” International endodontic journal, vol. 53, no. 5, pp. 680–689, 2020. [DOI] [PubMed] [Google Scholar]
  • [2].Tarvainen A and Valpola H, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” Advances in neural information processing systems, vol. 30, 2017. [Google Scholar]
  • [3].Miyato T, Maeda S.-i., Koyama M, and Ishii S, “Virtual adversarial training: a regularization method for supervised and semi-supervised learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 8, pp. 1979–1993, 2018. [DOI] [PubMed] [Google Scholar]
  • [4].Kervadec H, Dolz J, Granger E, and Ben Ayed I, “Curriculum semi-supervised segmentation,” in Medical Image Computing and Computer Assisted Intervention-MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22. Springer, 2019, pp. 568–576. [Google Scholar]
  • [5].Jiao R, Zhang Y, Ding L, Xue B, Zhang J, Cai R, and Jin C, “Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation,” Computers in Biology and Medicine, p. 107840, 2023. [DOI] [PubMed] [Google Scholar]
  • [6].Setzer FC, Shi KJ, Zhang Z, Yan H, Yoon H, Mupparapu M, and Li J, “Artificial intelligence for the computer-aided detection of periapical lesions in cone-beam computed tomographic images,” Journal of endodontics, vol. 46, no. 7, pp. 987–993, 2020. [DOI] [PubMed] [Google Scholar]
  • [7].Zheng Z, Yan H, Setzer FC, Shi KJ, Mupparapu M, and Li J, “Anatomically constrained deep learning for automating dental cbct segmentation and lesion detection,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 2, pp. 603–614, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Li Z, Kamnitsas K, and Glocker B, “Analyzing overfitting under class imbalance in neural networks for image segmentation,” IEEE transactions on medical imaging, vol. 40, no. 3, pp. 1065–1077, 2020. [DOI] [PubMed] [Google Scholar]
  • [9].Kim D, Ku H, Nam T, Yoon T-C, Lee C-Y, and Kim E, “Influence of size and volume of periapical lesions on the outcome of endodontic microsurgery: 3-dimensional analysis using cone-beam computed tomography,” Journal of endodontics, vol. 42, no. 8, pp. 1196–1201, 2016. [DOI] [PubMed] [Google Scholar]
  • [10].Li S, Zhang C, and He X, “Shape-aware semi-supervised 3d semantic segmentation for medical images,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23. Springer, 2020, pp. 552–561. [Google Scholar]
  • [11].Miki Y, Muramatsu C, Hayashi T, Zhou X, Hara T, Katsumata A, and Fujita H, “Classification of teeth in cone-beam ct using deep convolutional neural network,” Computers in biology and medicine, vol. 80, pp. 24–29, 2017. [DOI] [PubMed] [Google Scholar]
  • [12].Esmaeilyfard R, Bonyadifard H, and Paknahad M, “Dental caries detection and classification in cbct images using deep learning,” International Dental Journal, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Ver Berne J, Saadi SB, Politis C, and Jacobs R, “A deep learning approach for radiological detection and classification of radicular cysts and periapical granulomas,” Journal of Dentistry, p. 104581, 2023. [DOI] [PubMed] [Google Scholar]
  • [14].Jang TJ, Kim KC, Cho HC, and Seo JK, “A fully automated method for 3d individual tooth identification and segmentation in dental cbct,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 6562–6568, 2021. [DOI] [PubMed] [Google Scholar]
  • [15].Kirnbauer B, Hadzic A, Jakse N, Bischof H, and Stern D, “Automatic detection of periapical osteolytic lesions on cone-beam computed tomography using deep convolutional neuronal networks,” Journal of Endodontics, vol. 48, no. 11, pp. 1434–1440, 2022. [DOI] [PubMed] [Google Scholar]
  • [16].Jacobs R, “Dental cone beam ct and its justified use in oral health care,” Journal of the Belgian Society of Radiology, vol. 94, no. 5, pp. 254–265, 2011. [DOI] [PubMed] [Google Scholar]
  • [17].Zhai X, Oliver A, Kolesnikov A, and Beyer L, “S41: Self-supervised semi-supervised learning,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1476–1485. [Google Scholar]
  • [18].Yang Z, Yu H, He Y, Sun W, Mao Z-H, and Mian A, “Fully convolutional network-based self-supervised learning for semantic segmentation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 1, pp. 132–142, 2022. [DOI] [PubMed] [Google Scholar]
  • [19].Felfeliyan B, Forkert ND, Hareendranathan A, Cornel D, Zhou Y, Kuntze G, Jaremko JL, and Ronsky JL, “Self-supervised-renn for medical image segmentation with limited data annotation,” Computerized Medical Imaging and Graphics, vol. 109, p. 102297, 2023. [DOI] [PubMed] [Google Scholar]
  • [20].Pan H, Guo Y, Deng Q, Yang H, Chen J, and Chen Y, “Improving, fine-tuning of self-supervised models with contrastive initialization,” Neural Networks, vol. 159, pp. 198–207, 2023. [DOI] [PubMed] [Google Scholar]
  • [21].Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W-Y et al. , “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026. [Google Scholar]
  • [22].Ma J, He Y, Li F, Han L, You C, and Wang B, “Segment anything in medical images,” Nature Communications, vol. 15, no. 1, p. 654, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Zhu Y, Xiong C, Zhao H, and Yao Y, “Sam-att: A prompt-free samrelated model with an attention module for automatic segmentation of the left ventricle in echocardiography,” IEEE Access, 2024. [Google Scholar]
  • [24].Ouali Y, Hudelot C, and Tami M, “An overview of deep semi-supervised learning,” arXiv preprint arXiv:2006.05278, 2020. [Google Scholar]
  • [25].Olsson V, Tranheden W, Pinto J, and Svensson L, “Classmix: Segmentation-based data augmentation for semi-supervised learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1369–1378. [Google Scholar]
  • [26].Cui W, Liu Y, Li Y, Guo M, Li Y, Li X, Wang T, Zeng X, and Ye C, “Semi-supervised brain lesion segmentation with an adapted mean teacher model,” in Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26. Springer, 2019, pp. 554–565. [Google Scholar]
  • [27].Hung W-C, Tsai Y-H, Liou Y-T, Lin Y-Y, and Yang M-H, “Adversarial learning for semi-supervised semantic segmentation,” arXiv preprint arXiv:1802.07934, 2018. [Google Scholar]
  • [28].Souly N, Spampinato C, and Shah M, “Semi supervised semantic segmentation using generative adversarial network,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5688–5696. [Google Scholar]
  • [29].Lee D-H et al. , “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Workshop on challenges in representation learning, ICML, vol. 3, no. 2. Atlanta, 2013, p. 896. [Google Scholar]
  • [30].Wang Y, Wang H, Shen Y, Fei J, Li W, Jin G, Wu L, Zhao R, and Le X, “Semi-supervised semantic segmentation using unreliable pseudo-labels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4248–4257. [Google Scholar]
  • [31].Basak H and Yin Z, “Pseudo-label guided contrastive learning for semi-supervised medical image segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 19786–19797. [Google Scholar]
  • [32].Yang F, Wu K, Zhang S, Jiang G, Liu Y, Zheng F, Zhang W, Wang C, and Zeng L, “Class-aware contrastive semi-supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14421–14430. [Google Scholar]
  • [33].Thompson BH, Di Caterina G, and Voisey JP, “Pseudo-label refinement using superpixels for semi-supervised brain tumour segmentation,” in 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). IEEE, 2022, pp. 1–5. [Google Scholar]
  • [34].Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, and Raffel CA, “Mixmatch: A holistic approach to semi-supervised learning,” Advances in neural information processing systems, vol. 32, 2019. [Google Scholar]
  • [35].Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, and Li C-L, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” Advances in neural information processing systems, vol. 33, pp. 596–608, 2020. [Google Scholar]
  • [36].Yang L, Qi L, Feng L, Zhang W, and Shi Y, “Revisiting weak-to-strong consistency in semi-supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7236–7246. [Google Scholar]
  • [37].Chen X, Yuan Y, Zeng G, and Wang J, “Semi-supervised semantic segmentation with cross pseudo supervision,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2613–2622. [Google Scholar]
  • [38].Lu L, Yin M, Fu L, and Yang F, “Uncertainty-aware pseudo-label and consistency for semi-supervised medical image segmentation,” Biomedical Signal Processing and Control, vol. 79, p. 104203, 2023. [Google Scholar]
  • [39].Xie X, Niu J, Liu X, Chen Z, Tang S, and Yu S, “A survey on incorporating domain knowledge into deep learning for medical image analysis,” Medical Image Analysis, vol. 69, p. 101985, 2021. [DOI] [PubMed] [Google Scholar]
  • [40].Mirikharaji Z and Hamarneh G, “Star shape prior in fully convolutional networks for skin lesion segmentation,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16–20, 2018, Proceedings, Part IV 11. Springer, 2018, pp. 737–745. [Google Scholar]
  • [41].Xue Y, Tang H, Qiao Z, Gong G, Yin Y, Qian Z, Huang C, Fan W, and Huang X, “Shape-aware organ segmentation by predicting signed distance maps,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12565–12572. [Google Scholar]
  • [42].Gupta S, Hu X, Kaan J, Jin M, Mpoy M, Chung K, Singh G, Saltz M, Kurc T, Saltz J et al. , “Learning topological interactions for multi-class medical image segmentation,” in European Conference on Computer Vision. Springer, 2022, pp. 701–718. [Google Scholar]
  • [43].Azad R, Aghdam EK, Rauland A, Jia Y, Avval AH, Bozorgpour A, Karimijafarbigloo S, Cohen JP, Adeli E, and Merhof D, “Medical image segmentation review: The success of u-net,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [DOI] [PubMed] [Google Scholar]
  • [44].Chen RQ, Lee Y, Yan H, Mupparapu M, Lure F, Li J, and Setzer FC, “Leveraging pre-trained transformers for efficient segmentation and lesion detection in cone-beam ct scans,” Journal of Endodontics, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Moran M, Faria M, Bastos L, Giraldi G, and Conci A, “Combining image processing and artificial intelligence for dental image analysis: Trends, challenges, and applications,” Trends and advancements of image processing and its applications, pp. 75–105, 2022. [Google Scholar]
  • [46].Laine S and Aila T, “Temporal ensembling for semi-supervised learning,” arXiv preprint arXiv:1610.02242, 2016. [Google Scholar]
  • [47].Yu L, Wang S, Li X, Fu C-W, and Heng P-A, “Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation,” in Medical image computing and computer assisted intervention–MICCAI 2019: 22nd international conference, Shenzhen, China, October 13–17, 2019, proceedings, part II 22. Springer, 2019, pp. 605–613. [Google Scholar]
  • [48].Armstrong RA, “When to use the b onferroni correction,” Ophthalmic and Physiological Optics, vol. 34, no. 5, pp. 502–508, 2014. [DOI] [PubMed] [Google Scholar]
  • [49].Loshchilov I and Hutter F, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017. [Google Scholar]
  • [50].Tramèr F, Behrmann J, Carlini N, Papernot N, and Jacobsen J-H, “Fundamental tradeoffs between invariance and sensitivity to adversarial perturbations,” in International Conference on Machine Learning. PMLR, 2020, pp. 9561–9571. [Google Scholar]
  • [51].Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, and Batra D, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626. [Google Scholar]
  • [52].Vinogradova K, Dibrov A, and Myers G, “Towards interpretable semantic segmentation via gradient-weighted class activation mapping (student abstract),” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 10, 2020, pp. 13 943–13 944. [Google Scholar]
  • [53].Zhang Y, Hong D, McClement D, Oladosu O, Pridham G, and Slaney G, “Grad-cam helps interpret the deep learning models trained to classify multiple sclerosis types using clinical brain magnetic resonance imaging,” Journal of Neuroscience Methods, vol. 353, p. 109098, 2021. [DOI] [PubMed] [Google Scholar]
  • [54].Ghosal S and Shah P, “Interpretable and synergistic deep learning for visual explanation and statistical estimations of segmentation of disease features from medical images,” arXiv preprint arXiv:2011.05791, 2020. [Google Scholar]
  • [55].Wu J, Fan H, Zhang X, Lin S, and Li Z, “Semi-supervised semantic segmentation via entropy minimization,” in 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2021, pp. 1–6. [Google Scholar]
  • [56].Boubaris M, Cameron A, Love R, and George R, “Sphericity of periapical lesion and its relation to the novel cbct periapical volume index,” Journal of Endodontics, vol. 48, no. 11, pp. 1395–1399, 2022. [DOI] [PubMed] [Google Scholar]

RESOURCES