Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurance

Hootan Kamran; Dionne Aleman; Chris McIntosh; Tom Purdie

doi:10.1371/journal.pone.0321968

. 2025 Jun 10;20(6):e0321968. doi: 10.1371/journal.pone.0321968

Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurance

Hootan Kamran ¹, Dionne Aleman ^1,^*, Chris McIntosh ², Tom Purdie ²

Editor: Zeyar Aung³

PMCID: PMC12151409 PMID: 40493678

Abstract

In this study, we address the inherent challenges in radiotherapy (RT) plan quality assessment (QA). RT, a prevalent cancer treatment, utilizes high-energy beams to target tumors while sparing adjacent healthy tissues. Typically, an RT plan is refined through several QA cycles by experts to ensure it meets clinical and operational objectives before being considered safe for patient treatment. This iterative process tends to eliminate unacceptable plans, creating a significant class imbalance problem for machine learning efforts aimed at automating the classification of RT plans as either acceptable or not. The complexity of RT treatment plans, coupled with the aforementioned class imbalance issue, introduces a generalization problem that significantly hinders the efficacy of traditional binary classification approaches. We introduce a novel one-class classification framework, using an adaptive neural network architecture, that outperforms both traditional binary and standard one-class classification methods in this imbalanced and complex context, despite the inherent disadvantage of not learning from unacceptable plans. Unlike its predecessors, our method enhances anomaly detection for RT plan QA without compromising on interpretability—a critical feature in healthcare applications, where understanding and trust in automated decisions are paramount. By offering clear insights into decision-making processes, our method allows healthcare professionals to quickly identify and address specific deficiencies in RT plans deemed unacceptable, thereby streamlining the QA process and enhancing patient care efficiency and safety.

Author summary

Neural Networks, Adaptive Resonance Theory, Anomaly Detection, Outlier Detection, One-Class Classification, Quality Assurance, Explainable, Interpretable

1 Introduction

An estimated 225,800 new cancer cases and 83,300 cancer deaths occurred in Canada in 2020, about 40% of which will be treated with external beam radiation therapy [1]. RT treatment plans must undergo a comprehensive quality assurance (QA) process that requires highly-skilled interdisciplinary man-hours. Machine learning algorithms can help automate QA by analyzing features that led to the creation of acceptable and unacceptable plans in the past [2]. However, a comprehensive RT QA requires machine learning on high-dimensional datasets that store information about the patient’s anatomy, the dose volumes, and the machine deliverability. On the other hand, real clinical datasets have a small number of plans (because of inter-institutional variabilities [3] and changing treatment guidelines even within institutions), and are label-imbalanced in favor of acceptable plans (because unacceptable plans, even if recorded, may be overridden by improved plans). We therefore treat RT QA as an anomaly detection problem on small-sample high-dimensional datasets where anomalies are unacceptable. We developed a one-class classifier (OCC) based on adaptive resonance theory (ART), which we call OART, to detect unacceptable treatment plans. OART selects stable features and extracts their memory values to improve anomaly detection by dimensionality reduction, and allows for direct interpretation of classification results in terms of the input features.

The quality of a plan may be unacceptable for a variety of reasons: poorly delineated regions of interest (ROIs) may lead to unrealistic estimations of the dosimetric variables [4], poorly designed delivery plans may lead to delivery complexities [5], and inaccurately-estimated dose volumes may lead to under/overdosages of the tumour(s)/organs at risk (OARs) [6], making traditional binary classification of acceptable v. unacceptable plans challenging, especially as there is a risk that a future unacceptable plan is a truly novel anomaly in that it does not look like historical unacceptable plans. Prior work demonstrated the feasibility of binary classification for RT QA [2], but these approaches rely on the assumption that all error types have been observed in training. This limitation motivates the use of anomaly detection methods, such as our OART model, which learns decision boundaries from only acceptable plans and generalizes to novel errors. We therefore develop a one-class framework for RT QA where norms are learned by training on data from only one class (the majority; here, the acceptable plans) and in testing, objects that deviate from the learned norms are considered unacceptable.

To measure adherence to immediate clinical standards, anatomical [7], dosimetric [8], and machine deliverability [9] features must all be present in the training data. Handling the sparsity in high-dimensional feature spaces is crucial, as highlighted in previous studies on RT planning [10]. Our approach, OART, explicitly addresses this challenge by incorporating a feature selection mechanism that stabilizes anomaly detection results while maintaining interpretability. In such problems as RT planning, where datasets have small sample sizes, high-dimensional feature spaces introduce a particularly high overfitting risk, especially of the underrepresented class [49], in this case, unacceptable plans. We additionally perform binary classification as a baseline to evaluate performance against OCC using our OART algorithm.

1.1 Binary classification

To compare or baseline against OCC, we use the following binary classification algorithms: logistic regression, random forests, support vector machines (SVM), naive Bayes, and multilayer perceptron artificial neural networks (NN). These methods are briefly described here.

The binary classification problem takes a set of m inputs, each with n real-valued features, and one binary label. Given $X \in ℝ^{m \times n}, Y \in {0, 1}^{m \times 1}$ , find $f^{θ}$ such that $f^{θ} (X)$ approximates Y as well as possible, the error is called loss. The idea is that, a trained function $f^{θ^{*}} (X)$ that achieved low loss may then, by induction, be used on a new input, say, $X^{'}$ to predict their unknown labels $Y^{'}$ . Depending on the algorithm, the function $f^{θ}$ may be a logistic function applied to a weighted sum (logistic regression), stacked nonlinearities (random forests, neural nets), a hyperplane (SVM), or class-conditional probabilities (NB).

1.2 One-class classification

One common approach for anomaly detection is with one-class classification [11], where only acceptable samples are available in training. Unacceptable samples are included in the testing set for performance metrics. There are algorithms for OCC based on both neural networks [12–14] and support vector machines (SVMs) [15]. Neural autoencoders, for example, learn to optimally reconstruct acceptable plans by backpropagating reconstruction error through a given network structure of neural units [16]. Once the weights are tuned to reconstruct training plans, testing plans’ reconstruction errors are used for classification. Unacceptable plans are expected to exhibit higher reconstruction error than acceptable plans, because the network has not been trained to reconstruct them.

We compare OART to other unsupervised feature extractors applicable to a one-class setting: principal component analysis (PCA) [17], independent component analysis (ICA) [18], and autoencoders (AEs) [19]. One-class classifiers in the literature are usually based on neural networks (restricted Boltzmann machines [20] and AE classifiers [21]), or one-class SVMs (OSVMs) [15,22]. Restricted Boltzmann machines require deep structures and very large training datasets [23], while RT datasets, including ours, generally have small sample sizes. We therefore implement AE and OSVM as classifiers after feature selection with OART.

OSVM algorithms identify anomalies by defining the smallest hyperspace that encloses most of the acceptable plans and classifying any outliers as anomalies. Depending on the approach, this hyperspace can be a sphere [22] or a hyperplane [15], with both methods showing comparable performance [11]. However, OSVM struggles in high-dimensional feature spaces [24], such as those encountered in RT treatment datasets, where the complexity of anatomical, dosimetric, and machine deliverability features increases the risk of overfitting and inaccurate margins. Unlike OSVM, our OART method leverages adaptive resonance theory to dynamically refine feature selection, mitigating overfitting while preserving decision boundaries that align with clinical QA needs. By focusing on stable feature extraction, OART provides a more interpretable and robust approach to anomaly detection in RT planning.

Deep One-Class Classification [25], while being widely used and effective for one-class classification tasks, requires data from both classes during training. In contrast, our algorithm leverages the power of adaptive resonance theory and requires only samples from a single class for training. This characteristic allows our model to excel in scenarios where data from the other class might be scarce or unavailable.

HRN [26], on the other hand, offers a one-class learning approach and can be trained with a single class. However, it employs a multilayer perceptron with a specific cost function, which can make the model less interpretable. In contrast, our model maintains transparency and interpretability, allowing for easier understanding of the decision-making process.

ART is a simplistic mathematical model of feature learning in the biological brain. Since its introduction in 1987 [27], many different variations have been proposed to improve the existing methodology, or to provide context-specific solutions. ART-based solutions have been used in different machine learning tasks, namely unsupervised, supervised and reinforcement learning [28]. PART [29] is a particular extension of ART, that was developed as a projective version of ART for pattern recognition in high-dimensional spaces. PART-TSP [30,31] was developed later as a supervised version of the projective ART.

The high-dimensionality (with mixed feature types), and class imbalance are the two main challenges in out RT QA problem. An ART-based model solves both problems simultaneously in an interpretable fashion: 1) The adaptive learning process removes reliance on backpropagation and allows for learning from only acceptable samples. 2) ART-based systems provide scalable, fast, and reliable machine learning solutions for mixed-domain input [32] which high dimensionality [33], which is the case for a comprehensive RT QA.

In OART, learning happens as a result of an adaptive resonance procedure that focuses on stable similarities between acceptable samples. The primary contribution of this work is one-class feature extraction that selectively computes stable long-term values and quantifies deviations from them to measure anomaly. One-class learning can happen without any samples required from the unacceptable class, thanks to the adaptive learning that does not require backpropagation. The extracted features may be combined with any OCC, or be used as a standalone OCC with interpretable classification scores.

While ART has been widely studied and applied in various domains, our contribution extends beyond the mere application to radiotherapy treatment planning. OART is the only one-class supervised version of the ART algorithm, specifically designed for anomaly detection. By leveraging the one-class classification framework, OART learns solely from acceptable plans, allowing for interpretable and explainable quality assurance. This adaptation introduces a novel perspective in the field of RT plan classification, contributing to both the advancement of ART and the explainable enhancement of patient safety in radiotherapy treatments.

Since clinical norms and standards may change over time, and the QA system is expected to adapt, we use a variation of adaptive resonance theory (ART) [34] as the base learner in our methodology. Furthermore, such networks learn adaptively from matches, rather than backpropagating mismatches; and are therefore a natural match for our one-class problem statement. Our neural network assigns two weights to each feature: a binary switch to indicate the presence of that feature in the memory and a real-valued weight to store long-term memory. It adaptively learns the values of these two sets of weights according to a resonance procedure that requires samples from only one class. By keeping the features unmixed in the feature extraction phase, OART provides interpretable results, that can be particularly valuable in RT QA, where the planners can use feature-specific feedback for plan quality improvement.

We used two clinical datasets of breast and prostate treatment plans to validate our model. We show that for both datasets, OART outperforms other common unsupervised feature extraction methods in improving the performance of OSVM and AE as one-class classifiers. We also show that OART, as a classifier by itself, provides unique feature-wise interpretability, while achieving only slightly lower AUC than the best performer (OART+AE).

While previous research in binary classification and one-class approaches has shown promise in RT QA, these methods either struggle with novel anomalies or suffer from poor performance in high-dimensional, small-sample datasets. Our method, OART, addresses these limitations by integrating adaptive resonance theory with feature selection, enabling robust and interpretable anomaly detection in RT planning.

2 Data

We use two clinically curated radiotherapy datasets of breast and prostate cancer patients. They are two of the most common cancers with 12.5% and 12.2% respectively [35], and thus not only are an impactful group to study but have a lot of data. The breast dataset has 160 treatments (142 acceptable and 18 unacceptable) each with 1,727 numeric features, and the prostate dataset has 434 treatments (421 acceptable and 13 unacceptable) each with 2,697 numeric features, owing to the larger number of OARs in prostate cancer (Table 1).

Table 1. Plan features and feature counts.

Category	Subcategory	Details
(feat. count¹)
ROI (1550, 2453)	Shape	centroid
		variation along PCs²
		volume
		SDF³ histogram
		maximum thickness
	Intensity	mean
		mode
		standard deviation
		minimum and maximum
		histogram
	Joint	shape and dose features
Dose (13, 80)	DVH	mean
		mode
		standard deviation
		minimum and maximum
Machine (164, 164)	Beam	angle, distance, area
	Fraction	number of beams
	Plan	number of fractions
	Plan	Complexity ⁴

Open in a new tab

1 Count of features in breast, and prostate datasets, respectively 2 Principal components 3 Signed distance function has positive values inside the OAR and negative values outside [36]. 4 Modulation complexity score [5]

2.1 Plan features

Each plan has three categories of features: ROI features, dose features, and machine features. Table 1 summarizes feature details and feature counts. The ROI features capture the anatomical shape and intensity of the OARs and the tumour(s). If a particular ROI is present in only a subset of the plans, the corresponding ROI features are set to zero for the plans in which that ROI is not present. Breast will include lung, heart, and breast at minimum, and prostate will include prostate, left femur, right femur, bladder, and rectum. The number of ROI and dose features are higher in the prostate dataset, as typically more organs are involved in a prostate case than in a breast case. But since both breast and prostate plans are designed for the same technique (volumetric modulated arc therapy), the number of machine features are similar.

The dosimetric features capture how much beam radiation each ROI receives. Dose volume histograms [37] measure what proportion of the OARs and the tumour(s) receive what proportion of the prescribed dose. Ideally, most of the tumour tissues receive at least a certain amount of radiation, while as little of the healthy organs as possible receive more than the prescribed dose.

The machine features capture the radiation variables from the source’s point of view. Each plan is divided into several fractions, each of which will be delivered in a separate session. Each fraction, then, includes a set of beams whose intensities, distances to ROI surfaces, and effective areas are calculated. An overall complexity score [5] is also included to reflect deliverability complexities. Plans with high complexity are undesirable.

Since the number of beams and control points vary per plan, beam and control point features are variable in length, but all records should contain the same number of features for classification. We therefore used a bag-of-words model combined with dictionary learning [38] to map these variable number of beam and control point related features to a constant length feature space based on their membership to unsupervised clusters of data points in full dimensionality. The bag-of-words model can be split into a training phase, where the system learns a dictionary of basis functions, and an encoding phase, where the dictionary is used to extract features from new inputs.

2.2 Plan samples

Let T be the set of all available plans that are all labelled as acceptable (-1) or unacceptable (+1) by clinical experts.

Unlike binary classification, during training for one-class classification, all unacceptable plans (T⁺) are left out of training dataset and are only used in the testing dataset. The acceptable plans ( $T^{-}$ -) are randomly split (80%-20%) into $T_{train}^{-}$ and $T_{test}^{-}$ . Therefore, the training is performed on only the acceptable plans, while testing is performed on a mix of acceptable and unacceptable plans.

3 Methods

3.1 Data acquisition and ethical considerations

The datasets were obtained over three years from Princess Margaret Cancer Centre (Toronto, Canada) under an institutional ethics-approved research protocol. We obtained Research Ethics Board (REB) approval from the University of Toronto. The protocol reference number is 14-8165. Since we used retrospective study of medical records or archived samples and the data was de-identified (anonymized), no patient identifying information is available. The need for consent was waived.

3.2 OART structure and training

The goal of our training approach is to adaptively learn from samples that resonate with long-term memory. We utilize an ART-based neural network structure, called OART, to form a stable and adaptable long-term memory of acceptable plans. By measuring the deviation of a new plan x from this memory, we can predict its acceptance or rejection.

OART is a neural network consisting of two sets of weights: a vector of boolean switches $f \in {0, 1}^{m}$ and a vector of real-valued long-term memories $z \in [0, 1]^{m}$ , where m is the number of original input features. Each input dimension $i = 1, \dots, m$ has a switch $f [i]$ that acts as a gate for the sensory input to reach the long-term memory for comparison. This mechanism allows the network to focus its attention on a subset of features with stable memories and discard the rest. The real-valued $z [i]$ represents the long-term memory for feature i and is adaptively updated during resonance learning.

To mitigate the order dependence in vanilla adaptive learning, we adopt an unsupervised implementation of ART with distributed dual vigilance [39]. For our one-class problem, we repeat the training for E rounds of randomly-ordered training plans. At each round, the training starts with a shuffled full epoch of training plans and continues until the number of surviving features stabilizes (i.e., does not increase by more than 0.1%).

At the beginning of each training round, the memory values are set to the first training plan, and all feature switches are initially active:

z [i] = T_{train}^{-} [1, i] i = 1, \dots, m

f [i] = 1 i = 1, \dots, m

Here, $T_{train}^{-}$ is the training matrix where the first index represents the sample, and the second index represents the feature.

During each training step, learning occurs through a resonance check between each feature of the old memory and that of the new plan, followed by an adaptive update of the memory towards resonant features of the new plan. Only features that match (i.e., are within a distance of $σ$ ) between the new plan and the old memory remain active, while the rest are deactivated ( $f [i] = 0, i \notin {i : | x [i] - z [i] | \leq σ}$ ) for the remainder of that training round. Resonant features undergo an adaptation procedure where the long-term memory is shifted towards the new plan. Another plan is then presented as input in the next training step. Fig 1 shows the structure of OART.

Fig 1 — Bold arrow indicates vectors and regular arrow indicates scalars.

For a given distance parameter $σ$ , let $z_{σ}$ and $f_{σ}$ denote the weights. Then the learning equations are as follows:

z_{σ}^{new} = (1 - α) z_{σ}^{old} + α x

(1)

f_{σ}^{new} = f_{σ}^{old} ⊙ h_{σ} (x, z^{old})

where $α \in (0, 1)$ is the learning rate, $⊙$ denotes element-wise multiplication, and

h_{σ} (x, z^{old}) [i] = {\begin{matrix} 1 & if | x [i] - z^{old} [i] | \leq σ \\ 0 & otherwise \end{matrix} i = 1, \dots, m

performs the similarity check.

Fig 2 — Bold arrow indicates vectors and regular arrow indicates scalars.

Fig 3 — Bold arrow indicates vectors and regular arrow indicates scalars.

Fig 4 — Bold arrow indicates vectors and regular arrow indicates scalars.

3.3 Feature extraction and classification

At the end of each training round $e = 1, \dots, E$ , the terminal values of the weights provide the survived features and their corresponding values, denoted as $f_{σ}^{e}$ and $z_{σ}^{e}$ , respectively. Features that survive in more than a desired proportion $γ$ of training rounds are considered stable features and their memory values are extracted as the average of the terminal training values across all rounds (Figs 5 and 6). We define $F_{σ}$ as the set of stable features:

Fig 5 — Left, middle: The number of survived features are for $σ = 0.2$ (left) and $σ = 0.8$ (middle) over different runs. Dashed lines show number of features that survived a particular run as an example. Solid lines show the count of features who survived most of the runs, and therefore are labelled stable. Right: The number of stable features at each value of $σ \in {0.01, 0.03, \dots, 0.99}$ , with $σ = 0.19$ and $σ = 0.79$ denoted 0.2 and 0.8 on the x-axis.

Fig 6 — Left, middle: The number of survived features are for $σ = 0.2$ (left) and $σ = 0.8$ (middle) over different runs. Dashed lines show number of features that survived a particular run as an example. Solid lines show the count of features who survived most of the runs, and therefore are labelled stable. Right: The number of stable features at each value of $σ \in {0.01, 0.03, \dots, 0.99}$ , with $σ = 0.19$ and $σ = 0.79$ denoted 0.2 and 0.8 on the x-axis.

F_{σ} = {i : \sum_{e = 1}^{E} f_{σ}^{e} [i] \geq γ E, i = 1, \dots, m}

(2)

{\overset{―}{z}}_{σ} [i] = \frac{1}{E} \sum_{e = 1}^{E} z_{σ}^{e} [i] i = 1, \dots, m

{\overset{―}{f}}_{σ} [i] = {\begin{matrix} 1 & i \in F_{σ} \\ 0 & otherwise \end{matrix} i = 1, \dots, m

To measure the deviation of a plan x from the memory values ${\overset{―}{z}}_{σ}$ of the features in $F_{σ}$ , we define $Δ_{σ} (x)$ as follows:

Δ_{σ} (x) [i] = | x [i] - {\overset{―}{z}}_{σ} [i] | i \in F_{σ}

(3)

The classification is then performed on the deviations $Δ_{σ} (x)$ , which reside in $ℝ_{+}^{| F_{σ} |}$ . We boost the classification results by adding classification scores over a set of $σ$ values, where smaller values represent tighter similarity conditions, resulting in a smaller number of features surviving in stable memory.

It is important to note that there are many variants of autoencoders, each with its own unique characteristics and use cases. For example, convolutional autoencoders [40] are designed for image data, while recurrent autoencoders [41] are designed for sequential data. Sparse autoencoders [42] are designed to learn sparse representations of the input data, while contractive autoencoders [43] are designed to be more robust to small perturbations in the input data. Laplacian autoencoders [44] are designed to learn a low-dimensional representation of the input data that preserves the local structure of the data. However, we only used a vanilla as a representative, because our main claimed advantage of OART is explainability, which is absent in all AE variants as a result of their multilayer perceptron structure.

In the case of using an Autoencoder for one-class classification, we follow a two-step process. Firstly, we employ the AE as a feature extractor by cutting the network at the middle layer, which serves as the bottleneck layer capturing the deep features of the input data. Once the deep features are extracted, we utilize the AE as a classifier. To make one-class classification decisions, we use the reconstruction mean squared error (MSE) as the classification score. Outlier instances are expected to exhibit higher MSE during the reconstruction process.

To determine the classification threshold, we conduct testing on a set of labeled data. By varying the threshold from the minimum to the maximum MSE observed during testing, we can explore different trade-offs between false positives and false negatives.

The OSVM was implemented in the libsvm package with Gaussian kernel (we excluded linear, polynomial, and sigmoid kernels, as none provided AUC larger than 0.5). The algorithm looks for the smallest hyperspace in the kernel space that contains most of training plans. Let $Φ$ be a feature map $ℝ_{+}^{| F_{σ} |} \to ℱ$ , i.e., a map from $Δ_{σ}$ space into an inner product space $ℱ$ such that the inner product can be computed by evaluating some simple kernel [45]. Then, the separating hyperspace is found by solving the following optimization problem:

\begin{array}{l} \underset{W \in F, s \in ℝ_{+}^{n}, b \in ℝ}{minimize} \frac{1}{2} | | W | |^{2} + \frac{1}{ν \times n} \sum_{η = 1}^{n} (s [η] - b) ​ \\ subject to W \cdot Φ (Δ_{σ} (x)) \geq b - s [η] \end{array}

where b is the bias, s is the vector of the slack variables, and $ν$ is a parameter in (0,1] that trades off between the two objective terms: hyperspace and the proportion of acceptable plans to encompass. Note that a positive dot product $W \cdot Φ (Δ (x))$ indicates that x is within acceptable hyperspace and therefore its class label is acceptable, and vice versa.

Let $W_{σ}^{ν}$ and $b_{σ}^{ν}$ denote the optimal values of W and b for distance parameter $σ$ and model complexity $ν$ . The decision boundary is then

S_{σ}^{ν} (x) = W_{σ}^{ν} \cdot Φ (Δ_{σ} (x)) - b_{σ}^{ν}

which is positive for most training acceptable plans. Unacceptable plans should have $S_{σ}^{ν} (x) < 0$ . The magnitude indicates how far from the boundary the plan is. Different values $ν \in V = {0.01, 0.02, \dots, 0.99}$ are considered. Classification scores from distance parameters $σ \in U$ and complexities $ν \in V$ are aggregated by summation into a single score:

\overset{―}{S} (x) = \sum_{ν \in V} \sum_{σ \in U} S_{σ}^{ν} (x)

4 Results

We first present the classification results obtained from OSVM and AE classifiers with and without applying feature extractors. Then, we present the clinical interpretations of classification on OART’s features.

4.1 Classification results

Our MATLAB implementation of OART is available on github, and the link is https://github.com/d-aleman/oarthere. We ran OART on a 3.5 GHz Intel Core i7 processor with 16GB RAM.

We utilized min-max scaling to normalize the features. This normalization technique was chosen to ensure that all features have a consistent scale, thus preventing any particular feature from dominating the classification process.

Although the breast dataset has fewer features, it has so many fewer samples that its record-to-feature ratio is still lower than that of the prostate dataset. There are 119 training samples for breast with 1,727 features (a record-to-feature ratio of 6.89%), and 345 training samples for prostate with 2,697 features (record-to-feature ratio of 12.79%). In Sect 5, we discuss the potential repercussions of the breast dataset’s especially sparse training space.

The values of parameters $α$ and $γ$ cannot be optimized end-to-end because there is no binary classification error metric. The value of $α$ , the adaptive learning rate in Eq 1, is set to a small number ( $α = 0.1$ ) to only slightly shift the long-term memory of OART towards the new resonant plans. The value of $γ$ , the survival stability rate in Eq 2 is set to a large number ( $γ = 0.9$ ) to keep the features that survived almost all training shuffles (the difference is highlighted by solid line and the dashed line in Fig 5 for breast; and Fig 6 for prostate).

Training iterations stop when the marginal change is less than 0.1%. In a single training round $e = 1, \dots, E$ , the size of survived dimensions $\sum_{i = 1}^{m} f [i]$ decreases by less than 0.1% before the end of the second epoch (Figs 5 and 6), so each training round e includes two shuffled epochs of training plans. For breast and prostate, the size of the stable features $| F_{σ} |$ changes by less than 0.1% after 12 and 17 training rounds, respectively, so we set E = 20 as a conservative number of epochs to reach convergence.

Our approach involves a two-pronged strategy for applying the $σ$ values. Initially, the data features are linearly normalized into the range [0,1]. Given that the distance parameter dictates the degree of similarity necessary for resonance learning, we compare two cases: learning from a full range between 0 and 1, in increments of 0.02 ( $U = 0.01, 0.03, \dots, 0.99$ ) and learning only from the more sensitive (smaller) $σ$ values ( $\tilde{U} = 0.01, 0.03, \dots, 0.49$ ). Each $σ$ value, regardless of being larger or smaller, has the potential to offer valuable insights. Without comprehensive binary validation, it would be presumptuous to deem any $σ$ values as ’useless’, hence our inclusive approach.

Once OART features are extracted for various $σ \in U$ , we have two paths for classification. Firstly, the raw sum of deviations can be treated as a classification score, providing an intuitive yet effective classification criterion - higher total deviations suggest a higher likelihood of an entity being an outlier. This method ensures interpretability of the classification score. Alternatively, these raw features can be forwarded to classifiers like OSVM and the AE. While this could offer a slight gain in accuracy, it comes at the expense of explainability. Regardless of the path chosen, the scores from different $σ$ values are aggregated, hence the use of summation.

For lower $σ$ values, most of stable features are machine features and for larger $σ$ values, ROI and dose features are also included in the stable memory (Figs 5 and 6). Machine features of the acceptable plans are therefore expected not to deviate much from memory values. Testing plans that exhibit large deviations on machine features are likely to be classified as unacceptable. Larger $σ$ means more tolerance to deviation from memory, larger number of survived dimensions in each training round, and therefore more stable features. Both breast and prostate plans exhibit similar performance, so only prostate plots are shown.

It is essential to clarify the data shuffling method employed in our study. Initially, the entire set of training samples is shuffled, forming the first epoch. Subsequently, another epoch is shuffled and appended, resulting in a sequence of twice the size, in which each data point appears once in the first half and once in the second. This procedure ensures a fair distribution and exposure of data points throughout the training process. The observed plateau in Figs 5 and 6 is indicative of the model’s learning behavior that is explained below. Essentially, the model size stabilizes after the first epoch, reflecting that the first epoch is sufficient to extract the significant patterns from the data. The consistency of this behavior across all training rounds suggests that the algorithm’s learning is not significantly boosted by subsequent passes over the data, hence the stabilization of model size during the second epoch, and no need for further epochs.

OART performance is compared to PCA, ICA, and AEs. Since the number of plans is smaller than the number of original features, PCA and ICA provide as many components as the samples. as the samples. All of these components are included in the classification. We determined the size and structure of the autoencoder based on the reconstruction error. We considered three structures: shallow, deep, and bottleneck using 50, 100, 200, 400, 800, and 1600 units with ${tan}^{- 1}$ activation function and dropout regularization. The reconstruction error for all structures and unit numbers is 0.026 with $< 0.1 %$ variation between structures. We selected the structure and unit size with the lowest reconstruction MSE (bottleneck with 1600 units).

The features from each feature extraction method are then inputted into OSVM and AE for one-class classification (e.g., PCA+OSVM and OART+AE). Since OART’s features indicate deviation from stable memory values, we also performed classification based on the sum of OART’s feature values (OART+ $Σ$ ). Tables 2 and 3 summarize the performance metrics.

Table 2. Breast: 95% confidence intervals for performance metrics. Highlight indicates best performance in each dataset.

	Features	Classifier	${Spec}_{100}$	${Spec}_{80}$	${Sens}_{100}$	${Sens}_{80}$	AUC
Binary	original	LR	0	0	${0.19}_{\pm 0.08}$	${0.55}_{\pm 0.03}$	${0.75}_{\pm 0.02}$
		RF	${0.16}_{\pm 0.03}$	${0.39}_{\pm 0.04}$	${0.35}_{\pm 0.04}$	${0.42}_{\pm 0.04}$	${0.65}_{\pm 0.03}$
		SVM	0	${0.06}_{\pm 0.06}$	${0.21}_{\pm 0.08}$	${0.60}_{\pm 0.03}$	${0.78}_{\pm 0.02}$
		NB	0	0	${0.03}_{\pm 0.04}$	${0.42}_{\pm 0.04}$	${0.64}_{\pm 0.02}$
		NN	0	0	${0.14}_{\pm 0.03}$	${0.53}_{\pm 0.03}$	${0.72}_{\pm 0.02}$
	LASSO	$LR$	0	0	0	${0.21}_{\pm 0.03}$	${0.53}_{\pm 0.08}$
	PCA	$LR$	0	0	${0.15}_{\pm 0.06}$	${0.57}_{\pm 0.03}$	${0.73}_{\pm 0.02}$
		RF	${0.12}_{\pm 0.07}$	${0.20}_{\pm 0.05}$	${0.34}_{\pm 0.03}$	${0.45}_{\pm 0.02}$	${0.67}_{\pm 0.02}$
		SVM	0	0	${0.18}_{\pm 0.04}$	${0.53}_{\pm 0.04}$	${0.73}_{\pm 0.03}$
		NB	${0.05}_{\pm 0.06}$	${0.07}_{\pm 0.02}$	${0.10}_{\pm 0.02}$	${0.42}_{\pm 0.02}$	${0.66}_{\pm 0.02}$
		NN	0	0	0	${0.32}_{\pm 0.04}$	${0.68}_{\pm 0.02}$
	ICA	$LR$	0	0	${0.15}_{\pm 0.05}$	${0.60}_{\pm 0.03}$	${0.75}_{\pm 0.02}$
		RF	${0.10}_{\pm 0.03}$	${0.27}_{\pm 0.04}$	${0.30}_{\pm 0.03}$	${0.45}_{\pm 0.02}$	${0.65}_{\pm 0.03}$
		SVM	0	0	${0.19}_{\pm 0.04}$	${0.50}_{\pm 0.02}$	${0.76}_{\pm 0.03}$
		$NB$	0	0	${0.10}_{\pm 0.07}$	${0.49}_{\pm 0.04}$	${0.73}_{\pm 0.02}$
		NN	0	0	${0.05}_{\pm 0.05}$	${0.30}_{\pm 0.02}$	${0.63}_{\pm 0.02}$
	AE	$LR$	0	${0.11}_{\pm 0.03}$	0	${0.38}_{\pm 0.03}$	${0.63}_{\pm 0.02}$
		RF	${0.08}_{\pm 0.04}$	${0.21}_{\pm 0.03}$	${0.23}_{\pm 0.03}$	${0.36}_{\pm 0.02}$	${0.65}_{\pm 0.03}$
		SVM	${0.07}_{\pm 0.05}$	${0.22}_{\pm 0.03}$	${0.31}_{\pm 0.02}$	${0.44}_{\pm 0.03}$	${0.78}_{\pm 0.02}$
		$NB$	0	0	${0.07}_{\pm 0.03}$	${0.30}_{\pm 0.03}$	${0.73}_{\pm 0.02}$
		NN	0	${0.14}_{\pm 0.03}$	${0.23}_{\pm 0.02}$	${0.39}_{\pm 0.02}$	${0.75}_{\pm 0.02}$
OCC	original	$AE$	${0.07}_{\pm 0.03}$	${0.53}_{\pm 0.06}$	${0.05}_{\pm 0.03}$	${0.66}_{\pm 0.04}$	${0.73}_{\pm 0.02}$
		OSVM	${0.09}_{\pm 0.02}$	${0.33}_{\pm 0.03}$	${0.08}_{\pm 0.03}$	${0.48}_{\pm 0.03}$	${0.65}_{\pm 0.02}$
	PCA	AE	${0.07}_{\pm 0.02}$	${0.23}_{\pm 0.04}$	${0.24}_{\pm 0.04}$	${0.36}_{\pm 0.03}$	${0.59}_{\pm 0.02}$
		OSVM	${0.03}_{\pm 0.02}$	${0.41}_{\pm 0.05}$	${0.05}_{\pm 0.03}$	${0.49}_{\pm 0.04}$	${0.66}_{\pm 0.02}$
	ICA	AE	${0.11}_{\pm 0.02}$	${0.38}_{\pm 0.04}$	${0.23}_{\pm 0.03}$	${0.47}_{\pm 0.03}$	${0.69}_{\pm 0.02}$
		OSVM	${0.00}_{\pm 0.01}$	${0.20}_{\pm 0.04}$	${0.22}_{\pm 0.02}$	${0.42}_{\pm 0.03}$	${0.60}_{\pm 0.02}$
	AE	OSVM	${0.20}_{\pm 0.03}$	${0.51}_{\pm 0.03}$	${0.06}_{\pm 0.03}$	${0.52}_{\pm 0.03}$	${0.72}_{\pm 0.02}$
	${OART}^{U}$	$Σ$	${0.04}_{\pm 0.02}$	${0.35}_{\pm 0.05}$	${0.22}_{\pm 0.05}$	${0.63}_{\pm 0.03}$	${0.70}_{\pm 0.02}$
		AE	${0.26}_{\pm 0.05}$	${0.73}_{\pm 0.04}$	${0.10}_{\pm 0.05}$	${0.78}_{\pm 0.03}$	${0.81}_{\pm 0.01}$
		OSVM	${0.11}_{\pm 0.04}$	${0.57}_{\pm 0.05}$	${0.08}_{\pm 0.03}$	${0.60}_{\pm 0.04}$	${0.74}_{\pm 0.02}$
	${OART}^{\tilde{U}}$	$Σ$	${0.24}_{\pm 0.02}$	${0.58}_{\pm 0.05}$	${0.24}_{\pm 0.03}$	${0.69}_{\pm 0.03}$	${0.78}_{\pm 0.02}$
		AE	${0.34}_{\pm 0.02}$	${0.67}_{\pm 0.05}$	${0.11}_{\pm 0.05}$	${0.75}_{\pm 0.03}$	${0.80}_{\pm 0.01}$
		OSVM	${0.35}_{\pm 0.03}$	${0.67}_{\pm 0.04}$	${0.05}_{\pm 0.02}$	${0.66}_{\pm 0.04}$	${0.78}_{\pm 0.01}$

Open in a new tab

Table 3. Prostate: 95% confidence intervals for performance metrics. Highlight indicates best performance in each dataset.

	Features	Classifier	${Spec}_{100}$	${Spec}_{80}$	${Sens}_{100}$	${Sens}_{80}$	AUC
Binary	original	LR	0	${0.03}_{\pm 0.05}$	${0.06}_{\pm 0.04}$	${0.53}_{\pm 0.04}$	${0.70}_{\pm 0.02}$
		RF	${0.25}_{\pm 0.03}$	${0.51}_{\pm 0.07}$	${0.35}_{\pm 0.06}$	${0.70}_{\pm 0.04}$	${0.79}_{\pm 0.02}$
		SVM	0	0	${0.17}_{\pm 0.06}$	${0.44}_{\pm 0.03}$	${0.71}_{\pm 0.02}$
		NB	0	0	0	${0.29}_{\pm 0.03}$	${0.61}_{\pm 0.02}$
		NN	${0.14}_{\pm 0.03}$	${0.28}_{\pm 0.03}$	${0.34}_{\pm 0.03}$	${0.69}_{\pm 0.02}$	${0.81}_{\pm 0.03}$
	LASSO	$LR$	0	0	0	${0.32}_{\pm 0.06}$	${0.63}_{\pm 0.07}$
	PCA	$LR$	0	0	0	${0.45}_{\pm 0.03}$	${0.69}_{\pm 0.03}$
		RF	${0.06}_{\pm 0.03}$	${0.11}_{\pm 0.03}$	0	${0.23}_{\pm 0.02}$	${0.65}_{\pm 0.03}$
		SVM	0	0	${0.05}_{\pm 0.03}$	${0.29}_{\pm 0.02}$	${0.67}_{\pm 0.03}$
		NB	0	0	0	${0.14}_{\pm 0.02}$	${0.60}_{\pm 0.02}$
		NN	0	0	${0.09}_{\pm 0.03}$	${0.29}_{\pm 0.02}$	${0.67}_{\pm 0.02}$
	ICA	$LR$	0	0	${0.16}_{\pm 0.03}$	${0.47}_{\pm 0.02}$	${0.72}_{\pm 0.02}$
		RF	${0.07}_{\pm 0.02}$	${0.12}_{\pm 0.03}$	${0.13}_{\pm 0.02}$	${0.39}_{\pm 0.02}$	${0.69}_{\pm 0.03}$
		SVM	0	0	${0.13}_{\pm 0.03}$	${0.39}_{\pm 0.02}$	${0.70}_{\pm 0.04}$
		$NB$	0	${0.05}_{\pm 0.03}$	${0.07}_{\pm 0.06}$	${0.53}_{\pm 0.03}$	${0.73}_{\pm 0.02}$
		NN	0	0	0	${0.35}_{\pm 0.02}$	${0.67}_{\pm 0.03}$
	AE	$LR$	0	0	${0.23}_{\pm 0.04}$	${0.57}_{\pm 0.03}$	${0.75}_{\pm 0.02}$
		RF	${0.10}_{\pm 0.02}$	${0.16}_{\pm 0.03}$	${0.12}_{\pm 0.03}$	${0.41}_{\pm 0.03}$	${0.67}_{\pm 0.03}$
		SVM	${0.21}_{\pm 0.03}$	${0.48}_{\pm 0.03}$	${0.31}_{\pm 0.03}$	${0.75}_{\pm 0.02}$	${0.80}_{\pm 0.03}$
		$NB$	0	0	${0.12}_{\pm 0.06}$	${0.39}_{\pm 0.04}$	${0.73}_{\pm 0.02}$
		NN	${0.24}_{\pm 0.02}$	${0.41}_{\pm 0.03}$	${0.33}_{\pm 0.02}$	${0.60}_{\pm 0.02}$	${0.78}_{\pm 0.03}$
OCC	original	$AE$	${0.26}_{\pm 0.03}$	${0.55}_{\pm 0.05}$	${0.53}_{\pm 0.03}$	${0.69}_{\pm 0.03}$	${0.81}_{\pm 0.02}$
		OSVM	${0.27}_{\pm 0.02}$	${0.48}_{\pm 0.04}$	${0.61}_{\pm 0.03}$	${0.65}_{\pm 0.03}$	${0.78}_{\pm 0.02}$
	PCA	AE	${0.02}_{\pm 0.01}$	${0.20}_{\pm 0.04}$	${0.19}_{\pm 0.02}$	${0.39}_{\pm 0.03}$	${0.59}_{\pm 0.02}$
		OSVM	${0.16}_{\pm 0.03}$	${0.40}_{\pm 0.04}$	${0.23}_{\pm 0.03}$	${0.36}_{\pm 0.03}$	${0.65}_{\pm 0.02}$
	ICA	AE	${0.01}_{\pm 0.01}$	${0.11}_{\pm 0.03}$	${0.36}_{\pm 0.04}$	${0.49}_{\pm 0.03}$	${0.59}_{\pm 0.02}$
		OSVM	${0.05}_{\pm 0.03}$	${0.26}_{\pm 0.04}$	${0.09}_{\pm 0.03}$	${0.15}_{\pm 0.03}$	${0.53}_{\pm 0.02}$
	AE	OSVM	${0.15}_{\pm 0.04}$	${0.46}_{\pm 0.04}$	${0.23}_{\pm 0.03}$	${0.40}_{\pm 0.03}$	${0.68}_{\pm 0.02}$
	${OART}^{U}$	$Σ$	${0.23}_{\pm 0.04}$	${0.63}_{\pm 0.06}$	${0.55}_{\pm 0.04}$	${0.74}_{\pm 0.04}$	${0.81}_{\pm 0.03}$
		AE	${0.80}_{\pm 0.01}$	${0.80}_{\pm 0.01}$	${0.00}_{\pm 0.00}$	${0.75}_{\pm 0.05}$	${0.86}_{\pm 0.01}$
		OSVM	${0.48}_{\pm 0.02}$	${0.67}_{\pm 0.04}$	${0.58}_{\pm 0.03}$	${0.71}_{\pm 0.03}$	${0.81}_{\pm 0.01}$
	${OART}^{\tilde{U}}$	$Σ$	${0.32}_{\pm 0.05}$	${0.74}_{\pm 0.04}$	${0.47}_{\pm 0.04}$	${0.76}_{\pm 0.03}$	${0.86}_{\pm 0.01}$
		AE	${0.81}_{\pm 0.01}$	${0.81}_{\pm 0.01}$	${0.00}_{\pm 0.00}$	${0.82}_{\pm 0.05}$	${0.87}_{\pm 0.01}$
		OSVM	${0.29}_{\pm 0.06}$	${0.71}_{\pm 0.04}$	${0.50}_{\pm 0.03}$	${0.71}_{\pm 0.03}$	${0.85}_{\pm 0.02}$

Open in a new tab

In binary classification, testing metrics are derived through leave-one-out cross validation, where training occurs on all data from both classes except for the sample currently being tested. Consequently, these metrics are not directly comparable to those of One-Class Classification (OCC), where only 80% of the negative samples are used. Nevertheless, the one-class scenario involves utilizing less training data for each test sample, making it technically more challenging.

The performance comparison in our study includes five metrics that assess the machine learning model’s performance at extreme misclassification costs (100%) and at the average human vigilance level in industrial Quality Assurance (QA) settings (80% [46]). Specifically, we consider two metrics: $Sens 80$ and $Spec 80$ .

${Sens}_{80}$ represents the maximum sensitivity of the model when it maintains at least 80% specificity. It indicates the highest percentage of unacceptable plans that can be correctly detected while ensuring that at least 80% of acceptable plans are identified accurately. This metric allows us to evaluate the model’s performance in scenarios where maintaining high specificity is crucial.

Similarly, $Spec 80$ represents the maximum specificity of the model when it maintains at least 80% sensitivity. It measures the percentage of acceptable plans that can be accepted without human review while minimizing the risk of accepting unacceptable plans. In the context of real-time Quality Assurance (QA), $Spec 80$ is particularly important, as it indicates the level of confidence in accepting plans without manual inspection.

We also consider $Sens 100$ and $Spec 100$ , which represent the maximum sensitivity and specificity, respectively, when the model is required to be 100% specific or sensitive. These metrics offer insights into the model’s performance under strict conditions where there is no tolerance for accepting misclassified plans.

Additionally, AUC provides an overall comparison by measuring the area under the operating characteristic curve. Testing classification scores are bootstrapped with replacement 100 times and a 95% t-distribution confidence interval is calculated for the mean of each of the performance metrics. The ROC curves and performance metrics are shown in Figs 7 and 8.

Fig 7 — The highest performing algorithm in black and the rest are grey. The specific OART implemention illustrated is the better of ${OART}^{U}$ and ${OART}^{\tilde{U}}$ .

Fig 8 — The highest performing algorithm in black and the rest are grey. The specific OART implemention illustrated is the better of ${OART}^{U}$ and ${OART}^{\tilde{U}}$ .

OART-based methods as feature extractors improved the performance of AE and OSVM classifiers. OART’s improvements are the highest among other feature extractors for both breast and prostate datasets. The weak performances of PCA and ICA were expected as both of these methods exploit the variation in the data, while one-class classification is intrinsically based on the similarities of the existing training samples from the one class.

Note that ${OART}^{\tilde{U}} + Σ$ achieves higher AUC than ${OART}^{U} + Σ$ (0.78 compared to 0.70 in breast, and 0.86 compared to 0.81 for prostate) by focusing on the more sensitive half of OART learners. Overall, $OART + Σ$ slightly underperforms the top algorithm, $OART + AE$ (0.78 compared to 0.70 in breast, and 0.86 compared to 0.81 for prostate), but it provides feature-wise classification interpretability for clinical purposes, as described in Sect 4.2.

4.2 Clinical interpretation

Both autoencoders and support vector machines perform a joint analysis of input features for classification. Their classification scores cannot be directly traced back to the individual input features for interpretation. More specifically, when AE or OSVM classify a plan as unacceptable, they cannot tell which features caused the classifier to classify as such. In a clinical setting, however, transparency and interpretability of classification results are invaluable features for a classifier to have as they can help clinicians initiate targeted improvements to the unacceptable plans.

In OART+ $Σ$ , first the set of stable features and their memory values are extracted from the training plans, and then the classification score is the sum of deviations from memory values of stable feature. Therefore, it provides a unique capability in interpretation of the classification results in terms of the input features. Not only can OART+ $Σ$ tell how much each feature (or feature category) of each plan is contributing to its classification score, but also it can suggest automatic improvements by providing a comparison of the stable long-term memory of the features with those of that plan.

For simplicity, we aggregated the features based on the three feature categories described in Sect 2 (ROI, dose, and machine features), but the methodology allows for individual feature analysis as well. Since the performance of ${OART}^{\tilde{U}} + Σ$ is better than ${OART}^{U} + Σ$ , we show the results from $\tilde{U}$ . The average deviations indicates the average feature-wise deviation from memory values:

\frac{1}{| \tilde{U} | \times | F_{σ} |} \sum_{σ \in \tilde{U}} \sum_{i \in F_{σ}} Δ_{σ} (x)

Figs 9 and 10 show the breakdown of overall deviations according to feature categories.

For example, the first five unacceptable testing plans of the prostate dataset (plan numbers 77 to 81) exhibit large deviations on the machine features. The next step for a clinician to improve the quality of the plan may then be to examine machine features for anomalies.

5 Discussion

As expected, one-class classifiers generally outperformed binary classifiers, and our OART approach was the best performer in all metrics for both the breast and prostate datasets. This performance is in spite of the fact that the one-class setting suffers from lower training/testing ratios, as mentioned in Sect 4, indicating that a one-class setting is the better anomaly detector. One previous binary classifier [2] used sophisticated class-imbalance mitigation techniques (namely, guided undersampling) to achieve better AUCs for these datasets, but it is theoretically limited in that it needs anomalies to learn, and may not perform as well in the presence of other types of unacceptable plans.

When OART is used as a classifier by summing deviations from stored memory values ( $OART + Σ$ ), though it is slightly less accurate than $OART + AE$ (Figs 7 and 8), it provides transparently interpretable scores, contrary to AE and OSVM (Figs 9 and 10). Moreover, although $OART + Σ$ commits to individual feature analysis for transparency, we included joint features (such as pairwise ROI and joint dose and shape features) in the original input features to accommodate some clinically-relevant mixed effects between features.

The difference in sparsity between the breast (more sparse) and prostate datasets shows that sparser datasets are more challenging in a one-class setting. Specifically, it becomes especially difficult to learn from $\approx$ 100 samples with $\approx$ 1700 features. A comparison of AUCs (Tables 2 and 3) clearly shows breast was a harder dataset supports this observation.

OART relies on distance parameter $σ$ , but as a result of the one-class setting, it cannot be treated as a hyperparameter and tuned based on performance on the validation set. However, we did expect that OART benefit from upper-bounded distance parameter; As it would exclude classification scores coming from too high values of $σ$ , where all features are stable enough to be included in the memory and practically no feature reduction is performed. The results showed that, especially in the case of $OART + Σ$ , where no additional mixed feature classifier is applied, smaller $σ$ values indeed provide higher AUC. This observation provides additional insights into the interpretation of the distance parameter, and suggests that the accuracy may improve as a result of using variable $σ$ values for each feature.

Our results indicate that an automated QA can instantaneously accept up to 35% of breast and 81% of prostate plans, with no risk of accepting an unacceptable plan by mistake. Although implementation of such automated QA in a clinical setting requires further examination and compare between its accuracy and speed with those of human QA, our findings shed light on the sort of benefits that automation may provide through proper data storage and analytics.

6 Conclusions

We developed a feature extraction method, where features with stable values are selected into the memory, and the value of each selected feature is replaced by its deviation from the long-term memory of that feature. This method requires samples from only one class and is therefore suitable for such anomaly detection problems as quality assurance. Automated quality assurance, once developed to be reliable enough, offers greater speed and cost efficiency than human QA, saving valuable time for highly skilled personnel to focus on research.

The responsible use of AI necessitates the establishment of governance frameworks and compliance measures. Global AI policies and regulations are being developed, with a focus on sensitive areas such as healthcare[47]. Organizations are forming AI committees to guide decision-making and provide oversight. AI risk management frameworks help identify and mitigate potential risks throughout the AI lifecycle. Adhering to AI trust principles, such as integrity, explainability, fairness, and resilience, is crucial. Our explainable algorithm aligns with these governance frameworks and compliance requirements. Its transparency and interpretability enable stakeholders, including regulatory bodies, to understand and verify its decision-making process, fostering trust and supporting compliance with AI policies. By incorporating these frameworks and an explainable algorithm, our overall approach demonstrates a commitment to responsible and accountable AI deployment.

In this work, we considered deviations of equal magnitude as equally-significant evidence for anomalous behaviour. The future direction for this research may focus on modelling deviations probabilistically. Specifically, depending on the empirical distribution of feature values, an equal amount of deviation may not always imply an equal probability of an anomaly. A Bayesian approach, learning priors from historical data, can estimate the likelihood of a plan being acceptable or unacceptable.

The empirical results obtained from our study are promising, demonstrating the effectiveness of the our method from an empirical perspective. Future work may further validate the generalizability of these findings by applying the method to additional cancer types and data from various institutions. However, further theoretical analysis is needed to establish the convergence properties of our specific ART adaptation. While existing mathematical proofs demonstrate the convergence of ART networks to stable categories under certain conditions [48], additional theoretical work is needed to validate the convergence of our approach and applicability to other RT sites.

Supporting information

S1 Archive

Archive.zip containing plotting data and code used to generate figures and results reported in the manuscript.

(ZIP)

pone.0321968.s001.zip^{(57.3KB, zip)}

Data Availability

Data cannot be shared publicly. We use two clinically curated radiotherapy datasets of breast and prostate cancer patients. The datasets were obtained over three years from Princess Margaret Cancer Centre (Toronto, Canada) under an institutional ethics-approved research protocol. We obtained Research Ethics Board (REB) approval from the University of Toronto. The protocol reference numbers are 12-5225, and 14-8165.

Funding Statement

No authors have competing interests.

References

1.Brenner DR, Weir HK, Demers AA, Ellison LF, Louzado C, Shaw A, et al. Projected estimates of cancer in Canada in 2020. CMAJ. 2020;192(9):E199–E205. doi: 10.1503/cmaj.191292 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Brown WE, Sung K, Aleman DM, Moreno-Centeno E, Purdie TG, McIntosh CJ. Guided undersampling classification for automated radiation therapy quality assurance of prostate cancer treatment. Med Phys. 2018;45(4):1306–16. doi: 10.1002/mp.12757 [DOI] [PubMed] [Google Scholar]
3.Berry SL, Boczkowski A, Ma R, Mechalakos J, Hunt M. Interobserver variability in radiation therapy plan output: results of a single-institution study. Pract Radiat Oncol. 2016;6(6):442–9. doi: 10.1016/j.prro.2016.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Nelms B, Robinson G, Magliari J, Roeske J. MO-EE-A2-01: On the dosimetric/DVH impact of variation in organ delineation: A multi-institutional study and proposed quality system. Med Phys. 2010;37(6 Part 26):3348. doi: 10.1118/1.3469085 [DOI] [Google Scholar]
5.McNiven AL, Sharpe MB, Purdie TG. A new metric for assessing IMRT modulation complexity and plan deliverability. Med Phys. 2010;37(2):505–15. doi: 10.1118/1.3276775 [DOI] [PubMed] [Google Scholar]
6.Engberg L, Forsgren A, Eriksson K, Hårdemark B. Explicit optimization of plan quality measures in intensity-modulated radiation therapy treatment planning. Med Phys. 2017;44(6):2045–53. doi: 10.1002/mp.12146 [DOI] [PubMed] [Google Scholar]
7.Zhu X, Ge Y, Li T, Thongphiew D, Yin F-F, Wu QJ. A planning quality evaluation tool for prostate adaptive IMRT based on machine learning. Med Phys. 2011;38(2):719–26. doi: 10.1118/1.3539749 [DOI] [PubMed] [Google Scholar]
8.Mayo CS, Yao J, Eisbruch A, Balter JM, Litzenberg DW, Matuszak MM, et al. Incorporating big data into treatment plan evaluation: Development of statistical DVH metrics and visualization dashboards. Adv Radiat Oncol. 2017;2(3):503–14. doi: 10.1016/j.adro.2017.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Craft D, Süss P, Bortfeld T. The tradeoff between treatment plan quality and required number of monitor units in intensity-modulated radiotherapy. Int J Radiat Oncol Biol Phys. 2007;67(5):1596–605. doi: 10.1016/j.ijrobp.2006.11.034 [DOI] [PubMed] [Google Scholar]
10.Chawla N. In: Maimon O, Rokach L, editors. Data mining for imbalanced datasets: An overview. Boston, MA: Springer US; 2010. p. 875–86. [Google Scholar]
11.Chawla N In: Maimon O, Rokach L, editors. Data mining for imbalanced datasets: An overview. Boston, MA: Springer US; 2010. p. 875–86. [Google Scholar]
12.Oza P, Patel VM. One-class convolutional neural network. IEEE Signal Process Lett. 2019;26(2):277–81. doi: 10.1109/lsp.2018.2889273 [DOI] [Google Scholar]
13.Perera P, Patel VM. Learning deep features for one-class classification. IEEE Trans Image Process. 2019;28(11):5450–63. doi: 10.1109/TIP.2019.2917862 [DOI] [PubMed] [Google Scholar]
14.Sabokrou M, Fathy M, Zhao G, Adeli E. Deep end-to-end one-class classifier. IEEE Trans Neural Netw Learn Syst. 2021;32(2):675–84. doi: 10.1109/TNNLS.2020.2979049 [DOI] [PubMed] [Google Scholar]
15.Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Comput. 2001;13(7):1443–71. doi: 10.1162/089976601750264965 [DOI] [PubMed] [Google Scholar]
16.Deng L, Seltzer M, Yu D, Acero A, Mohamed Ar, Hinton G. Binary coding of speech spectrograms using a deep auto-encoder. In: Interspeech 2010. International Speech Communication Associatio; n; 2010. [Google Scholar]
17.Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci. 1901;2(11):559–72. doi: 10.1080/14786440109462720 [DOI] [Google Scholar]
18.Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Netw. 2000;13(4–5):411–30. doi: 10.1016/s0893-6080(00)00026-5 [DOI] [PubMed] [Google Scholar]
19.Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning – ICML ’08; 2008. pp. 1096–103. doi: 10.1145/1390156.1390294 [DOI] [Google Scholar]
20.Dahl G, Ranzato MA, Mohamed AR, Hinton GE. Phone recognition with the mean-covariance restricted Boltzmann machine. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, editors. Advances in neural information processing systems, vol. 23. Curran Associates, Inc.; 2010. [Google Scholar]
21.Leng Q, Qi H, Miao J, Zhu W, Su G. One-class classification with extreme learning machine. Math Probl Eng. 2015;2015(1):412957. [Google Scholar]
22.Tax DM, Duin RP. Data domain description using support vectors. In: European symposium on artificial neural networks, vol. 99; 1999. p. 251–6. [Google Scholar]
23.Erfani SM, Rajasegarar S, Karunasekera S, Leckie C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn. 2016;58:121–34. doi: 10.1016/j.patcog.2016.03.028 [DOI] [Google Scholar]
24.Huang FJ, LeCun Y. Large-scale learning with SVM and convolutional for generic object categorization. In: 2006 IEEE computer society conference on computer vision and pattern recognition – Volume 1 (CVPR’06), vol. 1; 2006. pp. 284–91. doi: 10.1109/cvpr.2006.164 [DOI] [Google Scholar]
25.Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui SA, Binder A, et al. Deep one-class classification. In: Dy J, Krause A, editors. Proceedings of the 35th international conference on machine learning. vol. 80 of proceedings of machine learning research. PMLR; 2018. pp. 4393–402. [Google Scholar]
26.Hu W, Wang M, Qin Q, Ma J, Liu B. HRN: A holistic approach to one class learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems, vol. 33. Curran Associates, Inc.; 2020. pp. 19111–24. [Google Scholar]
27.Grossberg S. Competitive learning: From interactive activation to adaptive resonance. Cognit Sci. 1987;11(1):23–63. doi: 10.1111/j.1551-6708.1987.tb00862.x [DOI] [Google Scholar]
28.Brito da Silva LE, Elnabarawy I, Wunsch DC 2nd. A survey of adaptive resonance theory neural network models for engineering applications. Neural Netw. 2019;120:167–203. doi: 10.1016/j.neunet.2019.09.012 [DOI] [PubMed] [Google Scholar]
29.Cao Y, Wu J. Dynamics of projective adaptive resonance theory model: The foundation of PART algorithm. IEEE Trans Neural Netw. 2004;15(2):245–60. doi: 10.1109/TNN.2004.824261 [DOI] [PubMed] [Google Scholar]
30.Habibkhani HK. Supervised projective adaptive resonance theory with top scoring pair (part-tsp) [Master’s thesis]. York University; 2012. [Google Scholar]
31.Kamran H, Aleman DM, McIntosh C, Purdie TG. SuPART: Supervised projective adapted resonance theory for automatic quality assurance approval of radiotherapy treatment plans. Phys Med Biol. 2022;67(6):065004. doi: 10.1088/361-6560/ac568f [DOI] [PubMed] [Google Scholar]
32.Seiffertt J. Adaptive resonance theory in the time scales calculus. Neural Netw. 2019;120:32–9. doi: 10.1016/j.neunet.2019.08.010 [DOI] [PubMed] [Google Scholar]
33.Meng L, Tan A-H, Miao C. Salience-aware adaptive resonance theory for large-scale sparse data clustering. Neural Netw. 2019;120:143–57. doi: 10.1016/j.neunet.2019.09.014 [DOI] [PubMed] [Google Scholar]
34.Grossberg S. Adaptive pattern classification and universal recoding. I. Parallel development and coding of neural feature detectors. Biol Cybern. 1976;23(3):121–34. doi: 10.1007/BF00344744 [DOI] [PubMed] [Google Scholar]
35.World Cancer Research Fund International. Cancer Trends; 2022. [cited 2025 Feb 4]. Available from: https://www.wcrf.org/cancer-trends/worldwide-cancer-data/ [Google Scholar]
36.Purdie T G, McIntosh C J, Svistoun I. Method and system for automated quality assurance in radiation therapy; 2023. US Patent 11,735,309. [cited 2025 Feb 4]. Available from: https://patents.google.com/patent/US11735309B2 [Google Scholar]
37.Drzymala RE, Mohan R, Brewster L, Chu J, Goitein M, Harms W, et al. Dose-volume histograms. Int J Radiat Oncol Biol Phys. 1991;21(1):71–8. doi: 10.1016/0360-3016(91)90168-4 [DOI] [PubMed] [Google Scholar]
38.Coates A, Ng AY. Learning feature representations with k-means. In: Neural networks: Tricks of the trade: 2nd ed. Springer; 2012. p. 561–80. [Google Scholar]
39.Brito da Silva LE, Elnabarawy I, Wunsch DC 2nd. Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence. Neural Netw. 2020;121:208–28. doi: 10.1016/j.neunet.2019.08.033 [DOI] [PubMed] [Google Scholar]
40.Zhang Y. A better autoencoder for image: Convolutional autoencoder. In: ICONIP17-DCEC; 2018. [cited 2024 Oct 28]. Available from: http://users.cecs.anu.edu.au/Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_58.pdf [Google Scholar]
41.Kieu T, Yang B, Guo C, Jensen CS. Outlier detection for time series with recurrent autoencoder ensembles. In: IJCAI; 2019. p. 2725–32. [Google Scholar]
42.Su Y, Marinoni A, Li J, Plaza J, Gamba P. Stacked nonnegative sparse autoencoders for robust hyperspectral unmixing. IEEE Geosci Remote Sens Lett. 2018;15(9):1427–31. doi: 10.1109/lgrs.2018.2841400 [DOI] [Google Scholar]
43.Rifai S, Bengio Y, Dauphin Y, Vincent P. A generative process for sampling contractive auto-encoders; arXiv preprint 2012. arXiv:1206.6434. [cited 2025 Feb 4]. Available from: https://arxiv.org/abs/1206.6434 [Google Scholar]
44.Miani M, Warburg F, Moreno-Muñoz P, Skafte N, Hauberg S. Laplacian autoencoders for learning stochastic representations. Adv Neural Inform Process Syst. 2022;35:21059–72. [Google Scholar]
45.Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory; 1992. p. 144–52. doi: 10.1145/130385.130401 [DOI] [Google Scholar]
46.Juran J, Godfrey AB. Quality handbook. Republished McGraw-Hill; 1999. p. 173–8. [Google Scholar]
47.Commission E. Laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts. Eur Comm. 2021;106:1–108. [Google Scholar]
48.Grossberg S. Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw. 2013;37:1–47. doi: 10.1016/j.neunet.2012.09.017 [DOI] [PubMed] [Google Scholar]
49.Pes B. Learning from high-dimensional biomedical datasets: the issue of class imbalance. IEEE Access. 2020;8:13527–40. doi: 10.1109/access.2020.2966296 [DOI] [Google Scholar]

PLoS One. 2025 Jun 10;20(6):e0321968. doi: 10.1371/journal.pone.0321968.r001

Author response to Decision Letter 0

1 Apr 2024

PLoS One. doi: 10.1371/journal.pone.0321968.r002

Decision Letter 0

Zeyar Aung

7 May 2024

PONE-D-24-09697Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurancePLOS ONE

Dear Dr. Kamran,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Major revision: * Please address the comments of the reviewers carefully in the revised submission. Thank you.

==============================

Please submit your revised manuscript by Jun 21 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Zeyar Aung

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please ensure you have stated the date of data collection in the Methods section of your manuscript text to fully comply with the PLOS ONE policy on reporting research involving human participants. This information is currently provided only in the Human Participants Research Checklist, which will not be published with your manuscript files.

3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

4. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

Before we proceed with your manuscript, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see

https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Looking into the manuscript, it is clear that the topic is timely and has potential implications to participants in the stock market. However, there are some significant issues that need to be addressed before the paper is considered for publication.

1. Introduction. The structure of the paper should be improved. In particular, the introduction section should be more structured, faster, and cover the key five elements: motivation, objectives, literature gap and paper's contributions to the literature, justification of the methodology, and summary of the results. Furthermore, in the introduction section, the authors clearly describe the added value of their analysis and the positioning in light of previous findings.

2. Related Work. I suggest the authors to analyze more papers that are recently published. Moreover, make sure that literature review is linked to a systematic.

3. The method used should be compared with various methods and the results should be strengthened. The methodological explanation of the method used should be made with the help of assumptions and formulas.

5. The results should be discussed more thoroughly in light of the existing literature. This is a very important point to address.

6. In the conclusion section, a more detailed discussion of the policy implications would make the paper richer and more informative for investors and policymakers.

Reviewer #2: Provide more examples or case studies to demonstrate the practical implementation of the proposed model in clinical settings. This would help in illustrating its applicability and effectiveness in real-world scenarios.

Expand the discussion on the limitations and challenges of your approach, especially focusing on situations where the model may face difficulties due to the unique characteristics of individual treatment plans or variations in equipment across different healthcare facilities.

Include a more detailed comparative analysis with other anomaly detection models that have been applied in similar healthcare settings. Highlighting the specific improvements your model offers over these existing methods would provide a clearer context for your contributions.

Discuss the model's adaptability to other types of quality assurance tasks outside of radiotherapy, potentially broadening the scope of your findings.

Enhance the explanation of how the adaptive resonance features work in layman's terms, to make the paper more accessible to readers who are not specialists in machine learning or medical physics.

Consider the ethical implications of automating such critical aspects of healthcare, particularly the reliance on algorithmic decisions in clinical environments. Discuss safeguards that could be implemented to prevent potential negative outcomes.

Propose future directions for this research, especially in terms of integrating this technology with other AI-driven tools used in healthcare, which could lead to a more comprehensive system for automated quality assurance across various treatment modalities.

Cite this https://doi.org/10.1109/ACCESS.2024.3373697 to improve your literature

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Saad Mohamed Darwish

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 10;20(6):e0321968. doi: 10.1371/journal.pone.0321968.r003

Author response to Decision Letter 1

7 Oct 2024

Reviewer 1

1. R1C1: Introduction. The structure of the paper should be improved. In particular, the

introduction section should be more structured, faster, and cover the key five elements: moti-

vation, objectives, literature gap and paper’s contributions to the literature, justification of the

methodology, and summary of the results. Furthermore, in the introduction section, the au-

thors clearly describe the added value of their analysis and the positioning in light of previous

findings.

We significantly re-wrote the Introduction section as you suggested and moved most of the

previous introduction to a new Related Work section. The Related Work section includes an

expanded literature review as requested in comments R1C1, R1C2, R1C4, and R2C3.

2. R1C2: Related Work. I suggest the authors to analyze more papers that are recently published.

Moreover, make sure that literature review is linked to a systematic.

We have updated the literature review with the following additional citations, which include

several papers from 2023 and 2024: Commission (2021); Gorokhov et al. (2017); Kehayias

et al. (2023); Li et al. (2023); Lundberg and Lee (2017); Ribeiro et al. (2016, 2018); Saab et al.

(2024); Salam et al. (2024); Union (2024)

3. R1C3: The method used should be compared with various methods and the results should be

strengthened. The methodological explanation of the method used should be made with the

help of assumptions and formulas.

We added a simple explanation of the main idea behind Adaptive Resonance Theory at the

beginning of the Methods section. Additionally, we extensively compare the various methods

and their results in the Discussion section.

4. R1C4: The results should be discussed more thoroughly in light of the existing literature. This

is a very important point to address.

We included four additional papers: Altman et al. (2015); Jensen et al. (2018); Kalendralis

et al. (2023); Kehayias et al. (2023). Our study stands out by using a comprehensive dataset

with over 100 variables and real erroneous plans, unlike previous works that focused on specific

error types, lower-dimensional data, or simulated errors. Our one-class approach overcomes

the limitations of requiring erroneous plans for training, providing a more robust and adapt-

able framework for detecting a wide range of unacceptable plans in radiation therapy. This

discussion is highlighted in blue on page 17 of the revised document.

5. R1C5: In the conclusion section, a more detailed discussion of the policy implications would

make the paper richer and more informative for investors and policymakers.

We added explanation on page 20, in accordance with the most prominent AI regulation

document, AI Act published by European Union Union (2024).

Reviewer 2

1. R2C1: Provide more examples or case studies to demonstrate the practical implementation of

the proposed model in clinical settings. This would help in illustrating its applicability and

effectiveness in real-world scenarios.

We added discussions about implementation and limitations in the Discussion section on page

20. Additional studies have been included in the discussion section (See R1C3 and R1C4 on

page 17), comparing their approaches, their results, and some of the implications of the clinical

implementation Kalendralis et al. (2023).

2. R2C2: Expand the discussion on the limitations and challenges of your approach, especially

focusing on situations where the model may face difficulties due to the unique characteristics

of individual treatment plans or variations in equipment across different healthcare facilities.

We added additional discussion of implementation challenges and limitations in the Discussion

section.

3. R2C3: Include a more detailed comparative analysis with other anomaly detection models that

have been applied in similar healthcare settings. Highlighting the specific improvements your

model offers over these existing methods would provide a clearer context for your contributions.

The Related Work section now includes a comparison with alternative methods, highlighting

that deep learning approaches often require additional algorithms for interpretation, lacking

inherent interpretability.

4. R2C4: Discuss the model’s adaptability to other types of quality assurance tasks outside of

radiotherapy, potentially broadening the scope of your findings.

We added a paragraph about other possible applications in the Discussion section.

5. R2C5: Enhance the explanation of how the adaptive resonance features work in layman’s terms,

to make the paper more accessible to readers who are not specialists in machine learning or

medical physics.

See R1C3.

6. R2C6: Consider the ethical implications of automating such critical aspects of healthcare,

particularly the reliance on algorithmic decisions in clinical environments. Discuss safeguards

that could be implemented to prevent potential negative outcomes.

See R1C5.

7. R2C7: Propose future directions for this research, especially in terms of integrating this tech-

nology with other AI-driven tools used in healthcare, which could lead to a more comprehensive

system for automated quality assurance across various treatment modalities.

We added a paragraph about future directions in the Conclusion section.

8. R2C8: Cite this https://doi.org/10.1109/ACCESS.2024.3373697 to improve your litera-

ture.

This paper (Salam et al., 2024) was cited as part of R2C4.

References

MB Altman, JA Kavanaugh, HO Wooten, OL Green, TA DeWees, H Gay, WL Thorstad, H Li, and

S Mutic. A framework for automated contour quality assurance in radiation therapy including

adaptive techniques. Physics in Medicine & Biology, 60(13):5199, 2015.

European Commission. Laying down harmonised rules on artificial intelligence (artificial intelligence

act) and amending certain union legislative acts. Eur Comm, 106:1–108, 2021.

Oleg Gorokhov, Mikhail Petrovskiy, and Igor Mashechkin. Convolutional neural networks for unsu-

pervised anomaly detection in text data. In International Conference on Intelligent Data Engi-

neering and Automated Learning, pages 500–507. Springer, 2017.

N Jensen, K Boye, S Damkjær, and I Wahlstedt. Impact of automation in external beam radiation

therapy treatment plan quality control on error rates and productivity. International Journal of

Radiation Oncology, Biology, Physics, 102(3):S149–S150, 2018.

Petros Kalendralis, Samuel MH Luk, Richard Canters, Denis Eyssen, Ana Vaniqui, Cecile Wolfs,

Lars Murrer, Wouter van Elmpt, Alan M Kalet, Andre Dekker, et al. Automatic quality assurance

of radiotherapy treatment plans using bayesian networks: A multi-institutional study. Frontiers

in oncology, 13:1099994, 2023.

CE Kehayias, D Bontempi, S Quirk, S Friesen, JS Bredfeldt, MA Huynh, H Aerts, RH Mak, and

CV Guthier. Deep learning-based automated quality assurance for palliative spinal treatment

planning in radiotherapy. International Journal of Radiation Oncology, Biology, Physics, 117(2):

S50, 2023.

Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection.

ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023.

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances

in neural information processing systems, 30, 2017.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the

predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference

on knowledge discovery and data mining, pages 1135–1144, 2016.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision model-agnostic

explanations. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018.

Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang,

Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike

Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil

Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean baptiste Alayrac,

Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh

Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He,

Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud

Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa

Dominowska, S. M. Ali Eslami, Katherine Chou, Claire Cui, Oriol Vinyals, Koray Kavukcuoglu,

James Manyika, Jeff Dean, Demis Hassabis, Yossi Matias, Dale Webster, Joelle Barral, Greg Cor-

rado, Christopher Semturs, S. Sara Mahdavi, Juraj Gottweis, Alan Karthikesalingam, and Vivek

Natarajan. Capabilities of gemini models in medicine, 2024.

Abdu Salam, Mohammad Abrar, Farhan Amin, Faizan Ullah, Izaz Ahmad Khan, Bader Fahad

Alkhamees, and Hussain AlSalman. Securing smart manufacturing by integrating anomaly de-

tection with zero-knowledge proofs. IEEE Access, 12:36346–36360, 2024. doi: 10.1109/ACCESS.

2024.3373697.

European Union. Artificial intelligence act. european parliamentary research service, 2023, 2024.

Attachment

Submitted filename: PLOS_Response_newfile.pdf

pone.0321968.s002.pdf^{(78.5KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0321968.r004

Decision Letter 1

Zeyar Aung

1 Dec 2024

PONE-D-24-09697R1Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurancePLOS ONE

Dear Dr. Kamran,

Minor revision: Some more revisions are still required. Please address the comments by the reviewers.

Please submit your revised manuscript by Jan 15 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Zeyar Aung

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: I Don't Know

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #2: The paper needs to check for spelling, typo and grammatical mistakes. The rest of the paper is now ok.

Reviewer #3: 1. Although additional citations were included, the review could benefit from a systematic and thematic organization. There are still areas where the links between referenced studies and the proposed method remain unclear.

2. The inclusion of comparative analysis with alternative methods is appreciated, but the discussion is surface-level. A deeper dive into why the proposed method outperforms others with specific examples would be more impactful.

3. Figures and Tables - Ensure clarity and alignment with the narrative.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

PLoS One. 2025 Jun 10;20(6):e0321968. doi: 10.1371/journal.pone.0321968.r005

Author response to Decision Letter 2

6 Feb 2025

\section*{Journal Requirements:}

``Please review your reference list to ensure that it is complete and correct.''

\textcolor{darkgreen}{We have reviewed the reference list to ensure that it is complete and correct. We have checked all 49 citations, and none of them have been retracted.}

\section*{Response to reviewers}

We have carefully revised our manuscript, "Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurance" (\# PONE-D-24-09697), based on the insightful feedback provided by the editor and reviewers. This letter details our point-by-point responses to each comment, clarifying how we have incorporated the suggestions into the revised manuscript.

For ease of reference

\begin{itemize}

\item Reviewers' comments: Black text

\item Our responses: Dark green text

\item Changes in the manuscript: Marked in blue and tagged with the corresponding comment number (e.g., R2C1 for Reviewer 2, Comment 1).

\end{itemize}

The changes include spelling and grammatical corrections to address Reviewer 2's comments and revisions to the Related Work section based on Reviewer 3's feedback.

\subsection*{Reviewer 2}

Reviewer \#2: The paper needs to check for spelling, typo and grammatical mistakes. The rest of the paper is now ok.

\textcolor{darkgreen}{We appreciate Reviewer 2's careful reading and helpful suggestions for improving the clarity and correctness of our manuscript. We have made the following changes, marked in blue with the code R2C1 in the manuscript:}

\textcolor{darkgreen}{

\begin{itemize}

\item \textbf{In the Abstract:}

\begin{itemize}

\item Improved the flow of the sentence beginning with "This issue" to enhance readability.

\end{itemize}

\item \textbf{In Related Work:}

\begin{itemize}

\item Corrected the spelling of "occurred" (previously "occured").

\item Ensured consistent use of "a small number of plans" throughout.

\item Removed the hyphen in "label imbalanced" for better readability.

\item Changed "in favour of" to "in favor of" to maintain consistency with American English spelling.

\end{itemize}

\item \textbf{In Section 2:}

\begin{itemize}

\item Changed "OSVM" to "one-class SVMs" to maintain consistency with the use of spelled-out forms.

\end{itemize}

\item \textbf{In Section 4:}

\begin{itemize}

\item Removed the redundant word "hyperreferencced."

\item Corrected the capitalization of "Breast dataset" and "Prostate dataset."

\item Corrected the spelling of "features" (previously "featues").

\item Rephrased the sentence about the stopping condition for training iterations for clarity.

\item Rephrased the sentence about the two cases for conciseness.

\item Maintained consistent use of either spelled-out forms or abbreviations for "PCA," "ICA," and "autoencoders."

\item Split the sentence about PCA and ICA components for clarity.

\item Maintained consistent use of the spelled-out form for "autoencoder."

\item Replaced "each with" with "using" for better flow.

\item Replaced "smallest" with "lowest" for better word choice.

\item Rephrased the sentence about the classification based on OART's feature values for clarity.

\end{itemize}

\item \textbf{In Section 5:}

\begin{itemize}

\item Changed "in general" to "generally" for better grammar.

\item Replaced "demonstrates" with "shows" for better word choice.

\item Rephrased the sentence about the hyperparameter for clarity.

\item Replaced "do in fact" with "indeed" for better flow.

\item Rephrased the sentence about automated QA for clarity.

\item Rephrased the sentence about governance frameworks for clarity.

\end{itemize}

\item \textbf{In the Conclusions:}

\begin{itemize}

\item Rephrased the sentence about the speed and cost-efficiency of automated QA for clarity and impact.

\item Rephrased the sentence about future research directions for conciseness.

\item Rephrased the sentence about Bayesian approaches for clarity.

\item Rephrased the sentence about validating the generalizability of findings for clarity.

\item Rephrased the sentence about the need for further theoretical analysis for conciseness.

\item Rephrased the sentence about existing mathematical proofs for clarity.

\item Rephrased the sentence about future research addressing convergence for conciseness.

\item Rephrased the sentence about combining empirical evidence and theoretical foundations for impact.

\end{itemize}

\end{itemize}}

\subsection*{Reviewer 3}

\begin{enumerate}

\item{{\textbf{R3C1}} Although additional citations were included, the review could benefit from a systematic and thematic organization. There are still areas where the links between referenced studies and the proposed method remain unclear. The inclusion of comparative analysis with alternative methods is appreciated, but the discussion is surface-level. A deeper dive into why the proposed method outperforms others with specific examples would be more impactful.}

\textcolor{darkgreen}{We appreciate Reviewer 3's careful reading and helpful suggestions for improving the clarity and correctness of our manuscript. We have made the following changes, marked in blue with the code {\textbf{R3C1}} in the manuscript:}

\textcolor{darkgreen}{We have revised multiple parts of the manuscript to improve the connection between previous studies and our proposed method:

\begin{itemize}

\item {We clarified why handling sparsity in high-dimensional feature spaces is a key challenge in RT planning and explicitly referenced prior work addressing this issue. }

\item {We strengthened the explanation of why binary classification methods are insufficient for RT QA, emphasizing the risk of failing to detect novel anomalies. }

\item {We improved the description of how our OART method incorporates feature selection to enhance anomaly detection while maintaining interpretability.}

\end{itemize}

}

\item {{\textbf{R3C2}} Figures and Tables - Ensure clarity and alignment with the narrative.}

\end{enumerate}

\textcolor{darkgreen}{We have carefully reviewed all figures and tables in the manuscript to ensure their clarity and alignment with the surrounding narrative. This review included checks for proper referencing, clear descriptions, high-resolution images, and accurate labels and explanations. If there are any additional specific issues with the tables and figures, please let us know the details so we can address them effectively.

}

Attachment

Submitted filename: PLOS_Response_Feb25.pdf

pone.0321968.s003.pdf^{(50.5KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0321968.r006

Decision Letter 2

Zeyar Aung

14 Mar 2025

Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurance

PONE-D-24-09697R2

Dear Dr. Kamran,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Zeyar Aung

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The paper is acceptable now. Please remove the tags R3C1, etc. from the final submission. Thank you.

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0321968.r007

Acceptance letter

Zeyar Aung

PONE-D-24-09697R2

PLOS ONE

Dear Dr. Kamran,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Zeyar Aung

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Archive

Archive.zip containing plotting data and code used to generate figures and results reported in the manuscript.

(ZIP)

pone.0321968.s001.zip^{(57.3KB, zip)}

Attachment

Submitted filename: PLOS_Response_newfile.pdf

pone.0321968.s002.pdf^{(78.5KB, pdf)}

Attachment

Submitted filename: PLOS_Response_Feb25.pdf

pone.0321968.s003.pdf^{(50.5KB, pdf)}

Data Availability Statement

[pone.0321968.ref001] 1.Brenner DR, Weir HK, Demers AA, Ellison LF, Louzado C, Shaw A, et al. Projected estimates of cancer in Canada in 2020. CMAJ. 2020;192(9):E199–E205. doi: 10.1503/cmaj.191292 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0321968.ref002] 2.Brown WE, Sung K, Aleman DM, Moreno-Centeno E, Purdie TG, McIntosh CJ. Guided undersampling classification for automated radiation therapy quality assurance of prostate cancer treatment. Med Phys. 2018;45(4):1306–16. doi: 10.1002/mp.12757 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref003] 3.Berry SL, Boczkowski A, Ma R, Mechalakos J, Hunt M. Interobserver variability in radiation therapy plan output: results of a single-institution study. Pract Radiat Oncol. 2016;6(6):442–9. doi: 10.1016/j.prro.2016.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0321968.ref004] 4.Nelms B, Robinson G, Magliari J, Roeske J. MO-EE-A2-01: On the dosimetric/DVH impact of variation in organ delineation: A multi-institutional study and proposed quality system. Med Phys. 2010;37(6 Part 26):3348. doi: 10.1118/1.3469085 [DOI] [Google Scholar]

[pone.0321968.ref005] 5.McNiven AL, Sharpe MB, Purdie TG. A new metric for assessing IMRT modulation complexity and plan deliverability. Med Phys. 2010;37(2):505–15. doi: 10.1118/1.3276775 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref006] 6.Engberg L, Forsgren A, Eriksson K, Hårdemark B. Explicit optimization of plan quality measures in intensity-modulated radiation therapy treatment planning. Med Phys. 2017;44(6):2045–53. doi: 10.1002/mp.12146 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref007] 7.Zhu X, Ge Y, Li T, Thongphiew D, Yin F-F, Wu QJ. A planning quality evaluation tool for prostate adaptive IMRT based on machine learning. Med Phys. 2011;38(2):719–26. doi: 10.1118/1.3539749 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref008] 8.Mayo CS, Yao J, Eisbruch A, Balter JM, Litzenberg DW, Matuszak MM, et al. Incorporating big data into treatment plan evaluation: Development of statistical DVH metrics and visualization dashboards. Adv Radiat Oncol. 2017;2(3):503–14. doi: 10.1016/j.adro.2017.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0321968.ref009] 9.Craft D, Süss P, Bortfeld T. The tradeoff between treatment plan quality and required number of monitor units in intensity-modulated radiotherapy. Int J Radiat Oncol Biol Phys. 2007;67(5):1596–605. doi: 10.1016/j.ijrobp.2006.11.034 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref010] 10.Chawla N. In: Maimon O, Rokach L, editors. Data mining for imbalanced datasets: An overview. Boston, MA: Springer US; 2010. p. 875–86. [Google Scholar]

[pone.0321968.ref011] 11.Chawla N In: Maimon O, Rokach L, editors. Data mining for imbalanced datasets: An overview. Boston, MA: Springer US; 2010. p. 875–86. [Google Scholar]

[pone.0321968.ref012] 12.Oza P, Patel VM. One-class convolutional neural network. IEEE Signal Process Lett. 2019;26(2):277–81. doi: 10.1109/lsp.2018.2889273 [DOI] [Google Scholar]

[pone.0321968.ref013] 13.Perera P, Patel VM. Learning deep features for one-class classification. IEEE Trans Image Process. 2019;28(11):5450–63. doi: 10.1109/TIP.2019.2917862 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref014] 14.Sabokrou M, Fathy M, Zhao G, Adeli E. Deep end-to-end one-class classifier. IEEE Trans Neural Netw Learn Syst. 2021;32(2):675–84. doi: 10.1109/TNNLS.2020.2979049 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref015] 15.Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Comput. 2001;13(7):1443–71. doi: 10.1162/089976601750264965 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref016] 16.Deng L, Seltzer M, Yu D, Acero A, Mohamed Ar, Hinton G. Binary coding of speech spectrograms using a deep auto-encoder. In: Interspeech 2010. International Speech Communication Associatio; n; 2010. [Google Scholar]

[pone.0321968.ref017] 17.Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci. 1901;2(11):559–72. doi: 10.1080/14786440109462720 [DOI] [Google Scholar]

[pone.0321968.ref018] 18.Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Netw. 2000;13(4–5):411–30. doi: 10.1016/s0893-6080(00)00026-5 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref019] 19.Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning – ICML ’08; 2008. pp. 1096–103. doi: 10.1145/1390156.1390294 [DOI] [Google Scholar]

[pone.0321968.ref020] 20.Dahl G, Ranzato MA, Mohamed AR, Hinton GE. Phone recognition with the mean-covariance restricted Boltzmann machine. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, editors. Advances in neural information processing systems, vol. 23. Curran Associates, Inc.; 2010. [Google Scholar]

[pone.0321968.ref021] 21.Leng Q, Qi H, Miao J, Zhu W, Su G. One-class classification with extreme learning machine. Math Probl Eng. 2015;2015(1):412957. [Google Scholar]

[pone.0321968.ref022] 22.Tax DM, Duin RP. Data domain description using support vectors. In: European symposium on artificial neural networks, vol. 99; 1999. p. 251–6. [Google Scholar]

[pone.0321968.ref023] 23.Erfani SM, Rajasegarar S, Karunasekera S, Leckie C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn. 2016;58:121–34. doi: 10.1016/j.patcog.2016.03.028 [DOI] [Google Scholar]

[pone.0321968.ref024] 24.Huang FJ, LeCun Y. Large-scale learning with SVM and convolutional for generic object categorization. In: 2006 IEEE computer society conference on computer vision and pattern recognition – Volume 1 (CVPR’06), vol. 1; 2006. pp. 284–91. doi: 10.1109/cvpr.2006.164 [DOI] [Google Scholar]

[pone.0321968.ref025] 25.Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui SA, Binder A, et al. Deep one-class classification. In: Dy J, Krause A, editors. Proceedings of the 35th international conference on machine learning. vol. 80 of proceedings of machine learning research. PMLR; 2018. pp. 4393–402. [Google Scholar]

[pone.0321968.ref026] 26.Hu W, Wang M, Qin Q, Ma J, Liu B. HRN: A holistic approach to one class learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems, vol. 33. Curran Associates, Inc.; 2020. pp. 19111–24. [Google Scholar]

[pone.0321968.ref027] 27.Grossberg S. Competitive learning: From interactive activation to adaptive resonance. Cognit Sci. 1987;11(1):23–63. doi: 10.1111/j.1551-6708.1987.tb00862.x [DOI] [Google Scholar]

[pone.0321968.ref028] 28.Brito da Silva LE, Elnabarawy I, Wunsch DC 2nd. A survey of adaptive resonance theory neural network models for engineering applications. Neural Netw. 2019;120:167–203. doi: 10.1016/j.neunet.2019.09.012 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref029] 29.Cao Y, Wu J. Dynamics of projective adaptive resonance theory model: The foundation of PART algorithm. IEEE Trans Neural Netw. 2004;15(2):245–60. doi: 10.1109/TNN.2004.824261 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref030] 30.Habibkhani HK. Supervised projective adaptive resonance theory with top scoring pair (part-tsp) [Master’s thesis]. York University; 2012. [Google Scholar]

[pone.0321968.ref031] 31.Kamran H, Aleman DM, McIntosh C, Purdie TG. SuPART: Supervised projective adapted resonance theory for automatic quality assurance approval of radiotherapy treatment plans. Phys Med Biol. 2022;67(6):065004. doi: 10.1088/361-6560/ac568f [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref032] 32.Seiffertt J. Adaptive resonance theory in the time scales calculus. Neural Netw. 2019;120:32–9. doi: 10.1016/j.neunet.2019.08.010 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref033] 33.Meng L, Tan A-H, Miao C. Salience-aware adaptive resonance theory for large-scale sparse data clustering. Neural Netw. 2019;120:143–57. doi: 10.1016/j.neunet.2019.09.014 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref034] 34.Grossberg S. Adaptive pattern classification and universal recoding. I. Parallel development and coding of neural feature detectors. Biol Cybern. 1976;23(3):121–34. doi: 10.1007/BF00344744 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref035] 35.World Cancer Research Fund International. Cancer Trends; 2022. [cited 2025 Feb 4]. Available from: https://www.wcrf.org/cancer-trends/worldwide-cancer-data/ [Google Scholar]

[pone.0321968.ref036] 36.Purdie T G, McIntosh C J, Svistoun I. Method and system for automated quality assurance in radiation therapy; 2023. US Patent 11,735,309. [cited 2025 Feb 4]. Available from: https://patents.google.com/patent/US11735309B2 [Google Scholar]

[pone.0321968.ref037] 37.Drzymala RE, Mohan R, Brewster L, Chu J, Goitein M, Harms W, et al. Dose-volume histograms. Int J Radiat Oncol Biol Phys. 1991;21(1):71–8. doi: 10.1016/0360-3016(91)90168-4 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref038] 38.Coates A, Ng AY. Learning feature representations with k-means. In: Neural networks: Tricks of the trade: 2nd ed. Springer; 2012. p. 561–80. [Google Scholar]

[pone.0321968.ref039] 39.Brito da Silva LE, Elnabarawy I, Wunsch DC 2nd. Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence. Neural Netw. 2020;121:208–28. doi: 10.1016/j.neunet.2019.08.033 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref040] 40.Zhang Y. A better autoencoder for image: Convolutional autoencoder. In: ICONIP17-DCEC; 2018. [cited 2024 Oct 28]. Available from: http://users.cecs.anu.edu.au/Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_58.pdf [Google Scholar]

[pone.0321968.ref041] 41.Kieu T, Yang B, Guo C, Jensen CS. Outlier detection for time series with recurrent autoencoder ensembles. In: IJCAI; 2019. p. 2725–32. [Google Scholar]

[pone.0321968.ref042] 42.Su Y, Marinoni A, Li J, Plaza J, Gamba P. Stacked nonnegative sparse autoencoders for robust hyperspectral unmixing. IEEE Geosci Remote Sens Lett. 2018;15(9):1427–31. doi: 10.1109/lgrs.2018.2841400 [DOI] [Google Scholar]

[pone.0321968.ref043] 43.Rifai S, Bengio Y, Dauphin Y, Vincent P. A generative process for sampling contractive auto-encoders; arXiv preprint 2012. arXiv:1206.6434. [cited 2025 Feb 4]. Available from: https://arxiv.org/abs/1206.6434 [Google Scholar]

[pone.0321968.ref044] 44.Miani M, Warburg F, Moreno-Muñoz P, Skafte N, Hauberg S. Laplacian autoencoders for learning stochastic representations. Adv Neural Inform Process Syst. 2022;35:21059–72. [Google Scholar]

[pone.0321968.ref045] 45.Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory; 1992. p. 144–52. doi: 10.1145/130385.130401 [DOI] [Google Scholar]

[pone.0321968.ref046] 46.Juran J, Godfrey AB. Quality handbook. Republished McGraw-Hill; 1999. p. 173–8. [Google Scholar]

[pone.0321968.ref047] 47.Commission E. Laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts. Eur Comm. 2021;106:1–108. [Google Scholar]

[pone.0321968.ref048] 48.Grossberg S. Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw. 2013;37:1–47. doi: 10.1016/j.neunet.2012.09.017 [DOI] [PubMed] [Google Scholar]

[pone.0321968.ref049] 49.Pes B. Learning from high-dimensional biomedical datasets: the issue of class imbalance. IEEE Access. 2020;8:13527–40. doi: 10.1109/access.2020.2966296 [DOI] [Google Scholar]

PERMALINK

Explainable one-class feature extraction by adaptive resonance for anomaly detection in quality assurance

Hootan Kamran

Dionne Aleman

Chris McIntosh

Tom Purdie

Roles

Abstract

Author summary

1 Introduction

1.1 Binary classification

1.2 One-class classification

2 Data

Table 1. Plan features and feature counts.

2.1 Plan features

2.2 Plan samples

3 Methods

3.1 Data acquisition and ethical considerations

3.2 OART structure and training

Fig 1. Structure of OARTσ.

Fig 2. OART as standalone classifier OART+Σ.

Fig 3. OART as feature extractor: OART+OSVM.

Fig 4. OART as feature extractor: OART+AE.

3.3 Feature extraction and classification

Fig 5. Feature survival for breast.

Fig 6. Feature survival for prostate.

4 Results

4.1 Classification results

Table 2. Breast: 95% confidence intervals for performance metrics. Highlight indicates best performance in each dataset.

Table 3. Prostate: 95% confidence intervals for performance metrics. Highlight indicates best performance in each dataset.

Fig 7. Receiver operating characteristic curve for the breast dataset.

Fig 8. Receiver operating characteristic curve for the prostate dataset.

4.2 Clinical interpretation

Fig 9. Interpretable testing scores of OART for breast: The dashed line indicates the actual class separation, with unacceptable plans to the right.

Fig 10. Interpretable testing scores of OART for prostate: The dashed line indicates the actual class separation, with unacceptable plans to the right.

5 Discussion

6 Conclusions

Supporting information

Data Availability

Funding Statement

References

Author response to Decision Letter 0

Decision Letter 0

Zeyar Aung

Roles

Author response to Decision Letter 1

Decision Letter 1

Zeyar Aung

Roles

Author response to Decision Letter 2

Decision Letter 2

Zeyar Aung

Roles

Acceptance letter

Zeyar Aung

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 1. Structure of ${OART}_{σ}$ .

Fig 2. OART as standalone classifier $OART + Σ$ .

Fig 3. OART as feature extractor: $OART + OSVM$ .

Fig 4. OART as feature extractor: $OART + AE$ .