Skip to main content
Journal of Medical Physics logoLink to Journal of Medical Physics
. 2025 Mar 24;50(1):46–54. doi: 10.4103/jmp.jmp_195_24

Analysis of Aperture-based Complexity Metrics and Their Effect on Patient-specific Quality Assurance in Intensity-modulated Radiation Therapy Planning

Dinesh Kumar Saroj 1,2,, Suresh Yadav 3, Neetu Paliwal 2, Ravindra Bhagwat Shende 1, Gaurav Gupta 1
PMCID: PMC12005648  PMID: 40256179

Abstract

Background:

Identifying plans at risk of patient-specific quality assurance (PSQA) failure through complexity metrics can reduce the workload while maintaining quality. This study evaluates complexity metrics as predictors of PSQA outcomes.

Materials and Methods:

A retrospective analysis was conducted on 192 IMRT plans for head-and-neck cancer. Complexity metrics were calculated using an in-house Python program. PSQA was performed with 3%/2-mm gamma passing rate (GPR) criteria, with plans classified as “Pass” (GPR ≥95%) or “Fail.” Statistical analyses, including Spearman’s correlation and receiver operating characteristic analysis, assessed the metrics’ predictive value.

Results:

Passing plans had an average GPR of 98.64 ± 1.33%, compared to 92.17 ± 2.35% for failing plans. The mean small area segment (MSAS) 5mm metric, with a threshold of 0.085, achieved a true positive rate of 38.17% and a false positive rate of 3.1%. Beam modulation and beam area indices also significantly differed between passing and failing plans.

Conclusion:

MSAS5 and edge metrics showed strong potential for identifying high-risk plans. These metrics can guide targeted PSQA, improving workflow efficiency without compromising treatment safety.

Keywords: Gamma passing rate, intensity-modulated radiation therapy, modulation complexity score, patient-specific quality assurance, receiver operating characteristic curve

INTRODUCTION

Intensity-modulated radiation therapy (IMRT) techniques offer enhanced precision in delivering radiation doses for the treatment of cancerous lesions with concavities or proximity to multiple critical structures.[1] Nevertheless, well-known drawbacks are associated with employing overly complex IMRT fields.[2,3] Existing literature has suggested using the smoothing penalty during inverse optimization to reduce the complexity of treatment plan.[4] Treating patients with complex beams of small-field segments necessitates prolonged beam-on time (higher monitor unit [MU]/Gy), leading to an increase the risks of patient movement and elevated leakage doses.[2,3] In addition, such treatments result in increased mechanical stress, a greater likelihood of treatment delivery errors, and, consequently, an augmented workload for quality assurance (QA).

Owing to the complex planning and delivery processes, implementing measurement-based patient-specific QA (PSQA) is a crucial clinical procedure for ensuring the safety of treatments. PSQA for IMRT typically involves a comparison between the measured and calculated dose distributions using a detector array before the commencement of treatment. The gamma index, a widely employed metric, evaluates the agreement between two distributions by considering both the percentage dose difference and distance-to-agreement (DTA).[5] The gamma passing rate (GPR), which indicates the percentage of measurement points meeting the condition of a gamma index <1, was computed with options for global or local normalization and various dose differences and DTA criteria, such as 3%/3 mm. Previously, AAPM Task Group 119 recommended that 90% GPR with 3%/3 mm is sufficient for clinical acceptance of the IMRT plan. However, because more reliable detectors and software are now available, AAPM has released updated standards for TG 218 that apply stricter criteria, such as a 95% GPR with 3%/2 mm.[6,7] The lower gamma analysis passing rates can be attributed to several different causes of uncertainty and error. According to earlier research, these problems can include detector resolution, calibration, and phantom setup.[8] Additional contributors include fluctuations in the beam output and profile on the day of measurement, challenges associated with beam modeling, and the inherent complexity of treatment plans.

There has been a noticeable increase in the interest in machine learning and deep learning techniques in recent advancements in PSQA.[9,10,11,12] These models make the use of a variety of input data, including radiomics analysis, features extracted from convolutional neural networks (CNNs) based on images, such as dose or fluence maps, and complexity metrics obtained from treatment plans and machine-related parameters.[13,14] Using complexity metrics as training data, Valdes et al. trained a Poisson regression model with Lasso regularization to predict the 3%/3 mm GPR for 498 plans.[15] The prediction model was verified at additional institutions using various measurement techniques. Tomori et al. trained a CNN model with 60 prostate IMRT plans using planar dose distributions and volume of the planned target volume (PTV), rectum, overlapping region, and MU values for each field.[13] Furthermore, Hirashima et al. improved the prediction and classification performance of GPR by including plan complexity and dosimetric characteristics in their model.[16] Such prediction models for the PSQA process are potentially useful for alerting physicists to treatment plans for which the patient has a high likelihood of failing to meet the clinical criteria.

Creating complexity metrics correlated with dosimetric accuracy in IMRT plans can enhance plan optimization and verification. By identifying and controlling parameters that impact accuracy, such as leaf travel, plans with lower complexity and higher accuracy can be prioritized, potentially reducing the workload of QA efforts. This study aimed to identify a reliable metric for identifying treatment plans that are likely to fail pretreatment QA tests. The ultimate goal is to enhance the efficiency and reliability of the IMRT treatment planning and delivery processes.

MATERIALS AND METHODS

Patient dataset and treatment plan

This study retrospectively analyzed data from 192 patients who were diagnosed with head-and-neck cancer and treated with IMRT. Treatment plans utilizing a 6 MV photon beam and a dose rate of 600 MU/min were created using Eclipse Treatment Planning System (TPS) version 13.7.29 (Varian Medical Systems, Palo Alto, CA). IMRT treatment plans were delivered using a True Beam (SVC, Varian Medical System) linear accelerator equipped with a Millennium multileaf collimator (MLC120, Varian Medical System). The system consisted of 120 leaves with 40 leaf pairs in the middle and 10 leaf pairs on each side. The middle leaf thickness was estimated at the isocenter to be 5 mm, while the outer leaves were 10 mm. The maximum speed of leaves was 2.5 cm/s with Eclipse TPS. MLC transmission factor was 1.2% for 6 MV photon beam and the respective dosimetric leaf gap was 0.26 cm. Positional accuracy of the MLC was < 1 mm as measured in routine picket fence test analysis. Depending on the size and geometry of PTV, a total of 7–9 equally spaced beam angles were utilized to create the IMRT treatment plan. Out of 192 patients, 110 were of carcinoma of the buccal mucosa and 82 were of carcinoma of the tongue with dose prescription of 66 Gy and 60 Gy in 30 fractions, respectively. Photon optimizer (PO) (Version 13.7.29, Varian Medical System) was selected for inverse optimization with 2.5 mm grid size. The resulting dose was calculated using the Anisotropic Analytical Algorithm (Version 13.7.29, Varian Medical System). Varian’s leaf motion calculator (version 13.7.29) was used to generate the fluence pattern. A total of 1448 IMRT treatment fields were used to calculate the complexity metrics of the treatment plan.

Portal dosimetry and patient-specific quality assurance

PSQA was conducted using the electronic portal imaging device (EPID). EPID utilized in this study was a Varian aS1200 model featuring a 43 cm × 43 cm active area and a matrix of 1280 × 1280 pixels with a resolution of 0.34 mm. Calibration of the EPID involves acquiring a dark field (DF) image, capturing pixel background offsets without radiation, and obtaining a flood field (FF) image through irradiation with an open field to determine individual pixel sensitivity differences. Varian’s portal dosimetry preconfiguration package, which considers backscatter from the supporting arm, provides a two-dimensional profile correction image for calibration. Dosimetric calibration was performed to define the calibration units (CUs), where portal dose images were displayed with 100 CUs corresponding to the central axis value of a 10 cm × 10 cm field at a 100 cm source-to-surface distance with the delivery of 100 MUs. The preconfigured portal dosimetry package from Varian was imported into an Eclipse workstation for portal dosimetry analysis.

The evaluation was based on gamma index analysis with a 3% dose difference and a 2 mm DTA, along with a 10% dose threshold. As a rule of thumb to distinguish between plans that are more likely to show dose differences between measurements and TPS calculations, a tolerance limit with a GPR of 95% was used.

Complexity metrics calculation

The complexity level of modulated treatment plans varies based on patient anatomy, dosimetric constraints, optimization algorithms, and linear accelerator (linac) capabilities. Various complexity metrics have been proposed, but there is no consensus on them. These metrics can be categorized as follows: Fluence Metrics: These evaluate the heterogeneity of the fluence, assuming that higher variability indicates greater complexity compared to conventional radiotherapy; Deliverability Metrics: These assess the machine’s capability to deliver treatment accurately, considering variations in mechanical (gantry, MLC) and dosimetric (dose rate, MU) parameters; and Accuracy Metrics: These quantify factors that may compromise dose calculation accuracy in the TPS, such as gaps between MLC leaves, off-axis leaf apertures, leaf leakage doses, and aperture irregularities.

Complexity analyses of the IMRT plans were performed based on the following complexity metrics:

Beam area

The beam area is defined as the total area of the beam and is calculated as follows: [9] This is suggested that the decrease in beam area results in an increase in beam complexity.

graphic file with name JMP-50-46-g001.jpg

where MUij and AAij represent the monitoring units and area of segment Aij, while MUi represents the total monitoring units of beam i.

Beam modulation

Beam modulation is defined as the ratio of the beam area to the union of all apertures.[17] The amount to which a large open area is divided into several smaller portions is reflected by BM.

graphic file with name JMP-50-46-g002.jpg

where MUij represents the monitoring units of segment Aij, MUi represents the total monitoring units of beam i, and U (AAij) represents the union of all the segments of beam i.

Small aperture score

The small aperture score (SAS) is defined as the proportion of a plan delivered using small apertures and ranges from 0 to 1. Small apertures were defined as leaf pair gaps of < 2, 5, and 20 mm, resulting in three distinct quantities. The MSAS represents the average value of the corresponding small-aperture score.[9]

Mean field area

The mean field area (MFA) was computed as the weighted mean of the area between exposed to open leaf pairs for all segments of the beam. Each segment is weighted based on the number of MUs delivered.[9] MFA is field aperture sensitive. Small irregular aperture can result in significant dose variation.

graphic file with name JMP-50-46-g003.jpg

Monitor unit per control point

The mean MU per control point (MUCP) for each beam represents the average of the MUs per control point within that specific beam.[9]

Mean asymmetry distance

The mean asymmetry distance (MAD) is the weighted average of the distance between the center of every open leaf pair aperture and the central beam axis.[9]

Modulation complexity score

The modulation complexity score (MCS) is a plan complexity assessment metric initially developed by McNiven et al.[18] for step-and-shoot treatments. Masi et al.[19] later applied the MCS formalism to volumetric-modulated arc therapy treatments. The score uses two parameters to describe the modulation of the fluence:

Aperture area variability

This represents the variability in the shape of segments, measured as the difference between leaf pair apertures for any segment and the maximum leaf separation in the beam.[18,19]

graphic file with name JMP-50-46-g004.jpg

Leaf sequence variability

Leaf sequence variability (LSV) represents the variability in the area of segments and is measured as the variation between adjacent leaves on the same leaf bank.[18,19]

graphic file with name JMP-50-46-g005.jpg

where pi is the coordinate of the ith leaf position, pmax is the maximum distance between positions for a given leaf bank summed over all control points, and N is the number of leaves in the bank.

MCS=

graphic file with name JMP-50-46-g006.jpg

where MUcpi, i + 1 indicates the MUs delivered between two successive control points (cpi and cp[i + 1]). For both LSVcp and AAVcp, the product of the mean values between neighboring control points must be considered to calculate the MCS. All the control points in the arc values were obtained by adding the weighted total of this product, which was further adjusted for the proportion of MUs supplied between two successive control points. These calculations are intended to ensure that the MCS score ranges between 0 and 1, where MCS = 0 represents a highly complex plan, and MCS = 1 represents a simple plan.

Edge metrics

Younge et al. defined edge metrics as the ratio of the MLC length to the aperture area.[20] The EM index, which is directly correlated with the degree of tongue-and-groove effect, increases with differences in the locations of neighboring leaves.

Union of aperture area

Union of aperture area (UAA) represents the union area of all beam apertures, which could be greater than or equal to the area of any individual.[21] In order to achieve the more conformity of target, one approach is to divide the beam into many small beams. The presence of many small beams incorporates the uncertainty associated with it leading to deviation in dose.

Data analysis

Gamma index analysis was conducted with criteria set at 3% and 2 mm for dose difference and distance-to-agreement, respectively. A 10% low-dose threshold and global dose normalization were applied. Plans were classified as passing if the GPR exceeded 95%; otherwise, they were considered as failing. The complexity metrics were derived from the planning files using a Python script.[9] Spearman’s rank correlation coefficient (rs) was used to evaluate the correlations between the GPR and complexity metrics.

Receiver operating characteristic (ROC) curves were generated to assess whether complexity metrics can be used to identify plans with GPRs below the tolerance limit. The threshold values for each complexity metric were varied to classify the plans as Pass or Fail. True positives were defined as those with both low complexity values, and GPRs below the tolerance limit. False positives were defined as those with low complexity values but GPRs above the tolerance limit. The area under the curve (AUC) was calculated to evaluate the performance of each ROC curve for classification. The AUC values fall between 0.5 and 1.0, where 1.0 denotes perfect accuracy and 0.5 denotes chance accuracy.

RESULTS

Patient-specific quality assurance analysis

Table 1 shows the mean values of GPR and various complexity parameters for pass-PSQA and failed-PSQA. A total of 192 IMRT treatment plans were studied and analyzed. Among these, 160 plans fall into the pass category of PSQA (GPR ≥95%), while the remaining 32 plans fall into the failed PSQA (GPR ≤95%) category. According to the 3% and 2 mm evaluation criteria, the mean GPR for the pass category PSQA was 98.64 ± 1.33 (%), whereas it was 92.17 ± 2.35 (%) for the failed-PSQA category. As shown in Figure 1, the minimum GPR values reported for the pass and failed PSQA were 95.1% and 84.4%, respectively. The first quartile values of GPR for pass and failed PSQA were 97.9% and 91.2%, respectively.

Table 1.

Results of patient-specific quality assurance for intensity-modulated radiation therapy treatment fields

Parameters Pass PSQA Fail PSQA
Number of treatment plans 160 32
Mean GPR (%) 98.64±1.33 92.17±2.35
Min-GPR (%) 95.1 84.8
Max-GPR (%) 100 94.8
First quartile 97.9 91.2

PSQA: Patient-specific quality assurance, Max-GPR: Maximum gamma passing rate Min-GPR: Minimum gamma passing rate, Fail PSQA: Treatment field with gamma passing rates below 95%, Pass PSQA: Treatment field with gamma passing rates above 95%

Figure 1.

Figure 1

Represents the box plot of gamma passing rate (GPR) for IMRT treatment plan above and below the 95% passing rate. GPR ≥95% Plans with GPR above 95% threshold; Fail: GPR ≤95%: Plans with GPR below 95%. GPR: Gamma passing rate

Complexity metrics

Table 2 summarizes the average complexity scores for the pass and failed categories in the PSQA analysis of IMRT treatment plans. The average BM values for pass and failed PSQA were 0.678 ± 0.122 and 0.739 ± 0.112, respectively. Similarly, the average UAA and BA concentrations in pass and failed PSQA were 130.14 ± 52.46 cm2, 40.37 ± 20.11 cm2, 139.11 ± 51.38 cm2, and 33.40 ± 14.79 cm2, respectively. The average SAS2, 5, and 20 in pass and failed PSQA were 0.032 ± 0.025, 0.089 ± 0.060, 0.477 ± 0.214, and 0.031 ± 0.015, 0.096 ± 0.048, and 0.550 ± 0.174, respectively. Similarly, the MFA averages in pass and failed PSQA were 40.18 ± 20.09 cm2 and 33.21 ± 14.70 cm2, the MAD averages were 10.39 ± 2.68 cm and 10.76 ± 2.86 cm, and the MUCP averages were 0.708 ± 0.277 and 0.716 ± 0.206 for pass and failed PSQA, respectively. Moreover, the average Ratio of the average area of an aperture over the area defined by jaws (AAJA) durations for pass and failed PSQA were 0.208 ± 0.083 and 0.181 ± 0.068, respectively. The MCS averages for pass and failed PSQA were 0.378 ± 0.132 and 0.340 ± 0.102, respectively, and the average EM scores were 0.069 ± 0.033 and 0.074 ± 0.024 for pass and failed PSQA, respectively.

Table 2.

Summarizes the average values of the various complexity metrics for the pass and failed categories of patient-specific quality assurance

Complexity indices Pass PSQA Fail PSQA
BM 0.678±0.122 0.739±0.112
UAA (cm2) 130.14±52.46 139.11±51.38
BA (cm2) 40.37±20.11 33.40±14.79
MSAS 2 mm 0.032±0.025 0.031±0.015
MSAS 5 mm 0.089±0.060 0.096±0.048
SAS 20 mm 0.477±0.214 0.550±0.174
MFA (cm2) 40.18±20.09 33.21±14.70
MAD (cm) 10.39±2.68 10.76±2.86
MUCP 0.708±0.277 0.716±0.206
AAJA 0.208±0.083 0.181±0.068
MCS 0.378±0.132 0.340±0.102
EM 0.069±0.033 0.074±0.024

Pass PSQA: Treatment field with gamma passing rates above 95%, Fail PSQA: Treatment field with gamma passing rates below 95%, BA: Beam area, BM: Beam modulation, UAA: Union of average area, MSAS (2, 5, 20): Mean small aperture score for 2 mm, 5 mm, and 20 mm, MFA: Mean field area, MAD: Mean asymmetric distance, MUCP: Monitor unit per control point, AAJA: Ratio of the average area of an aperture over the area defined by jaws, MCS: Modulation complexity score, EM: Edge metrics

Correlation analysis

Figure 2 shows the Spearman’s rank correlation coefficient between the GPR and various treatment plan complexity metrics for the IMRT treatment plans. GPR showed no correlation (rs) with any of the other complexity metrics. BM had the strongest negative correlation with AAJA. UAA showed the strongest positive correlation (|rs| ≥0.7) with MAD. MSAS2 and MSAS5 showed no correlation with UAA but showed the strongest correlation with EM. MSA20 showed strong positive and negative correlations with BM, EM, BA, and MFA, respectively, and was negatively correlated with MSAS20. MUCP was weakly correlated with UAA and MAD. MCS showed a strong positive correlation with AAJA, whereas it showed a strong negative correlation with EM, BM, MSAS5, and MSAS20.

Figure 2.

Figure 2

A Heatmap showing the Spearman correlation coefficients between gamma passing rate (GPR) and various complexity metrics. GPR: Gamma passing rates with 3% and 3 mm, BA: Beam area, BM: Beam modulation, UAA: Union of average area, MSAS (2, 5, 20): Mean small aperture score for 2 mm, 5 mm and 20 mm, MFA: Mean field area, MAD: Mean asymmetric distance, MUCP: Monitor unit per control points, AAJA: Average area of aperture over the area, MCS: Modulation complexity score, EM: Edge metrics

Receiver operating characteristic curve analysis

Table 3 shows the threshold values, true positive rates (TPRs), false positive rates (FPRs), and areas under the curve for various complexity metrics. MSAS5 has a threshold value of 0.085 and the highest TPR of 38.17%, with a corresponding FPR of 3.1%. MUCP had the lowest TPR (18.30%) and corresponding FPR (3.1%), with a threshold value of 0.98. The second-highest TPR, 37.17%, was shown by EM, with an FPR of 3.25% and a threshold value of 0.085. The MCS yielded a TPR of 37.12%, an FPR of 9.3%, and a threshold of 0.39. Figure 3 shows the ROCs for all complexity metrics with GPR values of 3% and 3 mm. The BM showed good performance in terms of the AUC, as it had the highest AUC (82%) compared to the other complexity metrics. BA, MSAS2, MSAS5, MFA, MAD, and EM had AUCs of 72%, 77%, 76%, 75%, 73%, and 76%, respectively. In contrast, UAA MSAS20, AAJA, and MCS had AUCs of 56%, 47%, 59%, and 48%, respectively.

Table 3.

Threshold values, true positive rates, false positive rates, and areas under the curve for various complexity metrics

Complexity indices Threshold TPR (%) FPR (%) AUC (%)
BM 0.68 31.42 9.3 80.71
MSAS 2 mm 0.044 31.42 9.3 71.43
MSAS 5 mm 0.085 38.17 3.1 73.57
MSAS 20 mm 0.491 24.81 9.3 56.43
MUCP 0.98 18.3 3.1 73.93
AAJA 0.24 24.4 6.25 60.28
MCS 0.39 37.12 9.3 59.29
EM 0.085 37.17 3.25 77.50
BA (cm2) 72.80 42.76 6.28 71.86
MFA (cm) 78.53 29.76 9.25 56.07
MAD (cm) 13.95 29.41 9.2 73.05
UAA (cm2) 133.45 33.5 6.8 56.05

Figure 3.

Figure 3

The scatter plots between various complexity metrics and gamma passing rates (GPR) for 3% and 3 mm evaluation criteria (a) Beam area versus GPR, (b) Beam modulation versus GPR, (c) Edge metrics versus GPR, (d) Modulation complexity score versus GPR, (e) Mean field area versus GPR, (f) Mean asymmetric distance versus GPR. BM: Beam modulation, GPR: Gamma passing rate, BA: Beam area, EM: Edge metrics, MCS: Modulation complexity score, MFA: Mean field area, MAD: Mean asymmetric distance

DISCUSSION

This study examines the relationship between 12 distinct complexity indicators and GPR in 192 IMRT treatment plans. Although these measurements were presented in earlier research, it has been shown that a specific analysis is required for their use across various linear accelerators.[22] The analysis technique investigated and suggested in this study seeks to provide a readily adaptable strategy for use in different establishments. In addition, this study aimed to offer a simple yet reliable method for choosing a measure and establishing its cutoff. One of the obstacles for PSQA triage systems is the difficulty in selecting a threshold and number of complexity measures.[22,23] To overcome this limitation, correlation analysis with GPR data and ROC curve analysis can be performed by setting the threshold for complexity metrics with fixed GPR criteria. The absolute dosimetry was performed, and dose variation was set within the limit of 1%. The mechanical accuracy of the EPID was checked during the routine measurement and was found to be within the ± 1 mm. As per the department protocol, the DF and the FF image acquisition was performed on monthly basis.

Figure 1 shows the box plot of the GPRs for IMRT plans that fall into the pass and fail categories. In this study, 83.33% of the patients met the 3%/2 mm passing criterion and were able to score a GPR greater than or equal to 95%. However, 16.67% of the treatment fields had a GPR <95%. In our study, we have observed the higher plan GPR failing rate compared to the previous study, this could be because, for the study, patient PSQA data were collected over a 3-to 4-year period. The PSQA results were greatly affected by the planning constraints, target volume delineation, planning strategy, presence of nearby OARs, and machine state at the time of QA. Gamma evaluation criteria used are also going to affect the PSQA results. The failed plans were reoptimized and remeasured with different detectors in case of further failure.

Furthermore, the clinical relevance was checked, in which area, the gamma was failing before delivering the plan to the patient. Table 1 shows that the complexity metrics may indicate a highly complex plan, and differences may be produced in the planned and measured dose distributions. PSQA measurements depend on the type of detector used, measurement and setup errors, linac commissioning, and TPS commissioning. The PSQA measurements on phantoms with different geometries and physical properties using a variety of detectors may also affect the results. A correlation analysis of the complexity metrics with the PSQA results was performed to identify the most relevant complexity metrics that could be used to determine the complexity of the treatment plan, which can be seen in Figure 2. A previous study of various complexity metrics with PSQA results showed weak-to-moderate correlations.[23] Similarly, in our study, none of the metrics showed a strong correlation with GPR. The complexity metrics used in our study mainly describe the aperture shape of the treatment beam and MLC movement. Interestingly, all the complexity metrics showed strong correlations (|rs| ≥0.7) among each other. Figure 3 clearly illustrates that the BA, BM, and MCS had small ranges of values that overlapped with the corresponding complexity metrics of the failing plans. The majority of the failed plans had BA values <40.0 cm2, BM values above 0.8, and MCSs of 0.35. However, MAD and EM showed large ranges of overlap between failing and passing PSQA plans. Similar trends were observed for MSAS2, 5, and 20 mm MUCP and UAA, as shown in Figure 4. Hence, identifying treatment plans that fail or pass through the GPR results is difficult. However, the scatter plots in Figures 3 and 4 illustrate that at the extreme end of the complexity value, the majority of the plans tend to show large disagreements between the calculated and measured dose distributions. The angular dependency, spatial resolution, and physical properties of the detectors may lead to increased uncertainty in PSQA measurements.[24] Figure 5 shows the receiver operating curve for the various complexity metrics calculated for the IMRT treatment plans. ROC curve analysis was adopted to evaluate the classification performance of each complexity metric. The AUC, a commonly used indicator ranging from 0.5 (representing random classification) to 1 (indicating perfect classification), was used to summarize the performance. Throughout this investigation, for most of the complexity metrics under analysis, AUC values were consistently between 0.6 and 0.82. MCS and MSAS20 had AUCs of < 0.5. Our finding for the MCS was similar to that of Park et al., who reported an AUC of 0.527 for the MCS using the 2% 2 mm criterion with a 90% tolerance limit.[17] To employ complexity metrics as surrogates for dose verification measurements, a threshold value can be established to ascertain whether the complexity of the treatment plan suggests a high degree of dose uncertainty. Consequently, such treatment plans may warrant reconsideration for re-planning. In this scenario, the threshold value should be set to minimize false positives and prevent unnecessary flagging of clinically acceptable plans, while maximizing true positives to identify highly complex plans. Nevertheless, the selection of any threshold value inevitably involves a trade-off between the FPR and the TPR. The threshold values provided in Table 3 were chosen to ensure that the FPRs remained below 10%. Younge et al.[20] used the same constraint on the FPR and reported that the aperture complexity metric yielded a 44% TPR with a 7% false-positive rate. In our study, BM resulted in a 31.42% true-positive rate and a 9.3% false-positive rate. MSAS5 resulted in a 38.17% TPR with a 3.1% FPR, whereas EM resulted in a 37.17% TPR with a 3.25% FPR.

Figure 4.

Figure 4

The scatter plots between various complexity metrics and gamma passing rates (GPR) for 3% and 3 mm evaluation criteria (a) Average area of aperture over the area defined by jaw versus GPR, (b) Monitor unit per control points versus GPR, (c) Mean small aperture score for 20 mm versus GPR, (d) Mean small aperture score for 5 mm versus GPR, (e) Mean small aperture score for 2 mm versus GPR, (f) Union of average area versus GPR. GPR: Gamma passing rate, AAJA: Average area of aperture over the area, MUCP: Monitor unit per control points, MSAS20: Mean small aperture score for 20 mm, MSAS5: Mean small aperture score for 5 mm, MSAS2: Mean small aperture score for 2 mm, UAA: Union of average area

Figure 5.

Figure 5

The receiver operating curve for the various complexity metrics calculated for the IMRT treatment plans. AUC: Area under the curve, BA: Beam area, BM: Beam modulation, UAA: Union of average area, MSAS (2, 5, 20): Mean small aperture score for 2 mm, 5 mm, and 20 mm, MFA: Mean field area, MAD: Mean asymmetric distance, MUCP: Monitor unit per control points, AAJA: Average area of aperture over the area, MCS: Modulation complexity score, EM: Edge metrics, black dotted lines shows the random classification performance, ROC: Receiver operating curve

The outcomes of the complexity metric analysis are notably influenced by the unique attributes of each institution, thereby posing challenges for direct comparisons between different institutions. In addition, the chosen correlation method, criteria set for PSQA, and quantity of treatment plans investigated, along with their corresponding treatment volumes, contributed to the complexity of the analysis. Although specific findings may lack generalizability to other institutions, the methodology can serve as a foundation for developing institution-specific PSQA tools, thereby assisting in the refinement of the treatment planning process.

In such studies, ROC analysis is recommended as the preferred method for evaluating the performance of complexity metrics. ROC analysis provides a comprehensive assessment of the classification performance of complexity metrics that serve as PSQA tools. This approach allows the establishment of threshold values tailored to individual machines, target sites, or treatment techniques. By contrast, methods that yield single values, such as correlation tests or AUC, may not fully capture the nuanced performance of complexity metrics. However, correlation tests and AUC analysis can still offer valuable insights, and their results should align. A lack of correlation or AUC values close to 0.5 indicates that a given complexity metric cannot effectively discriminate between treatment plans with higher or lower GPRs and, consequently, should not be considered for practical use. In summary, the findings of this study suggest congruence among the results, indicating that the investigated complexity metrics possess a moderate ability to discern the degree of agreement between dose distributions.

CONCLUSION

Many of complexity metrics exhibited moderate-to-strong correlations. By employing ROC analysis to assess classification performance, both MSAS5 and EM showed relatively elevated true-positive rates in identifying highly modulated plans, accompanied by corresponding false-positive rates below 10%. The efficacy of these complexity metrics in identifying complex plans should be further scrutinized in future studies for different clinical sites and other modern planning techniques. There are several significant clinical benefits to simplifying IMRT plans, such as increased plan deliverability and shorter treatment duration. In order to direct the evaluation of IMRT plans and eventually lower radiation dose uncertainty, treatment plan complexity quantification may be helpful. Plan complexity tools should be incorporated into the inverse planning system to address the plan complexity.

Conflicts of interest

There are no conflicts of interest.

Funding Statement

Nil.

REFERENCES

  • 1.Intensity Modulated Radiation Therapy Collaborative Working Group. Intensity-modulated radiotherapy: Current status and issues of interest. Int J Radiat Oncol Biol Phys. 2001;51:880–914. doi: 10.1016/s0360-3016(01)01749-7. [DOI] [PubMed] [Google Scholar]
  • 2.Lee MT, Purdie TG, Eccles CL, Sharpe MB, Dawson LA. Comparison of simple and complex liver intensity modulated radiotherapy. Radiat Oncol. 2010;5:115. doi: 10.1186/1748-717X-5-115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nauta M, Villarreal-Barajas JE, Tambasco M. Fractal analysis for assessing the level of modulation of IMRT fields. Med Phys. 2011;38:5385–93. doi: 10.1118/1.3633912. [DOI] [PubMed] [Google Scholar]
  • 4.Saroj DK, Yadav S, Paliwal N. Does fluence smoothing reduce the complexity of the intensity-modulated radiation therapy treatment plan?A dosimetric analysis. J Med Phys. 2022;47:336–43. doi: 10.4103/jmp.jmp_81_22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Low DA, Harms WB, Mutic S, Purdy JA. A technique for the quantitative evaluation of dose distributions. Med Phys. 1998;25:656–61. doi: 10.1118/1.598248. [DOI] [PubMed] [Google Scholar]
  • 6.Ezzell GA, Burmeister JW, Dogan N, LoSasso TJ, Mechalakos JG, Mihailidis D, et al. IMRT commissioning:multiple institution planning and dosimetry comparisons, a report from AAPM Task Group 119. Med Phys. 2009;36:5359–73. doi: 10.1118/1.3238104. [DOI] [PubMed] [Google Scholar]
  • 7.Miften M, Olch A, Mihailidis D, Moran J, Pawlicki T, Molineu A, et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendations of AAPM task group no. 218. Med Phys. 2018;45:e53–83. doi: 10.1002/mp.12810. [DOI] [PubMed] [Google Scholar]
  • 8.Huang JY, Pulliam KB, McKenzie EM, Followill DS, Kry SF. Effects of spatial resolution and noise on gamma analysis for IMRT QA. J Appl Clin Med Phys. 2014;15:4690. doi: 10.1120/jacmp.v15i4.4690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lam D, Zhang X, Li H, Deshan Y, Schott B, Zhao T, et al. Predicting gamma passing rates for portal dosimetry-based IMRT QA using machine learning. Med Phys. 2019;46:4666–75. doi: 10.1002/mp.13752. [DOI] [PubMed] [Google Scholar]
  • 10.Li J, Wang L, Zhang X, Liu L, Li J, Chan MF, et al. Machine learning for patient-specific quality assurance of VMAT: Prediction and classification accuracy. Int J Radiat Oncol Biol Phys. 2019;105:893–902. doi: 10.1016/j.ijrobp.2019.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Granville DA, Sutherland JG, Belec JG, La Russa DJ. Predicting VMAT patient-specific QA results using a support vector classifier trained on treatment plan characteristics and linac QC metrics. Phys Med Biol. 2019;64:095017. doi: 10.1088/1361-6560/ab142e. doi:10.1088/1361-6560/ab142e. [DOI] [PubMed] [Google Scholar]
  • 12.Interian Y, Rideout V, Kearney VP, Gennatas E, Morin O, Cheung J, et al. Deep nets versus expert designed features in medical physics: An IMRT QA case study. Med Phys. 2018;45:2672–80. doi: 10.1002/mp.12890. [DOI] [PubMed] [Google Scholar]
  • 13.Tomori S, Kadoya N, Takayama Y, Kajikawa T, Shima K, Narazaki K, et al. Adeep learning-based prediction model for gamma evaluation in patient-specific quality assurance. Med Phys. 2018;45:4055–65. doi: 10.1002/mp.13112. [DOI] [PubMed] [Google Scholar]
  • 14.Tomori S, Kadoya N, Kajikawa T, Kimura Y, Narazaki K, Ochi T, et al. Systematic method for a deep learning-based prediction model for gamma evaluation in patient-specific quality assurance of volumetric modulated arc therapy. Med Phys. 2021;48:1003–18. doi: 10.1002/mp.14682. [DOI] [PubMed] [Google Scholar]
  • 15.Valdes G, Chan MF, Lim SB, Scheuermann R, Deasy JO, Solberg TD. IMRT QA using machine learning: A multi-institutional validation. J Appl Clin Med Phys. 2017;18:279–84. doi: 10.1002/acm2.12161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hirashima H, Ono T, Nakamura M, Miyabe Y, Mukumoto N, Iramina H, et al. Improvement of prediction and classification performance for gamma passing rate by using plan complexity and dosiomics features. Radiother Oncol. 2020;153:250–7. doi: 10.1016/j.radonc.2020.07.031. [DOI] [PubMed] [Google Scholar]
  • 17.Park JM, Park SY, Kim H, Kim JH, Carlson J, Ye SJ. Modulation indices for volumetric modulated arc therapy. Phys Med Biol. 2014;59:7315–40. doi: 10.1088/0031-9155/59/23/7315. [DOI] [PubMed] [Google Scholar]
  • 18.McNiven AL, Sharpe MB, Purdie TG. A new metric for assessing IMRT modulation complexity and plan deliverability. Med Phys. 2010;37:505–15. doi: 10.1118/1.3276775. [DOI] [PubMed] [Google Scholar]
  • 19.Masi L, Doro R, Favuzza V, Cipressi S, Livi L. Impact of plan parameters on the dosimetric accuracy of volumetric modulated arc therapy. Med Phys. 2013;40:071718. doi: 10.1118/1.4810969. doi:10.1118/1.4810969. [DOI] [PubMed] [Google Scholar]
  • 20.Younge KC, Matuszak MM, Moran JM, McShan DL, Fraass BA, Roberts DA. Penalization of aperture complexity in inversely planned volumetric modulated arc therapy. Med Phys. 2012;39:7160–70. doi: 10.1118/1.4762566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Du W, Cho SH, Zhang X, Hoffman KE, Kudchadker RJ. Quantification of beam complexity in intensity-modulated radiation therapy treatment plans. Med Phys. 2014;41:021716. doi: 10.1118/1.4861821. doi:10.1118/1.4↪21. [DOI] [PubMed] [Google Scholar]
  • 22.Chiavassa S, Bessieres I, Edouard M, Mathot M, Moignier A. Complexity metrics for IMRT and VMAT plans: A review of current literature and applications. Br J Radiol. 2019;92:20190270. doi: 10.1259/bjr.20190270. doi:10.1259/bjr.20190270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Antoine M, Ralite F, Soustiel C, Marsac T, Sargos P, Cugny A, et al. Use of metrics to quantify IMRT and VMAT treatment plan complexity: A systematic review and perspectives. Phys Med. 2019;64:98–108. doi: 10.1016/j.ejmp.2019.05.024. [DOI] [PubMed] [Google Scholar]
  • 24.Basran PS, Woo MK. An analysis of tolerance levels in IMRT quality assurance procedures. Med Phys. 2008;35:2300–7. doi: 10.1118/1.2919075. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Medical Physics are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES