Abstract
Radiation therapy (RT) is a frontline approach to treating cancer. While the target of radiation dose delivery is the tumor, there is an inevitable spill of dose to nearby normal organs causing complications. This phenomenon is known as radiotherapy toxicity. To predict the outcome of the toxicity, statistical models can be built based on dosimetric variables received by the normal organ at risk (OAR), known as Normal Tissue Complication Probability (NTCP) models. To tackle the challenge of the high dimensionality of dosimetric variables and limited clinical sample sizes, statistical models with variable selection techniques are viable choices. However, existing variable selection techniques are data-driven and do not integrate medical domain knowledge into the model formulation. We propose a knowledge-constrained generalized linear model (KC-GLM). KC-GLM includes a new mathematical formulation to translate three pieces of domain knowledge into non-negativity, monotonicity, and adjacent similarity constraints on the model coefficients. We further propose an equivalent transformation of the KC-GLM formulation, which makes it possible to solve the model coefficients using existing optimization solvers. Furthermore, we compare KC-GLM and several well-known variable selection techniques via a simulation study and on two real datasets of prostate cancer and lung cancer, respectively. These experiments show that KC-GLM selects variables with better interpretability, avoids producing counter-intuitive and misleading results, and has better prediction accuracy.
Keywords: Statistical modeling, generalized linear models, variable selection techniques, radiation toxicity prediction
1. Introduction
As one of the frontline treatment methods for cancer, radiation therapy (RT) aims to deliver a sufficient amount of dose to the tumor while at the same time keeping the surrounding normal tissues minimally injured. The two goals are hard to achieve simultaneously and perfectly since there are normal organs near the organ with the tumor. Inevitable scattering of radiation may deliver some energy to the normal organs, which could result in toxicity/complication or even death. To reduce the risk of toxicity, statistical models are built to predict the probability for a normal organ at risk (OAR) to develop a certain complication given the planning dose distribution on the OAR extracted from treatment planning systems. Such models are known as Normal Tissue Complication Probability (NTCP) models (Brodin et al., 2018).
There are several ways to build an NTCP model. One of the most popular methods is to convert the dose map on a normal OAR to a Dose-Volume Histogram (DVH) (Holyoake et al., 2017). Then, features from the DVH are used to predict the probability for the OAR to develop a certain complication by a statistical model such as a regression. Such methods are known as DVH-based NTCP models (Troeller et al., 2015). Two types of features extracted from the DVH are commonly used in the NTCP models due to their clear meaning to facilitate clinical interpretation: and is the fractional volume of the organ that receives a dose level greater than or equal to Gy, where ranges from the lowest dose to the highest dose of an RT protocol with a pre-specified bin size. This results in a collection of ’s as the feature set, is the dose level above which volume of the normal organ falls, where can range from a low to a high percentage, resulting in a collection of ’s as the feature set. In this paper, and are referred to as “dosimetric variables”. Please see Fig. 1 for the workflow of building a DVH-based NTCP model.
Figure 1.

Workflow of building a DVH-based NTCP model using rectal complication prediction of prostate cancer patients as an example.
The simplest DVH-based NTCP model is univariate analysis, which links the clinical endpoint of toxicity/complication with one dosimetric variable at a time. Depending on the type of the endpoint variable, the univariate method can be to compute a Pearson correlation or fit a simple regression if the endpoint is numerical; or to perform hypothesis testing or fit a simple logistic regression to compare two groups with and without the development of the complication (i.e., for a binary endpoint variable) (Robertson et al., 2010). After performing the univariate analysis for each dosimetric variable, the significance of the association with the clinical endpoint can be assessed using a -value.
The advantage of univariate analysis is its simplicity. On the other hand, the drawbacks of univariate analysis are also obvious. First, since there are many dosimetric variables and thereby many univariate tests performed, the -value must be corrected using multiple test correction methods to control the False Discovery Rate. However, because there are many -value to correct for, univariate analysis has a substantial risk of not finding any dosimetric variable to be significantly associated with the clinical endpoint of toxicity. Second, the inherent assumption of univariate analysis that the dosimetric variables take effect independently is not true in reality. Dosimetric variables are strongly correlated because of the physics of dose deposition in clinically realistic dose delivery methods. The strength of the correlations depends on the disease site and the treatment modality being used. Using univariate analysis without considering correlations can lead to results that are both not generalizable and inconsistent among different studies (Zhang et al., 2019). It is thus critical to develop generalizable statistical analysis methods which account for correlations and consider that the dosimetric variables or a subset of them contribute to the toxicity endpoint jointly. This stresses the need for multivariate analysis.
In theory, any multivariate classification algorithms or predictive models can be used to link dosmetric variables with the toxicity endpoint (Buettner et al., 2011; Rossi et al., 2018). Machine learning algorithms such as support vector machines (SVM), random forest, and neural networks have been used. For example, Chen et al. (2007) studied lung radiation-induced pneumonitis by training two SVM models. Ospina et al. (2014) used the random forest to model the probability of rectal toxicity in patients with prostate cancer. Ibragimov et al. (2018) used convolutional neural networks (CNNs) trained on 3D CT images. Interested readers can refer to a recent review paper on machine learning methods for the prediction of toxicity outcomes in radiotherapy (Isaksson et al., 2020).
Compared to the aforementioned machine learning methods, multivariate regression-based methods such as Generalized Linear Model (GLM) have some advantages. GLM is a white-box model and easy to interpret. The coefficients of GLM represent the impacts of explanatory variables on the response variable, with signs and -value of the coefficients revealing the directions and significances of the impacts, respectively. While some black-box machine learning models may provide a reasonable predictor, they may not be as interpretable as GLM, e.g., it can be difficult to assess the impacts of explanatory variables with directions and statistical significances. Furthermore, GLM is quite flexible in the sense that it can handle different types of clinical endpoints, such as numerical, binary, survival data, by using different link functions to connect the explanatory variables with the response/endpoint.
A significant challenge for building a GLM-based NTCP model is the high dimensionality of dosimetric variables. Not all dosimetric variables are useful in predicting the clinical endpoint. Some existing work conducted a multivariate analysis after selecting the most relevant factors (Beetz et al., 2012). However, only the mean dose in OAR rather than the whole dose distribution was considered in the analysis. This situation is further complicated by the limited sample sizes of clinical datasets. Therefore, it is important for the GLM model to have the capacity of selecting a subset of important dosimetric variables. Conventional approaches include forward selection, backward selection, and stepwise selection (Mavroidis et al., 2018). More modern approaches, a.k.a. the variable selection techniques, include lasso, elastic net, and fused lasso. A typical form of these methods includes a loss function to be minimized, subject to a penalty on the regression coefficients to allow for the selection of an important subset of variables simultaneously. Specifically, consider the predictors to be dosimetric variables such as on the OAR. Let denote the response variable which is a clinical endpoint for evaluating the toxicity. In GLM, is assumed to follow an exponential family distribution with mean . A link function is used to link with , i.e., . Depending on the type of the endpoint (e.g., numerical, binary, survival data), different link functions are used in the GLM. For example, when is binary, a logistic link function is used and the GLM becomes a logistic regression. When is survival data, the GLM becomes a proportional hazards model. To estimate the coefficients of a GLM, variable selection techniques typically solve the following optimization:
| (1) |
where the first term is a loss function such as the negative log-likelihood of the coefficients, the second term is a penalty on the coefficients (the intercept is typically not penalized), and is a tuning parameter for controlling the tradeoff. Different forms of the penalty function have been proposed, such as the penalty used by lasso (Tibshirani, 1996), penalty used by elastic net (Zou & Hastie, 2005), and penalty used by fused lasso (Tibshirani et al., 2005).
However, the existing variable selection techniques are data-driven and they do not integrate medical domain knowledge in the model formulations. As a result, they may produce counter-intuitive results. In this paper, we propose a knowledge-constrained generalized linear model (KC-GLM). KC-GLM includes a new mathematical formulation to translate three pieces of domain knowledge into non-negativity, monotonicity, and adjacent similarity constraints on the model coefficients. We further propose an equivalent transformation of the original formulation, which makes it possible to solve the model coefficients using existing optimization solvers. Furthermore, we apply KC-GLM to a simulation dataset and on two real datasets of prostate cancer and lung cancer, respectively, and demonstrate its performance in comparison with alternative approaches.
The rest of this paper is organized as follows. Section 2 presents the development of KC-GLM, the application datasets, and how KC-GLM and competing methods are applied to the datasets. Section 3 shows the results of the application and comparison. Section 4 discusses the medical implication of the results and the limitations of this study. Section 5 is the conclusion.
2. Materials and methods
2.1. Development of the proposed KC-GLM
To develop the KC-GLM, we first need to identify the medical domain knowledge that can be integrated. We focus on the case when the dosimetric variables included in the model are , where and are the lowest and highest doses of an RT protocol. That is, our model includes as multivariate predictors , while a similar formulation can be developed for other dosimetric variables such as the ’s.
Specifically, there are three pieces of domain knowledge to be integrated into the KC-GLM formulation:
Non-negativity. Increasing any , is associated with an increase or no impact for the risk of toxicity, but must not decrease the risk. This is intuitive and based on simple biology, i.e., when increasing the fractional volume of the normal organ receiving dose, this will only pose a higher risk (or at minimum no significant risk) to induce radiation toxicity for the normal organ, but it is impossible to lower the risk. This domain knowledge translates to a non-negative constraint on the coefficients of the proposed KC-GLM, i.e., .
Monotonicity. The greater the dose , the higher risk of toxicity posed by the corresponding . For example, compare and . Increasing , i.e., the fractional volume of the normal organ receiving dose, should pose a higher risk of toxicity than increasing , or at minimum no significantly different risk for increasing compared with . However, a lower risk posed by than is counter-intuitive and violates biological principles. To prohibit this from happening, we need to impose a constraint on the coefficients corresponding to and that . A general form of this constraint is the following: for any two dose levels , with , let and be the coefficients for and , respectively. Then, . Note that the ‘=’ sign is included to avoid the strong assumption of a strictly-increasing risk.
Adjacent similarity. Finally, due to the definition of , the ’s with adjacent dose levels are highly correlated. This means that the coefficients of these ’s should be similar.
To mathematically incorporate the aforementioned knowledge constraints, we propose the following formulation for KC-GLM:
| (2) |
subject to: .
For notation simplicity, we assume . The constraint in the above optimization obeys the knowledge of non-negativity and monotonicity. In the objective function of the optimization, the first penalty, , is to help variable selection. The second penalty, , is to encourage adjacent similarity. The objective function is similar to fused lasso (Tibshirani et al., 2005), but KC-GLM additionally imposes a constraint to make sure the knowledge of non-negativity and monotonicity is integrated.
2.2. Proposed algorithm to solve KC-GLM
The challenge in solving the KC-GLM optimization in Eq. (2) is how to produce estimates for that obey the monotonic constraint for these coefficients. To resolve this challenge, we propose the following procedure. First, we define a set of variables to represent the increments between adjacent regression coefficients, i.e.,
Through some algebra, we can express as functions of , i.e.,
| (3) |
Then, we can convert the optimization in Eq. (2) with respect to to an optimization with respect to . Specifically, to convert the log-likelihood function to a function with respective to , we note that is a function of the linear predictor
Inserting Eq. (3) into the linear predictor, we can get
This converts the log-likelihood function in Eq. (2) to a new function with as the parameters and as the data. Denote this new log-likelihood function by . Furthermore, we can convert the two penalty terms in Eq. (2) into a penalty with respect to , i.e.,
Finally, let and . The KC-GLM optimization in Eq. (2) can be written as an optimization with respect to , i.e.,
| (4) |
This optimization can be solved by the “penalized” function in the R package (Goeman, 2018). Once we obtain the solutions for , Eq. (3) can be used to obtain , which are the solutions for the original KC-GLM problem in Eq. (2).
2.3. Variable selection techniques in regression
KC-GLM is closely related to variable selection techniques in regression, such as lasso (Tibshirani, 1996), elastic net (Zou & Hastie, 2005), and fused lasso (Tibshirani et al., 2005). In this section, we briefly introduce these existing methods and point out the difference between KC-GLM and these methods. For notation simplicity, we assume the intercept in presenting each method in the following.
Lasso:
This model solves the following -penalized formulation:
where is a loss function such as the negative log-likelihood function. Compared with KC-GLM, lasso includes only one penalty help variable selection, and it does not integrate any domain knowledge.
Elastic net:
Lasso works well when there is little correlation between the predictors, but its performance is suboptimal when some predictors are highly correlated (Zou & Hastie, 2005). In our application, the dosimetric variables such as the ’s with adjacent dose levels have a high correlation. This is a natural result from the definition of . To consider the correlation, elastic net may be a better choice, which uses an -penalized formulation to make it possible to select groups of correlated variables, i.e.,
where is the penalty.
Fused lasso:
This model addresses the situation when the predictors are highly correlated and have a natural order (Tibshirani et al., 2005), and aims to select successive predictors as groups. This seems to be a more appropriate model than elastic net for our problem to honor the knowledge of adjacent similarity. Fused lasso imposes two penalties on the coefficients, i.e.,
This formulation is similar to KC-GLM, but without the constraint to integrate the knowledge of non-negativity and monotonicity.
2.4. Datasets and experiments
We include three datasets in this paper: a simulation dataset, a prostate cancer dataset, and a lung cancer dataset.
2.4.1. Simulation data generation and experiments
We first generate simulation data for is the fractional volume of the OAR that receives a dose level greater than or equal to Gy. To make the data resemble real data characteristics, we refer to the prostate cancer dataset and choose and that cover the range of dose for the RT protocol, and then generate dosimetric variables apart with each other, i.e., . Recall that in generating the data for , we consider the fact that closer variables in this sequence have a high correlation. Thus, for each patient, we sample from a multivariate normal distribution, , where is an 80 × 80 matrix with on the -th row and -th column.
Furthermore, given the dosimetric variables generated for each patient, , we combine them into a linear predictor using respective coefficients, , i.e., . We set the coefficients to be a step function with , and . The reason for setting the coefficients up to a certain dose to be zero is based on clinical experience that normal tissue can compensate for cell depletion due to RT up to a threshold cell survival fraction. The two levels of non-zero coefficients (0.1 and 0.5) are to reflect commonly observed clinical scenarios that there may be multiple distinct sub-types for a toxicity of interest (e.g., one mild, one more severe) with different sensitivity to cell survival faction.
Then, considering a binary response variable/endpoint , i.e., or 0 corresponds to whether or not a patient develops a certain complication for the OAR, a logistic link function is used to link the linear predictor with the probability of . Finally, we generate the binary for each patient by sampling a Bernoulli distribution with parameter . The proportion of patients with toxicity is assumed to be 50%.
Following the aforementioned procedure, we generate a training dataset of 80 patients. This is a similar sample size as the clinical datasets used in the following sections. We also generate a validation set of 20 patients and a test set of 50 patients. The validation set is used to determine the tuning parameters in each model by maximizing the Area Under the Curve (AUC). The test set is used to report the generalization performance of the four models.
We repeat the data generation process for 200 times, which allows us to access and compare model performances with respect to randomness in the dataset. We report several performance metrics: (1) mean absolute error (MAE) between the true coefficients and estimated coefficients; (2) AUC on the validation set and the test set; (3) patterns of the detected zero and non-zero coefficients against medical knowledge/intuition.
2.4.2. Prostate cancer data and experiments
Dataset description: This dataset includes 79 patients with prostate cancer from Mayo Clinic Arizona (MCA). All patients received intensity-modulated radiation therapy (IMRT). The detailed protocol of the IMRT can be found in a previous publication on this dataset (Liu et al., 2019). The endpoint variable to be predicted is grade 2+ acute rectal complication with symptoms including anal pain, diarrhea, and rectal obstruction. Among the 79 patients, 16 developed the complication.
DVH feature extraction: The OAR in this study is rectum. The rectum of each patient was drawn as a whole organ bounded by ischial tuberosity inferiorly and sigmoid flexure superiorly. Then, ’s were extracted from the rectal DVH, with and that cover the range of the RT protocol. Between and , we extracted “dense features” which are ’s 1 Gy apart and “sparse features” which are ’s 5 Gy apart. We want to compare the different methods for these two sets of features.
Application of KC-GLM and other methods: We apply each method to predict the probability of grade 2+ acute rectal complication (a binary outcome) for each patient using the DVH features and patient covariates of Prostate Specific Antigen (PSA) level and medication use of statin. These patient covariates have been previously found to be related to this complication (Liu et al., 2019). The coefficients corresponding to covariates are not penalized but only those for the DVH features. We apply each method to the dataset with the tuning parameters of each method selected by maximizing the leave-one-out cross-validation (LOOCV) AUC. We compare the patterns of the coefficient estimates by each method against medical knowledge/intuition. We also report the LOOCV AUC.
2.4.3. Lung cancer data and experiments
Dataset description: This dataset includes 134 patients with stage III non-small-cell lung cancer (NSCLC) from MCA. All the patients received RT delivered using the involved-field technique and either 3-dimensional conformal RT (3D-CRT) or IMRT. The detailed protocol can be found in a previous publication on this dataset (Liu et al., 2020). It is well-known that lung cancer patients under RT may be at risk of radiation spill to their heart, leading to reduction of the overall survival (OS), known as cardiac toxicity (Wang et al., 2017). Thus, we focus on predicting OS as the outcome variable in this study. Among the 134 patients, the median follow-up time is 1.49 years (range: 0.13–6.77) with 53/81 patients being alive/deceased at the time of the last follow-up.
DVH feature extraction: For each patient, the OAR is the heart. The whole heart was contoured for each patient on the radiation image using the Eclipse treatment planning system (Varian, Inc). Then, ’s were extracted from the heart DVH, with and . Between and , we extracted dense features and sparse features in the same way as the prostate cancer data.
Application of KC-GLM and other methods: We apply each method to predict the OS for each patient using the DVH features and patient covariates of age, stage of lung cancer, and use of chemotherapy. These patient covariates have been previously found to be related to OS (Liu et al., 2020). The coefficients corresponding to covariates are not penalized but only those for the DVH features. Also, since the clinical endpoint is survival data, the likelihood function in the optimization formulation of each method is written in the same way as that in a proportional hazards model (Wennberg et al., 2011). We apply each method to the dataset with tuning parameters of each method selected by maximizing the LOOCV partial log-likelihood (Dai & Breheny, 2019). We compare the patterns of the coefficient estimates by each method against medical knowledge/intuition.
3. Results
3.1. Results from the simulation dataset
We apply KC-GLM and three competing methods (lasso, elastic net, and fused lasso) to the simulation dataset generated according to the procedures described in Section 2.3(A), which consists of 100 patients. The data of 80 patients (i.e., the training set) is used for model training with the tuning parameter(s) of the model selected by maximizing the AUC on the remaining 20 patients (i.e., the validation set). After this training process is done, we get the estimated coefficients for each model, which are compared with the true coefficients to compute the mean absolute error (MAE). We repeat the experiment 200 times so that we can get the distribution of MAE for each method, which allows us to construct a bar chart to compare different methods. Figure 2 shows the bar chart of MAE for each method. A two-sample t test is used to compare KC-GLM and each competing method for MAE based on the results from 200 repeated experiments. Since there are three competing methods, three t tests are performed, all having -value less than 0.0001. After correction for multiple comparisons, the adjusted -value are at most 0.0003, which indicates that KC-GLM has a significantly lower MAE than the three competing methods.
Figure 2.

MAE between the true and estimated coefficients on a simulation dataset of 100 patients. The simulation is repeated 200 times to get the distribution of MAE for each method, which is represented by a bar chart. KC-GLM has a significantly lower MAE than all others with adjusted after correction for multiple comparisons.
Furthermore, we compare the AUC of the four methods on the validation set (Figure 3 left bar chart). We also apply the trained model on a blind/unseen test set of 50 patients and report the AUC on the test set (Figure 4 right bar chart). It can be seen that the AUCs on the validation set are not significantly different between the four methods. However, the AUC of KC-GLM on the test set is significantly higher than each competing method based on a two-sample t test ( for all three t tests). After correction for multiple comparisons, the adjusted -value are at most 0.03, which indicates that KC-GLM has a significantly higher test AUC than the three competing methods. This implies that KC-GLM has a better generalization capability on unseen data.
Figure 3.

AUC on the validation set of 20 patients (left) and on a unseen test set of 50 patients (right) under four methods. The simulation is repeated 200 times to get the distribution of MAE for each method, which is represented by a bar chart. KC-GLM has a significantly higher AUC than all others on the test set with adjusted after correction for multiple comparisons.
Figure 4.

Zero/non-zero coefficients estimated by KC-GLM (red dash) and fused lasso (blue dash) compared with true coefficients (black solid) in three simulation runs.
Additionally, using KC-GLM, 168 out of the 200 repeated runs of the experiment detected the coefficient step at 51 Gy within ±5Gy error margin. 197 out of the 200 runs detected the coefficient step at 76 Gy within ±5Gy error margin. Among the three competing methods, fused lasso can produce step-like coefficients, but the coefficients may be negative or/and non-monotonic (see the middle and right plots in Fig. 4), which are counter-intuitive. Lasso and elastic net fail to detect the step-like coefficients.
Finally, we show in Fig. 4 the detected zero/non-zero coefficients by KC-GLM in comparison with the true coefficients for an example of three runs. As fused lasso has the best MAE and AUC performance among the three competing methods, we also add the detected zero/non-zero coefficients by fused lasso in Fig. 4. Except for the first run, KC-GLM detects the non-zero coefficients including their magnitudes more consistently with the true coefficients than fuze lasso. Because fused lasso does not consider the knowledge of non-negativity and monotonicity like KC-GLM, it identifies some coefficients to be negative and some coefficients corresponding to a higher dose to be smaller than those corresponding to a lower dose, which is counter-intuitive.
3.2. Results from the prostate cancer dataset
Figure 5 shows the coefficient estimates from the four methods under the dense and sparse feature settings. The observations are summarized as follows:
Figure 5.

Coefficient estimates by the four methods under (a) the dense feature setting and (b) the sparse feature setting.
There is a difference in the variable selection result by each method. First focus on the results from the dense feature experiment (Fig. 5(a)): Lasso tends to select individual variables. Specifically, the variables with non-zero coefficients from lasso are , and . In general, when there is a high correlation between variables, lasso is known not to work very well by selecting one variable and downplaying the significance of the other variables in the highly correlated group (Zou & Hastie, 2005). Elastic net works better by selecting three groups of highly correlated variables. Fused lasso selects similar groups of variables but through a different mechanism than the elastic net. The additional penalty on steps used in fused lasso encourages adjacent variables in the ordered set to have the same coefficient. This produces the “flat” look of the coefficients within each group. In general, the three competing methods select variables at similar segments of the entire range of the ’s. In contrast, the variables selected by KC-GLM have a very different pattern, i.e., a “staircase” type of pattern. There is a step at 35 Gy, meaning that increasing the fractional volume of the rectum receiving higher than 35 Gy will increase the probability of developing grade 2+ rectal complication.
Comparing the pattern of the variable selection by KC-GLM with the other three methods, the other methods produce hard-to-interpret and counter-intuitive results in two aspects: 1) Some selected variables have negative coefficients. For example, is selected by lasso with a coefficient of −1.11. This means that an increase of is associated with a decrease in the probability of developing grade 2+ rectal complication. Irradiating more fractional volume of the rectum even with a relatively lower dose should not alleviate the complication; the best hope would be not to induce significant complication. A negative coefficient is counterintuitive. Both elastic net and fused lasso also produce negative coefficients. 2) In the lasso model, Variables corresponding to a range of relatively high dose, e.g., have zero coefficients. This means that changes in any of these variables do not incur any change in the probability of complication. This is misleading because it is equivalent to saying that increasing the fractional volume of the rectum receiving higher than 40 Gy dose (or any other dose level in the range) will not increase the complication probability. In contrast, the pattern of the variable selection by KC-GLM makes more sense.
Furthermore, when we examine the results from the sparse feature experiment (Fig. 5(b)), KC-GLM selects the same group of variables: . This pattern of variable selection by KC-GLM makes more sense, while the patterns by the three competing methods are counter-intuitive for the same reasons as the dense feature experiment discussed previously.
Finally, the LOOCV AUCs for the four models are reported in Table 1. The highest LOOCV AUCs are highlighted in bold. We find that the LOOCV AUC for KC-GLM is high: 0.82 for the dense experiment and 0.8 for the sparse experiment. The AUCs by the competing methods are lower (0.75–0.78). Also, as discussed previously, the competing methods fail to select the dosimetric variables in a clinically meaningful way.
Table 1.
LOOCV AUC for four different models and two experiments.
| Lasso | Elastic net | Fused Lasso | KC-GLM | |
|---|---|---|---|---|
| Dense experiment | 0.78 | 0.76 | 0.76 | 0.82 |
| Sparse experiment | 0.78 | 0.75 | 0.77 | 0.8 |
Compared with the existing works as summarized by a recent review paper (Isaksson et al., 2020), the AUCs of different ML algorithms for classification of binary toxicity outcomes after radiotherapy range from 0.58 to 0.88. Black-box methods do not necessarily produce a higher AUC than white-box methods such as generalized linear model. Focusing on prostate cancer, random forest achieved an AUC of 0.7; artificial neural networks achieved AUCs from 0.7 to 0.78; generalized linear model achieved AUCs from 0.58 to 0.73; a better AUC of 0.83 was achieved by independent component analysis. Our method has a competitive AUC performance and meanwhile it produces interpretable results for the coefficients of the DVH variables.
3.3. Results from the lung cancer dataset
Figure 6 shows the coefficient estimates from the four methods under the dense and sparse feature settings. The observations are summarized as follows:
Figure 6.

Coefficient estimates by the four methods under (a) the dense feature setting and (b) the sparse feature setting.
First focus on the dense feature experiment (Fig. 6(a)). Lasso selects three nonzero variables, and , while the coefficient of is negative, which is again counterintuitive. Elastic net and fused lasso both select three groups of variables, , median dose, and high dose and high dose. The pattern is similar to the result of lasso, but elastic net and fused lasso tend to select highly correlated variables. In contrast, the variables selected by KC-GLM have a “staircase” type of pattern with a stair between and .
The variable selection patterns by lasso, elastic net, and fused lasso are hard to interpret in comparison to the result by KC-GLM. Specifically, while all three competing methods select variables around , none of them selects any variable in the higher dose range of . This means that changes in any variable within the high dose range do not incur any change in the probability of complication, which is misleading. In contrast, the pattern of the variable selection by KC-GLM makes more sense. KC-GLM recognizes one group of variables within the high dose range of to affect the probability of complication in a positive way.
Furthermore, when we examine the results from the sparse feature experiment (Fig. 6(b)), we find that the coefficient of is negative for lasso, elastic net, and fused lasso. This result is counter-intuitive for the same reason as the dense feature experiment discussed previously. Fused lasso and KC-GLM select all the variables within the high dose range of . This result makes more sense.
Finally, in Fig. 7 we show the survival curves predicted by all four models under the dense feature experiment for a patient who died 1.2 years post-therapy. The survival curves predicted by the three competing models are similar because these models select similar regions of dosimetric variables (Fig. 6(a)). Compared with the predictions by the competing methods, the predicted survival probability at the time of death by KC-GLM is a lot lower, and this probability drops a lot faster after the time of death. The result demonstrates that KC-GLM generates a better prediction for this patient. Upon checking the predicted survival curves for all patients, we found the following advantage of KC-GLM. Among the 81 deceased patients, seven have a significantly lower survival curve by KC-GLM than the other methods (similar to the pattern in Fig. 7), whereas the survival curves of the remaining patients are similar by the four methods.
Figure 7.

Predicted survival probability curves by four methods for a deceased patient.
4. Discussion
Automated treatment planning and computer-controlled treatment delivery form the backbone of contemporary RT. In general doses which are large enough to assure the local control of the tumor are also large enough to cause significant damage to normal organs at risk that are immediately adjacent to the tumor. One copes with this limitation of RT by shaping dose distributions to conform to the target while sparing the adjacent tissue. This limitation requires an inverse optimization algorithm that derives the optimum dose distribution.
A practical use of an inverse optimization algorithm requires tools which can be used by the treatment planner to communicate optimization goals to the algorithm. DVH variables which were described in this paper are the tool to the optimizer of the planner. Currently available optimizers accept a set of limits, upper or lower, on DVH variables. The values of DVH limits are derived from clinical studies that correlate DVH variables to observed clinical outcomes. The DVH constraint, used in future optimization, was derived by setting an acceptable limit on the likelihood of clinical toxicity, as predicted by the NTCP model.
When searching for the most predictive DVH variable for a particular clinical outcome, the commonsense approach to such a search is to let the computer “scan” the space of DVH variables, one at a time, looking for the most predictive one. This univariate approach has some limitations. A significant limitation is the lack of significance due to the need for correction for multiple comparisons (i.e., FDR control), which would make it nearly impossible to reach statistical significance for any DVH variable in a study.
A multivariate model that combines the DVH variables into a single predictor for the toxicity outcome is more appropriate. However, as demonstrated in this paper, they may be prone to generating counter-intuitive, hard-to-interpret results due to the pure data-driven nature of these methods. To overcome this limitation, our proposed KC-GLM model integrates medical domain knowledge into the data-driven model formulation, which warrants better interpretability, better consistency with knowledge constraints, and better generalization on unseen data. In the simulation study and two case studies, the results by KC-GLM demonstrated compliance to the assumptions of nonnegativity and monotonicity, whereas the competing methods of lasso, elastic net, and fused lasso failed to produce results that comply with these assumptions. Furthermore, we performed additional experiments to find out if adding two-way interaction effects of dosimetric variables to the competing methods may help produce more meaningful results. Specifically, to accommodate the added interactions, some adjustments are needed for each method. For lasso, the penalty is imposed on all regression coefficients including those for the main effects and interaction effects. Similarly, the penalty for elastic net is imposed on all regression coefficients. Fused lasso has two penalties, one for the regression coefficients and the other for the differences of regression coefficients of successive predictors given that the predictors have a natural order. The first penalty is imposed on all regression coefficients. The second penalty is only imposed on the coefficients of main effects because there is no natural order for the interaction effects. We applied these methods to the prostate cancer and lung cancer datasets, focusing on the sparse feature experiment for each dataset. Detailed results can be found in Supplementary Materials. We found that adding interaction effects in the competing methods helps alleviate the risk of generating negative coefficients for the dosimetric variables to some extent. However, such a strategy still cannot prevent the risk of generating zero coefficients at the high dose range. Both risks can be effectively prevented by KC-GLM due to the knowledge-informed constraints of non-negativity and monotonicity.
KC-GLM is built upon a basic, biophysics-driven notation that there exists an underlying dose threshold beyond which dose-volume relationship becomes predictive for clinical toxicity, and there could be multiple thresholds for toxicities of increasing severity. The model can detect such thresholds through an increase in model coefficients. The result of applying KC-GLM to a particular clinical dataset can shed some light on how to set constraints on DVH variables (i.e., which variables should be constrained and how much to constrain) in order to reduce the probability of radiation toxicity. One promising future direction of this work may be to develop optimization algorithms that can take the output from our KC-GLM and integrate it as constraints into a treatment planning optimization algorithm for each patient.
There are several limitations of this study. First, except for the simulation study, the two clinical datasets have limited sample sizes. To assess impact of the sample sizes, we performed a retrospective power analysis (Green & MacLeod, 2016; Kumle et al., 2021; Zhang et al., 2019) and estimated the powers of the prostate cancer and lung cancer case studies to be 0.75 and 0.83, respectively. Detailed steps of the power analysis can be found in Supplementary Materials. Future research may include more data to improve the power, and to evaluate the generalization capability of KC-GLM on unseen data. Second, the lung cancer study was based on combined datasets under 3D-CRT and IMRT. We converted the dose under the different treatment protocols to 2 Gy equivalent dose to keep the same standard. This conversion is recommended by the QUANTEC committee that has been historically setting the standards for clinical outcome studies. This conversion is necessary if we want to compare treatment regimens delivered with different fractionation schedules. Our results did not show significant impact from the potential data inhomogeneity from the different protocols. Even so, we plan to further evaluate our result based on larger datasets under different treatment protocols, which would help assess the robustness of the proposed method and generate insight for clinical practice. Third, the development of KC-GLM is based on ’s extracted from DVH. As mentioned in the Introduction, other DVH variables such as ’s are also commonly used in NTCP models. Extension of KC-GLM to other DVH variables can be done. Fourth, the best method for choosing the maximum dose which is considered in the model remains uncertain. Maximum dose should be set at the dose beyond which DVH data becomes sparse but a more precise definition of the maximum dose limit is left to future work. Fifth, KC-GLM cannot be used directly in commercially available treatment optimization algorithms. One must use approximate constraints in the process of optimization and use KC-GLM for the estimate of NTCP in the final plan. Addressing this limitation will require significant new developments in treatment optimization algorithms. Finally, we want to emphasize that KC-GLM was developed as an NTCP model for radiation therapy. Validity of the proposed knowledge-driven constraints in KC-GLM was verified by field experts (i.e., medical physicists in radiation oncology). Adoption of KC-GLM to other application domains needs be cautious and requires field expertise from the specific domain to justify the use. Proper adjustment of the model may be needed before implementation.
5. Conclusion and future work
We developed a KC-GLM model for radiotherapy toxicity prediction by integrating medical domain knowledge into the data-driven model formulation. KC-GLM shows better performance than commonly used variable selection techniques in simulation experiments and two real-world clinical datasets. The clinical implication and utility of KC-GLM were discussed. Limitations and future research directions were pointed out.
Supplementary Material
Funding
Funding support for this work was provided by the Department of Radiation Oncology at Mayo Clinic Arizona with additional assistance from the North Central Cancer Treatment Group. JL acknowledges support of research time from NSF DMS-2053170. SES acknowledges support for his research time from the North Central Cancer Treatment Group and Mayo Clinic with funding from the Public Health Service (CA-25224, CA-37404, CA-35267, CA-35431, CA-35195, CA-63848, CA-63849, CA-35113, CA-35103, CA-35415, CA-35101, CA-35119, CA-35090).
Role of the funder
The content is solely the responsibility of the authors and does not necessarily represent the views of the National Cancer Institute or the National Institute of Health.
Footnotes
Supplemental data for this article can be accessed online at https://doi.org/10.1080/24725579.2023.2227199.
Consent and approval statement
The study has been approved by a local institutional review board (IRB).
Disclosure statement of interest
The authors report no conflict of interest.
References
- Beetz I, Schilstra C, Burlage FR, Koken PW, Doornaert P, Bijl HP, Chouvalova O, Leemans CR, De Bock GH, Christianen MEMC, Van Der Laan BFAM, Vissink A, Steenbakkers RJHM, & Langendijk JA (2012). Development of NTCP models for head and neck cancer patients treated with three-dimensional conformal radiotherapy for xerostomia and sticky saliva: the role of dosimetric and clinical factors. Radiotherapy and Oncology: Journal of the European Society for Therapeutic Radiology and Oncology, 105(1), 86–93. 10.1016/j.radonc.2011.05.010 21632133 [DOI] [PubMed] [Google Scholar]
- Brodin NP, Kabarriti R, Garg MK, Guha C, & Tomé WA (2018). Systematic review of normal tissue complication models relevant to standard fractionation radiation therapy of the head and neck region published after the QUANTEC reports. International Journal of Radiation Oncology, Biology, Physics, 100(2), 391–407. 10.1016/j.ijrobp.2017.09.041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buettner F, Gulliford SL, Webb S, & Partridge M (2011). Modeling late rectal toxicities based on a parameterized representation of the 3D dose distribution. Physics in Medicine and Biology, 56(7), 2103–2118. 10.1088/0031-9155/56/7/013 [DOI] [PubMed] [Google Scholar]
- Chen S, Zhou S, Yin FF, Marks LB, & Das SK (2007). Investigation of the support vector machine algorithm to predict lung radiation-induced pneumonitis. Medical Physics, 34(10), 3808–3814. 10.1118/1.2776669 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai B, & Breheny PJ (2019). Cross validation approaches for penalized Cox regression. arXiv preprint arXiv: 1905.10432. [DOI] [PubMed] [Google Scholar]
- Goeman JJ (2018). L1 penalized estimation in the Cox proportional hazards model. Biometrical Journal, 52(1), 70–84. [DOI] [PubMed] [Google Scholar]
- Green P, & MacLeod CJ (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498. 10.1111/2041-210X.12504 [DOI] [Google Scholar]
- Holyoake D, Aznar M, Mukherjee S, Partridge M, & Hawkins MA (2017). Modelling duodenum radiotherapy toxicity using cohort dose-volume-histogram data. Radiotherapy and Oncology: Journal of the European Society for Therapeutic Radiology and Oncology, 123(3), 431–437. 10.1016/j.radonc.2017.04.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ibragimov B, Toesca D, Chang D, Yuan Y, Koong A, & Xing L (2018). Development of deep neural network for individualized hepatobiliary toxicity prediction after liver SBRT. Medical Physics, 45(10), 4763–4774. 10.1002/mp.13122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Isaksson LJ, Pepa M, Zaffaroni M, Marvaso G, Alterio D, Volpe S, Corrao G, Augugliaro M, Starzyńska A, Leonardi MC, Orecchia R, & Jereczek-Fossa BA (2020). Machine learning-based models for prediction of toxicity outcomes in radiotherapy. Frontiers in Oncology, 10, 790. 10.3389/fonc.2020.00790 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumle L, Võ MLH, & Draschkow D (2021). Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behavior Research Methods, 53(6), 2528–2543. 10.3758/s13428-021-01546-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Fatyga M, Schild ES, & Li J (2020). Detecting spatial susceptibility to cardiac toxicity of radiation therapy for lung cancer. IISE Transactions on Healthcare Systems Engineering, 10(4), 243–250. 10.1080/24725579.2020.1795012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Fatyga M, Wu T, & Li J (2019). Integration of biological and statistical models toward personalized radiation therapy of cancer. IISE Transactions, 51(3), 311–321. 10.1080/24725854.2018.1486054 [DOI] [Google Scholar]
- Mavroidis P, Grimm J, Cengiz M, Das S, Tan X, Yazici G, & Ozyigit G (2018). Fitting NTCP models to SBRT dose and carotid blowout syndrome data. Medical Physics, 45(10), 4754–4762. 10.1002/mp.13121 [DOI] [PubMed] [Google Scholar]
- Ospina JD, Zhu J, Chira C, Bossi A, Delobel JB, Beckendorf V, Dubray B, Lagrange J-L, Correa JC, Simon A, Acosta O, & de Crevoisier R (2014). Random forests to predict rectal toxicity following prostate cancer radiation therapy. International Journal of Radiation Oncology, Biology, Physics, 89(5), 1024–1031. 10.1016/j.ijrobp.2014.04.027 [DOI] [PubMed] [Google Scholar]
- Robertson JM, Söhn M, & Yan D (2010). Predicting grade 3 acute diarrhea during radiation therapy for rectal cancer using a cutoff-dose logistic regression normal tissue complication probability model. International Journal of Radiation Oncology, Biology, Physics, 77(1), 66–72. 10.1016/j.ijrobp.2009.04.048 [DOI] [PubMed] [Google Scholar]
- Rossi L, Bijman R, Schillemans W, Aluwini S, Cavedon C, Witte M, Incrocci L, & Heijmen B (2018). Texture analysis of 3D dose distributions for predictive modelling of toxicity rates in radiotherapy. Radiotherapy and Oncology: Journal of the European Society for Therapeutic Radiology and Oncology, 129(3), 548–553. 10.1016/j.radonc.2018.07.027 [DOI] [PubMed] [Google Scholar]
- Tibshirani R (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. 10.1111/j.2517-6161.1996.tb02080.x [DOI] [Google Scholar]
- Tibshirani R, Saunders M, Rosset S, Zhu J, & Knight K (2005). Sparsity and smoothness via the fused LASSO. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(1), 91–108. 10.1111/j.1467-9868.2005.00490.x [DOI] [Google Scholar]
- Troeller A, Yan D, Marina O, Schulze D, Alber M, Parodi K, Belka C, & Söhn M (2015). Comparison and limitations of DVH-based NTCP models derived from 3D-CRT and IMRT data for prediction of gastrointestinal toxicities in prostate cancer patients by using propensity score matched pair analysis. International Journal of Radiation Oncology, Biology, Physics, 91(2), 435–443. 10.1016/j.ijrobp.2014.09.046 [DOI] [PubMed] [Google Scholar]
- Wang K, Eblan MJ, Deal AM, Lipner M, Zagar TM, Wang Y, Mavroidis P, Lee CB, Jensen BC, Rosenman JG, Socinski MA, Stinchcombe TE, & Marks LB (2017). Cardiac toxicity after radiotherapy for stage III Non-Small-Cell Lung Cancer: Pooled Analysis of Dose-Escalation Trials Delivering 70 to 90 Gy. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 35(13), 1387–1394. 10.1200/JCO.2016.70.0229 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wennberg BM, Baumann P, Gagliardi G, Nyman J, Drugge N, Hoyer M, Traberg A, Nilsson K, Morhed E, Ekberg L, Wittgren L, Lund JÅ, Levin N, Sederholm C, Lewensohn R, & Lax I (2011). NTCP modelling of lung toxicity after SBRT comparing the universal survival curve and the linear quadratic model for fractionation correction. Acta Oncologica (Stockholm, Sweden), 50(4), 518–527. 10.3109/0284186X.2010.543695 [DOI] [PubMed] [Google Scholar]
- Zhang TW, Snir J, Boldt RG, Rodrigues GB, Louie AV, Gaede S, McGarry RC, Urbanic JJ, Daly ME, & Palma DA (2019). Is the importance of heart dose overstated in the treatment of non-small cell lung cancer? A systematic review of the literature. International Journal of Radiation Oncology, Biology, Physics, 104(3), 582–589. 10.1016/j.ijrobp.2018.12.044 [DOI] [PubMed] [Google Scholar]
- Zhang Y, Hedo R, Rivera A, Rull R, Richardson S, & Tu XM (2019). Post hoc power analysis: Is it an informative and meaningful analysis? General Psychiatry, 32(4), e100069. 10.1136/gpsych-2019-100069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou H, & Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
