Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2021 Mar 19;23(3):990–1006. doi: 10.1093/biostatistics/kxab005

Structure-preserving integrated analysis for risk stratification with application to cancer staging

Tianjie Wang 1, Rui Chen 2, Wenshuo Liu 3, Menggang Yu 4,
PMCID: PMC9608615  PMID: 33738474

Summary

To provide appropriate and practical level of health care, it is critical to group patients into relatively few strata that have distinct prognosis. Such grouping or stratification is typically based on well-established risk factors and clinical outcomes. A well-known example is the American Joint Committee on Cancer staging for cancer that uses tumor size, node involvement, and metastasis status. We consider a statistical method for such grouping based on individual patient data from multiple studies. The method encourages a common grouping structure as a basis for borrowing information, but acknowledges data heterogeneity including unbalanced data structures across multiple studies. We build on the “lasso-tree” method that is more versatile than the well-known classification and regression tree method in generating possible grouping patterns. In addition, the parametrization of the lasso-tree method makes it very natural to incorporate the underlying order information in the risk factors. In this article, we also strengthen the lasso-tree method by establishing its theoretical properties for which Lin and others (2013. Lasso tree for cancer staging with survival data. Biostatistics 14, 327–339) did not pursue. We evaluate our method in extensive simulation studies and an analysis of multiple breast cancer data sets.

Keywords: Cancer staging, Data heterogeneity, Individual patient data, Integrated analysis, Survival analysis

1. Introduction

Risk stratifying patients using established risk factors is a perpetual health care research theme. This topic was brought to the center stage of cancer research when the American Joint Committee on Cancer (AJCC) initiated its 8th edition on cancer staging in 2018. All previous editions have focused on anatomical factors alone, but the new edition tries to incorporate validated biological factors to provide more precise stratification.

Take breast cancer for an example, the new edition includes estrogen receptor (ER) and progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, histologic grade, and oncogene expression, to the previous 7th edition that uses only the anatomical assessment of tumor size, regional lymph node involvement, and distant metastasis (Amin and others, 2017; National Cancer Institute, 2004). This more “personalized” approach to patient classification is an inevitable step in this era of precision molecular oncology. However the availability of many factors or many levels of the established factors leads to exceedingly refined categorization for patients.

Compared with the 7th edition staging table that can be depicted on a single page, the 8th-edition staging algorithm spans over three pages and is rather complex and hard to follow. A large part of the algorithm is based on expert opinions, while sensible, still needs to be validated from continued accumulation of data. This naturally raises a practical question for health care management is the categorization or refinement clinically significant and if not, how can we combine similar categories into relatively few strata with clinically distinct prognosis so that the staging system is relatively simple to follow. This can be crucial for effectiveness and simplification of care delivery and disease management due to limited resources.

In our motivating example with three breast cancer data sets, anatomical assessment factor, ER or PR status, and HER2 status were collected. However, histologic grade and oncogene expression information were not. Therefore, we cannot verify the new AJCC 8th edition cancer staging. However, it is still methodologically and scientifically interesting to investigate how the biological factors of ER or PR status and HER2 status enhance the anatomical staging. In our data sets, the anatomical feature based on the 7th edition of AJCC has four stages: I, IIa, IIb, and III. The biomarker feature, based on the ER/PR and HER2 statuses, has four well-known subtypes (Onitilo and others, 2009): ER/PR+ and HER2+, ER/PR+ and HER2Inline graphic, ER/PRInline graphic and HER2+, and ER/PRInline graphic and HER2Inline graphic. The combination of the two factors, anatomical (A) and biomarker (B), leads to 16 categories as in a typical A Inline graphic B table. The goal is to simplify the categories into fewer risk strata based on patients clinical outcomes using data-driven algorithms.

Statistically, this task is a supervised clustering or stratification problem where the outcome is usually time to event (such as recurrence or death) that is subject to possible censoring due to loss to follow-up. An intuitive solution is to use the well-known classification and regression tree (CART) method through recursive partitioning (Breiman, 1984; Loh, 2014). However, as argued by Lin and others (2013), CART does not capture all types of A and B combinations. At each split, CART must have a complete separation with respect to the splitting variable. In the case of the A Inline graphic B table, this means that CART will split fully along all columns or rows conditional on the existing splits. In other words, the grouping that results from CART must have partitions in straight lines. Yet the true staging system might have a different configuration (Amin and others, 2017). Hence, Lin and others (2013) proposed using a Cox proportional hazard model with penalization on the neighboring coefficients for cancer staging. Specifically, they utilized the sparsity-enforcing property of the lasso penalty (Tibshirani, 1997) to merge neighboring coefficients that are close and, therefore, form clusters. Their method, known as “lasso-tree”, has the advantage of generating more general partitioning patterns.

To enable CART to generate arbitrary patterns of the A Inline graphic B table, one could create an interaction term between A and B and then apply CART directly to this interaction term. In our setting, both the anatomical and biomarker features are ordered in the sense that their corresponding cancer prognoses satisfy the following: I Inline graphic IIa Inline graphic IIb Inline graphic III and (ER/PR+; HER2+) Inline graphic (ER/PR+; HER2Inline graphic) Inline graphic (ER/PRInline graphic; HER2+) Inline graphic (ER/PRInline graphic; ER2Inline graphic), where “Inline graphic” means better prognosis. When A and B are ordered, the levels of the interaction term become a partially ordered set as off-diagonal levels are not comparable. For example, A1B1 has better prognosis than A1B2, A2B1, A2B2, etc., but A1B2 and A2B1 are not directly comparable. Such partial ordering information of the interaction term, which is derived from the ordered properties of A and B, is clearly not utilized in the usual CART algorithm. On the other hand, the lasso-tree method directly incorporates the ordering into its algorithm.

This article extends the lasso-tree method to perform integrated analysis that uses individual participant data (IPD) from multiple sources. Such an integrated analysis can synthesize more information to borrow strength across different studies. On the other hand, the analysis needs to fully acknowledge data heterogeneity. In our motivating study, the data sets have differences in sample sizes, lengths of follow-up, study periods, and population composition and characteristics. These underlying differences are likely associated with challenges, such as varying magnitudes of signals, missing data, and population variations across different studies.

Therefore, we extend Lin and others (2013) in a non-trivial way to emphasize a common structure of risk stratification across different data sets. This basis for borrowing information is more robust as it makes no assumption or restriction about the magnitude of parameters otherwise. The assumption is achieved by a special adaptive group lasso penalty. In addition, we strengthen the method of Lin and others (2013) by establishing its theoretical properties. We evaluate our method in extensive simulation studies and in analyzing our motivational breast cancer data sets.

2. Penalized cox regression

Suppose the data are collected from Inline graphic studies where Study Inline graphic has Inline graphic observations for Inline graphic. The total sample size Inline graphic and for subject Inline graphic we observe Inline graphic as the event indicator describing whether the Inline graphic is an event time (Inline graphic) or a censoring time (Inline graphic). Let Inline graphic be the study indicator of this subject Inline graphic. In addition, we also observe covariates, in particular, including staging factors Inline graphic and Inline graphic, which are ordinal variables. Because we adopt a regression framework, adjustment for other non-staging covariates, such as age and family history can be easily incorporated too. Our method works for more than two factors. However, for simpler presentation and better result visualization, we first focus on the two staging factors and without loss of generality, we suppose Inline graphic, with larger values corresponding to worse prognoses.

We use the Cox proportional hazard model as our modeling framework. The hazard function for Study Inline graphic is assumed as

graphic file with name Equation1.gif (2.1)

where Inline graphic corresponds to the two staging features Inline graphic and Inline graphic. The baseline hazard function Inline graphic is unspecified as in the traditional Cox model. Note that we allow both the baseline functions Inline graphic and coefficients Inline graphic to be study-specific to account for possible heterogeneity among the studies. To make Inline graphic identifiable, we restrict Inline graphic.

We further denote

graphic file with name Equation2.gif (2.2)

where Inline graphic is the vectorization transformation which stacks matrix columns into a vector.

The logarithm of partial likelihood corresponding to Study Inline graphic is

graphic file with name Equation3.gif (2.3)

where Inline graphic is the set of all failure events and Inline graphic is the risk set of subject Inline graphic in Study Inline graphic. The overall log-likelihood is

graphic file with name Equation4.gif (2.4)

where Inline graphic.

To construct a staging system, one can first maximize (2.4) with respect to Inline graphic, or maximize Inline graphic with respect to Inline graphic for Inline graphic. Then inspect the resulting estimates for possible clustering of adjacent cells. However, such an endeavor may be very ineffective due to the large number of free parameters, that is, Inline graphic, in Inline graphic. In addition, a small Study Inline graphic may cause unstable estimation of Inline graphic, leading to difficulty in synthesizing findings across studies. Therefore, it is critical to streamline the analysis for construction of an interpretable and unified staging system.

As patients with similar values of the risk factors are more likely to have similar disease prognosis, the adjacent cells in the grid generated by the ordinal risk factors are more likely to cluster in the same stage. We focus on working with the vector of differences between the neighboring coefficients in Study Inline graphic. To simplify notation, we let Inline graphic be the set of neighboring pairs and define Inline graphic as follows:

graphic file with name Equation5.gif (2.5)

Here the matrix Inline graphic can be written as

graphic file with name Equation6.gif

where Inline graphic and Inline graphic are identity matrices, Inline graphic is the Kronecker product, and

graphic file with name Equation7.gif

To borrow strength across the studies and also enhance result interpretation, we consider the following adaptive group lasso penalization (Yuan and Lin, 2006; Zou, 2006; Wang and Leng, 2008) in the form of

graphic file with name Equation8.gif (2.6)

where

graphic file with name Equation9.gif (2.7)

Note that different from the usual adaptive group lasso, our penalty is directly on Inline graphic, instead of Inline graphic. This penalization on differences of adjacent parameters facilitates a “fusing” or clustering of neighboring coefficients for risk stratification. From the well-known group-wise selection property of the group lasso penalization (Yuan and Lin, 2006), all the terms under the square root will be enforced to be 0 simultaneously or not. That is, for a given Inline graphic, all Inline graphic, Inline graphic, are either estimated to be all 0 or not. Therefore, our penalization is on “preserving" the stratifying structure across the Inline graphic studies. Otherwise, we allow the coefficients Inline graphic to vary freely. The adaptive part, realized by Inline graphic, leads to much stabilized computation and paves the way for rigorous theoretical analysis of our method.

Because the staging factors are ordinal, the following ordering constraints are imposed,

graphic file with name Equation10.gif (2.8)

The above constraints (2.8) means that all the elements of Inline graphic are non-negative, which we denote by Inline graphic.

In summary, we seek solution of the following problem

graphic file with name Equation11.gif (2.9)

When there is only one study Inline graphic, our formulation reduces essentially to the lasso-tree method (Lin and others, 2013) when the adaptive weights are kept as constant.

We use the well-known theory of counting processes (Andersen and Gill, 1982) to establish the oracle properties of the proposed penalized Cox regression approach. To guarantee the local asymptotic quadratic property for the partial likelihood function (2.4), conditions A–D on page 1105 in Andersen and Gill (1982) are assumed in the whole section. We call these conditions the “Andersen–Gill conditions”. Suppose Inline graphic are the true values of Inline graphic and Inline graphic. Let Inline graphic be the indices of adjacent grids that should be grouped together. Then we have the following theorem.

Theorem 1

Assume the Andersen–Gill conditions as in Andersen and Gill (1982) and Condition (2.7). In addition, assume that Inline graphicInline graphic, and Inline graphic as Inline graphic. Then the solution to (2.9), Inline graphic satisfies the following properties as Inline graphic:

  1. (Consistency in grouping) Inline graphic, where Inline graphic;

  2. (Asymptotic normality) Inline graphic, where Inline graphic is the the asymptotic covariance that one would get by using a standard Cox model as in Andersen and Gill (1982), and Inline graphic is any full-rank Inline graphic matrix satisfying Inline graphic for all Inline graphic.

The proof of Theorem 1 largely follows Andersen and Gill (1982) and Zou (2006) and is relegated to the Supplementary Materials available at Biostatistics online.

3. Computation and algorithm

The minimization shown in Problem (2.9) is non-trivial. Different from the usual adaptive group lasso that can be solved by applying a block coordinate descent algorithm (Friedman and others, 2007), our penalty is on Inline graphic, instead of Inline graphic. As a result, each Inline graphic can appear under multiple square roots which makes it impossible to reparameterize to apply the block coordinate descent algorithm. Before we introduce our algorithm, we first revisit the existing algorithm for the Cox model with the lasso penalty proposed in Tibshirani (1997). We have developed an R package called “Group Lasso assisted Integrated Data Analysis for Risk Stratification” (GLIDARS), which is available at https://github.com/WangTJ/glidars.

3.1. Iteratively reweighted least squares algorithm

In presence of the well-known lasso penalty, the minimization of the negative log partial likelihood of the Cox model is achieved by a quasi-Newton method, usually referred as iteratively reweighted least squares (IRLS) algorithm (Green, 1984; Tibshirani, 1997).

For the convenience of notation, we introduce a dummy vector Inline graphic for subject Inline graphic such that Inline graphic. Let Inline graphic be a matrix whose Inline graphicth row is Inline graphic. Then the hazard function for subject Inline graphic under the Cox model (2.1) can be rewritten as Inline graphic. The IRLS algorithm starts from an initial guess of the optimizer, Inline graphic and then calculate the Hessian and gradient of Inline graphic at Inline graphic:

graphic file with name Equation12.gif

Then Inline graphic is updated to Inline graphic which minimizes

graphic file with name Equation13.gif

with respect to Inline graphic under the lasso penalty. This process is then iterated until convergence. It is worth mentioning that the calculation of the second-order differentiation Inline graphic in Inline graphic, which has the dimension of the total sample size Inline graphic, is the bottleneck of the computation. It is suggested by Tibshirani (1997) to keep only the diagonal elements of the matrix Inline graphic, because they are much larger than the off-diagonal ones. With such a method, more iterations are needed before convergence, but each iteration is much faster.

To deal with the penalty term in (2.6), here, we use an algorithm of local quadratic approximation similar to Fan and Li (2001). Define Inline graphic. Note that the penalty term (2.6) is actually a summation of the Inline graphic norms of the vectors Inline graphic over all the groups, or Inline graphic

The key idea is to replace the Inline graphic norms in the penalty with quadratic terms that are simple to deal with in optimization. Suppose in the Inline graphic-th step, Inline graphic from Inline graphic is obtained, then the Inline graphic norm of Inline graphic can be approximated by

graphic file with name Equation14.gif (3.10)

Now the numerator is quadratic in Inline graphic, and can be combined with the quadratic terms in the IRLS. The denominator is replaced with Inline graphic that can be calculated from Inline graphic, thus freed from the unknown when optimizing for Inline graphic.

One needs to pay special attention if the norm of some group of Inline graphic approaches 0, Inline graphic, which is expected for the purpose of grouping through variable selection when the tuning parameter Inline graphic is large enough. In that case, we use a small cutoff Inline graphic to preclude the denominator in (3.10) from being 0 and modify (3.10) as

graphic file with name Equation15.gif

where Inline graphic is the maximum operator.

The cutoff Inline graphic should be chosen much smaller than the convergence tolerance Inline graphic, Inline graphic, to ensure an accurate staging in the result.

The complete algorithm is described in Algorithm 1 below. The updating rule (3.10) has been rewritten in terms of Inline graphic. To do so, we write Inline graphic where Inline graphic Then define the following matrix for Inline graphic, Inline graphic where Inline graphic is the vector of length Inline graphic whose Inline graphic-th component is 1 and other components are 0. The functionality of these matrices is to link Inline graphic with Inline graphic so that

graphic file with name Equation16.gif

With these notations, the approximation (3.10) can be written as a quadratic term of Inline graphic,

graphic file with name Equation17.gif (3.11)

graphic file with name kxab005a1.jpg

Algorithm 1 The GLIDARS algorithm

Due to the constraint Inline graphic, the updating step for Inline graphic is a quandratic programming problem which can be solved by many standard solvers, such as the R package quadprog. In our setup, the Hessian will have many zero eigenvalues, leading to unstable computation. We deal with it by adding a very small disturbance Inline graphic to the Hessian, where Inline graphic is the precision tolerance. It will also generate a moderate fusion for very close coefficients.

3.2. Tuning and adaptive weights

For the adaptive weights, Inline graphicInline graphic should be a Inline graphic-consistent estimator of Inline graphic to ensure the theoretical results from Theorem 1. We propose estimating Inline graphic from the Cox model without penalization on each data set for such estimation.

With adaptive weights fixed, we propose to use the Bayesian information criterion (BIC) (Schwarz, 1978) to select the tuning parameter Inline graphic:

graphic file with name Equation18.gif (3.12)

Here Inline graphic is the log-partial likelihood with the penalty tuning parameter Inline graphic, and Inline graphic is the degree of freedom in the model which is taken as the number of unique parameters. The BIC is calculated over a grid of values of Inline graphic, for which we choose uniformly on the log-scale from 0 to a large value. The value Inline graphic yielding the lowest estimated BIC is selected.

4. Simulation Study

We demonstrate the performance of our method by simulation studies under three scenarios listed in Table 1. For different scenarios, we vary the following aspects: sample sizes, distribution of patients on the Inline graphic combinations, relative risks, and baseline functions. All the results are based on 500 simulations. Scenarios 1 and 2 have two data sets and Scenario 3 has four data sets. The underlying staging structures are the same across the data sets which are highlighted in different colored shades. However, the actual coefficients can vary for different data sets. We also allow the patient distributions on the Inline graphic combinations to vary for different data sets. The survival outcomes were generated from the exponential distribution and the censoring rates were kept around 0.2 for all data sets.

Table 1.

Simulation scenarios

Inline graphic

For comparison, we apply the lasso-tree method from Lin and others (2013) and the CART using the r package rpart. As these comparison methods are not designed for integrated analysis, we apply them both to each data set separately and to the pooled data.

To evaluate and compare results from different methods, we define staging or clustering “accuracy” as the proportion of correctly identified zero differences Inline graphic between neighboring coefficients for Inline graphic. The accuracy takes value between 0 and 1, with larger values indicating better results. For our method, GLIDARS, and the lasso-tree method, because there are also parameters to be estimated, we also present results for the sum of square error (SSE) of Inline graphic in the Supplementary Materials available at Biostatistics online.

In Scenario 1, the sample sizes of the two data sets are Inline graphic and Inline graphic. All cells in the AInline graphicB table have some observations. In Scenario 2, we allow some cells of the Inline graphic table to be empty, similar to our real data. In particular, the first data set has a sample size of Inline graphic with no missing cells. The second data set has a larger sample size, Inline graphic, but with no observations for B3 and for the cell A3B2. Scenario 3 has four data sets with the sample sizes Inline graphic, Inline graphic, Inline graphic, and Inline graphic. We chose relative larger sample sizes for the data sets with smaller signal or coefficients so that the separate analyses were stably conducted.

The results from different methods are in Figure 1. Besides providing boxplots for stratification ‘accuracy’ across 500 simulations, we also plotted percentages that these methods yielded highest accuracy in each simulation. We see that GLIDARS clearly outperforms the other methods in terms of accuracy. The separate analysis suffers from the small sample sizes. GLIDARS successfully makes use of the compensating distributions of the data sets to overcome this problem. Due to the heterogeneity in both the values of coefficients and the patient distributions in the Inline graphic table, naively pooling also leads to inferior results. In terms of SSE, GLIDARS also outperforms the lasso-tree from the results presented in the Supplementary Materials available at Biostatistics online.

Fig. 1.

Fig. 1.

Staging accuracy results are shown in box plots (left), and the proportions of being best model in each simulation are shown in the bar plots (right).

We further conducted simulation studies when the underlying true staging may differ across different data sets. The simulation scenarios and results are presented in the Supplementary Materials available at Biostatistics online. By an inspection of our theoretical development, the final integrated staging in this case should asymptotically be the intersection of all the individual data stagings because of the way we penalize the differences of adjacent coefficients. All nonzero differences will remain nonzero in the integrated results given large sample sizes. Consequently, the integrated staging will be finer than all the individual data stagings. We take comfort in this result as this means more refined care for patients. In addition, the estimated coefficients from different stages can provide further information about the necessity of the integrated staging. That is, if the coefficients from two adjacent stages are very close to each other, we may combine the two stages into one stage.

5. Application to Breast Cancer Staging

We perform integrated analysis using our method and the comparators on three different studies: CALGB 49907 and CALGB 9741 conducted by the Cancer and Leukemia Group B (CALGB) and N9831 by the North Central Cancer Treatment Group (NCCTG). CALGB 49907 included 298 early-stage breast cancer patients who were 65 years or older (Partridge and others, 2010). CALGB 9741 included 798 patients with positive lymph nodes (Citron and others, 2002). N9831 has the largest sample size of 2256 (Perez and others, 2006). According to Table 2, the patient distribution is quite different across the studies. Both CALGB 9741 and N9831 have missing observations in some of the AInline graphicB combinations. Follow-up and event rate are also shown in the table.

Table 2.

Summary statistics and patient distributions for the three studies

(a) Overall
Study 49007 9741 N9831
Sample size 298 798 2256
Mean follow-up time (years) 6.93 6.43 8.96
Censor rate 0.68 0.69 0.84
Study 49907
  ER/PR+ ER/PR+ ER/PRInline graphic ER/PRInline graphic
  HER2+ HER2Inline graphic HER2+ HER2Inline graphic
I 1 9 4 11
IIa 8 86 8 36
IIb 5 55 7 19
III 1 36 3 10
Study 9741
  ER/PR+ ER/PR+ ER/PRInline graphic ER/PRInline graphic
  HER2+ HER2Inline graphic HER2+ HER2Inline graphic
I 0 0 0 0
IIa 0 136 0 41
IIb 0 179 0 100
III 0 221 0 94
Study N9831
  ER/PR+ ER/PR+ ER/PRInline graphic ER/PRInline graphic
  HER2+ HER2Inline graphic HER2+ HER2Inline graphic
I 98 14 181 17
IIa 592 134 451 36
IIb 0 0 0 0
III 347 66 303 13

Similar to our simulation study, we compare GLIDARS with the lasso-tree and the CART methods. Note that because our goal for the real data is to construct a unified staging system for all patients, we only performed the comparison methods on the pooled data. For both GLIDARS and the lasso-tree, BIC is used to determine the optimum value of Inline graphic from a uniform grid on the log-scale from 0 to 15.

Table 3 shows results from the GLIDARS method where the stratification results are shown in Table 3a and the corresponding coefficients estimates are in Table 3b. Figure 2 shows the estimated Kaplan–Meier survival curves based on GLIDARS’s stratification of five stages. The stratification led to clear separation except Stage 3 and Stage 4 patients for CALGB 49907. However, these two groups of patients were well separated in the other two studies.

Table 3.

The stratifying schemes and values of the coefficients on extending the AJCC 7th edition staging with biomarkers for three selected studies

Inline graphic

Fig. 2.

Fig. 2.

The survival curves from the GLIDARS stratification.

Results from the comparator methods are in Table 4 and Figure 3. The lasso-tree method based on pooled data also had five stages; however, the separation of their Kaplan–Meier survival curves is not as distinct. For CART, the default pruning option led to three stages and there were clear separation of their Kaplan–Meier survival curves. However, we think this stratification may not be scientifically valid as it is well known that triple negative breast cancer (ER/PRInline graphic and HER2Inline graphic) is more aggressive than the other subtypes. But this fact is not reflected in the CART result in patients with anatomical stages of I and IIa. Then, we specified more conservative options for CART that led to four stages and five stages of stratification. The same issue persisted for these CART results. In the meantime, the separation of their Kaplan–Meier survival curves became less distinct.

Table 4.

The stratifying schemes of comparator methods

Inline graphic

Fig. 3.

Fig. 3.

The stratified survival curves from the comparator methods.

6. Discussion and conclusion

In this article, we developed an integrated analysis method for risk stratification using data sets from different sources. In particular, we made the minimum assumption that all data sets shared the same structure of the risk stratification. But we otherwise let all the parameters to vary freely, including the baseline hazard function. We facilitated such an assumption by using a special adaptive group lasso penalty on the differences of adjacent parameters. We developed our method based on the well-known Cox model for time-to-event outcomes. Our idea can be directly applied to other types of outcomes. We studied asymptotic properties of our approach. When simplified to a single study setting (i.e. Inline graphic), our theoretical study also filled an important step for the original lasso-tree paper (Lin and others, 2013).

Even though we only considered two staging factors in our article, our algorithm can be applied to more than two factors. The increase in computation burden is manageable to certain degree as the workhorse of our algorithm is a quadratic programming algorithm that can be solved efficiently.

We motivated our method on risk stratification for breast cancer patients. Our method is directly applicable to other settings as well. For example, for patients with many chronic conditions, risk stratification is also encountering challenges due to the need to incorporate many risk factors. These risk factors, similar to our breast cancer example, are typically ordered. Considering a risk stratification system based on these factors can lead to personalized care but overly refined classification can be challenging for efficient and equitable delivery of health care.

7. Software

The programming codes for this paper are publicly available at https://github.com/WangTJ/glidars.

Supplementary Material

kxab005_Supplementary_Data

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable input on improving the original version of this article. Conflict of Interest: None declared.

Contributor Information

Tianjie Wang, Department of Statistics, University of Wisconsin, Madison, WI, USA.

Rui Chen, Department of Statistics, University of Wisconsin, Madison, WI, USA.

Wenshuo Liu, Department of Research & Innovation, Interactions LLC, 31 Hayward Street Suite E, Franklin, MA 02038, USA.

Menggang Yu, Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

The authors’ research was partially supported by the Specialized Program of Research Excellence (SPORE) program, through the National Institute for Dental and Craniofacial Research (NIDCR), and National Cancer Institute (NCI), grant P50DE026787. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health (NIH).

References

  1. Amin, M. B., Edge, S. B., Greene, F. L., Byrd, D. R., Brookland, R. K., Washington, M. K., Gershenwald, J. E., Compton, C. C., Hess, K. R., Sullivan, D. C.. and others. (2017). AJCC Cancer Staging Manual. New York: Springer. [Google Scholar]
  2. Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics 10, 1100–1120. [Google Scholar]
  3. Breiman, L. (1984). Classification and Regression Trees. New York: Routledge. [Google Scholar]
  4. Citron, M., Berry, D., Cirrincione, C., Carpenter, J., Hudis, C., Gradishar, W., Davidson, N., Ingle, J., Martino, S., Livingston, R.. and others. (2002). Superiority of dose-dense over conventional scheduling and equivalence of sequential (sc) vs. combination adjuvant chemotherapy for node-positive breast cancer (CALGB 9741, Int C9741). Breast Cancer Research and Treatment 76, S32. [Google Scholar]
  5. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360. [Google Scholar]
  6. Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics 1, 302–332. [Google Scholar]
  7. Green, P. J. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. Journal of the Royal Statistical Society. Series B (Methodological) 46, 149–192. [Google Scholar]
  8. Lin, Y., Wang, S. and Chappell, R. J. (2013). Lasso tree for cancer staging with survival data. Biostatistics 14, 327–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Loh, W.-Y. (2014). Fifty years of classification and regression trees. International Statistical Review 82, 329–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. National Cancer Institute. (2004). National Cancer Institute Fact Sheet: Cancer Staging. https://www.cancer.gov/about-cancer/diagnosis-staging/staging.
  11. Onitilo, A. A., Engel, J. M., Greenlee, R. T. and Mukesh, B. N. (2009). Breast cancer subtypes based on ER/PR and Her2 expression: comparison of clinicopathologic features and survival. Clinical Medicine and Research 7, 4–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Partridge, A. H., Archer, L., Kornblith, A. B., Gralow, J., Grenier, D., Perez, E., Wolff, A. C., Wang, X., Kastrissios, H., Berry, D.. and others. (2010). Adherence and persistence with oral adjuvant chemotherapy in older women with early-stage breast cancer in CALGB 49907: adherence companion study 60104. Journal of Clinical Oncology 28, 2418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Perez, E. A., Suman, V. J., Davidson, N. E., Martino, S., Kaufman, P. A., Lingle, W. L., Flynn, P. J., Ingle, J. N., Visscher, D. and Jenkins, R. B. (2006). Her2 testing by local, central, and reference laboratories in specimens from the north central cancer treatment group n9831 intergroup adjuvant trial. Journal of Clinical Oncology 24, 3032–3038. [DOI] [PubMed] [Google Scholar]
  14. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464. [Google Scholar]
  15. Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine 16, 385–395. [DOI] [PubMed] [Google Scholar]
  16. Wang, H. and Leng, C. (2008). A note on adaptive group lasso. Computational Statistics and Data Analysis 52, 5277–5286. [Google Scholar]
  17. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 49–67. [Google Scholar]
  18. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418–1429. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxab005_Supplementary_Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES