Structure-preserving integrated analysis for risk stratification with application to cancer staging

Tianjie Wang; Rui Chen; Wenshuo Liu; Menggang Yu

doi:10.1093/biostatistics/kxab005

. 2021 Mar 19;23(3):990–1006. doi: 10.1093/biostatistics/kxab005

Structure-preserving integrated analysis for risk stratification with application to cancer staging

Tianjie Wang ¹, Rui Chen ², Wenshuo Liu ³, Menggang Yu ^4,^✉

PMCID: PMC9608615 PMID: 33738474

Summary

To provide appropriate and practical level of health care, it is critical to group patients into relatively few strata that have distinct prognosis. Such grouping or stratification is typically based on well-established risk factors and clinical outcomes. A well-known example is the American Joint Committee on Cancer staging for cancer that uses tumor size, node involvement, and metastasis status. We consider a statistical method for such grouping based on individual patient data from multiple studies. The method encourages a common grouping structure as a basis for borrowing information, but acknowledges data heterogeneity including unbalanced data structures across multiple studies. We build on the “lasso-tree” method that is more versatile than the well-known classification and regression tree method in generating possible grouping patterns. In addition, the parametrization of the lasso-tree method makes it very natural to incorporate the underlying order information in the risk factors. In this article, we also strengthen the lasso-tree method by establishing its theoretical properties for which Lin and others (2013. Lasso tree for cancer staging with survival data. Biostatistics 14, 327–339) did not pursue. We evaluate our method in extensive simulation studies and an analysis of multiple breast cancer data sets.

Keywords: Cancer staging, Data heterogeneity, Individual patient data, Integrated analysis, Survival analysis

1. Introduction

Risk stratifying patients using established risk factors is a perpetual health care research theme. This topic was brought to the center stage of cancer research when the American Joint Committee on Cancer (AJCC) initiated its 8th edition on cancer staging in 2018. All previous editions have focused on anatomical factors alone, but the new edition tries to incorporate validated biological factors to provide more precise stratification.

Take breast cancer for an example, the new edition includes estrogen receptor (ER) and progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, histologic grade, and oncogene expression, to the previous 7th edition that uses only the anatomical assessment of tumor size, regional lymph node involvement, and distant metastasis (Amin and others, 2017; National Cancer Institute, 2004). This more “personalized” approach to patient classification is an inevitable step in this era of precision molecular oncology. However the availability of many factors or many levels of the established factors leads to exceedingly refined categorization for patients.

Compared with the 7th edition staging table that can be depicted on a single page, the 8th-edition staging algorithm spans over three pages and is rather complex and hard to follow. A large part of the algorithm is based on expert opinions, while sensible, still needs to be validated from continued accumulation of data. This naturally raises a practical question for health care management is the categorization or refinement clinically significant and if not, how can we combine similar categories into relatively few strata with clinically distinct prognosis so that the staging system is relatively simple to follow. This can be crucial for effectiveness and simplification of care delivery and disease management due to limited resources.

In our motivating example with three breast cancer data sets, anatomical assessment factor, ER or PR status, and HER2 status were collected. However, histologic grade and oncogene expression information were not. Therefore, we cannot verify the new AJCC 8th edition cancer staging. However, it is still methodologically and scientifically interesting to investigate how the biological factors of ER or PR status and HER2 status enhance the anatomical staging. In our data sets, the anatomical feature based on the 7th edition of AJCC has four stages: I, IIa, IIb, and III. The biomarker feature, based on the ER/PR and HER2 statuses, has four well-known subtypes (Onitilo and others, 2009): ER/PR+ and HER2+, ER/PR+ and HER2 Inline graphic , ER/PR and HER2+, and ER/PR and HER2. The combination of the two factors, anatomical (A) and biomarker (B), leads to 16 categories as in a typical A B table. The goal is to simplify the categories into fewer risk strata based on patients clinical outcomes using data-driven algorithms.

Statistically, this task is a supervised clustering or stratification problem where the outcome is usually time to event (such as recurrence or death) that is subject to possible censoring due to loss to follow-up. An intuitive solution is to use the well-known classification and regression tree (CART) method through recursive partitioning (Breiman, 1984; Loh, 2014). However, as argued by Lin and others (2013), CART does not capture all types of A and B combinations. At each split, CART must have a complete separation with respect to the splitting variable. In the case of the A Inline graphic B table, this means that CART will split fully along all columns or rows conditional on the existing splits. In other words, the grouping that results from CART must have partitions in straight lines. Yet the true staging system might have a different configuration (Amin and others, 2017). Hence, Lin and others (2013) proposed using a Cox proportional hazard model with penalization on the neighboring coefficients for cancer staging. Specifically, they utilized the sparsity-enforcing property of the lasso penalty (Tibshirani, 1997) to merge neighboring coefficients that are close and, therefore, form clusters. Their method, known as “lasso-tree”, has the advantage of generating more general partitioning patterns.

To enable CART to generate arbitrary patterns of the A Inline graphic B table, one could create an interaction term between A and B and then apply CART directly to this interaction term. In our setting, both the anatomical and biomarker features are ordered in the sense that their corresponding cancer prognoses satisfy the following: I IIa IIb III and (ER/PR+; HER2+) Inline graphic (ER/PR+; HER2) (ER/PR; HER2+) (ER/PR; ER2), where “” means better prognosis. When A and B are ordered, the levels of the interaction term become a partially ordered set as off-diagonal levels are not comparable. For example, A1B1 has better prognosis than A1B2, A2B1, A2B2, etc., but A1B2 and A2B1 are not directly comparable. Such partial ordering information of the interaction term, which is derived from the ordered properties of A and B, is clearly not utilized in the usual CART algorithm. On the other hand, the lasso-tree method directly incorporates the ordering into its algorithm.

This article extends the lasso-tree method to perform integrated analysis that uses individual participant data (IPD) from multiple sources. Such an integrated analysis can synthesize more information to borrow strength across different studies. On the other hand, the analysis needs to fully acknowledge data heterogeneity. In our motivating study, the data sets have differences in sample sizes, lengths of follow-up, study periods, and population composition and characteristics. These underlying differences are likely associated with challenges, such as varying magnitudes of signals, missing data, and population variations across different studies.

Therefore, we extend Lin and others (2013) in a non-trivial way to emphasize a common structure of risk stratification across different data sets. This basis for borrowing information is more robust as it makes no assumption or restriction about the magnitude of parameters otherwise. The assumption is achieved by a special adaptive group lasso penalty. In addition, we strengthen the method of Lin and others (2013) by establishing its theoretical properties. We evaluate our method in extensive simulation studies and in analyzing our motivational breast cancer data sets.

2. Penalized cox regression

Suppose the data are collected from Inline graphic studies where Study has observations for . The total sample size and for subject we observe as the event indicator describing whether the is an event time () or a censoring time (). Let be the study indicator of this subject . In addition, we also observe covariates, in particular, including staging factors Inline graphic and , which are ordinal variables. Because we adopt a regression framework, adjustment for other non-staging covariates, such as age and family history can be easily incorporated too. Our method works for more than two factors. However, for simpler presentation and better result visualization, we first focus on the two staging factors and without loss of generality, we suppose Inline graphic , with larger values corresponding to worse prognoses.

We use the Cox proportional hazard model as our modeling framework. The hazard function for Study Inline graphic is assumed as

(2.1)

where Inline graphic corresponds to the two staging features and . The baseline hazard function is unspecified as in the traditional Cox model. Note that we allow both the baseline functions and coefficients to be study-specific to account for possible heterogeneity among the studies. To make identifiable, we restrict Inline graphic .

We further denote

(2.2)

where Inline graphic is the vectorization transformation which stacks matrix columns into a vector.

The logarithm of partial likelihood corresponding to Study Inline graphic is

(2.3)

where Inline graphic is the set of all failure events and is the risk set of subject in Study . The overall log-likelihood is

(2.4)

where Inline graphic .

To construct a staging system, one can first maximize (2.4) with respect to Inline graphic , or maximize with respect to for . Then inspect the resulting estimates for possible clustering of adjacent cells. However, such an endeavor may be very ineffective due to the large number of free parameters, that is, , in . In addition, a small Study may cause unstable estimation of Inline graphic , leading to difficulty in synthesizing findings across studies. Therefore, it is critical to streamline the analysis for construction of an interpretable and unified staging system.

As patients with similar values of the risk factors are more likely to have similar disease prognosis, the adjacent cells in the grid generated by the ordinal risk factors are more likely to cluster in the same stage. We focus on working with the vector of differences between the neighboring coefficients in Study Inline graphic . To simplify notation, we let be the set of neighboring pairs and define as follows:

(2.5)

Here the matrix Inline graphic can be written as

where Inline graphic and are identity matrices, is the Kronecker product, and

To borrow strength across the studies and also enhance result interpretation, we consider the following adaptive group lasso penalization (Yuan and Lin, 2006; Zou, 2006; Wang and Leng, 2008) in the form of

(2.6)

where

(2.7)

Note that different from the usual adaptive group lasso, our penalty is directly on Inline graphic , instead of . This penalization on differences of adjacent parameters facilitates a “fusing” or clustering of neighboring coefficients for risk stratification. From the well-known group-wise selection property of the group lasso penalization (Yuan and Lin, 2006), all the terms under the square root will be enforced to be 0 simultaneously or not. That is, for a given Inline graphic , all , , are either estimated to be all 0 or not. Therefore, our penalization is on “preserving" the stratifying structure across the studies. Otherwise, we allow the coefficients to vary freely. The adaptive part, realized by , leads to much stabilized computation and paves the way for rigorous theoretical analysis of our method.

Because the staging factors are ordinal, the following ordering constraints are imposed,

(2.8)

The above constraints (2.8) means that all the elements of Inline graphic are non-negative, which we denote by .

In summary, we seek solution of the following problem

(2.9)

When there is only one study Inline graphic , our formulation reduces essentially to the lasso-tree method (Lin and others, 2013) when the adaptive weights are kept as constant.

We use the well-known theory of counting processes (Andersen and Gill, 1982) to establish the oracle properties of the proposed penalized Cox regression approach. To guarantee the local asymptotic quadratic property for the partial likelihood function (2.4), conditions A–D on page 1105 in Andersen and Gill (1982) are assumed in the whole section. We call these conditions the “Andersen–Gill conditions”. Suppose Inline graphic are the true values of and . Let be the indices of adjacent grids that should be grouped together. Then we have the following theorem.

Theorem 1

Assume the Andersen–Gill conditions as in Andersen and Gill (1982) and Condition (2.7). In addition, assume that , and as . Then the solution to (2.9), satisfies the following properties as :

(Consistency in grouping) , where ;

(Asymptotic normality) , where is the the asymptotic covariance that one would get by using a standard Cox model as in Andersen and Gill (1982), and is any full-rank matrix satisfying for all .

The proof of Theorem 1 largely follows Andersen and Gill (1982) and Zou (2006) and is relegated to the Supplementary Materials available at Biostatistics online.

3. Computation and algorithm

The minimization shown in Problem (2.9) is non-trivial. Different from the usual adaptive group lasso that can be solved by applying a block coordinate descent algorithm (Friedman and others, 2007), our penalty is on Inline graphic , instead of . As a result, each can appear under multiple square roots which makes it impossible to reparameterize to apply the block coordinate descent algorithm. Before we introduce our algorithm, we first revisit the existing algorithm for the Cox model with the lasso penalty proposed in Tibshirani (1997). We have developed an R package called “Group Lasso assisted Integrated Data Analysis for Risk Stratification” (GLIDARS), which is available at https://github.com/WangTJ/glidars.

3.1. Iteratively reweighted least squares algorithm

In presence of the well-known lasso penalty, the minimization of the negative log partial likelihood of the Cox model is achieved by a quasi-Newton method, usually referred as iteratively reweighted least squares (IRLS) algorithm (Green, 1984; Tibshirani, 1997).

For the convenience of notation, we introduce a dummy vector Inline graphic for subject such that . Let be a matrix whose th row is . Then the hazard function for subject under the Cox model (2.1) can be rewritten as . The IRLS algorithm starts from an initial guess of the optimizer, and then calculate the Hessian and gradient of at :

Then Inline graphic is updated to which minimizes

with respect to Inline graphic under the lasso penalty. This process is then iterated until convergence. It is worth mentioning that the calculation of the second-order differentiation in , which has the dimension of the total sample size , is the bottleneck of the computation. It is suggested by Tibshirani (1997) to keep only the diagonal elements of the matrix Inline graphic , because they are much larger than the off-diagonal ones. With such a method, more iterations are needed before convergence, but each iteration is much faster.

To deal with the penalty term in (2.6), here, we use an algorithm of local quadratic approximation similar to Fan and Li (2001). Define Inline graphic . Note that the penalty term (2.6) is actually a summation of the norms of the vectors over all the groups, or

The key idea is to replace the Inline graphic norms in the penalty with quadratic terms that are simple to deal with in optimization. Suppose in the -th step, from is obtained, then the norm of can be approximated by

(3.10)

Now the numerator is quadratic in Inline graphic , and can be combined with the quadratic terms in the IRLS. The denominator is replaced with that can be calculated from , thus freed from the unknown when optimizing for .

One needs to pay special attention if the norm of some group of Inline graphic approaches 0, , which is expected for the purpose of grouping through variable selection when the tuning parameter is large enough. In that case, we use a small cutoff to preclude the denominator in (3.10) from being 0 and modify (3.10) as

where Inline graphic is the maximum operator.

The cutoff Inline graphic should be chosen much smaller than the convergence tolerance , , to ensure an accurate staging in the result.

The complete algorithm is described in Algorithm 1 below. The updating rule (3.10) has been rewritten in terms of Inline graphic . To do so, we write where Then define the following matrix for , where is the vector of length whose -th component is 1 and other components are 0. The functionality of these matrices is to link with so that

With these notations, the approximation (3.10) can be written as a quadratic term of Inline graphic ,

(3.11)

graphic file with name kxab005a1.jpg — **Algorithm 1** The GLIDARS algorithm

Due to the constraint Inline graphic , the updating step for is a quandratic programming problem which can be solved by many standard solvers, such as the R package quadprog. In our setup, the Hessian will have many zero eigenvalues, leading to unstable computation. We deal with it by adding a very small disturbance to the Hessian, where Inline graphic is the precision tolerance. It will also generate a moderate fusion for very close coefficients.

3.2. Tuning and adaptive weights

For the adaptive weights, Inline graphic should be a -consistent estimator of to ensure the theoretical results from Theorem 1. We propose estimating from the Cox model without penalization on each data set for such estimation.

With adaptive weights fixed, we propose to use the Bayesian information criterion (BIC) (Schwarz, 1978) to select the tuning parameter Inline graphic :

(3.12)

Here Inline graphic is the log-partial likelihood with the penalty tuning parameter , and is the degree of freedom in the model which is taken as the number of unique parameters. The BIC is calculated over a grid of values of , for which we choose uniformly on the log-scale from 0 to a large value. The value Inline graphic yielding the lowest estimated BIC is selected.

4. Simulation Study

We demonstrate the performance of our method by simulation studies under three scenarios listed in Table 1. For different scenarios, we vary the following aspects: sample sizes, distribution of patients on the Inline graphic combinations, relative risks, and baseline functions. All the results are based on 500 simulations. Scenarios 1 and 2 have two data sets and Scenario 3 has four data sets. The underlying staging structures are the same across the data sets which are highlighted in different colored shades. However, the actual coefficients can vary for different data sets. We also allow the patient distributions on the Inline graphic combinations to vary for different data sets. The survival outcomes were generated from the exponential distribution and the censoring rates were kept around 0.2 for all data sets.

Table 1.

Simulation scenarios

Open in a new tab

For comparison, we apply the lasso-tree method from Lin and others (2013) and the CART using the r package rpart. As these comparison methods are not designed for integrated analysis, we apply them both to each data set separately and to the pooled data.

To evaluate and compare results from different methods, we define staging or clustering “accuracy” as the proportion of correctly identified zero differences Inline graphic between neighboring coefficients for . The accuracy takes value between 0 and 1, with larger values indicating better results. For our method, GLIDARS, and the lasso-tree method, because there are also parameters to be estimated, we also present results for the sum of square error (SSE) of Inline graphic in the Supplementary Materials available at Biostatistics online.

In Scenario 1, the sample sizes of the two data sets are Inline graphic and . All cells in the AB table have some observations. In Scenario 2, we allow some cells of the table to be empty, similar to our real data. In particular, the first data set has a sample size of with no missing cells. The second data set has a larger sample size, , but with no observations for B3 and for the cell A3B2. Scenario 3 has four data sets with the sample sizes Inline graphic , , , and . We chose relative larger sample sizes for the data sets with smaller signal or coefficients so that the separate analyses were stably conducted.

The results from different methods are in Figure 1. Besides providing boxplots for stratification ‘accuracy’ across 500 simulations, we also plotted percentages that these methods yielded highest accuracy in each simulation. We see that GLIDARS clearly outperforms the other methods in terms of accuracy. The separate analysis suffers from the small sample sizes. GLIDARS successfully makes use of the compensating distributions of the data sets to overcome this problem. Due to the heterogeneity in both the values of coefficients and the patient distributions in the Inline graphic table, naively pooling also leads to inferior results. In terms of SSE, GLIDARS also outperforms the lasso-tree from the results presented in the Supplementary Materials available at Biostatistics online.

Fig. 1. — Staging accuracy results are shown in box plots (left), and the proportions of being best model in each simulation are shown in the bar plots (right).

We further conducted simulation studies when the underlying true staging may differ across different data sets. The simulation scenarios and results are presented in the Supplementary Materials available at Biostatistics online. By an inspection of our theoretical development, the final integrated staging in this case should asymptotically be the intersection of all the individual data stagings because of the way we penalize the differences of adjacent coefficients. All nonzero differences will remain nonzero in the integrated results given large sample sizes. Consequently, the integrated staging will be finer than all the individual data stagings. We take comfort in this result as this means more refined care for patients. In addition, the estimated coefficients from different stages can provide further information about the necessity of the integrated staging. That is, if the coefficients from two adjacent stages are very close to each other, we may combine the two stages into one stage.

5. Application to Breast Cancer Staging

We perform integrated analysis using our method and the comparators on three different studies: CALGB 49907 and CALGB 9741 conducted by the Cancer and Leukemia Group B (CALGB) and N9831 by the North Central Cancer Treatment Group (NCCTG). CALGB 49907 included 298 early-stage breast cancer patients who were 65 years or older (Partridge and others, 2010). CALGB 9741 included 798 patients with positive lymph nodes (Citron and others, 2002). N9831 has the largest sample size of 2256 (Perez and others, 2006). According to Table 2, the patient distribution is quite different across the studies. Both CALGB 9741 and N9831 have missing observations in some of the A Inline graphic B combinations. Follow-up and event rate are also shown in the table.

Table 2.

Summary statistics and patient distributions for the three studies

(a) Overall
Study	49007	9741	N9831
Sample size	298	798	2256
Mean follow-up time (years)	6.93	6.43	8.96
Censor rate	0.68	0.69	0.84

Study 49907
	ER/PR+	ER/PR+	ER/PR	ER/PR
	HER2+	HER2	HER2+	HER2
I	1	9	4	11
IIa	8	86	8	36
IIb	5	55	7	19
III	1	36	3	10
Study 9741
	ER/PR+	ER/PR+	ER/PR	ER/PR
	HER2+	HER2	HER2+	HER2
I	0	0	0	0
IIa	0	136	0	41
IIb	0	179	0	100
III	0	221	0	94
Study N9831
	ER/PR+	ER/PR+	ER/PR	ER/PR
	HER2+	HER2	HER2+	HER2
I	98	14	181	17
IIa	592	134	451	36
IIb	0	0	0	0
III	347	66	303	13

Open in a new tab

Similar to our simulation study, we compare GLIDARS with the lasso-tree and the CART methods. Note that because our goal for the real data is to construct a unified staging system for all patients, we only performed the comparison methods on the pooled data. For both GLIDARS and the lasso-tree, BIC is used to determine the optimum value of Inline graphic from a uniform grid on the log-scale from 0 to 15.

Table 3 shows results from the GLIDARS method where the stratification results are shown in Table 3a and the corresponding coefficients estimates are in Table 3b. Figure 2 shows the estimated Kaplan–Meier survival curves based on GLIDARS’s stratification of five stages. The stratification led to clear separation except Stage 3 and Stage 4 patients for CALGB 49907. However, these two groups of patients were well separated in the other two studies.

Table 3.

The stratifying schemes and values of the coefficients on extending the AJCC 7th edition staging with biomarkers for three selected studies

Open in a new tab

Fig. 2. — The survival curves from the GLIDARS stratification.

Results from the comparator methods are in Table 4 and Figure 3. The lasso-tree method based on pooled data also had five stages; however, the separation of their Kaplan–Meier survival curves is not as distinct. For CART, the default pruning option led to three stages and there were clear separation of their Kaplan–Meier survival curves. However, we think this stratification may not be scientifically valid as it is well known that triple negative breast cancer (ER/PR Inline graphic and HER2) is more aggressive than the other subtypes. But this fact is not reflected in the CART result in patients with anatomical stages of I and IIa. Then, we specified more conservative options for CART that led to four stages and five stages of stratification. The same issue persisted for these CART results. In the meantime, the separation of their Kaplan–Meier survival curves became less distinct.

Table 4.

The stratifying schemes of comparator methods

Open in a new tab

Fig. 3. — The stratified survival curves from the comparator methods.

6. Discussion and conclusion

In this article, we developed an integrated analysis method for risk stratification using data sets from different sources. In particular, we made the minimum assumption that all data sets shared the same structure of the risk stratification. But we otherwise let all the parameters to vary freely, including the baseline hazard function. We facilitated such an assumption by using a special adaptive group lasso penalty on the differences of adjacent parameters. We developed our method based on the well-known Cox model for time-to-event outcomes. Our idea can be directly applied to other types of outcomes. We studied asymptotic properties of our approach. When simplified to a single study setting (i.e. Inline graphic ), our theoretical study also filled an important step for the original lasso-tree paper (Lin and others, 2013).

Even though we only considered two staging factors in our article, our algorithm can be applied to more than two factors. The increase in computation burden is manageable to certain degree as the workhorse of our algorithm is a quadratic programming algorithm that can be solved efficiently.

We motivated our method on risk stratification for breast cancer patients. Our method is directly applicable to other settings as well. For example, for patients with many chronic conditions, risk stratification is also encountering challenges due to the need to incorporate many risk factors. These risk factors, similar to our breast cancer example, are typically ordered. Considering a risk stratification system based on these factors can lead to personalized care but overly refined classification can be challenging for efficient and equitable delivery of health care.

7. Software

The programming codes for this paper are publicly available at https://github.com/WangTJ/glidars.

Supplementary Material

kxab005_Supplementary_Data

Click here for additional data file.^{(380KB, pdf)}

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable input on improving the original version of this article. Conflict of Interest: None declared.

Contributor Information

Tianjie Wang, Department of Statistics, University of Wisconsin, Madison, WI, USA.

Rui Chen, Department of Statistics, University of Wisconsin, Madison, WI, USA.

Wenshuo Liu, Department of Research & Innovation, Interactions LLC, 31 Hayward Street Suite E, Franklin, MA 02038, USA.

Menggang Yu, Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

The authors’ research was partially supported by the Specialized Program of Research Excellence (SPORE) program, through the National Institute for Dental and Craniofacial Research (NIDCR), and National Cancer Institute (NCI), grant P50DE026787. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health (NIH).

References

Amin, M. B., Edge, S. B., Greene, F. L., Byrd, D. R., Brookland, R. K., Washington, M. K., Gershenwald, J. E., Compton, C. C., Hess, K. R., Sullivan, D. C.. and others. (2017). AJCC Cancer Staging Manual. New York: Springer. [Google Scholar]
Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics 10, 1100–1120. [Google Scholar]
Breiman, L. (1984). Classification and Regression Trees. New York: Routledge. [Google Scholar]
Citron, M., Berry, D., Cirrincione, C., Carpenter, J., Hudis, C., Gradishar, W., Davidson, N., Ingle, J., Martino, S., Livingston, R.. and others. (2002). Superiority of dose-dense over conventional scheduling and equivalence of sequential (sc) vs. combination adjuvant chemotherapy for node-positive breast cancer (CALGB 9741, Int C9741). Breast Cancer Research and Treatment 76, S32. [Google Scholar]
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360. [Google Scholar]
Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics 1, 302–332. [Google Scholar]
Green, P. J. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. Journal of the Royal Statistical Society. Series B (Methodological) 46, 149–192. [Google Scholar]
Lin, Y., Wang, S. and Chappell, R. J. (2013). Lasso tree for cancer staging with survival data. Biostatistics 14, 327–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loh, W.-Y. (2014). Fifty years of classification and regression trees. International Statistical Review 82, 329–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
National Cancer Institute. (2004). National Cancer Institute Fact Sheet: Cancer Staging. https://www.cancer.gov/about-cancer/diagnosis-staging/staging.
Onitilo, A. A., Engel, J. M., Greenlee, R. T. and Mukesh, B. N. (2009). Breast cancer subtypes based on ER/PR and Her2 expression: comparison of clinicopathologic features and survival. Clinical Medicine and Research 7, 4–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Partridge, A. H., Archer, L., Kornblith, A. B., Gralow, J., Grenier, D., Perez, E., Wolff, A. C., Wang, X., Kastrissios, H., Berry, D.. and others. (2010). Adherence and persistence with oral adjuvant chemotherapy in older women with early-stage breast cancer in CALGB 49907: adherence companion study 60104. Journal of Clinical Oncology 28, 2418. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perez, E. A., Suman, V. J., Davidson, N. E., Martino, S., Kaufman, P. A., Lingle, W. L., Flynn, P. J., Ingle, J. N., Visscher, D. and Jenkins, R. B. (2006). Her2 testing by local, central, and reference laboratories in specimens from the north central cancer treatment group n9831 intergroup adjuvant trial. Journal of Clinical Oncology 24, 3032–3038. [DOI] [PubMed] [Google Scholar]
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464. [Google Scholar]
Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine 16, 385–395. [DOI] [PubMed] [Google Scholar]
Wang, H. and Leng, C. (2008). A note on adaptive group lasso. Computational Statistics and Data Analysis 52, 5277–5286. [Google Scholar]
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 49–67. [Google Scholar]
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418–1429. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxab005_Supplementary_Data

Click here for additional data file.^{(380KB, pdf)}

[B1] Amin, M. B., Edge, S. B., Greene, F. L., Byrd, D. R., Brookland, R. K., Washington, M. K., Gershenwald, J. E., Compton, C. C., Hess, K. R., Sullivan, D. C.. and others. (2017). AJCC Cancer Staging Manual. New York: Springer. [Google Scholar]

[B2] Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics 10, 1100–1120. [Google Scholar]

[B3] Breiman, L. (1984). Classification and Regression Trees. New York: Routledge. [Google Scholar]

[B4] Citron, M., Berry, D., Cirrincione, C., Carpenter, J., Hudis, C., Gradishar, W., Davidson, N., Ingle, J., Martino, S., Livingston, R.. and others. (2002). Superiority of dose-dense over conventional scheduling and equivalence of sequential (sc) vs. combination adjuvant chemotherapy for node-positive breast cancer (CALGB 9741, Int C9741). Breast Cancer Research and Treatment 76, S32. [Google Scholar]

[B5] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360. [Google Scholar]

[B6] Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics 1, 302–332. [Google Scholar]

[B7] Green, P. J. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. Journal of the Royal Statistical Society. Series B (Methodological) 46, 149–192. [Google Scholar]

[B8] Lin, Y., Wang, S. and Chappell, R. J. (2013). Lasso tree for cancer staging with survival data. Biostatistics 14, 327–339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Loh, W.-Y. (2014). Fifty years of classification and regression trees. International Statistical Review 82, 329–348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] National Cancer Institute. (2004). National Cancer Institute Fact Sheet: Cancer Staging. https://www.cancer.gov/about-cancer/diagnosis-staging/staging.

[B11] Onitilo, A. A., Engel, J. M., Greenlee, R. T. and Mukesh, B. N. (2009). Breast cancer subtypes based on ER/PR and Her2 expression: comparison of clinicopathologic features and survival. Clinical Medicine and Research 7, 4–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Partridge, A. H., Archer, L., Kornblith, A. B., Gralow, J., Grenier, D., Perez, E., Wolff, A. C., Wang, X., Kastrissios, H., Berry, D.. and others. (2010). Adherence and persistence with oral adjuvant chemotherapy in older women with early-stage breast cancer in CALGB 49907: adherence companion study 60104. Journal of Clinical Oncology 28, 2418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Perez, E. A., Suman, V. J., Davidson, N. E., Martino, S., Kaufman, P. A., Lingle, W. L., Flynn, P. J., Ingle, J. N., Visscher, D. and Jenkins, R. B. (2006). Her2 testing by local, central, and reference laboratories in specimens from the north central cancer treatment group n9831 intergroup adjuvant trial. Journal of Clinical Oncology 24, 3032–3038. [DOI] [PubMed] [Google Scholar]

[B14] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464. [Google Scholar]

[B15] Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine 16, 385–395. [DOI] [PubMed] [Google Scholar]

[B16] Wang, H. and Leng, C. (2008). A note on adaptive group lasso. Computational Statistics and Data Analysis 52, 5277–5286. [Google Scholar]

[B17] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 49–67. [Google Scholar]

[B18] Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418–1429. [Google Scholar]

PERMALINK

Structure-preserving integrated analysis for risk stratification with application to cancer staging

Tianjie Wang

Rui Chen

Wenshuo Liu

Menggang Yu

Summary

1. Introduction

2. Penalized cox regression

Theorem 1

3. Computation and algorithm