Statistical Framework in Support of a Revised Children's Oncology Group Neuroblastoma Risk Classification System

Arlene Naranjo; Meredith S Irwin; Michael D Hogarty; Susan L Cohn; Julie R Park; Wendy B London

doi:10.1200/CCI.17.00140

. 2018 Jul 20;2:CCI.17.00140. doi: 10.1200/CCI.17.00140

Statistical Framework in Support of a Revised Children's Oncology Group Neuroblastoma Risk Classification System

Arlene Naranjo ^1,^✉, Meredith S Irwin ¹, Michael D Hogarty ¹, Susan L Cohn ¹, Julie R Park ¹, Wendy B London ¹

PMCID: PMC6421832 NIHMSID: NIHMS1016453 PMID: 30652588

Abstract

Purpose

The International Neuroblastoma Risk Group (INRG) Staging System (INRGSS) was developed through international consensus to provide a presurgical staging system that uses clinical and imaging data at diagnosis. A revised Children's Oncology Group (COG) neuroblastoma (NB) risk classification system is needed to incorporate the INRGSS and within the context of modern therapy. Herein, we provide statistical support for the clinical validity of a revised COG risk classification system.

Patients and Methods

Nine factors were tested for potential statistical and clinical significance in 4,569 patients diagnosed with NB who were enrolled in the COG biology/banking study ANBL00B1 (2006-2016). Recursive partitioning was performed to create a survival-tree regression (STR) analysis of event-free survival (EFS), generating a split by selecting the strongest prognostic factor among those that were statistically significant. The least absolute shrinkage and selection operator (LASSO) was applied to obtain the most parsimonious model for EFS. COG patients were risk classified using STR, LASSO, and per the 2009 INRG classification (generated using an STR analysis of INRG data). Results were descriptively compared among the three classification approaches.

Results

The 3-year EFS and overall survival (± SE) were 72.9% ± 0.9% and 84.5% ± 0.7%, respectively (N = 4,569). In each approach, the most statistically and clinically significant factors were diagnostic category (eg, NB, ganglioneuroblastoma), INRGSS, MYCN status, International Neuroblastoma Pathology Classification, ploidy, and 1p/11q status. The results of the STR analysis were more concordant with those of the INRG classification system than with LASSO, although both methods showed moderate agreement with the INRG system.

Conclusion

These analyses provide a framework to develop a new COG risk classification incorporating the INRGSS. There is statistical evidence to support the clinical validity of each of the three classifications: STR, LASSO, and INRG.

INTRODUCTION

Neuroblastoma is a cancer of the sympathetic nervous system; it most commonly occurs in the adrenal glands and nerve tissue extending from the neck to the pelvis. It is the most common extracranial solid tumor in childhood, with > 650 cases diagnosed yearly in North America.^1,2 Risk stratification, incorporating clinical and biologic factors, has been used for over two decades to predict prognosis and assign patients to appropriate therapeutic intensity. The International Neuroblastoma Risk Group Staging System (INRGSS)³ was developed to define extent of disease at diagnosis, before treatment, including surgical resection. In contrast, the International Neuroblastoma Staging System (INSS)^4,5 is a postsurgical classification of extent of disease. INSS stages 1 and 2 refer to complete or partially resected locoregional tumors, stage 3 denotes large locoregional tumors crossing the midline, and stage 4 denotes tumors with distant metastases (Fig 1). Stage 4S describes tumors in patients < 12 months of age with stage 1 primary tumors and metastatic disease limited to skin, liver, and < 10% of bone marrow without cortical bone involvement. In the INRGSS, L1 and L2 are locoregional tumors in the absence or presence of image-defined risk factors (IDRF),⁶ respectively. Widely disseminated disease is classified as stage M. Stage MS describes L1 or L2 tumors associated with metastatic disease limited to skin, liver, and < 10% of bone marrow without cortical bone involvement in patients < 18 months old.

Fig 1. — Relationship between the INSS and the INRGSS. IDRF, image-defined risk factor; INRGSS, International Neuroblastoma Risk Group Staging System; INSS, International Neuroblastoma Staging System; mIBG, metaiodobenzylguanidine.

The goal of the INRG task force was to harmonize risk classifications across international groups. To create the INRG risk groups, 23 prognostic factors were tested in an EFS survival-tree regression (STR) analysis (N = 8,800 patients diagnosed worldwide, 1990 through 2002), resulting in a classification using INSS, age, diagnostic category, grade of differentiation, MYCN status, 11q aberration, and ploidy.⁷ Treatments have evolved significantly since the period 1990 through 2002, resulting in improved survival, especially for patients with high-risk neuroblastoma (NB).^8,9 Since 2006, the Children's Oncology Group (COG) has collected INRGSS data to study its prognostic strength and impact on risk classification.

The goal of this paper is to provide the statistical modeling framework to support a revised COG risk classification system within the context of modern therapy and with INSS replaced by INRGSS. We have chosen to explore and descriptively compare two different statistical approaches: STR and least absolute shrinkage and selection operator (LASSO). We are not attempting to quantify the superiority of any one approach; rather, we provide statistical evidence of the clinical validity of each approach.

STR, or recursive partitioning, provides a graphical way of representing the prognostic structure of data by successively splitting the covariate space into relatively homogeneous groups of observations, or nodes, and maximizes between-node separation in terms of the outcome measure. The classification and regression tree algorithm originally described by Breiman et al¹⁰ was extended to accommodate censored survival data, including methods on the basis of the two-sample log-rank test^11,12 and the Cox proportional hazards (PH) model.^13,14 Tree-structured methods have the advantage of being simply explained and understood, identifying groups of patients with distinct survival outcomes, and allowing easy classification of new patients.

The second method, LASSO, is a linear regression method for both variable selection and improving prediction accuracy. Introduced by Tibshirani,¹⁵ and later extended to the Cox PH model,¹⁶ the LASSO achieves covariate selection and regularization by minimizing the sum of squared errors subject to a constraint (via a tuning parameter) on the sum of the absolute values of the coefficients. This removes the weakest covariates, leaving the most parsimonious model. LASSO identifies the most important variables associated with outcome that minimize the prediction error.

Herein, we report survival data for COG subsets of patients with NB diagnosed between 2006 and 2016 using the INRGSS. Our STR analysis identified patient subgroups with poor outcome in otherwise well-performing cohorts and subgroups with more favorable outcomes among patients with poor survival. The fitted LASSO model predicted patient survival outcomes based on the most prognostic patient characteristics. These methods provide the basis for developing a revised COG classification system, within the context of modern therapy, incorporating the INRGSS.

PATIENTS AND METHODS

Patients newly diagnosed with NB, ganglioneuroblastoma (GNB), or ganglioneuroma (GN; Schwannian stroma-dominant), maturing subtype (GN were not eligible) with tumor sample submission and without prior chemotherapy were eligible for ANBL00B1, the COG neuroblastoma biology and banking study. Eligibility criteria were enrollment in ANBL00B1 (between August 18, 2006, and June 30, 2016), with known diagnostic category, IDRF status,⁶ and INSS. Institutional review board approval was obtained at participating sites. Written informed consent was obtained before enrollment in ANBL00B1.

The risk factors tested in this analysis have repeatedly proven to be prognostic, and most are used in the current COG risk stratification (Appendix Table A1). The starting variables for the STR and LASSO models were age at diagnosis (< 18 months v ≥ 18 months),^17-19 INRGSS (L1 v L2 v M v MS v M/MS Indeterminate [Ind]), MYCN status (nonamplified v amplified),²⁰ ploidy (hyperdiploid v diploid),²¹ diagnostic category (ganglioneuroblastoma, intermixed [GNBI] v NB and GNB/nodular),²² grade of differentiation (differentiating v totally undifferentiated/poorly differentiated), mitosis-karyorrhexis index (MKI; low/intermediate v high), International Neuroblastoma Pathology Classification (INPC; favorable v unfavorable),^23,24 and 1p and/or 11q segmental chromosome deletion (no loss v loss of either).²⁵ All biomarker assays were performed at diagnosis by the COG reference laboratory and pathology was centrally reviewed.

INSS and IDRF status were used to determine INRGSS (Fig 1). The presence or absence of distant metastases was determined on the basis of INSS. INSS used a 12-month age cut point for 4S, but INRGSS adopted an 18-month cut point for MS. In our cohort, metastatic-site information for patients 12 to 18 months old with INSS stage 4 disease at diagnosis was not collected; such patients have been denoted INRGSS M/MS Ind because the MS versus M distinction was indeterminant.

The 1p/11q variable was defined as follows: loss of heterozygosity in either 1p or 11q was “loss of either”; no loss of heterozygosity in both 1p and 11q was “no loss.” The diagnostic category GNBI comprised GNBI (Schwannian stroma-rich) and GN, maturing subtype tumors; the NB and GNB/nodular group included NB (Schwannian stroma-poor); peripheral neuroblastic tumors; and GNB, nodular (composite).

The primary end point was time to event, calculated from diagnosis until first occurrence of relapse, progression, secondary malignancy, or death, whichever occurred first; patients without an event were censored on the date of last contact. Time to death was a secondary end point; patients alive were censored on the date of last contact. Values quoted for EFS and OS are at 3 years ± SE,^26,27 and curves were compared using a log-rank test. Analyses, including the manual STR procedure (PROC PHREG), were performed using SAS, version 9.4 (SAS Institute, Cary, NC). Survival curve generation and LASSO modeling were performed in R (R Project for Statistical Computing, https://www.r-project.org/).

STR Analysis

Recursive partitioning was performed to create a “survival tree.” Starting with the overall patient cohort, univariate Cox PH models of EFS identified statistically significant (P ≤ .05) factors, and the one with the largest hazard ratio (HR) was selected manually to create two subgroups. If the factor had more than two levels (eg, INRGSS), all levels were first compared individually and grouped together if not significantly different from each other, until only significantly different groupings remained. Within each subgroup, the remaining factors were tested and the partitioning process repeated manually until the sample size was too small or no statistically significant factors remained.⁷ The PH assumption was tested in the terminal splits by testing a covariate by survival-time interaction term in the Cox model.²⁸ The HR is the increased risk of an event compared with the reference level. (Hereafter, in the article text, * denotes the reference category for the HR.)

The data were randomly split into two evenly sized groups, stratified by INRGSS stage, and the STR was performed in each dataset as internal validation. If the STR methodology yielded similar results in each dataset, the two datasets were to be recombined for the definitive analysis.

Age together with diagnostic category, grade, and MKI are used to define INPC as favorable or unfavorable; as a result, these factors are statistically confounded with INPC (Appendix Table A2). Therefore, if INPC was identified as the most strongly prognostic factor, age, diagnostic category, grade, or MKI were not tested thereafter. In addition to the objective statistical criteria used to create splits, subgroups historically treated with different levels of treatment intensity, yet currently had similar outcome, were maintained as separate subgroups using a “clinical split” of the factor historically used to direct the varying levels of treatment intensity (eg, MYCN, described later in this article). The clinical split will override the factor chosen by the STR (based on largest statistically significant HR) to create a split.

LASSO

LASSO requires complete data for each factor included in the model; therefore, to permit inclusion of patients with unknown factors, a series of binary dummy variables, one for each factor, was created for the missing category (yes = 1; no = 0). For each factor, the initial LASSO model included the dummy variable for missing data and a term for the known nonreference level of the factor, leaving the other category as reference. This approach prevented potential selection bias that could occur if only patients with complete data were included in the model.^18,29,30

Factors with more than two categories required more than one binary variable in the LASSO model. To ensure that all covariates encoding a given factor were either included or excluded from the model as a group, the “group” LASSO was applied.³¹ The group LASSO was fit using the cv.grpsurv function in the R package grpreg (https://CRAN.R-project.org/package=grpreg³²). Cross-validation (10-fold) was used to select the tuning parameter value that minimized the mean cross-validated error while providing some factor reduction. The tuning parameter controls the strength of the penalty; as it increases, more coefficients are shrunk to zero and fewer variables are maintained in the final model. When the tuning parameter is zero, we have ordinary least squares regression. The relative risk (RR), or increased risk of event in comparison with the reference category, was reported for the selected factors in the final model for EFS. For comparability with STR and INRG, interactions were not tested in the LASSO model.

In addition, within each prognostic variable, an assessment to determine whether EFS was missing completely at random was performed. Kaplan-Meier EFS curves²⁶ were generated for the reference, known nonreference, and missing groups. If survival was missing completely at random, then the missing group is expected to be a mixture of patients with and without the attribute and the Kaplan-Meier curve for the missing group should fall between the reference and known nonreference groups.³⁰

Methodology Comparison

The INRG pretreatment classification system⁷ was used as a descriptive comparator (Table 1). To avoid confusion with the revised COG risk classification system still in development, EFS risk groups were assigned generic labels (ie, groups A, B, C, and D; Table 1). These correspond to 3-year EFS values of > 90% for group A, > 80 to ≤ 90% for group B, ≥ 55 to ≤ 80% for group C, and < 55% for group D, which are similar to the EFS cut offs used in the INRG system (5-year EFS of > 85%, > 75 to ≤ 85%, ≥ 50 to ≤ 75%, and < 50%, respectively).

Table 1.

Categorization of 3,944 ANBL00B1 Patients Using the International Neuroblastoma Risk Group Consensus Pretreatment Classification Schema⁷

Open in a new tab

STR and LASSO analyses were compared with the INRG classification system, with differences noted. The 3-year EFS for each terminal node of the STR and LASSO was classified into EFS groups A through D. Each approach (ie, STR, LASSO) was compared with the INRG classification system by summing the number of concordant patients and dividing by the total number of patients categorized by the two systems compared. The level of agreement between the INRG classification system, STR, and LASSO methods was assessed using weighted κ.^33,34

RESULTS

The analytic cohort of 4,569 eligible patients was used in the STR and the LASSO analyses (Table 2). The overall 3-year EFS and OS were 72.9% ± 0.9% and 84.5% ± 0.7%, respectively, with median follow-up time of 3.1 years in 3,487 patients alive without event. The degree of missing data varied from none (ie, age, stage, diagnostic category) to moderate (range, 5% to 17% for INPC, MYCN status, grade, MKI, and ploidy), to high (54.5%) for 1p/11q.

Table 2.

Clinical, Biologic, and Genetic Patient Characteristics of 4,569 Patients Enrolled on the COG Neuroblastoma Biology and Banking Study ANBL00B1 Between August 18, 2006, and June 30, 2016

Open in a new tab

We examined how INSS mapped to INRGSS for patients with locoregional disease (Fig 2). As would be predicted, the proportion of patients with at least one IDRF present was higher in patients with more advanced INSS.