Abstract
Pan-cancer genomic analyses based on the magnitude of pathway activity are currently lacking. Focusing on the cell cycle, we examined the DNA mutations and chromosome arm-level aneuploidy within tumours with low, intermediate and high cell cycle activity in 9 515 pan-cancer patients with 32 different tumour types. Boxplots showed that cell cycle activity varied broadly across and within all cancers. TP53 and PIK3CA mutations were common in all cell cycle score (CCS) tertiles but with increasing frequency as cell cycle activity levels increased (P < 0.001). Mutations in BRAF and gains in 16p were less frequent CCS high tumours (P < 0.001). In Kaplan-Meier analysis, patients whose tumours were CCS Low had a longer Progression Free Interval (PFI) relative to intermediate or high (P < 0.001) and this significance remained in multivariable analysis (CCS intermediate: HR = 1.37; 95% CI 1.17 – 1.60, CCS high: 1.54; 1.29 – 1.84, CCS Low = Ref). These results demonstrate that whilst similar DNA alterations can be found at all cell cycle activity levels, some notable exceptions exist. Moreover, independent prognostic information can be derived on a pan-cancer level from a simple measure of cell cycle activity.
Introduction
The Nobel prize winning research of Hartwell1, Nurse2,3 and Hunt4 in the nineteen seventies and eighties fundamentally changed our understanding of the cell cycle and provided broad insight into the molecules governing its regulation. These seminal discoveries have shaped our modern view of the cell cycle and its separation into four distinct phases commonly referred to as G1, S, G2 and M. Transitions between these phases are governed by the cyclin family of proteins along with their binding partners the cyclin dependent kinases (CDKs)5. Disruptions to the function of cyclin-CDK holoenzymes or other cell cycle pathway members can lead to impaired control over the cycle and sustained proliferation - a hallmark of cancer6.
Large scale pan-cancer studies have sought to understand human malignancies at a molecular level through the integration of multiple high-throughput data types. This approach has yielded a number clinically relevant findings including the coalescence of lung squamous, head and neck, and some bladder cancers into a single pan-cancer subtype and the ability to classify tumours into prognostic subgroups at a pan-cancer level7. More recently, data from over eleven-thousand patients has shown actionable mutations in up to fifty-seven percent of tumours8, a positive correlation between aneuploidy and cell cycle genes9, and frequent co-alterations in the p53 and cell cycle pathways10. To date, the analysis of genomic aberrations in these studies have typically focused on all pan-cancer tumours at once8, within subgroups of tumours that have clustered together on the basis of DNA, RNA and protein expression – termed the iClusters8, or within tumours with a common genetic alteration such as chromosome 3p loss9. Given the varying degrees of oncogenic pathway activation/suppression across cancer types10, we hypothesized that basing genomic analyses on the magnitude of pathway activity may also provide important biological information and clinical insight. In view of the fundamental biological role of the cell cycle in cancer and the frequent genomic alterations of its pathway members, it represents a compelling choice for a pathway activity-based analysis.
Here, in order to test our hypothesis, we compare the most prevalent genomic alterations in tumours with low, intermediate and high levels of cell cycle activity by integrating data from multiple genomic platforms in over nine-thousand tumours from The Cancer Genome Atlas (TCGA). Specifically, we examine gene expression levels, gene mutational frequency and chromosome arm-level alterations across pan-cancer tumours grouped into low, intermediate and high tertiles of cell cycle activity on the basis of our cell cycle score (CCS) gene signature11,12. Finally, we also determine the clinical relevance of this signature across and within cancer types using survival analyses including Kaplan-Meier graphs and multivariable Cox proportional hazards modeling adjusting for patient and tumour characteristics.
Results
Cohort clinico-pathological characteristics in relation to CCS subgroups
In line with our aim to compare genomic alterations in tumours with differing levels of cell cycle activity we applied our CCS signature (genes are shown in Supplemental Table 1) to gene expression data from the tumours of 9,515 pan-cancer patients. Clinico-pathological characteristics for the pan-cancer cohort split by low, intermediate and high CCS tertile classifications are shown in Table 1 and a CONSORT diagram showing the exclusion criteria for this study is shown in Supplemental figure 1. Statistically significant associations were found between patient age, gender, pathological stage, radiotherapy and CCS subgroups (Table 1, Chi-squared test: P < 0.001 for all comparisons). After adjusting for cancer type, only stage and radiotherapy remained statistically significant whereby CCS high tumours were more likely to be stage IV and to have received radiotherapy (data not shown).
Table 1.
Clinical characteristics of all patients split by CCS - pan-cancer
| Variables | Pan-cancer (n = 9515) | ||||
|---|---|---|---|---|---|
| Low | Intermediate | High | |||
| n (%) | n (%) | n (%) | |||
| 3145 (33) | 3184 (33.5) | 3186 (33.5) | p | ||
| Age | |||||
| ≤ 54 | 1290 (41) | 876 (28) | 1061 (34) | < 0.001 | |
| 54 – 66 | 1044 (33) | 1062 (33) | 996 (31) | ||
| ≥ 66 | 808 (26) | 1236 (39) | 1119 (35) | ||
| Missing cases = 23 | |||||
| Gender | |||||
| Male | 1771 (56) | 1372 (43) | 1494 (47) | < 0.001 | |
| Female | 1374 (44) | 1812 (57) | 1692 (53) | ||
| Pathological stage | |||||
| Stage I | 859 (45) | 601 (26) | 444 (20) | < 0.001 | |
| Stage II | 480 (25) | 820 (35) | 768 (35) | ||
| Stage III | 419 (22) | 639 (28) | 575 (27) | ||
| Stage IV | 150 (8) | 260 (11) | 382 (18) | ||
| Missing cases & excluded cases° = 3118 | |||||
| Radiotherapy | |||||
| No | 1954 (73) | 2047 (73) | 1820 (65) | < 0.001 | |
| Yes | 709 (27) | 770 (27) | 993 (35) | ||
| Missing cases = 1222 | |||||
I/II NOS-Stage 0/IS/X, , In bold significant p < 0.05
Broad variation in cell cycle activity across cancers and COCA subtypes
We next assessed tumour cell cycle activity by creating pan-cancer, Cluster of cluster assignment (COCA) and iCluster boxplots using the continuous CCS. We found the highest levels of cell cycle activity in Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC), Testicular Germ Cell tumours (TGCT), Head and Neck squamous cell carcinoma (HNSC) and Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) tumours and the lowest in Kidney Chromophobe (KICH), Pheochromocytoma and Paraganglioma (PCPG), Kidney renal papillary cell carcinoma (KIRP) and Prostate adenocarcinoma (PRAD) tumours (Figure 1A). Similar results were found using the COCA algorithm - a classification strategy that clusters samples by integrating information from multiple individual cross platform technologies, with CA17 (TCGT) and CA4 (PAN-SCC, mainly HNSC, LUSC and CESC tumours) forming the top two subgroups with the highest cell cycle activity (Figure 1B). CA10 (Breast Invasive Carcinoma (BRCA), basal-like) and CA25 (Hematologic/lymphatic, mainly Thymoma (THYM) and DLBC tumours), also showed high cell cycle activity, whilst CA1 (CNS/Endocrine, mainly PCPG tumours), CA14 (PRAD) and C21 (PAN-Kidney) showed the lowest levels of all COCA subtypes (Figure 1B). Analogous results were noted using the iCluster classification strategy (Supplemental Figure 2). Examining cell cycle activity clusters using heatmap analysis demonstrated that tumours with low levels of cell cycle activity (and thus classified as CCS Low) show low expression of the majority of genes in all cell cycle phases (G1 to M), whilst the opposite is true for tumours with high levels of cell cycle activity (Figure 1C, compare tumours with black column-side colour to those with yellow).
Figure 1. CCS score across cancer types and COCA subtypes.

Boxplots comparing CCS across (A) pan-cancer types and (B) COCA subtypes. Numbers in parentheses represent number of tumours in each cancer type and/or COCA subtype. (C) Heatmap of CCS genes across pan-cancer tumours. Heatmap colside colours (horizontal, above heatmap) represent cell cycle score, cancer types and COCA subtypes as indicated in figure legend. Rowside colours (vertical, left hand side of heatmap) represent cell cycle phases.
TP53 and PIK3CA mutations display increasing frequency across cell cycle activity subgroups
To more clearly delineate the frequency of DNA mutations in relation to the magnitude of cell cycle activity we next examined the mutational frequency of 299 well defined oncogene and tumour suppressor driver genes within CCS subgroups. TP53 was found to be the most mutated gene in all three CCS subgroups and displayed an increase in mutational frequency with increasing CCS activity (Figure 2A, Supplemental Table 2, Chi-squared test: P < 0.001). In CCS Low tumours 40% of TP53 mutations were found in LGG, whereas in CCS high tumours TP53 mutations were most common in HNSC (18%), LUSC (17%) and BRCA (13%) (Highlighted in Figure 2A) PIK3CA was the second-most commonly mutated gene in CCS intermediate and high tumours and fifth most common in CCS Low tumours (Figure 2A). It is also more frequently mutated in CCS Intermediate and High tumours relative to CCS Low (Supplemental Table 2, P < 0.001). PIK3CA mutations in Breast Invasive Carcinoma (BRCA) and Uterine Corpus Endometrial Carcinoma (UCEC) were common across all CCS subgroups and were additionally found in Head and Neck squamous cell carcinoma (HNSC) and Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) in CCS high tumours (Figure 2A). Of interest, whilst BRAF mutations were prominent in Low and Intermediate subgroups as the third and eleventh most mutated gene respectively, it was absent from the top 15 in CCS High tumours (Figure 2A, red arrows, Supplemental Table 2, P < 0.001). These findings suggest genes other than BRAF are more commonly mutated in tumours with high cell cycle activity. Of note, the increased number of BRAF mutations in CCS Low tumours is mainly driven by a single tumour type – Thyroid carcinoma (THCA) (Thyroid carcinoma, pink colour under red arrow in Figure 2A), whereas in CCS Intermediate (and High) tumours BRAF mutations are mostly found in Skin Cutaneous Melanoma (SKCM) (Supplemental Table 3). The top 50 most frequently mutated genes in all CCS subgroups are shown in Supplemental Table 4.
Figure 2. Top 15 most commonly mutated genes or chromosomal arm-level alterations within CCS subgroups.

Pan-cancer tumours were divided into tertiles on the basis of low, intermediate or high CCS. Within each subgroup the Top 15 (A) Most frequently mutated oncogenes and tumour suppressor genes, (B) Arm-level gains and (C) Arm-level losses are shown. Cancer type colour key is are shown at the bottom of the figure. Red arrows indicate BRAF mutations and 16p gains in CCS low and intermediate subgroups.
Higher levels of chromosomal gains and losses in CCS intermediate and high tumours
We next performed the same subgroup analysis, but this time focusing on chromosome arm-level gains and losses. All CCS subgroups showed a high number of gains to arms 20q, 8q and 7p and losses to arms 17p and 8p (Figure 2B and C, respectively, all CCS subgroups). Moreover, these chromosomal aberrations all displayed an increase in frequency with increasing CCS activity (Supplemental Table 2, Chi-squared test: P < 0.001 for all comparisons, not adjusted for multiple testing). Overall, gains in KIRP (Figure 2B, highlighted) and losses in PCPG cancers (Figure 2C, highlighted) were more common CCS Low tumours relative to CCS Intermediate and High subgroups, as could be anticipated given the low cell cycle activity levels displayed by these tumour types and their grouping into the CCS Low tumour subgroup (Figure 1A). Analogous to our BRAF mutation findings, gains to 16p (Figure 2B, red arrows) were more common in CCS Low and Intermediate subgroups relative to the CCS High subgroup (Supplemental Table 2, P < 0.001). 29% of 16p gains are found in KIRP in CCS low tumours, whereas they occur most commonly in BRCA in CCS Intermediate and High tumours (Supplemental Table 3). The frequency of chromosomal arm gains and losses in all CCS subgroups are shown in Supplemental Table 5.
Next, we examined genomic alterations more broadly within CCS subgroups and found the frequency of gene mutations and chromosomal arm gains and losses to be greater in CCS Intermediate and High groups relative to Low (Figure 3 A – C, Tukey HSD test, 3A top 50 DNA mutations: P < 0.001 and P < 0.001, 3B chromosomal gains: P = 0.018 and P < 0.001 and losses 3C: P < 0.001 and P < 0.001 for Low vs. Intermediate and High, respectively). Similarly, using the recently derived aneuploidy score9 - a measure of the total number of chromosome arms with arm-level copy number changes in a given sample, we also found a statistically significant increase with increasing CCS activity levels (Figure 3 D, P < 0.001 for all comparisons).
Figure 3. Boxplots comparing frequency of DNA alterations across CCS subgroups.

Pan-cancer tumours were divided into tertiles on the basis of low, intermediate or high CCS. Within each subgroup the number of (A) Total mutations in the top 50 most mutated oncogenes or tumour suppressor genes, (B) Total chromosomal arm-level gains, (C) Total chromosomal arm-level losses and (D) Aneuploidy score are shown. (E) Kaplan-Meier analysis of CCS subgroups with Progression-free Interval (PFI) as clinical endpoint. Low/Inter/High = Low/Intermediated/High CCS subgroups, p values in boxplots (based on ANOVA with post-hoc Tukey HSD test) = NS > 0.05, * < 0.05, ** < 0.01, *** < 0.001; p value in the Kaplan-Meier curves refer to long-rank tests.
CCS signature provides independent prognostic information at pan-cancer level
We next assessed the relationship between CCS and Progression Free Interval (PFI) using Kaplan-Meier and multivariable Cox proportional hazard regression model analyses. In univariate Kaplan-Meier analysis patients whose tumours were classified as CCS Low had a significantly longer PFI relative to those classified as CCS Intermediate or High (Figure 3 E, log-rank test: P < 0.001). This significance remained when adjusting for tumour type, age, gender, pathological stage and radiotherapy in Cox proportional hazard analysis (Table 2, CCS intermediate: HR 1.37 95% CI 1.17 – 1.60, CCS high: HR 1.54 95% CI 1.29 – 1.84, tumour type not shown). The upper age tertile (≥ 66) remained statistically significant in the same model (HR 1.19 95% CI 1.05 – 1.35 vs. Ref), as did all pathological stages vs. the Stage I model reference group. As many cancers contain additional molecular subgroups (e.g. breast cancer) we also performed a similar analysis but adjusting for COCA subtypes rather than pan-cancer types and found comparable independent prognostic capacity for the CCS (data not shown).
Table 2.
Multivariate evaluation of prognostic markers in patients characterized by Cell Cycle Score
| Pan-cancer (n = 5421)* | ||||
|---|---|---|---|---|
| Variables | N (%) | HR | 95% CI | p |
| Age | ||||
| ≤ 54 | 1679 (31) | Ref | - | - |
| 54 – 66 | 1761 (32) | 1.04 | 0.91 – 1.18 | 0.551 |
| ≥ 66 | 1981 (37) | 1.19 | 1.05 – 1.35 | 0.008 |
| Missing cases = 23 | ||||
| Gender | ||||
| Male | 2757 (51) | Ref | - | - |
| Female | 2664 (49) | 0.96 | 0.87 – 1.07 | 0.483 |
| Pathological stage | ||||
| Stage I | 1561 (29) | Ref | - | - |
| Stage II | 1852 (34) | 1.60 | 1.38 – 1.86 | < 0.001 |
| Stage III | 1378 (25) | 2.41 | 2.08 – 2.79 | < 0.001 |
| Stage IV | 630 (12) | 5.04 | 4.21 – 6.03 | < 0.001 |
| Missing cases = 3118 | ||||
| Radiotherapy | ||||
| No | 3997 (74) | Ref | - | - |
| Yes | 1424 (26) | 0.97 | 0.84 – 1.11 | 0.658 |
| Missing cases = 1222 | ||||
| Cell cycle score | ||||
| Low | 1505 (28) | Ref | - | - |
| Intermediate | 2013 (37) | 1.37 | 1.17 – 1.60 | < 0.001 |
| High | 1903 (35) | 1.54 | 1.29 – 1.84 | < 0.001 |
adjusted for cancer types, Ref: Reference groups, N: Number of patients, HR: hazard ratio, CI: confidence interval, In bold significant p < 0.05
In order to more closely examine individual cancer types where the signature splits tumours into two or three CCS subgroups, we again performed Kaplan-Meier and Cox proportional hazard modelling but this time focusing on individual cancers. CCS provided significant independent prognostic information in four cancer types: Kidney renal clear cell carcinoma (KIRC) (P = 0.042), LGG (P < 0.001), Sarcoma (SARC) (P = 0.001) and Uveal Melanoma (UVM) (P = 0.013, Supplemental Figure 3, alphabetical ordering, adjusted for multiple testing). Finally, as the CCS subgroups are based on a tertile split of cell cycle activity on a pan-cancer level, we hypothesized that deriving subgroups in this manner may provide superior prognostic information to a simple tertile split within (intra) each cancer type. To test this hypothesis, we compared our pan-cancer CCS tertile subgroups to intra-cancer CCS tertile subgroups. We found that whilst both cut-offs provide significant prognostic information in the above four cancer types (Compare Kaplan-Meier curves for pan-cancer CCS to intra-cancer CCS, Supplemental Figure 4), a pan-cancer cut-off provides more prognostic information calculated by likelihood ratio (LR) test, in KIRC (LR = 24.7), LGG (LR = 31.1), SARC (LR = 18.5) and UVM cancers (LR = 17.1, Table 3, compare pan-cancer column to intra-cancer). These findings suggest that deriving transcriptional biomarker cut-points on a pan-cancer level may be advantageous relative to deriving them in a single cancer type. For the sake of completeness, hazard ratios and 95% confidence intervals for pan-cancer and intra-cancer tertile subgroups in individual cancers are shown in Supplemental Table 6.
Table 3.
Prognostic value of Cell Cycle signature (CCS) based on likelihood ratio (LR-X2) and concordance index (C-index) pan-cancer and intra-cancer
| Models | Cell cycle score | |||||||
|---|---|---|---|---|---|---|---|---|
| Total | Events | Pan-Cancer | Intra-cancer | |||||
| Univariate | C-index | LR-X2 | p | C-index | LR-X2 | p | ||
| KIRC | 480 | 148 | 0.589 | 24.7 | < 0.001 | 0.567 | 9.9 | 0.001 |
| LGG | 506 | 192 | 0.609 | 31.1 | < 0.001 | 0.627 | 21.9 | < 0.001 |
| SARC | 242 | 124 | 0.610 | 18.5 | < 0.001 | 0.614 | 14.3 | < 0.001 |
| UVM | 79 | 24 | 0.732 | 17.1 | < 0.001 | 0.707 | 11.5 | < 0.001 |
KIRC: Kidney renal clear cell carcinoma; LGG: Brain Lower Grade Glioma; SARC: Sarcoma; UVM: Uveal Melanoma; Events: Progression Free Interval (PFI) events in which patients had a new tumor whether it was a progression of disease, local recurrence, distant metastasis, new primary tumors all sites, or died with the cancer without new tumor event; CCS: Cell cycle score; LR-X2: Likelihood ratio; C-index: Concordance index; In bold significant p < 0.05
Discussion
The present study integrates gene expression, DNA mutation, DNA copy number and clinico-pathological data from 9 515 pan-cancer patients in order to better understand the DNA level alterations present in tumours with low, intermediate and high cell cycle activity. Our main findings show first, that cell cycle activity varies broadly across and within cancer types; second, that TP53, PIK3CA and chromosomal alterations (including gains to 20q, 8q, 7p and losses to arms 17p and 8p) occur with increasing frequency in tumours with increasing cell cycle activity; third, whilst in general similar mutations/arm level alterations are present within tumours with low, intermediate and high cell cycle activity, mutations in BRAF and gains in 16p were less frequent in tumours with high cell cycle activity; and fourth, that deriving cut-points for biomarkers on a pan-cancer level may provide more prognostic information than deriving them within specific cancer types. These analyses are the first to provide broad insight on the genetic alterations occurring within tumours grouped on the basis of cell cycle activity in order to advance our understanding of a pathway that is frequently dysregulated in human malignancies.
In pan-cancer analyses, TP53, PIK3CA, KRAS, PTEN and ARID1A genes have all been previously demonstrated to be mutated in over 15 different cancer types8. These genes also featured heavily in our mutational analysis with TP53 and PIK3CA mutations showing the high mutational frequency across CCS subgroups. This implies that mutations in these genes are found in tumours with a broad range of cell cycle activity and are not just associated with highly cycling cancers, despite their very clear links to cell cycle progression13,14. Whilst we found the ARID1A gene to be mutated in all CCS subgroups, BRAF was notable for only being found in the top 15 of the CCS Low and Intermediate subgroups, implying that other genes are more commonly mutated in tumours with high cell cycle activity, such as TP53 and PIK3CA. This result is partially driven by the cancer types found in each of the CCS subgroups, e.g. BRAF mutations are predominantly found in THCA cancers in the CCS Low subgroup and SKCM in CCS intermediate and high tumours. It is important to highlight that CCS tumour subgroups were derived on the basis of biological cell cycle pathway activity alone. Our aim was to provide map/characterize the DNA aberrations within tumours on the basis of pathway magnitude, as such, even if a specific aberration is enriched owing to a certain tumour type, it is still one characteristic of tumours with low levels of cell cycle activity, albeit one associated with a specific cancer type.
It has recently been demonstrated that tumour aneuploidy is inversely correlated to immune signaling genes and positively correlated to cell cycle and pro-proliferation pathways9. Our findings are in line with these showing a step wise increase in aneuploidy score with increasing CCS activity levels. Related to this, whilst most of predominant chromosome arm-level alterations we observed overlapped with those from the pan-cancer publication9, our within subgroup analysis yielded some novel findings. In particular, and analogous to our mutational results, we found that specific gains (16p) were present in the CCS Intermediate and high subgroups only (Figure 2B and C, red arrows). This raises the possibility that this chromosomal alteration could potentially be used as novel clinical biomarkers for more indolent tumours in cancers of unknown primary origin.
We found that our cell cycle score gene expression signature, which has been previously applied in a breast cancer setting11,12, provided independent prognostic information on a pan-cancer level. This signature was originally conceived as a simple biological measure of cell cycle activity in response to the dependence of more established commercial gene expression signatures on multiple cell cycle/cell proliferation genes for their prognostic capacity15. The signature genes were chosen through the aggregation of three different biological pathway databases16–18, meaning that it is not cancer-specific and can be applied to any tissue sample. In keeping with its descriptive nature, we have not attempted to maximise the signature’s prognostic capacity through selection of genes that are the strongest predictors of the study’s clinical endpoint - progression free interval. Despite this, the signature performed well in both Kaplan-Meier and multi-variable analyses, likely owing to its ability to select for faster growing, more aggressive tumours. As we are applying the CCS to a pan-cancer cohort, the prognostic capacity of the signature should be viewed in the context of all cancers and in general, not necessarily within specific cancer types. For example, when we split the continuous CCS into tertiles of activity the majority of prostate cancers (PRAD) are classified into the CCS Low subgroup and thus as “good” prognosis based on our analyses. Conversely, glioblastoma cancers (GBM) were predominantly classified into CCS Intermediate and High subgroups and thus as “poor” prognosis. In line with this, the median time to a PFI event for PRAD patients in the pan-cancer cohort is 18.4 months, whereas for GBM it’s 6.1 months19. As such, the prognostic capacity for the CCS signature when applied to all tumours cannot be determined on the basis of its strength within in a single cancer type, but only when considered in the context of all cancers. Interestingly, however, some cancers were split into two or three different CCS subgroups and upon further examination of these cancer types we found that deriving CCS tertiles of activity on a pan-cancer level may provide more prognostic information than deriving them within a specific cancer type. This could be of utility in a clinical setting where a gene transcript is being used as a biomarker for treatment response, such as the recent example of cyclin E expression and Palbociclib efficacy in metastatic breast cancer patients20. In this instance it is conceivable that re-defining a cyclin E cut-point on the basis of pan-cancer expression levels of the gene may more clearly delineate which patients are likely to be resistant to the drug.
When applying a gene expression signature to any dataset a choice regarding the best cutoffs for sample subgrouping is typically inherent to the analysis. Here, we chose to divide the continuous CCS into three equal groups resulting in low, intermediate and high expression subgroups. This decision was largely based on both our experience with other gene expression signatures in the breast cancer field where three subgroups are common, such as for the 21-gene recurrence score21 and the biology-based gene expression modules22. Moreover, given that the CCS is continuum of values (as shown in Figure 1) without any clear bimodal distribution, it does not make sense to force a simple binary high/low grouping on the data. Instead we opted for tertiles that reflect this continuum with high and low expression groups and the addition of a third intermediate subgroup to cover the range of samples transitioning from low to high expression. Another important point to consider is that we are applying the CCS signature to data extracted from an entire tumour and as such are getting an average gene signal across the entire sample. This means that heterogeneity in terms of the cellular composition of the tumour and in terms of expression of the CCS in different tumour regions is not taken into account. Many newer technologies including single-cell sequencing and spatial transcriptomics can be used to examine tumour heterogeneity at single cell resolution23, however, as this type of data is not currently available for the tumours of the pan-cancer cohort we cannot assess the intratumour variation of the CCS in this material. A second, more traditional way to take heterogeneity into account is through the examination of whole tumour sections under the microscope. Given that the CCS is interlinked with cell proliferation, which in turn is an important component of tumour grading (in the form of mitotic count), one could speculate as to the merits of grading in addition in to or in place of applying the CCS. Unfortunately, tumour grading information was missing for over 50% of the tumours included in this study meaning it was not included in multi-variable analyses. More importantly, grading systems differ greatly between cancer types, for example the three-level Nottingham histologic grade is used in breast cancer whilst Gleason grading with up to 5 different groups is used in prostate cancer. This renders the application of grade at a pan-cancer level currently unfeasible and relatedly, we have previously demonstrated the propensity of the CCS and other gene expression signatures to outperform ocular assessment of the proliferation marker Ki67 on whole tumour sections11,24.
There are three main strengths to our study; first, we utilise a novel methodology to examine the DNA alterations in subgroups of tumours that is based on the magnitude of cell cycle activity both across and within cancer types; second, our analysis provides an expansive, descriptive overview of the frequency of DNA mutations and chromosomal gains and losses in subgroups of low, intermediate and high cell cycle activity; and third, we demonstrate the translational relevance of our work by relating our CCS signature to a clinical survival endpoint – PFI. The limitations are as follows; first, our analysis focuses on DNA and RNA technologies only, with no protein or methylation array data included; second, we chose to study broad chromosomal gains and losses rather than gene-centric copy number changes – this was to avoid a situation where the most changed genes within a given CCS subgroup would all come from the same chromosomal location; third, we did not adjust the CCS for every molecular subgroup within every cancer type in multivariable analysis – this is a general limitation of any pan-cancer study, we did however perform additional analyses adjusting for COCA subtypes which captures more molecular heterogeneity than adjusting for cancer types alone and found analogous results; and fourth no external validation was performed for the CCS signature, although we are not aware of any other pan-cancer dataset where it could be validated and more importantly, we are not currently proposing it for use in a clinical setting – rather as a general tool to examine the cell cycle activity of a given tumour.
In summary, this study describes the DNA mutations and chromosomal alterations found in tumours with low, intermediate and high levels of cell cycle activity and also demonstrates the ability of a simple cell cycle gene expression signature to provide independent prognostic information at a pan-cancer level.
Materials and Methods
Study population and specimens
The Pan-Cancer Atlas (PanCanAtlas) project compared and contrasted genomic and cellular differences between tumour types profiled as part of TCGA. The project consists of 11 069 patients with primary tumours from 32 different cancer types, including Adrenocortical carcinoma (ACC), Bladder Urothelial Carcinoma (BLCA), Brain lower grade Glioma (LGG), Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), Cholangiocarcinoma (CHOL), Colon adenocarcinoma (COAD), Esophageal carcinoma (ESCA), Glioblatoma multiforme (GBM), Head and Neck squamous cell carcinoma (HNSC), Kidney Chromophobe (KICH), Kidney renal clear cell carcinoma (KIRC), Kidney renal papillary cell carcinoma (KIRP), Liver hepatocellular carcinoma (LICH), Lung adenocarcinoma (LUAD), Lung squamous cell carcinoma (LUSC), Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC), Ovarian serous cystadenocarcinoma (OV), Pancreatic adenocarcinoma (PAAD), Pheochromocytoma and Paraganglioma (PCPG), Prostate adenocarcinoma (PRAD), Rectum adenocarcinoma (READ), Sarcoma (SARC), Skin Cutaneous Melanoma (SKCM), Stomach adenocarcinoma (STAD), Testicular Germ Cell tumours (TGCT), Thymoma (THYM), Thyroid carcinoma (THCA), Uterine Carcinosarcoma (UCS), Uterine Corpus Endometrial Carcinoma (UCEC) and Uveal Melanoma (UVM). From the original 11 069 patients, 9 515 were included in our study and reasons for exclusion were missing or no matching gene expression data (n = 795), copy number data (n = 498) or clinico-pathological information (n = 261). A CONSORT diagram showing the exclusion criteria for this study is shown in Supplemental figure 1. This cohort was chosen owing to its large sample size ensuring sufficient power for the statistical testing being performed. All clinical, gene expression, mutation and chromosome arm-level data from the PanCanAtlas study were taken from the publicly available database of the National Institutes of Health (NIH) (https://gdc.cancer.gov/about-data/publications/pancanatlas).
mRNA data, clustering and the Cell Cycle Score (CCS)
Fully processed, batch corrected, RNA-sequencing data were accessed from NIH genomic data commons (GDC) database (https://gdc.cancer.gov). All data quality control, normalisation and gene level counts were performed by the PanCanAtlas investigators as described in the their original publication25. iCluster were also retrieved from the same publication. COCA classifications were performed by the pan-can investigators as described in Hoadley et al.7, resulting in 32 different tumour clusters. Clusters with less than 20 tumours were excluded from further analysis.
The Cell Cycle Score (CCS) signature is comprised of 463 genes that were originally identified through the aggregation of three different pathway-related databases – KEGG, HGNC and Cyclebase 3.016–18. As these databases aim to describe general biology rather than being cancer focused, the CCS genes can be seen as representative of general cell cycle activity and could be applied to any tissue sample (normal or tumour tissue). Whilst the signature has previously been applied in a breast cancer setting, the gene list has not been reduced or altered on the basis of those studies. For the sake of clarity and reproducibility all 463 signature genes are shown in Supplemental Table 1 along with annotations of which genes were present in previous breast cancer studies as well as the current pan-cancer dataset. 441 of the 463 original CCS signature genes were present in, and extracted from, the pan-cancer dataset. Expression values were summed on an individual tumour basis to derive a single score of cell cycle activity for each sample. This continuous variable was further divided into tertiles in order to classify tumours as having Low, Intermediate or High levels of cell cycle activity on a broad, pan-cancer level. Cancer types where the pan-cancer CCS demonstrated independent prognostic information in multivariable Cox proportional hazard models were also assessed using within (intra-) cancer CCS tertiles: KIRC, LGG, SARC and UVM.
Mutational analysis
Fully processed mutational data derived from exome sequencing was taken from GDC database in a mutation annotation format file (MAF) (https://gdc.cancer.gov). All data quality control, processing and mutation calling was performed by the PanCanAtlas investigators as described in the their original publication8. We limited our analysis to 299 cancer driver genes manually annotated by experts in the pan-cancer field8. The MAFtools package in the R-statistical environment was used for mutation count calculations within CCS subgroups. A gene was counted as mutated (1) or not (0) for each tumour regardless of the number of mutations within that gene.
Chromosomal arm-level alterations and Aneuploidy score
Fully processed chromosome arm-level alteration data and tumour aneuploidy scores were accessed from GDC database (https://gdc.cancer.gov) and were derived from Affymetrix SNP 6.0 arrays. All data quality control and processing was performed by the PanCanAtlas investigators as described in the original publication26. Chromosome arm-level alterations are presented as estimated ploidy values of +1, 0 and −1 for gains, non-aneuploidy and losses, respectively9.
Statistical Analysis
To assess differences among clinico-pathological characteristics of tumour samples and CCS subgroups χ2 tests were employed. Clinical and survival data were retrieved from the GDC database (https://gdc.cancer.gov/about-data/publications/pancanatlas). Univariate Kaplan-Meier analysis was performed for the CCS in all pan-cancer tumours together and in individual cancer types with PFI censored at 15 years as the clinical endpoint, as previously recommended19. PFI is defined as the period during or after the course of a treatment given to patients in which the disease does not show any progression until a loco-regional recurrence and/or second malignancy occurs, or the patients die from any cause. Multivariable Cox proportional hazard models were used to determine the independent prognostic capacity of the CCS subgroups in all pan-cancer tumours together and in individual cancer types adjusting for cancer type, age (grouped in tertiles), gender, radiation therapy and pathological stage. Tumour grading information was missing for over 50% of pan-cancer samples and as such was not included in multivariable analyses. To compare the prognostic capacity of pan-cancer vs. intra-cancer CCS cutoffs we used the likelihood ratio (LR) which can be interpreted as a goodness-of-fit test. LR and concordance index (c-index) measures were extracted from the output of the coxph function of the survival package in R. Genomic alterations including the frequency of gene mutations and chromosomal arm gains and losses as well as aneuploidy score were compared between three CCS subgroups by using ANOVA with post-hoc Tukey HSD test, all tests were 2-sided and p < 0.05 was considered as statistically significant, * p < 0.05, ** p < 0.01, *** p < 0.001. All p values in the Kaplan-Meier curves were corrected for multiple comparisons using the Benjamini & Hochberg method. The data fulfilled the preconditions/assumption of the above tests. Continuous CCS was normally distributed and variation was <1% between groups. All statistical analyses were performed using R statistical software version 3.5.327.
Supplementary Material
Competing Interests:
Funding:
This work was supported by the Iris, Stig och Gerry Castenbäcks Stiftelse for cancer research (to N.P.T.), the King Gustaf V Jubilee Foundation (N.P.T. and J.B.), BRECT, the Swedish Cancer Society, the Cancer Society in Stockholm Personalised Cancer Medicine (PCM), the King Gustaf V Jubilee Foundation, the Swedish Breast Cancer Association (BRO) and the Swedish Research Council (J. Bergh). C.M.P was supported by funds from the NCI Breast SPORE program (P50-CA58223–09A1), by R01-CA195754–01, by the Susan G. Komen (SAC-160074) and the Breast Cancer Research Foundation.
Conflict of interest statement: J.B. has no conflict of interest related to the present work. Unrelated to the present work, he received research funding from Merck paid to Karolinska Institutet and from Amgen, Bayer, Pfizer, Roche and Sanofi-Aventis paid to Karolinska University Hospital. No personal payments. Payment from UpToDate for a chapter in breast cancer prediction paid to Asklepios Medicine AB. CMP is an equity stockholder, and consultant, for of BioClassifier LLC. CMP and JP are also listed as inventors on patents on the Breast PAM50 Subtyping assay. All remaining authors have declared no conflicts of interest.
Footnotes
Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.
Data deposition and code
The data used in this study are publicly available on the NIH website (https://gdc.cancer.gov/about-data/publications/pancanatlas). R-code to reproduce the main and supplemental results of this study are publicly available at https://github.com/arianlundberg/PANCAN.analysis.
References
- 1.Hartwell LH, Culotti J, Pringle JR, Reid BJ. Genetic control of the cell division cycle in yeast. Science 1974; 183: 46–51. [DOI] [PubMed] [Google Scholar]
- 2.Nurse P, Thuriaux P, Nasmyth K. Genetic control of the cell division cycle in the fission yeast Schizosaccharomyces pombe. Mol Gen Genet MGG 1976; 146: 167–178. [DOI] [PubMed] [Google Scholar]
- 3.Lee MG, Nurse P. Complementation used to clone a human homologue of the fission yeast cell cycle control gene cdc2. Nature 1987; 327: 31–35. [DOI] [PubMed] [Google Scholar]
- 4.Evans T, Rosenthal ET, Youngblom J, Distel D, Hunt T. Cyclin: A protein specified by maternal mRNA in sea urchin eggs that is destroyed at each cleavage division. Cell 1983; 33: 389–396. [DOI] [PubMed] [Google Scholar]
- 5.Hartwell LH, Kastan MB. Cell cycle control and cancer. Science 1994; 266: 1821–1828. [DOI] [PubMed] [Google Scholar]
- 6.Hanahan D, Weinberg RA. Hallmarks of Cancer: The Next Generation. Cell 2011; 144: 646–674. [DOI] [PubMed] [Google Scholar]
- 7.Hoadley KA, Yau C, Wolf DM, Cherniack AD, Tamborero D, Ng S et al. Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin. Cell 2014; 158: 929–944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 2018; 173: 371–385.e18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Taylor AM, Shih J, Ha G, Gao GF, Zhang X, Berger AC et al. Genomic and Functional Approaches to Understanding Cancer Aneuploidy. Cancer Cell 2018; 33: 676–689.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC et al. Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 2018; 173: 321–337.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lundberg A, Lindström LS, Harrell JC, Falato C, Carlson JW, Wright PK et al. Gene Expression Signatures and Immunohistochemical Subtypes Add Prognostic Value to Each Other in Breast Cancer Cohorts. Clin Cancer Res Off J Am Assoc Cancer Res 2017; 23: 7512–7520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tobin NP, Lundberg A, Lindström LS, Harrell JC, Foukakis T, Carlsson L et al. PAM50 Provides Prognostic Information When Applied to the Lymph Node Metastases of Advanced Breast Cancer Patients. Clin Cancer Res Off J Am Assoc Cancer Res 2017; 23: 7225–7231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chang F, Lee JT, Navolanic PM, Steelman LS, Shelton JG, Blalock WL et al. Involvement of PI3K/Akt pathway in cell cycle progression, apoptosis, and neoplastic transformation: a target for cancer chemotherapy. Leukemia 2003; 17: 590–603. [DOI] [PubMed] [Google Scholar]
- 14.Chen J. The Cell-Cycle Arrest and Apoptotic Functions of p53 in Tumor Initiation and Progression. Cold Spring Harb Perspect Med 2016; 6: a026104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sotiriou C, Pusztai L. Gene-expression signatures in breast cancer. N Engl J Med 2009; 360: 790–800. [DOI] [PubMed] [Google Scholar]
- 16.Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 2016; 44: D457–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Santos A, Wernersson R, Jensen LJ. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes. Nucleic Acids Res 2015; 43: D1140–D1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gray KA, Yates B, Seal RL, Wright MW, Bruford EA. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res 2015; 43: D1079–1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell 2018; 173: 400–416.e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Turner NC, Liu Y, Zhu Z, Loi S, Colleoni M, Loibl S et al. Cyclin E1 Expression and Palbociclib Efficacy in Previously Treated Hormone Receptor-Positive Metastatic Breast Cancer. J Clin Oncol Off J Am Soc Clin Oncol 2019; : JCO1800925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sparano JA, Gray RJ, Ravdin PM, Makower DF, Pritchard KI, Albain KS et al. Clinical and Genomic Risk to Guide the Use of Adjuvant Therapy for Breast Cancer. N Engl J Med 2019; 380: 2395–2405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G et al. Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res Off J Am Assoc Cancer Res 2008; 14: 5158–5165. [DOI] [PubMed] [Google Scholar]
- 23.Asp M, Bergenstråhle J, Lundeberg J. Spatially Resolved Transcriptomes-Next Generation Tools for Tissue Exploration. BioEssays News Rev Mol Cell Dev Biol 2020; : e1900221. [DOI] [PubMed] [Google Scholar]
- 24.Tobin NP, Lindström LS, Carlson JW, Bjöhle J, Bergh J, Wennmalm K. Multi-level gene expression signatures, but not binary, outperform Ki67 for the long term prognostication of breast cancer patients. Mol Oncol 2014; 8: 741–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E et al. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 2018; 173: 291–304.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 2012; 30: 413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
