Abstract
Cyclin-dependent kinase 4 and 6 inhibitors (CDK4/6is) have revolutionized breast cancer therapy. However, <50% of patients have an objective response, and nearly all patients develop resistance during therapy. To elucidate the underlying mechanisms, we constructed an interpretable deep learning model of the response to palbociclib, a CDK4/6i, based on a reference map of multiprotein assemblies in cancer. The model identifies eight core assemblies that integrate rare and common alterations across 90 genes to stratify palbociclib-sensitive versus palbociclib-resistant cell lines. Predictions translate to patients and patient-derived xenografts, whereas single-gene biomarkers do not. Most predictive assemblies can be shown by CRISPR–Cas9 genetic disruption to regulate the CDK4/6i response. Validated assemblies relate to cell-cycle control, growth factor signaling and a histone regulatory complex that we show promotes S-phase entry through the activation of the histone modifiers KAT6A and TBL1XR1 and the transcription factor RUNX1. This study enables an integrated assessment of how a tumor’s genetic profile modulates CDK4/6i resistance.
Subject terms: Systems biology, Machine learning, Breast cancer, Cancer therapeutic resistance
Ideker and colleagues present NeST-VVN, a deep learning model based on cancer protein assembly data that can be used to predict the response and resistance of cancer cells to CDK4/6 inhibitors.
Main
Cell-cycle activation and sustained proliferation are hallmarks of cancer1. Cyclin-dependent kinases 4 and 6 (CDK4/6) trigger cells to pass the G1/S cell-cycle restriction point by phosphorylating the retinoblastoma (RB) transcriptional repressor and its paralogs. Inhibiting these kinases has been of high interest in cancer drug development2,3. Thus far, three CDK4/6 inhibitors (CDK4/6is) (palbociclib, ribociclib and abemaciclib) have been approved in combination with endocrine therapy for the treatment of hormone receptor-positive, human epidermal growth factor (EGF) receptor 2-negative (HR+, HER2−) breast cancer, and clinical trials are underway in a spectrum of other tissue types4. In metastatic breast cancer, these agents have appreciably improved treatment outcomes, increasing progression-free and overall survival with manageable adverse effects4,5. However, objective tumor response is observed in <50% of patients who receive CDK4/6 inhibitors as first-line therapy, and nearly all initially responsive patients develop drug resistance with subsequent mortality6,7.
Studies of resistance to these drugs have largely defined two groups of molecular biomarkers: loss-of-function alterations to antiproliferative CDK pathway genes (for example, CDKN2A/B/C or RB1) or gain-of-function alterations to progrowth genes (for example, CDK2, CDK4/6, CCND1, CCNE1, E2F or PIK3CA). These markers have been characterized predominantly in preclinical in vitro studies, with clinical assessments obtained primarily through retrospective analyses4,5. RB1 mutation bears the strongest burden of evidence, as it has been extensively associated with CDK4/6 drug resistance in cell lines and patient cohorts8. However, it and other markers have met with inconsistent results in prospective clinical trials8, suggesting that our understanding of this drug response is still incomplete.
Deep learning is a powerful general methodology in precision medicine, including the use of molecular profiles to predict drug responses9. Such models are typically trained to maximize the accuracy of outcome prediction (for example, whether a patient will respond to a drug) without attempting to reveal the internal cellular and molecular mechanisms by which that outcome is achieved. In this regard, it is notoriously difficult to interpret which molecular features are relevant and even more difficult to describe how these features integrate with one another in the logic of molecular pathways10. To create models that are both predictive and interpretable11,12, we and others have advanced a series of ‘visible’ neural network (VNN) architectures13–16 that are guided by knowledge maps of cellular components and functions. For example, using such a model, Elmarakeby et al. found that metastatic outcomes in prostate cancer were well predicted by convergent genetic alterations within a mouse double minute (MDM)–tumor protein 53 (TP53) inhibition pathway, implicating MDM4 in resistance to antiandrogen therapy15.
Thus far, VNN models have been structured using Gene Ontology17 or Reactome18, two general human expert-curated databases of known cellular components and functions that have not been explicitly designed to capture the molecular pathways of cancer. To define and discover cancer mechanisms systematically, we recently developed a hierarchical map of multiprotein assemblies called NeST (Nested Systems in Tumors)19. To build this map, we used affinity purification–mass spectrometry (AP–MS) to interrogate the physical interactions of a broad set of frequently altered cancer proteins. These data were integrated with other systematic omics datasets to create a large cancer protein–protein association network. Structural analysis of this network revealed a hierarchy of protein assemblies in which small, specific complexes nest within larger communities corresponding to broad processes and organelles. NeST was defined as the final hierarchy of 395 assemblies found to be under significant selection pressure for somatic mutations in one or more adult tumor types (Fig. 1a)19. Beyond the identification of the mutated protein assemblies, NeST has not yet been used to inform drug response models.
Here, we use this experimentally derived NeST map as the foundation for a visible deep learning approach to understand how patterns of genetic alterations govern the tumor response to CDK4/6 inhibition. This model is functionally predictive of palbociclib treatment outcomes and can be structurally interpreted, revealing a focal set of protein assemblies on which common and rare cancer mutations converge to affect drug resistance or sensitivity (Fig. 1b).
Results
Implementation of a cancer-oriented VNN
We defined a set of 718 genes assessed by one or more clinical cancer gene panels and studies, including the FoundationOne CDx panel20, Tempus xT21 and Project GENIE (Genomics Evidence Neoplasia Information Exchange)22. We then queried NeST to extract a hierarchy of 131 protein assemblies containing proteins encoded by the clinically assessed genes. This hierarchy was used to guide the architecture of a VNN following a previously described approach13 (Fig. 1b, Extended Data Fig. 1a and Methods). This model of cancer cell structure and response, which we call NeST-VNN, allowed for three binary input features per gene, describing the presence or absence of point mutation/insertion/deletion, copy number amplification (CNA) or copy number deletion (CND) (Fig. 1b and Methods). These gene-level input features were integrated within their respective protein assemblies in subsequent layers of the NeST-VNN, with each assembly represented by a bank of artificial neurons, reflecting its biological state or ‘in silico activity’ (Extended Data Fig. 1a). Connections were also established between the neurons of an assembly and those of larger assemblies that contain it (Extended Data Fig. 1b), allowing for the flow of genetic information from small focal assemblies (for example, ‘CDK holoenzyme complex’) to larger-scale assemblies and superassemblies (for example, ‘cell cycle’). The final protein assembly at the root of the hierarchy represented the model output—the predicted drug response of a tumor sample given the input set of genetic alterations (Fig. 1b).
To train NeST-VNN, we leveraged drug response data for 1,244 genomically characterized tumor cell lines23, obtained by harmonizing the Cancer Therapeutics Response Portal (CTRP)24,25 and Genomics of Drug Sensitivity in Cancer (GDSC)26,27 databases (Methods). These data included the response to the CDK4/6i palbociclib, which had been well characterized in 947 cell lines. For comparative benchmarking, we also examined 50 non-CDK-related drugs investigated in at least 200 cell lines, for which the cellular responses displayed sufficient variability, with many examples of sensitivity and resistance (Methods).
Evaluation of prediction performance
We constructed NeST-VNN drug response models for palbociclib and separately for each of the 50 benchmark drugs, using standard neural network learning procedures based on backpropagation (Methods). Each model was trained to use the gene alteration profile of a cell line to predict the corresponding area under the dose–response curve (AUC). Training and performance assessment was conducted using nested fivefold cross-validation (Methods), with each fold setting aside 64% of cell lines for training, 16% for validation (used for tuning hyperparameters) and 20% for testing. Although nested cross-validation is computationally intensive, it fully insulates model testing from parameter tuning while maximizing the amount of testing that can be performed. We compared the NeST-VNN approach to three state-of-the-art alternatives: ElasticNet, random forest (RF) and a conventional black-box artificial neural network (ANN) (Extended Data Fig. 2a,b and Supplementary Table 1). The overall performance of NeST-VNN was generally comparable to that of the state-of-art models and often better, with NeST-VNN achieving the best performance for more than half of the tested drugs (62.7%; Extended Data Fig. 2a,b). NeST-VNN trained for palbociclib was one of the top-performing models, significantly outperforming the ElasticNet and ANN models and slightly, but not significantly, outperforming RF (Extended Data Fig. 2c and Supplementary Table 1).
To translate predictions to discrete tumor response outcomes, we thresholded the AUC such that predictions below a value tlow were labeled ‘sensitive’, those above a value thigh were labeled ‘resistant’ and those between these two thresholds were labeled ‘undefined’ (Fig. 2a). At the most inclusive setting, tlow = thigh = median(AUC), NeST-VNN could accurately discriminate between actual sensitive and resistant cell lines in heldout test data, with a diagnostic odds ratio (OR) of 6.0. Discriminative power increased substantially with more stringent thresholds. For instance, setting the thresholds 1 s.d. from the median (tlow, thigh = median(AUC) ± s.d.(AUC)) yielded a very high OR of 40.1, indicating that samples predicted as resistant were approximately 40 times more likely to test as resistant than samples predicted as sensitive (Fig. 2b). The trade-off for higher accuracy was that samples were left undefined (66%), increasing specificity but decreasing sensitivity (Extended Data Fig. 2d).
Translation to patient-derived xenografts and patients
Next, we examined the performance of NeST-VNN in a study of patient-derived xenografts (PDXs)28, including n = 172 samples treated with a CDK4/6i (ribociclib). Each PDX sample was classified as sensitive, resistant or undefined using thresholds at 1 s.d. from the median (tlow, thigh = median(AUC) ± s.d.(AUC)). PDX samples predicted to be sensitive exhibited significantly longer progression-free survival (duration from the start of treatment to the doubling of tumor volume) than those predicted to be resistant, suggesting that predicted sensitivity was associated with impaired tumor growth (log-rank P = 0.04, hazard ratio 0.53, 95% confidence interval 0.30–0.97; Fig. 2c and Methods).
After this analysis in xenografts, we evaluated model performance in predicting treatment outcomes for 226 patients with breast cancer from the GENIE metastatic breast cancer cohort22. These patients had been treated with baseline endocrine therapy with (n = 67) or without (n = 159) a CDK4/6i. Each patient was classified as ‘sensitive’ or ‘resistant’ using the threshold tlow = thigh = median(AUC) (no ‘undefined’ category was used because the number of treated samples was less than that for the earlier PDX or cell line analysis). The resistant category was further equally split to denote ‘partially resistant’ and ‘strongly resistant’ subgroups. For patients treated with a CDK4/6i, those predicted to be sensitive had significantly longer survival than those predicted to be strongly resistant (log-rank P = 0.02, hazard ratio 0.21, 95% confidence interval 0.05–0.91; Fig. 2d). Moreover, for the strongly resistant subgroup, the addition of the CDK4/6i failed to produce a significant increase in overall survival compared to baseline therapy (P = 0.37). These predictions outperformed single-gene markers of palbociclib resistance29 (RB1 mutation) or sensitivity30,31 (CCND1 CNA) that had been previously suggested, consistent with the mixed results of these markers in clinical trials8 (Extended Data Fig. 2e,f). In patients who did not receive a CDK4/6i, no significant survival differences were observed among the predicted sensitive/resistant/strongly resistant class labels (all comparisons with log-rank P > 0.1; Fig. 2d). These results indicate that the NeST-VNN palbociclib model translates to the population of patients with breast cancer and is specifically predictive of response rather than generally prognostic of patient survival.
Interpreting the model to identify important protein assemblies
Having seen that the NeST-VNN model was predictive of drug response in tumor cell lines, PDX samples and patients, we sought to interpret which protein assemblies were important to this process. Following a previous method13, we computed a quantitative importance score for each assembly according to how well its in silico activity was associated with the final drug response prediction (Fig. 3a, Supplementary Table 2 and Methods). Assemblies containing the primary CDK4 and CDK6 drug targets were of significantly higher importance than expected by chance, serving as positive controls (P = 5 × 10–5; Fig. 3a and Supplementary Table 2). For example, one of the important CDK assemblies was NeST:110 (CDK holoenzyme complex I; Fig. 3b,c), comprising the cyclin D–CDK4–CDK6 complex along with upstream inhibitors (CDKN1/2 protein families) and downstream targets (RB1). Positive control assemblies were also observed for other top-performing drug models; the model for the drug nutlin-3a, which targets TP53 activity through MDM2, placed high importance on assemblies containing these proteins (P = 6.8 × 10–10; Extended Data Figs. 2a and 3a).
For all drug models, assembly importance tended to increase with size and depth in the hierarchy, reflecting the progressive integration of genetic information. Assembly importance was similar between cell lines and patient tumors (Fig. 3d) or PDX samples (Fig. 3e). In contrast, little correlation was observed between cell lines and clinical samples when examining the importance of individual gene mutations (Fig. 3f) or copy number aberrations (Extended Data Fig. 3b,c). These results are consistent with the premise that most individual genetic alterations are rare, with variable incidence across contexts32, and suggest that the effects of genetic alterations on protein assemblies can be substantially more stable.
Of 33 assemblies that were of high importance for palbociclib response prediction in cell lines (importance ≥ 0.5), we focused on eight distinct minimally overlapping assemblies whose importance scores remained significant under multiple-hypothesis correction (hereafter referred to as ‘core assemblies’; Methods). Beyond regulation of CDK activity, core assemblies spanned histone and chromatin regulation, DNA damage response and growth factor signaling (Fig. 3a), integrating rare and common genetic alterations across 90 genes (Extended Data Fig. 3d). Most core assemblies were also important for predicting outcomes in clinical and PDX samples (Fig. 3d,e).
Systematic validation of core assemblies by loss-of-function screens
We next sought to validate the palbociclib core assemblies using two CRISPR (clustered regularly interspaced short palindromic repeat) loss-of-function screens (Fig. 4a): a published chemogenetic screen involving genome-wide knockout (KO) of single genes in combination with palbociclib treatment33 and a de novo dual CRISPR screen in which we paired gene KOs in selected NeST-VNN assemblies with a second gene KO targeting CDK4 or CDK6 (Fig. 4a and Methods). For the chemogenetic screen, we assessed each assembly in NeST for the enrichment of genes whose KOs modulate cell fitness in the context of palbociclib treatment (Methods). The enrichments of the eight core assemblies tended to be significantly higher than those of nonimportant controls (P = 0.005, Mann–Whitney U test), with four of these assemblies in particular (regulation of CDK activity, histone-mediated transcription regulation, DNA damage response, promyelocytic leukemia (PML) body) showing stronger effects than any assembly in the control set (Fig. 4b and Supplementary Table 3). Such enrichment was due to KOs in a diversity of genes, including roughly a dozen with extreme loss-of-fitness phenotypes (Fig. 4c; for example, BCL6, CCND3, CDK4, CDK5, RAD51C, TOP2A, BARD, AURKA, AURKB) and several causing gain of fitness (for example, BRCA2, CTNNB1, CDKN2B, MSH6, MLH3). Enrichment was not observed for a genome-wide KO screen without palbociclib treatment34, indicating that at least some of the effect was due to gene–drug interactions rather than independent gene essentiality (Methods and Fig. 4d). We then moved on to our de novo dual CRISPR KO screen and noted that this screen and the earlier chemogenetic screen were reasonably consistent with respect to gene KO fitness effects (Pearson ρ = 0.48; Fig. 4e). Disruptions in all six of the core assemblies with sufficient coverage in our gene KO panel displayed a trend toward increased cell fitness (Fig. 4f,g and Supplementary Table 3). Taken together, these results indicate that engineered genetic disruptions in protein assemblies identified by NeST-VNN can influence tumor cell growth in the setting of CDK4/6 inhibition, whether such inhibition is induced by a drug (Fig. 4b) or CDK4/6 KO (Fig. 4f,g).
Exploration of gain-of-function alterations in a histone transcriptional assembly
An open question is how CDK4/6 and the G1/S transcriptional program interact with other cell functions, including upstream modulators and downstream effectors. A notable assembly in this regard was NeST:85 (histone-mediated transcription regulation), a densely connected complex of 15 proteins with roles in histone acetylation, deacetylation and transcriptional activation (Fig. 5a). This assembly was important for the CDK4/6i response in cell lines (Fig. 3a), PDX samples (Fig. 3e and Extended Data Fig. 4b) and patients (Fig. 3d). It had also been validated by CRISPR loss-of-function analysis (Fig. 4b and Supplementary Table 3). However, most of the frequent genetic alterations affecting this assembly in tumor cell lines or patients were not loss-of-function events but gene CNAs (Fig. 5b), which were especially prevalent in lung, oropharyngeal and gynecologic tumors (frequencies 15–35%; Extended Data Fig. 4a). CNAs also accounted for the top five genetic alterations in this assembly that were most predictive of palbociclib resistance, in particular those of MYC, TERT, KAT6A, TBL1XR1 and RUNX1 (Fig. 5c and Methods). Each of these amplifications had a resistance OR of approximately 2.0, indicating that cells harboring CNAs are twice as likely to exhibit resistance to palbociclib than cells without CNAs (Fig. 5c).
Motivated by these findings, we turned to the technique of CRISPR activation (CRISPRa), which uses the dCas9 (dead Cas9 endonuclease)–VPR (VP64–p65–Rta) transcriptional activator to increase expression from gene promoters targeted with CRISPR single guide RNAs (Fig. 5d). For these experiments, we selected A549 lung carcinoma epithelial cells, which harbor few genetic alterations in the NeST:85 assembly compared to many other common tumor cell models for which multiple genes are already amplified (Fig. 5b). We transfected short guide RNAs (sgRNAs) targeting KAT6A, TBL1XR1, RUNX1, TERT or MYC into A549 cells expressing dCas9–VPR and confirmed by qPCR that constructs exhibited substantial overexpression of the target gene compared to nontargeting control (NTC) sgRNAs (all except for MYC; Extended Data Fig. 4c). sgRNAs targeting MYC did not have a significant effect, consistent with prior reports that this gene is already highly expressed in A549 cells35. We used the thymidine analog 5-ethynyl-2′-deoxyuridine (EdU) to count the fraction of cells undergoing active DNA replication in the S phase. Overexpression of the histone modifiers KAT6A and TBL1XR1 produced significant increases in the proportion of cells entering the S phase under palbociclib treatment compared to the untreated group (Fig. 5e,f; approximately 2.5-fold; P < 0.05); the transcription factor RUNX1 also led to significant increases, albeit to a lesser degree (1.5-fold). We also examined the effects of KAT6A or TBL1XR1 overexpression on the phosphorylation status of RB1, the direct target of the CDK4–CDK6–cyclin D complex. Capillary western blot analysis demonstrated that overexpression of these factors is indeed associated with a more than twofold increase in phospho-RB levels (Fig. 5g,h, Extended Data Fig. 4d and Methods). Together, these results indicate several NeST:85 genes whose overexpression serves to promote the cell cycle, supporting our earlier observation (Fig. 5c) that CNAs in these genes are predictive of palbociclib resistance.
Discussion
CDK4/6 inhibitors are a well-studied class of drugs for which numerous candidate biomarkers have been identified8. Why has the prediction of CDK4/6i responses remained challenging? One reason is that markers with promise in cell lines (for example, CCND1 amplification) do not consistently translate to patient populations30,31. Another is that individual genetic alterations that are clinically predictive may occur too rarely to have broad utility (for example, RB1 deletion or mutation). A wider, more integrative analysis is needed to understand CDKi resistance fully5,8.
Toward this goal, NeST-VNN synthesizes both rare and common genetic events across a repertoire of drug response pathways, with the aim of facilitating a quantitative, integrated assessment of drug response. The modeling process begins with a map of tumor cell components, which is used to guide the topology of deep neural network models as they learn to translate genetic alterations to drug responses (Extended Data Fig. 1). The key subcellular assemblies of models that accurately capture drug responses in vitro and that translate to in vivo (for example, PDX) and clinical settings (Fig. 2) can be validated through directed CRISPR loss-of-function and/or activation screens (Figs. 4 and 5). Assemblies that pass this validation pipeline are a source of candidate biomarkers in downstream precision medicine applications. Alternatively, a model can be used in its entirety to produce a single resistance score integrating the mutational status of all proteins and assemblies.
NeST-VNN is based on NeST, a whole-cell map of cancer protein complexes derived from systematic proteomics data (see the ‘Structural architecture of the NeST-VNN model’ section in Methods). Previous drug response models have generally not incorporated outside knowledge of cell structure (many approaches, reviewed here9,36) or have modeled structure using databases of cellular components or pathways drawn from literature curation9,14,15. Biological insights informed by NeST-VNN are uniquely dependent on the composition of NeST, generating both strengths and limitations. One strength is that the model can incorporate information from numerous rare mutations in predicting a drug response insofar as these rare alterations aggregate to affect the activity of commonly altered protein assemblies with documented cancer relevance. A limitation is that NeST almost certainly does not include all relevant protein assemblies (false negatives), and some assemblies that are included may be imperfect or irrelevant to a given tumor population (false positives). Regardless, the NeST knowledgebase positions the precision medicine model as a dynamic entity, which can be updated either functionally with new incoming drug response data or structurally as NeST (or another future map) is improved by additional data. These new data need not be limited to AP–MS experiments (the primary source informing NeST thus far37) but, in the future, might incorporate information from complementary proteomics technologies such as proximity ligation38, size-exclusion chromatography39,40 or spatial imaging41. While pathway databases are sometimes treated as gold standards (especially literature-curated databases such as Gene Ontology and Reactome), knowledge of molecular pathways remains incomplete, particularly as it relates to specific tumor states and subtypes.
Using this platform, we identified a set of eight core assemblies for which genetic alterations are associated with anti-CDK4/6 response, seven of which were validated by one or more CRISPR screens (Figs. 4 and 5). These assemblies are not focused solely on cyclin-dependent control of the cell cycle (Fig. 3a). Nonetheless, ample literature support can be found for the involvement of many of these other assemblies in anti-CDK responses, such as those related to androgen receptor (AR) signaling42, EGF/fibroblast growth factor (FGF) signaling43, DNA damage response44 and the MDM2–p53 pathway45. Regarding the identification of an EGF/FGF signaling assembly, recent studies have found that the EGF receptors EGFR and ERBB2 are associated with palbociclib response46 and that the genetic alteration status of FGFR1/2 and their associated FGF ligands has promise as a marker of acquired resistance43. Furthermore, ongoing clinical trials are assessing the combination of anti-CDK4/6 treatments with insulin-like growth factor inhibition (trial no. NCT03099174) or with EGFR inhibition (trial no. NCT03065387) in various tumor types. In NeST-VNN, the EGF/FGF complex combines each of these alterations, which have largely been reported separately, into a single integrated effect including alterations in yet additional receptor tyrosine kinases (for example, ERBB3/4).
The model also highlights a notable role for NeST:85 (histone-mediated transcription regulation), which integrates both well-known and understudied factors. Treatment with CDK4/6 inhibitors induces chromatin structure remodeling mediated by histone acetyltransferases and histone deacetylases (HDACs), leading to the expression signatures of senescence and cell differentiation3. Accordingly, genetic alterations affecting proteins of the NeST:85 assembly, including the histone acetylases CREBBP and EP300 (ref. 47), the HDACs HDAC1 and HDAC2 (ref. 48), and transcription factors such as TP53 (ref. 49) and MYC50, have been previously documented to modulate the anti-CDK4/6 drug response. Using CRISPRa to model the effects of CNAs, we observed that increased expression of KAT6A and TBL1XR1, which are also components of the NeST:85 assembly, leads to increased S-phase entry (Fig. 5). KAT6A, also known as MYST3/MOZ, encodes a histone lysine acetyltransferase that is amplified in many cancer types51 (Fig. 5b). Relevant to the NeST:85 assembly, KAT6A has been previously documented to regulate cell-cycle arrest and differentiation through the transcription factors p53 (ref. 52) and RUNX1 (ref. 53); it is a frequent translocation partner of other assembly members such as EP300 and CREBBP54. TBL1XR1, also known as TBLR1, is an F-box-like protein involved in the recruitment of the ubiquitin conjugation system to histone modifier and transcriptional repression complexes55,56. Subsequent proteasomal degradation of these complexes is essential for transcriptional activation by AR, as captured by the NeST:85 assembly, as well as other transcription factors such as the estrogen receptor (ER)57. Notably, increases in KAT6A and TBL1XR1 expression were associated with higher phosphorylation levels of RB, the central transcriptional repressor targeted by CDK cell-cycle control (Fig. 5g,h), suggesting that they may promote drug resistance by increasing the transcription, abundance or activity of the upstream CDK4–CDK6–cyclin D regulatory complex. The possible combination of HDAC inhibitor therapies with cell-cycle inhibitors has been previously proposed48; this study further underscores this potential and delineates alternative targets. Indeed, KAT6A inhibitors are under development and have demonstrated promising effectiveness for inducing cellular senescence58–60 (clinical trial NCT04606446). In the tumor cells characterized here (A549; Fig. 5), TBL1XR1 has a T290A missense mutation of unknown significance whose impact will require further investigation.
In summary, the predictive models presented in this study build from and substantially develop the concept of an integrated response to therapy. In such an integrated response, diverse effects converge on biological machinery at multiple levels to produce an overall treatment outcome. This concept may explain the difficulty in identifying individual genetic biomarkers of palbociclib drug response. It also speaks to the challenge of patient-to-patient heterogeneity and illustrates one means by which knowledge of cellular machinery can be used to score a diverse population of cancer patients presenting unique patterns of mutational aberrations. Such an integrated model may provide a worthwhile asset in achieving improved outcomes for patients and in efforts to evaluate novel therapeutics to overcome resistance.
Methods
Drug response data for model training
Drug response data were retrieved from the GDSC and CTRP databases24–27, covering a total of 692,859 cell line–drug pairs comprising 1,244 cell lines and 888 drugs. The data from the two datasets were harmonized as follows. Drug information: each molecule’s published name, synonym or SMILES (Simplified Molecular Input Line Entry System) string was queried using PubChemPy. The associated InChIKey was extracted and used to identify duplicate drugs (within or between datasets). Cell viability data: for CTRP, the vehicle control-normalized average percent viability files were used. For GDSC1 and GDSC2, data were normalized to ‘cells-only’ and ‘dimethyl sulfoxide (DMSO) control’ wells, respectively, on a per-plate basis. Data were averaged across replicates within each dataset. For drug response measurement, we used AUC, in which AUC = 0 corresponds to complete cell killing and AUC = 1 corresponds to no cell killing; AUC > 1 represents a growth advantage conferred by the drug. AUCs calculated in this study agreed with AUCs reported by the original consortia (Pearson correlations of 0.92, 0.83, 0.91 and 0.91 for CTRP1, CTRP2, GDSC1 and GDSC2, respectively). For multiple AUCs for the same drug across different consortia, we used each replicate sample as a separate training instance. Genetic alteration data: a panel of 718 clinical genes was assembled from the union of genes assessed by FoundationOne CDx20, Tempus xT21, PALOMA-3 trial61 or Project GENIE22, each of which assesses mutations and/or copy number aberrations. To compile genotypes for all cell lines, we extracted nonsynonymous coding mutations and copy number alterations for the 718 clinical panel genes from the Cancer Cell Line Encyclopedia (CCLE, release 22Q1)23. Genes were marked as either mutated (‘1’) or unmutated (‘0’), with mutations filtered for the following types: missense/nonsense/nonstop mutations, frameshift insertions/deletions, splice site/region variations and in-frame insertions/deletions. Similarly, genes were marked as amplified (‘1’) or unamplified (‘0’) and deleted (‘1’) or undeleted (‘0’). Together, mutations, CNAs and CNDs served as features for each of the clinical panel genes. Of the 888 drugs available from the CCLE and/or GDSC, we selected the 51 drugs (palbociclib and 50 others) with the highest variation in the observed drug responses across cell lines (corresponding to s.d. ≥ 0.3).
Structural architecture of the NeST-VNN model
Construction of the NeST hierarchy of cancer protein assemblies has been thoroughly detailed elsewhere19. Briefly, AP–MS protein interaction data for 61 known cancer proteins were integrated with a compendium of other systematically generated datasets informing protein–protein associations, including protein–protein interaction, mRNA coexpression, protein coexpression, genetic codependency and sequence similarity. Such integration resulted in a large network of approximately 1.8 × 108 protein–protein interactions among 19,035 proteins. Multiscale community detection was performed to detect approximately 2,300 densely connected sets of proteins, herein called protein assemblies. Assemblies were nested (that is, organized hierarchically), with larger assemblies containing smaller ones, forming ‘parent–child’ assembly relations. This hierarchy has been used earlier19 to perform a comprehensive analysis of somatic coding mutations in The Cancer Genome Atlas (TCGA)62, identifying significant convergence of mutations on a set of 395 protein assemblies, named NeST19. Here, we filtered the NeST hierarchy to identify the subset of assemblies encoded by at least five genes represented on the 718-gene clinical panel, producing a final hierarchy of 131 assemblies distributed over seven layers.
Model training
The filtered NeST hierarchy was used to embed a deep neural network for drug response prediction, which we refer to as NeST-VNN (Extended Data Fig. 1a). We define an m × 3 input matrix as I, where Ii,j ∈ {0,1}, with m denoting the number of genes and 3 the number of gene alteration types (mutation, CNA and CND). For any input sample (tumor cell line, PDX or patient tumor), somatic genetic alterations for each gene and type are marked by 1 and otherwise 0. The first layer in NeST-VNN converts these input features to gene-level representations, , as follows:
‘BatchNorm’ indicates batch normalization63; ‘tanh’ indicates a hyperbolic tangent function; and ‘Linear’ indicates a linear transformation. Here, the linear transformation is applied for each row in I so that the three gene alteration values for each gene are converted into a single value. The remaining seven layers of NeST-VNN follow the structure of the NeST protein assembly hierarchy, where each assembly is represented by some number of neurons N, a hyperparameter. A dropout64 of 0.3 (selected through hyperparameter optimization) was added to the last four layers. Assembly state is defined as a function of the states of its K child assemblies and M additional genes (genes for which the protein products are not present in any descendant assemblies). Denoting an assembly input vector as Is and an output vector as Os, we have
Here, Is has dimension N × (N × K + M) and Os has dimension N. We define ‘in silico activity’, a representative singular value for assembly state, as the first principal component65. The NeST-VNN objective function (Loss) aggregates the mean squared error (MSE) across every assembly in the hierarchy:
The parameter α was set to 0.3; β is a tuned hyperparameter. y represents the actual AUC.‘Linear’ denotes the linear function used for transforming the vector Oi to a scalar. W denotes the weights of the neural network. Weight optimization was performed using AdamW66.
Model benchmarking
For baseline benchmarking, we trained the RF67, ElasticNet68 and black-box ANN69 (allotted the same number of neurons and layers as the NeST-VNN model) models using the Python scikit-learn package70. For all models, including NeST-VNN, we used nested fivefold cross-validation71, producing five models for each drug. For each fold setting, we split 64% of cell lines as a training set, 16% as a validation set (used for hyperparameter tuning) and 20% as a test set, ensuring that cell line replicate measurements (for example, from different datasets) were not split between the test and training sets. Hyperparameters were optimized with Optuna72. NeST-VNN was implemented in PyTorch and trained using five GPU (graphics processing unit) servers containing four NVIDIA Tesla V100s, each with 5,120 CUDA (Compute Unified Device Architecture) cores and 32-GB GDDR6 random access memory. All five NeST-VNN models were evaluated in downstream analyses.
Translation to cancer patients
Data from the American Association for Cancer Research Project GENIE metastatic breast cohort22 were used to validate the performance of the NeST-VNN model in retrospective clinical application. We extracted nonsynonymous coding mutations, CNAs and CNDs across 360 genes for 226 patients with ER+, HER2– metastatic breast cancer along with their overall survival (months) and censorship information. We did not consider gender or sex. Of these patients, 67 had been treated with CDK4/6i plus endocrine therapy. The remaining 159 patients were treated with endocrine therapy alone. Patients were excluded if they had been treated with additional targeted therapies, such as mammalian target of rapamycin (mTOR) or AKT inhibitors. Tumor genomic data were converted to calls (0 = unaltered, 1 = altered) for all gene mutation, CNA and CND features. Features used by NeST-VNN that were not assessed in the clinical trial were represented as unaltered. We predicted patient response to CDK4/6 inhibition using the average AUC over the five pretrained palbociclib models and then thresholded this value as described in the main text (Fig. 2d). Patients whose status label was ‘living’ at 120 months were censored.
Translation to PDXs
We analyzed a PDX dataset28, which contained 172 tumor samples treated with a CDK4/6i (ribociclib) across five tumor types (breast carcinoma, non-small cell lung carcinoma, cutaneous melanoma, colorectal cancer and pancreatic ductal carcinoma). Treatment responses had been measured by changes in the volume of the tumor xenograft over time, with an accompanying determination of treatment time and a classification according to the RECIST (Response Evaluation Criteria in Solid Tumors) standard (including categories of progressive disease, stable disease, partial response and complete response). PDX samples had been genomically characterized, covering 660 of the 718 genes in the NeST-VNN gene set. Similar to the procedure for cell lines and patients, tumor genomic data were converted to calls (0 = unaltered, 1 = altered) for all gene mutation, CNA and CND features. Features used by NeST-VNN that were not assessed in the PDX data were represented as unaltered. We predicted the responses of PDX tumors to CDK4/6 inhibition as the average AUC over the five pretrained NeST-VNN models for palbociclib and then thresholded this score as described in the main text (Fig. 2c).
Model dependence on the number of genes used for prediction
Given the difference in the number of genes used for prediction in cell lines (n = 718) versus GENIE analysis (n = 360) or PDX analysis (n = 660), we systematically studied the dependence of model performance on the number of genes for which genetic alteration data are provided. We computed the average predictive performance of the pretrained NeST-VNN model when it is supplied with data for diminishing numbers of genes (Extended Data Fig. 5a). We found that, at a gene set size of 350 (similar to the number of genes characterized in the GENIE study), the average performance is only slightly less than that obtained when using all genes (ρ = 0.30 versus ρ = 0.33), with a more precipitous fall in performance seen for 200 genes or fewer. A similar pattern was observed when we compared the assembly importance scores with their enrichments for gene KOs that modulate the response to palbociclib treatment (Extended Data Fig. 5b). Notably, we also found that the precise panel of genes used by GENIE performs better than expected compared to a random subsampling (Extended Data Fig. 5).
Identifying important assemblies and genes (model interpretation)
To determine which assemblies were important for drug response prediction in cell lines, PDX or clinical samples, we adopted a variation of the ‘relative local improvement in predictive power’ method as previously reported13. Each assembly was modeled using linear regression, with the aim of evaluating how well its NeST-VNN neuron values capture the NeST-VNN overall drug response prediction. Each assembly k was assigned a g × N matrix Pk, where g is the number of samples and N is the number of neurons. Pk was then used in a linear ridge regression73 model to predict the NeST-VNN drug response D, creating models M1, M2, …, Mk. The following function was minimized for each model:
where w is a vector of the coefficients of length N and α imposes an L2 penalty on coefficient complexity. Assembly ‘importance’ (Fig. 3 and Extended Data Figs. 3 and 5) is the Spearman correlation (ρ) between Mk and D. The mean correlation of the five NeST-VNN models was reported. A higher score indicates an assembly whose neuron values contributed more strongly to NeST-VNN predictions and can, therefore, be considered important. To assess statistical significance, we generated a null distribution of assembly importance scores, as follows. We randomly rearranged gene assembly memberships in the NeST-VNN while preserving the assembly size and parent–child relationships. We trained 500 null models with these random rearrangements and calculated assembly importance for each null. One-tailed t tests were used to evaluate whether the assembly importance scores from the five NeST-VNN models were greater than the assembly importance scores from the nulls, with a Benjamini–Hochberg control for false discovery rate (FDR; Fig. 3a). Finally, we defined ‘core assemblies’ as those with an importance score of ≥0.5 and an FDR of ≤0.1, while excluding less important redundant assemblies (Jaccard similarity > 0.5). To identify specific genetic alterations in the NeST:85 assembly associated with palbociclib resistance (Fig. 5c), we performed L1-norm regularized logistic regression74,75. Genetic alterations (mutations, CNAs, CNDs) for the 15 assembly genes were used as regression features to predict AUCs. AUC values in the top 30% were encoded as 1 to represent resistance, whereas AUC values in the bottom 30% were encoded as 0 to represent sensitivity. Nonzero coefficients from the fitted model were recognized as important alterations governing drug response, with the sign indicating whether the presence of alterations contributed to resistance (plus) or sensitivity (minus). We used scikit-learn70 with logistic regression settings of penalty = ‘l1’, C = 0.01 (default for other parameters).
Comparison of the interpretability of NeST-VNN and RF
We systematically evaluated the assembly importance scores provided by NeST-VNN versus RFs67 using the genome-wide loss-of-function screen for palbociclib treatment. To determine the assembly importance score for the RF models, we performed gene set enrichment analysis (GSEA76, implemented using GSEApy77) on the gene list ranked according to the gene-level feature importance scores derived from the RF models. The absolute normalized enrichment scores generated from GSEA were used as assembly importance scores for the RF models. Assembly importances in the NeST-VNN versus RF models were moderately but not completely correlated (⍴ = 0.31; Extended Data Fig. 6a). Relevant to the differences, we found that the NeST-VNN importance of an assembly was also moderately correlated with its enrichment for gene KOs conferring palbociclib sensitivity or resistance (ρ = 0.33; Extended Data Fig. 6b); in contrast, RF assembly importance showed a correlation that was substantially weaker (ρ = 0.07; Extended Data Fig. 6c). Thus, while RF models can achieve comparable predictive performance by identifying individual gene mutations that are indicative of drug response (Extended Data Fig. 2a,c), NeST-VNN demonstrates its strength by integrating the effects of such mutations within predictive cancer protein assemblies.
Genome-wide CRISPR KO chemogenetic screen
Core protein assemblies were validated using a genome-wide CRISPR–Cas9 screen in MCF7 cells exposed to palbociclib treatment33 (Fig. 4a) (Gene Expression Omnibus accession no. GSE192525). This screen had been run previously using the GeCKO (genome-wide CRISPR KO) v2 library. Gene-level z scores (referred to as ‘normZ’) from that study were used to indicate the effects of gene KO on cell fitness in the context of CDK4/6 inhibition (Fig. 4b). As a reference, the cell fitnesses of gene KOs (provided as Chronos scores78) in the MCF7 cell line in the absence of CDK4/6i treatment (Fig. 4c) were obtained from the DepMap (Dependency Map) project34 (https://depmap.org/portal/).
Dual CRISPR KO combinatorial screen
The genome-wide chemogenetic data (above section) were complemented by a de novo dual CRISPR screen performed in-house in MCF7 (HTB-22), MCF10A (CRL-10317) and MDAMB231 (CRM-HTB-26) cell lines from American Type Culture Collection (ATCC) (Fig. 4a). Cells were grown in DMEM with 10% FBS, screened for Mycoplasma contamination by PCR and verified by short tandem repeat (STR) testing (IDEXX BioAnalytics). CRISPR–Cas9 nuclease was stably integrated by a lentivirus. LentiCas9-Blast (Addgene plasmid no. 52962) and lentiCRISPR v2 (Addgene plasmid no. 52961) were gifts from F. Zhang79. Blasticidin was used to select Cas9 stable integrants. Cas9 protein expression was confirmed by capillary western blot analysis (Wes, ProteinSimple). We constructed a library of double gRNA constructs targeting druggable targets (such as CDK4 and CDK6), tumor suppressors and oncogenes. Here, we analyzed a subset of data from individual genes from core assemblies (sgRNA1) together with CDK4 or CDK6 (sgRNA2) (Supplementary Table 3). The library was packaged into lentiviruses, and cells were infected to achieve a multiplicity of infection of 0.3. Puromycin (2.5 mg ml−1) selection was started 2 days after transduction. Selection continued for 7 days, after which puromycin was removed for the remainder of the screen. Cells were maintained in exponential growth by isolating and removing a fraction of cells every 2–3 days. We analyzed data from two time points at approximately 14 and 21 days. DNA was extracted from cells with a Blood and Cell Culture DNA Mini kit (Qiagen). To assess the relative frequencies of gRNAs before and after selection, we amplified gRNA sequences from genomic DNA by PCR and prepared them for HiSeq 4000 sequencing (Illumina). Standard Illumina primers were used for library preparation, and 100-bp paired-end reads were collected. Data quality was assessed with FastQC. The fitness effects of gene KOs at a time point were determined as the fold enrichment of a construct compared to the relative abundance of that construct in the plasmid library. Fitness measurements were normalized to the median fitness for nontargeting guides. The mean z score across two biological replicates, two time points and genes in each assembly was then determined and plotted (Fig. 4f,g).
Production of a dCas9-expressing stable cell line
CRISPRa experiments were performed in A549 cells stably expressing dCas9 together with the VPR transcriptional activation complex. For this purpose, 293T cells (CRL-3216, ATCC) were cotransfected with a second-generation packaging plasmid (pCMV-dR8.2, Addgene 8455), vesicular stomatitis virus-G envelope-expressing plasmid (pMD2.G, Addgene 12259) and dCas9–VPR lentiviral plasmid (hCMV-Blast-dCas9-VPR, Horizon Discovery) using Lipofectamine 3000 (Invitrogen, L3000015). Viral supernatant was collected and cleared of cell debris by centrifugation and Steriflip column (Millipore, SE1M003M00). Lentivirus was concentrated using Amicon Ultra-15 centrifugal filters (Millipore, Z706345). Viral titer was determined through serial dilution. Subsequently, A549 cells (CCL-185, ATCC) were grown in a virus-containing medium (DMEM/F12: 10% FBS, 100 IU ml−1 penicillin/streptomycin) with 8 µg ml−1 polybrene for 72 h, followed by medium washout and selection with blasticidin (3.5 µg ml−1) for 6 days. After selection, cells were cultured with maintenance-dose blasticidin (0.35 µg ml−1) every other passage. The identity of stable dCas9 A549 cells was confirmed by STR testing (IDEXX BioAnalytics, August 31, 2020).
CRISPRa screen
A custom panel of sgRNA expression plasmids targeting genes in the NeST:85 assembly was obtained from Horizon Discovery (Fig. 5 and Supplementary Table 4). Controls included an NTC sgRNA and an overexpression (positive control) sgRNA targeting OCT4 (not a component of NeST:85). dCas9–VPR stable A549 cells were plated in a complete medium and transfected the next day with sgRNA plasmids for 24 h using FuGENE HD (Promega). Cells were selected with puromycin (0.44 µg ml−1) for 48 h and then lifted onto appropriate plates for further experimentation, where they were permitted to recover for 72 h. RNA was collected using the TRIzol reagent (Invitrogen, 15596026) and the RNeasy Mini kit (Qiagen, 74104). cDNA was synthesized using the iScript cDNA kit (Bio-Rad, 1708891). qPCR was performed using SYBR green, and cycle threshold (Ct) values were compared for genes overexpressed by CRISPRa versus NTC samples (Extended Data Fig. 4c and Supplementary Table 4).
EdU assays for S-phase entry
Transfected cells were plated in collagen-coated glass-bottom 96-well plates in a complete medium containing palbociclib (4 µM) for 24 h. Components of EdU Click-iT (Thermo Fisher Scientific, C10337) were prepared as instructed. Cells were labeled for 4 h with 10 µM EdU-labeling solution in the medium and then counterstained with Hoechst dye (1:10,000) for 10 min. Cells were fixed in 3.75% formaldehyde for 10 min at room temperature and then washed, permeabilized and stained according to the manufacturer’s instructions. Images were collected using a Keyence microscope (BZ-X800) fitted with a 4× objective and green fluorescent protein/fluorescein isothiocyanate (Chroma, C209879) and DAPI (Chroma, C209877) filters. Images were processed in bulk using scikit-image80. Cells were identified using Hoechst counterstain and then assessed for EdU incorporation (Fig. 5e,f).
Capillary western assays for RB status
Transfected cells were treated with palbociclib for 24 h and then trypsinized and washed in cold PBS; pellets were frozen at –80 °C. Protein was extracted in a hot 1× MES SDS running buffer (Invitrogen, NP0002) for 10 min. Cooled samples were vortexed for 2 min with glass beads (Sigma, G8772). cOmplete EDTA-free protease inhibitor cocktail (Roche, 04693132001) and PhosSTOP (Roche, 4906845001) were added to the cleared lysate. Protein was quantified using the Pierce 660-nm protein assay reagent (Thermo Fisher Scientific, 22662). Protein analysis was performed on a capillary-based western blot system (Wes, ProteinSimple, product no. 004-600) according to the manufacturer’s instructions using the 12- to 230-kDa separation module (ProteinSimple, SM-W001) and either the anti-rabbit detection module (ProteinSimple, DM-001) or the anti-mouse detection module (ProteinSimple, DM-002). Protein samples were diluted to 1 μg ml−1 in 0.1× sample buffer (ProteinSimple, 042-195) and then mixed with fluorescent master mix and heated at 95 °C for 5 min. Anti-phospho-RB Ser807/811 (mouse monoclonal antibody, clone D20B12, 1:100, Cell Signaling, 8516) or anti-actin (rabbit polyclonal antibody, 2 μM, Novus, NB600-532) was used as the primary antibody, whereas a horseradish peroxidase-conjugated anti-rabbit antibody (ProteinSimple, DM-001) was used as a secondary antibody. Program settings were as follows: separation at 375 V, 25 min; blocking reagent, 15 min; 20-s wash (for runs with phospho-RB only); primary antibody blocking, 35 min; two 150-s washes; secondary antibody blocking, 35 min; 150-s wash; chemiluminescence detection, from 1 to 512 s. Electropherograms (Fig. 5g,h) were inspected to check whether automatic peak detection required manual correction.
Statistics and reproducibility
All wet laboratory experiments were performed in biological duplicates with three to four technical replicates. No statistical method was used to predetermine sample sizes. The experiments were not randomized. The investigators were not blinded to allocation during experiments or outcome assessments. EdU assays were evaluated computationally with data-quality threshold filters as described above. For the survival analysis, patients were excluded if they had been documented to receive a targeted therapy other than a CDK4/6i (that is, an mTOR or AKT inhibitor), as these other targeted therapies were not the focus of our study. Statistical tests were performed as described in each section assuming data were normally distributed where appropriate, but this was not formally tested.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We are grateful to A. Sun and J. Ma for their helpful comments. This work was supported by the National Institutes of Health through the Cancer Cell Map Initiative (CA274502), the National Resource for Network Biology (GM103504), the Network Data Exchange (CA184427) and other grants from the National Cancer Institute (CA236404 and F30CA236404).
Extended data
Author contributions
S.P., E.S. and A.S. contributed equally to the paper. S.P., E.S., A.S. and T.I. conceived the idea. S.P., E.S. and A.S. developed the methods. S.P., E.S., A.S., M.R.K. and T.I. interpreted the results. K.L., I.P. and C.F. conducted the CRISPRa experiment. S.F. and J.J.Y.L. conducted the dual CRISPR KO combinatorial screen experiment. M.R.K., X.Z., R.B. and T.I. reviewed the experimental analyses. X.Z., B.A.P., K.T.Y. and T.I. reviewed the data analyses. S.P., E.S., A.S. and T.I. wrote the paper. All authors reviewed the paper.
Peer review
Peer review information
Nature Cancer thanks Shom Goel, Jianyang Zeng and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data availability
The datasets used in this study are all publicly available. GDSC version 1: https://www.cancerrxgene.org/downloads/bulk_download; GDSC version 2: https://www.cancerrxgene.org/downloads/bulk_download; CTRP version 1: https://portals.broadinstitute.org/ctrp.v1/; CTRP version 2: https://portals.broadinstitute.org/ctrp.v2.1/; DepMap 22Q1: 10.6084/m9.figshare.19139906.v1; PDX,: https://www.nature.com/articles/nm.3954; Project GENIE: https://genie.cbioportal.org/study/summary?id=brca_akt1_genie_2019; genome-wide CRISPR KO chemogenetic screen: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE192525 (Gene Expression Omnibus accession no. GSE192525). The Cytoscape session containing the NeST-VNN hierarchy and the pretrained models are available on GitHub. Cytoscape session: https://github.com/idekerlab/nest_vnn/blob/main/misc/NeST_VNN_Palbociclib.cys; pretrained models: https://github.com/idekerlab/nest_vnn/tree/main/pretrained_models/palbociclib. Source data are provided with this paper.
Code availability
The source code of NeST-VNN is available on GitHub (https://github.com/idekerlab/nest_vnn). Other supporting software is available as follows: scikit-learn (http://scikit-learn.org/stable/index.html), PyTorch (http://pytorch.org).
Competing interests
T.I. is a cofounder, member of the advisory board and has an equity interest in Data4Cure and Serinus Biosciences. T.I. is a consultant for and has an equity interest in IDEAYA Biosciences. The terms of these arrangements have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. B.A.P. has received research support to the institution from Pfizer, Genentech/Roche, Novartis, GlaxoSmithKline and Oncternal Therapeutics and receives consulting income from Daré Bioscience.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Sungjoon Park, Erica Silva, Akshat Singhal.
Extended data
is available for this paper at 10.1038/s43018-024-00740-1.
Supplementary information
The online version contains supplementary material available at 10.1038/s43018-024-00740-1.
References
- 1.Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell100, 57–70 (2000). [DOI] [PubMed] [Google Scholar]
- 2.Goel, S., Bergholz, J. S. & Zhao, J. J. Targeting CDK4 and CDK6 in cancer. Nat. Rev. Cancer22, 356–372 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Watt, A. C. & Goel, S. Cellular mechanisms underlying response and resistance to CDK4/6 inhibitors in the treatment of hormone receptor-positive breast cancer. Breast Cancer Res.24, 17 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fassl, A., Geng, Y. & Sicinski, P. CDK4 and CDK6 kinases: from basic science to cancer therapy. Science375, eabc1495 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xu, X.-Q. et al. Intrinsic and acquired resistance to CDK4/6 inhibitors and potential overcoming strategies. Acta Pharmacol. Sin.42, 171–178 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gao, J. J. et al. CDK4/6 inhibitor treatment for patients with hormone receptor-positive, HER2-negative, advanced or metastatic breast cancer: a US Food and Drug Administration pooled analysis. Lancet Oncol.21, 250–260 (2020). [DOI] [PubMed] [Google Scholar]
- 7.Li, J. et al. Association of cyclin-dependent kinases 4 and 6 inhibitors with survival in patients with hormone receptor-positive metastatic breast cancer: a systematic review and meta-analysis. JAMA Netw. Open3, e2020312 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.McCartney, A. et al. Mechanisms of resistance to CDK4/6 inhibitors: potential implications and biomarkers for clinical practice. Front. Oncol.9, 666 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rafique, R., Islam, S. M. R. & Kazi, J. U. Machine learning in the prediction of cancer therapy. Comput. Struct. Biotechnol. J.19, 4003–4017 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yu, M. K. et al. Visible machine learning for biomedicine. Cell173, 1562–1565 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kim, Y.-A. et al. Identifying drug sensitivity subnetworks with NETPHIX. iScience23, 101619 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Jin, I. & Nam, H. HiDRA: hierarchical network for drug response prediction with attention. J. Chem. Inf. Model.61, 3858–3867 (2021). [DOI] [PubMed] [Google Scholar]
- 13.Ma, J. et al. Using deep learning to model the hierarchical structure and function of a cell. Nat. Methods15, 290–298 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kuenzi, B. M. et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell38, 672–684 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature598, 348–352 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Huang, X. et al. ParsVNN: parsimony visible neural networks for uncovering cancer-specific and drug-sensitive genes and pathways. NAR Genom. Bioinform.3, lqab097 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet.25, 25–29 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res.50, D687–D692 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zheng, F. et al. Interpretation of cancer mutations using a multiscale map of protein systems. Science374, eabf3067 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Frampton, G. M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol.31, 1023–1031 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Beaubier, N. et al. Clinical validation of the Tempus xT next-generation targeted oncology sequencing assay. Oncotarget10, 2384–2396 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Smyth, L. M. et al. Characteristics and outcome of AKT1E17K-mutant breast cancer defined through AACR Project GENIE, a clinicogenomic registry. Cancer Discov.10, 526–535 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature483, 603–607 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Seashore-Ludlow, B. et al. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov.5, 1210–1223 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Basu, A. et al. An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules. Cell154, 1151–1161 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature483, 570–575 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell166, 740–754 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gao, H. et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med.21, 1318–1325 (2015). [DOI] [PubMed] [Google Scholar]
- 29.Li, Z. et al. Loss of the FAT1 tumor suppressor promotes resistance to CDK4/6 inhibitors via the Hippo pathway. Cancer Cell34, 893–905 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Finn, R. S. et al. The cyclin-dependent kinase 4/6 inhibitor palbociclib in combination with letrozole versus letrozole alone as first-line treatment of oestrogen receptor-positive, HER2-negative, advanced breast cancer (PALOMA-1/TRIO-18): a randomised phase 2 study. Lancet Oncol.16, 25–35 (2015). [DOI] [PubMed] [Google Scholar]
- 31.DeMichele, A. et al. CDK 4/6 inhibitor palbociclib (PD0332991) in Rb+ advanced breast cancer: phase II activity, safety, and predictive biomarker assessment. Clin. Cancer Res.21, 995–1001 (2015). [DOI] [PubMed] [Google Scholar]
- 32.Garraway, L. A. & Lander, E. S. Lessons from the cancer genome. Cell153, 17–37 (2013). [DOI] [PubMed] [Google Scholar]
- 33.Carpintero-Fernández, P. et al. Genome wide CRISPR/Cas9 screen identifies the coagulation factor IX (F9) as a regulator of senescence. Cell Death Dis.13, 163 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tsherniak, A. et al. Defining a cancer dependency map. Cell170, 564–576 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Fukazawa, T. et al. Inhibition of Myc effectively targets KRAS mutation-positive lung cancer expressing high levels of Myc. Anticancer Res.30, 4193–4200 (2010). [PubMed] [Google Scholar]
- 36.Adam, G. et al. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis. Oncol.4, 19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Richards, A. L., Eckhardt, M. & Krogan, N. J. Mass spectrometry-based protein–protein interaction networks for the study of human diseases. Mol. Syst. Biol.17, e8792 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Go, C. D. et al. A proximity-dependent biotinylation map of a human cell. Nature595, 120–124 (2021). [DOI] [PubMed] [Google Scholar]
- 39.Salas, D., Stacey, R. G., Akinlaja, M. & Foster, L. J. Next-generation interactomics: considerations for the use of co-elution to measure protein interaction networks. Mol. Cell. Proteomics19, 1–10 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Heusel, M. et al. Complex-centric proteome profiling by SEC–SWATH–MS. Mol. Syst. Biol.15, e8438 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Qin, Y. et al. A multi-scale map of cell structure fusing protein images and interactions. Nature600, 536–542 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ji, W. et al. Combined androgen receptor blockade overcomes the resistance of breast cancer cells to palbociclib. Int. J. Biol. Sci.15, 522–532 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mao, P. et al. Acquired FGFR and FGF alterations confer resistance to estrogen receptor (ER) targeted therapy in ER+ metastatic breast cancer. Clin. Cancer Res.26, 5974–5989 (2020). [DOI] [PubMed] [Google Scholar]
- 44.Wang, T.-H. et al. Palbociclib induces DNA damage and inhibits DNA repair to induce cellular senescence and apoptosis in oral squamous cell carcinoma. J. Formos. Med. Assoc.120, 1695–1705 (2021). [DOI] [PubMed] [Google Scholar]
- 45.Fernández-Aroca, D. M. et al. P53 pathway is a major determinant in the radiosensitizing effect of palbociclib: implication in cancer therapy. Cancer Lett.451, 23–33 (2019). [DOI] [PubMed] [Google Scholar]
- 46.Pancholi, S. et al. Tumour kinome re-wiring governs resistance to palbociclib in oestrogen receptor positive breast cancers, highlighting new therapeutic modalities. Oncogene39, 4781–4797 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shu, S. et al. Synthetic lethal and resistance interactions with BET bromodomain inhibitors in triple-negative breast cancer. Mol. Cell78, 1096–1113 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhou, M. et al. Combining histone deacetylase inhibitors (HDACis) with other therapies for cancer therapy. Eur. J. Med. Chem.226, 113825 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wang, B. et al. Pharmacological CDK4/6 inhibition reveals a p53-dependent senescent state with restricted toxicity. EMBO J.41, e108946 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ji, W. et al. c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death Dis.11, 760 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wiesel-Motiuk, N. & Assaraf, Y. G. The key roles of the lysine acetyltransferases KAT6A and KAT6B in physiology and pathology. Drug Resist. Updat.53, 100729 (2020). [DOI] [PubMed] [Google Scholar]
- 52.Rokudai, S. et al. MOZ increases p53 acetylation and premature senescence through its complex formation with PML. Proc. Natl Acad. Sci. USA110, 3895–3900 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kitabayashi, I., Aikawa, Y., Nguyen, L. A., Yokoyama, A. & Ohki, M. Activation of AML1-mediated transcription by MOZ and inhibition by the MOZ–CBP fusion protein. EMBO J.20, 7184–7196 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Borrow, J. et al. The translocation t(8;16)(p11;p13) of acute myeloid leukaemia fuses a putative acetyltransferase to the CREB-binding protein. Nat. Genet.14, 33–41 (1996). [DOI] [PubMed] [Google Scholar]
- 55.Li, J. Y., Daniels, G., Wang, J. & Zhang, X. TBL1XR1 in physiological and pathological states. Am. J. Clin. Exp. Urol.3, 13–23 (2015). [PMC free article] [PubMed] [Google Scholar]
- 56.Tomita, A., Buchholz, D. R. & Shi, Y.-B. Recruitment of N-CoR/SMRT–TBLR1 corepressor complex by unliganded thyroid hormone receptor for gene repression during frog development. Mol. Cell. Biol.24, 3337–3346 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Perissi, V. et al. TBL1 and TBLR1 phosphorylation on regulated gene promoters overcomes dual CtBP and NCoR/SMRT transcriptional repression checkpoints. Mol. Cell29, 755–766 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Priebbenow, D. L. et al. Discovery of acylsulfonohydrazide-derived inhibitors of the lysine acetyltransferase, KAT6A, as potent senescence-inducing anti-cancer agents. J. Med. Chem.63, 4655–4684 (2020). [DOI] [PubMed] [Google Scholar]
- 59.Baell, J. B. et al. Inhibitors of histone acetyltransferases KAT6A/B induce senescence and arrest tumour growth. Nature560, 253–257 (2018). [DOI] [PubMed] [Google Scholar]
- 60.Su, J. et al. The role of MOZ/KAT6A in hematological malignancies and advances in MOZ/KAT6A inhibitors. Pharmacol. Res.174, 105930 (2021). [DOI] [PubMed] [Google Scholar]
- 61.Lira, M. E. et al. Abstract 2749. Liquid biopsy testing allows highly-sensitive detection of plasma cfDNA mutations in 87 breast cancer-related genes. Cancer Res.77, 2749 (2017). [Google Scholar]
- 62.Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell174, 1034–1035 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 448–456 (JMLR.org, 2015).
- 64.Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res.15, 1929–1958 (2014). [Google Scholar]
- 65.Jolliffe, I. Principal component analysis. in Wiley StatsRef: Statistics Reference Online (Wiley, 2014); 10.1002/9781118445112.stat06472
- 66.Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at arXiv10.48550/arXiv.1711.05101 (2017).
- 67.Breiman, L. Random forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
- 68.Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22 (2010). [PMC free article] [PubMed] [Google Scholar]
- 69.Hinton, G. E. Connectionist learning procedures. Artif. Intell.40, 185–234 (1989). [Google Scholar]
- 70.Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011). [Google Scholar]
- 71.Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodol.)36, 111–133 (1974). [Google Scholar]
- 72.Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: a next-generation hyperparameter optimization framework. in KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2623–2631 (Association for Computing Machinery, 2019); 10.1145/3292500.3330701
- 73.Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics42, 80–86 (2000). [Google Scholar]
- 74.McCullagh, P. & Nelder, J. A. Generalized Linear Models 2nd edn (CRC, 1989).
- 75.Tibshirani, R. Regression Shrinkage and Selection via the Lasso (Stanford University Department of Statistics, 1994).
- 76.Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics39, btac757 (2023). [DOI] [PMC free article] [PubMed]
- 78.Dempster, J. M. et al. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol.22, 343 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods11, 783–784 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.van der Walt, S. et al. scikit-image: image processing in Python. PeerJ2, e453 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets used in this study are all publicly available. GDSC version 1: https://www.cancerrxgene.org/downloads/bulk_download; GDSC version 2: https://www.cancerrxgene.org/downloads/bulk_download; CTRP version 1: https://portals.broadinstitute.org/ctrp.v1/; CTRP version 2: https://portals.broadinstitute.org/ctrp.v2.1/; DepMap 22Q1: 10.6084/m9.figshare.19139906.v1; PDX,: https://www.nature.com/articles/nm.3954; Project GENIE: https://genie.cbioportal.org/study/summary?id=brca_akt1_genie_2019; genome-wide CRISPR KO chemogenetic screen: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE192525 (Gene Expression Omnibus accession no. GSE192525). The Cytoscape session containing the NeST-VNN hierarchy and the pretrained models are available on GitHub. Cytoscape session: https://github.com/idekerlab/nest_vnn/blob/main/misc/NeST_VNN_Palbociclib.cys; pretrained models: https://github.com/idekerlab/nest_vnn/tree/main/pretrained_models/palbociclib. Source data are provided with this paper.
The source code of NeST-VNN is available on GitHub (https://github.com/idekerlab/nest_vnn). Other supporting software is available as follows: scikit-learn (http://scikit-learn.org/stable/index.html), PyTorch (http://pytorch.org).