Abstract
Investigating the human genome is vital for identifying risk factors and devising effective therapies to combat genetic disorders and cancer. Despite the extensive knowledge of the "light genome”, the poorly understood "dark genome" remains understudied. In this study, we integrated data from 20,412 protein-coding genes in Pharos and 8,395 patient-derived tumours from The Cancer Genome Atlas (TCGA) to examine the genetic and pharmacological dependencies in human cancers and their treatment implications. We discovered that dark genes exhibited high mutation rates in certain cancers, similar to light genes. By combining the drug response profiles of cancer cells with cell fitness post-CRISPR-mediated gene knockout, we identified the crucial vulnerabilities associated with both dark and light genes. Our analysis also revealed that tumours harbouring dark gene mutations displayed worse overall and disease-free survival rates than those without such mutations. Furthermore, dark gene expression levels significantly influenced patient survival outcomes. Our findings demonstrated a similar distribution of genetic and pharmacological dependencies across the light and dark genomes, suggesting that targeting the dark genome holds promise for cancer treatment. This study underscores the need for ongoing research on the dark genome to better comprehend the underlying mechanisms of cancer and develop more effective therapies.
Introduction
The human genome comprises approximately 19,000–25,000 protein-coding gene sequences, accounting for 1–2% of the genome [1–5]. Perturbations within the genome, such as environmental stress factors like ultraviolet (UV) radiation, irradiation, and chemicals, can result in changes in cellular phenotypes and behavioural alterations [6–10]. Such disruptions in protein-coding regions are closely associated with the onset of and susceptibility to human diseases, including cancer and other genetic disorders [11,12]. For example, tumour development and progression are strongly influenced by activated oncogenes and the inactivation of tumour suppressor genes, thereby providing cancer-specific hallmarks [9,13–17]. The introduction of high-throughput next-generation sequencing techniques has led to the identification of numerous known and novel causal genes [18,19] through the rapid sequencing of complete genomes and generation of extensive genome data [20–23]. Despite extensive research efforts to understand and combat cancer, it remains as the second leading cause of death worldwide [24–27].
The "dark genome", "ignorome", or "Tdark" comprises protein-coding genes with limited or no known function in literature, representing over a third of all genes [28]. This vast amount of unexplored genetic information highlights the significant gaps in our understanding of their roles and importance [28–30]. The lack of knowledge surrounding these genes poses a considerable obstacle to the advancement of personalised medicine, despite the ever-growing comprehension of the human genome [29,31]. While our understanding of the human genome has undeniably facilitated the diagnosis of genetic diseases and the development of targeted therapies for various pathological disorders, such as cancer [5], the dark genome represents a substantial challenge to the progress of personalised medicine [28]. The inability to decipher the functions and significance of these genes limits our ability to tailor medical treatments based on individual genetic variations, thereby impeding the full realisation of personalised medicine’s potential [28]. Overcoming this obstacle requires extensive research aimed at unravelling the mysteries of the dark genome, shedding light on its function, and unlocking its therapeutic potential.
Large molecular profiling projects, such as The Cancer Genome Atlas (TCGA) project [32], have extensively profiled human cancers, providing valuable insights into their molecular landscapes and potential therapeutic targets. The Achilles project utilises CRISPR technology to explore gene dependencies in cancer cells [33], whereas the Genomics of Drug Sensitivity in Cancer (GDSC) [34] and Cancer Cell Line Encyclopaedia (CCLE) [35] projects screen thousands of cancer cell lines for small-molecule inhibitor responses. These efforts have generated vast publicly accessible data for advancing cancer understanding and treatment and uncovering the mysteries of the dark genome. The Illuminating the Druggable Genome (IDG) project has compiled dark gene data from over 60 sources to identify new therapeutic targets and advance cancer understanding [30,36]. This effort led to the development of the Target Central Resource Database (TCRD), which is a comprehensive database that integrates various data types. To facilitate easy exploration and sharing of this data, a web-based platform called Pharos was created [37].
Integrating the different datasets from TCGA, Achilles, GDSC, CCLE, and IDG projects is essential for illuminating the dark genome and overcoming obstacles posed by unstudied human genes, thus facilitating the advancement of personalised medicine. In this study, we extensively analysed the distribution of genes in both light and dark genomes, investigated the mutation frequency of dark genes across various human cancer types, and assessed their impact on the chemosensitivity of cancer cell lines and aggressiveness of cancer. Our comprehensive analysis aimed to provide valuable insights into the role of the dark genome in human cancers and contribute to a better understanding of the genetic landscape of these diseases. Furthermore, by leveraging information from large-scale molecular profiling projects, we underscore the importance of continued research on the dark genome and the integration of multiple datasets to enhance our understanding of its impact on cancer, ultimately promoting more effective, targeted therapies for cancer and other genetic disorders.
Methods
The study protocol received approval from the University of Zambia; Health Sciences Research Ethics Committee (IRB00011000). The analyses in this study utilised publicly available datasets and de-identified clinical information collected by the TCGA, CCLE, Achilles, IDG, and GDSC projects. These datasets were made accessible through their respective project databases. The methods employed in this study adhered to the relevant policies, regulations, and guidelines established by the TCGA, CCLE, DepMap, IDG, and GDSC projects for the analysis of their datasets and the reporting of findings.
We analysed a dataset comprising 10,528 patient-derived tumours representing 32 distinct human cancers, obtained from cBioPortal [32] version 3.1.9 (http://www.cbioportal.org). The acquired data included somatic gene mutations (point mutations and small insertions/deletions), mRNA expression, and comprehensive de-identified clinical data. We further filtered the dataset to include only cancer studies with clinical information for profiled patients. The final datasets encompassed 8,395 patient-derived tumour samples representing 28 distinct human cancer types (see S1 File for cancer study details).
Distribution and research focus on light and dark genes across the target development levels (TDLs)
We obtained human gene classifications based on target development levels (TDLs) from the Pharos [30] interface (version 3.15.1) (https://pharos.nih.gov/). The IDG project classified proteins into four TDLs based on the level of clinical, biological, and chemical investigations conducted on each protein. These TDLs include light genes (Tbio, Tclin, and Tchem) and dark genes (Tdark) [38]:
Tclin proteins are drug targets associated with at least one approved drug, and their mechanism of action (MoA) is known [30,36].
Tchem proteins do not have established connections to approved drugs based on MoA. However, they are recognised for their exceptional ability to bind with high potency to small molecules, surpassing the bioactivity cutoff values: ≤ 30 nM for kinases, ≤ 100 nM for GPCRs and nuclear receptors, ≤ 10 μM for ion channels, and ≤ 1 μM for other target families [30,38].
Tbio are proteins with well-studied biology and meet certain criteria, such as having a fractional publication count above 5, being annotated with a Gene Ontology Molecular Function or Biological Process with an Experimental Evidence code and having confirmed OMIM phenotype(s) [30,36].
Tdark are understudied proteins that do not meet the criteria for the above three categories [36]. They meet at least two of the following conditions: A PubMed text-mining score < 5, ≤ 3 Gene RIFs or ≤ 50 Antibodies available per Antibodypedia [38].
We analysed a dataset of 20,412 targets: light genes (Tbio [n = 12,058], Tclin [n = 704], Tchem [n = 1,971]) and dark genes (Tdark [n = 5,679]). Additionally, we compared gene distribution, and TDLs between TCRD versions 4.3.4 and 6.13.4 to investigate gene classification changes over time. We also gathered information on antibody counts, monoclonal antibody counts, and PubMed publication counts from Pharos for further insight into the research focus within each TDL.
Determination of the extent to which dark genes are mutated in human cancer types
To assess the extent of dark gene mutations in cancer compared to light genes, we integrated TDL-annotated protein-coding gene information from the Pharos database with the TCGA project [32] dataset of 8,395 patient-derived tumours representing 28 distinct human cancers. We determined the overall mutation frequencies in genes of each developmental level (Tdark, Tclin, Tchem, and Tbio) across the tumours of each cancer type and all cancer types. These analyses enabled us to understand how dark genes are altered within and across the 28 most common human cancer types.
Assessment of the impact of Tdark genes on cancer cell lines
To further evaluate the potential role of dark genes in cancer, we mined datasets from the Achilles project at the DepMap Portal version 21Q1 on the fitness of over 80 cell lines derived from 35 different human cancer types following CRISPR knockouts of 18,333 individual genes. See https://depmap.org/portal/ for information on Achilles’ CRISPR-derived gene dependency descriptions. Briefly, a lower score indicates a higher likelihood of cell line dependency on a given gene. A score of 0 corresponds to a non-essential gene, while -1 corresponds to the median of all “common essential genes”. The database groups genes into four primary categories based on cell line fitness after CRISPR-mediated gene knockouts:
Common essential genes–consistently ranked in the top X most depleted genes in at least 90% of cell lines.
Strongly selective genes–genes with a dependency skewed-likelihood ratio test (LRT) value > 100.
Essential genes–associated with cell fitness in only one or a few cell lines, but with a dependency skewed-LRT value < 100.
Non-essential genes–show no effect on the cell fitness in any of the 688 tested cell lines.
We incorporated TDL information from Pharos and CRISPR data to examine the impact of Tdark genes on cancer cell lines. Our assessment involved comparing the mean gene dependency scores obtained from CRISPR data to determine the importance of genes within each developmental stage for cancer cell lines across all cancer types and within each cancer type.
Additionally, we calculated and compared the number of common essential and non-essential genes across the Pharos development levels (S1 File) to understand how gene dependence relates to TDL classes. We then compared the number of PubMed publications, monoclonal antibodies, and polyclonal antibodies available for each group of genes (common essential versus non-essential genes).
Impact of common essential genes on cell fitness
To further investigate gene essentiality, we collected mRNA transcription data for 756 cancer cell lines from the Cancer Cell Line Encyclopaedia (CCLE) database [35]. Using CRISPR-derived dependence scores, we identified essential genes across cell lines by counting the number of instances in which each gene’s score was less than -0.5. This threshold was chosen based on the recommendation of the Achilles project, indicating reduced cell fitness after CRISPR-mediated gene knockouts. We then tallied instances where the CRISPR-derived dependence score was less than -0.5 for each gene in each cell line to identify genes associated with a reduction in cell fitness across all cell lines. Genes with a significant number of instances were identified as essential, as they were critical for cell survival and growth. Finally, we compared Achilles CRISPR-based fitness scores with transcription profiles.
Evaluation of the extent to which the dark genome affects the chemosensitivity of cancer cells in relation to the light genome
We analysed drug-response data from the Genomics of Drug Sensitivity in Cancer (GDSC) database [34] (www.cancerRxgene.org), to investigate the influence of the dark genome on cancer cell chemosensitivity compared to the light genome. The GDSC database provides comprehensive information on human cancer cell lines treated with a wide range of anticancer drugs that target various signalling pathways. We retrieved the dose-responses for 380 cancer cell lines to 397 drugs that target components of 24 different pathways. Additionally, we used Pharos to obtain information on the TDL annotation of light and dark genes.
We conducted a series of analyses for each pathway gene (e.g., MFSD1) in the context of the CRISPR dependency scores and mutations in cancer cell lines. Firstly, we divided the cell lines into two groups: (1) those with a high CRISPR dependency score on the gene (e.g., MFSD1 dependence score < -0.5), and (2) those with low dependence on that gene (e.g., MFSD1 dependence score > 0.5). Subsequently, we used the Wilcoxon rank-sum test to compare the IC50 values of 397 pathway inhibitors between these two groups of cell lines. Additionally, for each pathway gene, we further categorised the cell lines into two groups: (1) those with mutations in the specific gene (e.g., cell lines with MFSD1 mutations), and (2) those without mutations in that gene (e.g., cell lines with no MFSD1 mutations). Next, we compared the logarithm-transformed IC50 values of each anticancer drug between these two groups of cell lines using the Wilcoxon rank-sum test.
By conducting these analyses, we aimed to determine the extent to which dark and light genes affect cancer chemosensitivity and to identify the types of drugs and targeted pathways that are more relevant for tumours with alterations in specific dark and light genes.
Assessing the impact of dark genome alterations on cancer aggressiveness
To evaluate the influence of dark genome alterations on cancer aggressiveness, we integrated genetic alteration data, including mRNA transcription changes and gene mutations of 28 human cancer types, with clinical outcomes of patients in the TCGA. First, for each cancer type, we segregated patients’ tumours into two groups: 1) those without mutations and 2) mutations across all dark genes, or 1) those expressing higher mRNA transcripts of a particular dark gene and 2) those expressing lower mRNA transcripts of a particular dark gene. We then applied the Kaplan-Meier method and Log-rank test to compare the duration of overall survival (OS) and disease-free survival (DFS) between the two groups of tumours across all cancer types.
By comparing the OS and DFS durations, we sought to determine whether alterations in dark genes influence cancer aggressiveness and, if so, the extent of their impact. Furthermore, we investigated the association between mRNA expression levels of specific dark genes and the aggressiveness of disease in each cancer types and across all the cancer types. This assessment allowed us to identify potential dark gene targets that could have clinical significance in the prognosis and treatment of cancer.
Statistical analyses
All statistical analyses were performed using MATLAB R2021a software. Where appropriate, we used the independent sample Student t-test, Welch test, the Wilcoxon rank-sum test and the one-way Analysis of Variance to compare groups of continuous variables. Statistical tests were considered significant at p < 0.05 for single comparisons, whereas the p-values of multiple comparisons were adjusted using the Bonferroni correction method.
Results
Distribution and potential of dark genes as therapeutic targets
We obtained human gene classification information based on target development levels (TDLs) from Pharos [version 3.15.1] (https://pharos.nih.gov), a multimodal web interface that presents data from the Target Central Resource Database (TCRD) [30,37]. Pharos classified genes/proteins into four TDLs: light genes (Tbio [n = 12,058], Tclin [n = 704], and Tchem [n = 1,971]), and dark genes (Tdark [n = 5,679]) (see S1A Fig and “Methods” section for the description of TDLs). Our evaluation of TDL genes revealed that only 3.5% are currently utilised as drug targets, suggesting that many genes have the potential to be developed as drug targets (Tchem), and that a significant proportion (27.8%) of the dark genome encoding Tdark proteins remains to be understood. This study further filtered the dataset to include genes with information on publications, monoclonal antibodies, and antibody counts. The final datasets encompassed 19,387 genes, including light genes (Tbio [n = 11,724], Tclin [n = 689], and Tchem [n = 1,946]) and dark genes (Tdark [n = 5,028]). The distribution of genes within each development level is illustrated in Fig 1A (see S1 File).
Fig 1. Distribution of dark and light genes and their mutations in human cancers.
A. Number of genes in each target development level. B. Percentage of cancers with mutations at each target development level. C. Normalised number of mutated genes at each target development level across the 28 cancer types. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
We observed a reduction in dark genes between TCRD versions 4.3.4 and 6.13.4, with version 4.3.4 containing 7,003 Tdark genes and version 6.13.4 with 5,679 Tdark genes, indicating a decrease of 1,324 genes (see S1B Fig). Most genes, including ACTR8, CSMD3, LSM3, and AASDH, which were previously classified as Tdark, are now categorised as Tbio, likely because of new information regarding their functions. This reduction in the number of dark genes between the two database versions reflects the progress in our understanding of the human genome and its potential as a source of novel therapeutic targets. Furthermore, this reveals that what is called “dark genes” currently may have a function after careful analysis and further studies.
Research focus and trends in target development levels
To determine the research focus for each development level, we performed a one-way analysis of variance (ANOVA) to compare the mean scores of target development levels on publication count, antibody count, and monoclonal antibody count. The statistical analysis revealed significant main effects of target development levels on publication count (F(3, 19383) = 6.07 x 103, p = 1 × 10−300) (Fig 2A), antibody count (F(3, 19383) = 3.85 x 103, p = 1 × 10−300) (Fig 2B), and monoclonal antibody count (F(3,19383) = 2.62 x103, p = 1 × 10−300) (Fig 2C). These results suggest significant impacts of target development levels on publication, antibody, and monoclonal antibody counts. We further analysed the differences between target development levels using Bonferroni post hoc comparisons, revealing that Tclin had the highest publication count (mean = 4.53) and Tdark the lowest (mean = 1.84). Similarly, Tclin had the highest antibody count (mean = 5.40), and Tdark had the lowest (mean = 2.85). Tclin also had the highest monoclonal antibody count (mean = 3.21), while Tdark had the lowest (mean = 0.39). Our findings suggest that the research focus is consistently high in Tclin, followed by Tchem, Tbio, and Tdark, offering valuable insights into research priorities and trends within different development levels and informing future research efforts.
Fig 2. Research distribution across target development levels.
Comparison of the a. publication count, b. antibody count, and c. Monoclonal antibody count among the four target development levels. Comparison of the d. publication count, e. antibody count and f. Monoclonal antibody counts between common essential (n = 2073) and non-essential (n = 722) genes. The boxplots indicate the distribution of the publication, antibody, and monoclonal antibody counts. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents the 75th percentile. The whiskers extend to the most extreme data points that are not considered outliers. The scatter points within each box plot show the overall distribution of data points.
To identify potential differences between essential and non-essential genes in research and development efforts, we assessed the mean publication count, antibody count, and monoclonal antibody count between the common essential (N = 2073) and non-essential (N = 722) genes in the Achilles project dataset. We found significantly higher mean publication counts for common essential genes (mean = 3.25) compared to non-essential genes (mean = 2.56) (Welch test: t = 14.66, p = 1.28 X 10−44) (Fig 2D). Additionally, both antibody and monoclonal antibody counts were significantly higher in common essential genes (mean antibody count = 4.47, mean monoclonal antibody count = 1.91) than in non-essential genes (mean antibody count = 4.02, mean monoclonal antibody count = 1.22), antibody count (t = 8.86, p = 2.97 × 10−18), and monoclonal antibody count (t = 14.66, p = 2.27 × 10−23) (Fig 2E and 2F), suggesting that essential genes receive more research attention.
Dark genes are as frequently mutated across human cancers as light genes
To investigate the extent of mutations in dark and light genes in cancer, we obtained a dataset of 8,395 human cancer cases across 28 primary tumours from TCGA [32], consisting of gene copy number alterations and somatic mutations. We integrated this information with the TDL classification of genes from Pharos and assessed the number of cancers with mutations in each TDL class. Our results revealed that most cancers harboured Tbio gene mutations (99.88%), followed by Tdark (97.71%), Tchem (96.68%), and Tclin (92.94%) (Fig 1B and S1 File).
We further compared the number of mutated dark genes to light genes across each of the 28 cancer types. Our findings showed that the extent of dark gene mutations varied greatly depending on cancer type. Specifically, UCEC (Uterine corpus endometrial carcinoma) exhibited the highest number of dark gene mutations (20.39%), followed by SKCM (Skin cutaneous melanoma) (18.25%), COADREAD (Colorectal cancer) (8.66%), LUAD (Lung adenocarcinoma (6.52%), LUSC (Lung squamous cell carcinoma) (6.52%), and CESC (Cervical squamous cell carcinoma) (4.90%). A more granular analysis of SKCM revealed the following normalised and absolute values: 0.17 (624) Tbio genes, 0.18 (118) Tchem genes, 0.19 (56) Tclin genes, and 0.18 (179) Tdark genes were highly mutated. Likewise, for UCEC, we found the following normalised and absolute values: 0.20 (716) Tbio genes, 0.20 (131) Tchem genes, 0.17 (50) Tclin genes, and 0.20 (200) Tdark genes were highly mutated (Fig 1C, also see S1C Fig). Our findings suggest numerous potential therapeutic targets for treating these cancer types, particularly Tbio, Tchem and Tdark genes.
Furthermore, our analysis delved into the specific mutated dark genes within each cancer type, shedding light on the most frequently affected genes. In SKCM, the following Tdark genes exhibited the highest mutation rates: PKHD1L1 (52.07%), DNAH9 (45.18%), THSD7B (44.90%), DNAH3 (43.52%), and RP1 (42.98%). Similarly, in UCEC, the five most frequently mutated Tdark genes, were MDN1 (20.71%), DNHD1 (19.72%), DNAH3 (19.53%), SSPO (19.33%), and DNAH9 (19.13%). Within COADREAD, the predominant mutations were observed in DCHS2 (16.98%), DNAH17 (13.74%), MDN1 (13.36%), UNC13C (13.36%), and SSPO (13.17%). For LUAD, the frequently mutated Tdark genes included DNAH9 (22.07%), SSPO (18.49%), BAGE2 (18.09%), ZNF831 (16.90%), and PKHD1L1 (15.71%). In the case of LUSC, the most mutated Tdark genes were PKHD1L1 (23.82%), DNAH9 (19.74%), BAGE2 (18.24%), ZNF804B (17.38%), and SPHKAP (17.38%). Similarly, CESC exhibited frequent mutations in MDN1 (8.36%), DNAH3 (8.36%), DNAH6 (8.36%), SSPO (8.00%), and FSIP2 (7.63%) among Tdark genes (S1 File).
Notably, PKHD1L1 emerged as the most frequently mutated Tdark gene across all 28 cancer types, highlighting its potential significance in cancer development and progression. Conversely, TP53 (Tchem) stood out as the most frequently mutated light gene across all 28 cancer types, especially in OV (Ovarian cancer) (94.5%) (S1 File). This finding aligns with previous observations and reinforces the established association between TP53 mutations and OV [39–47].
Dark genes strongly impact the fitness of cancer cells
To assess the importance of dark genes compared with light genes, we used Achilles [33] data to analyse the impact of genes in each target development level (Tclin, Tchem, Tbio, and Tdark) on cancer cell lines. In the Achilles project, gene dependency scores measure the effect of gene perturbation on cell fitness, where a score close to -1 indicates reduced fitness (increased dependency), a score close to 1 indicates increased fitness (reduced dependency), and a score of 0 indicates no change in fitness (independence).
A one-way ANOVA test was performed to compare the mean gene dependency scores among the target development levels, revealing a statistically significant difference, F(3, 14120423) = 5.29 x 104, P < 1 x 10−300. The mean gene dependency scores for Tclin, Tchem, Tbio, and Tdark were -0.10, -0.14, -0.17, and -0.07, respectively. Tbio had the lowest mean gene dependency score, signifying the greatest negative impact on cell viability, followed by Tchem, Tclin, and Tdark, with the least negative impact. However, the effect sizes suggest that this difference accounted for a relatively small proportion of the overall variance. Specifically, the eta-squared (η2) value was 0.0111, indicating that 1.11% of the total variance in our outcome variable can be attributed to group membership. The omega-squared (ω2) value was 0.0111, suggesting a similar proportion of explained variance when adjusted for bias. Given the very large sample size of 14,148,860 instances, these small effect sizes suggest that the practical significance of the differences between groups may be limited, despite the statistical significance indicated by the p-value (Fig 3).
Fig 3. Comparison of gene dependency scores among target development levels (TDLs) in cancer cell lines from the Achilles project.
Boxplots show the gene dependency scores corresponding to the Tbio, Tchem, Tclin, and Tdark TDLs. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents the 75th percentile. The whiskers extend to the most extreme data points that are not considered outliers. The scatter points within each box plot show the overall distribution of the data points.
Correlation between gene essentiality and mRNA expression of dark genes
The gene essentiality signature of cell lines has been reported to be related to their mRNA transcription signature [48]. Therefore, we sought to assess the impact of the common essential dark and light genes frequently mutated in cancer on cell fitness. To this end, we examined the correlation between mRNA expression and dependency scores. Among the top genes with the most significant correlation, we found that, among dark genes, the brain cancer cell lines showed significantly greater dependency on the dark gene NRDE2 than other cell lines (p < 1 x 10−300); Fig 4A). Furthermore, cell lines with NRDE2 mutations exhibited a greater dependency on NRDE2 expression than other cell lines (p < 1 × 10−300). Furthermore, pancreatic cancer cell lines showed significantly greater dependency on the dark gene WDR7 than other cell lines (p < 1 x 10−300); Fig 4B). Additionally, cell lines with WDR7 mutations exhibited a slightly reduced dependency on WDR7 expression than other cell lines (p < 1 × 10−300). This pattern of dependence was similar to that observed for many light genes, including CRNKL1 in breast cancer cell lines (Fig 4C) and MED14 in leukaemia cell lines (Fig 4D). These findings suggest that both dark and light genes play significant roles in cell fitness and may contribute to the development or progression of specific cancer types (S2 File). Consequently, these results provide valuable insights into the genetic mechanisms involved in these cancers and suggest that these genes could serve as potential targets for the development of novel therapies. Therefore, the CRISPR-based gene editing technology used in the Achilles project has the potential to provide valuable insights for developing new cancer treatments and ultimately improving outcomes for cancer patients [33,49–57].
Fig 4. Relationship between gene essentiality and mRNA expression.
A. From left to right: correlation between NRDE2 transcript levels and NRDE2 gene dependence scores, the mean difference in NRDE2 dependence between brain cancer cell lines and all other cancer cell lines, and the mean difference in NRDE2 dependence score between NRDE2 mutant cell lines and cell lines that do not harbour NRDE2 mutations. B. (Left) Correlation between WDR7 transcript levels and WDR7 dependence scores, and the mean difference in WDR7 dependence scores between pancreatic cancer cell lines and all other cancer cell lines. (Right) The mean difference in the WDR7 dependence score between WDR7 mutant cell lines and cell lines that did not harbour WDR7 mutations. C. (Left) correlation between CRNKL1 transcript levels and CRNKL1 dependence scores, and the mean difference in the CRNKL1 dependence score between breast cancer cell lines and all other cancer cell lines. (Right) The mean difference in the CRNKL1 dependence score between CRNKL1 mutant cell lines and cell lines that did not harbour CRNKL1 mutations. D (Left) correlation between MED14 transcript levels and MED14 dependence scores, and the mean difference in the MED14 dependence score between leukaemia cell lines and all other cancer cell lines. (Right) The mean difference in the MED14 dependence score between MED14 mutant cell lines and cell lines that do not harbour MED14 mutations.
We further hypothesised that common essential genes are likely to be highly expressed in cancer cell lines. Therefore, we compared the mean transcript levels between the common essential genes and other genes and found that the common essential genes are indeed significantly more highly expressed (Welch t-test; t = 709.7; p < 1 × 10−300); see S2 Fig) in each target development level. Overall, these findings suggest that the “common essential” genes may be a potential target for cancer treatment and further research is warranted to explore this possibility.
Dark and light genes similarly impact the chemosensitivity of cancer cells
The sensitivity of cancer cell lines to pathway inhibitors is influenced by various factors, such as genetic mutations [58,59], the targeted pathway [60–63], the specific inhibitor used [64,65], and the level of dependence on targeted pathway components [48]. Therefore, we investigated whether CRISPR-derived measures of cellular dependence on pathway components correlate with the response of cell lines to existing drug molecules that inhibit these components, which could inform the identification of optimal drug targets within the pathways (see “Methods” section).
This study classified cancer cell lines from the GDSC database into two categories based on their CRISPR-derived dependency on genes: one group with a higher CRISPR- derived dependency and the other with a lower dependency. We then compared the mean dose responses of 397 pathway inhibitors (S3A Fig, also see S3 File for the list of inhibitors) between these two groups of cancer cell lines, and only significant results were obtained (p-value < 0.05). We found that 164,225 cases met this criterion, indicating a significant association between pathway inhibitors and the response of cancer cell lines. Notably, these instances encompassed both light genes (Tbio [n = 114,660], and Tchem [n = 21,066]), Tclin [n = 7,173] and dark genes (Tdark [n = 21,326]) (see S3C Fig and S3 File). Among the cell lines dependent on dark genes, we found that the top three inhibitors with the most significant efficacy differences between the groups were cediranib (p = 5.3 × 10−19), sepantronium bromide (p = 2.8 × 10−17), and KRAS (G12C) inhibitor-12 (p = 1.9 × 10−16; Fig 5). These findings suggested that dark genes play a significant role in influencing specific cancer responses to these drugs. Regarding the light genes, the top three inhibitors that demonstrated notable differences in efficacy were nutlin-3a (-) (p = 7.9 × 10−58; MDM2 (Tchem)), rTRAIL (p = 9.8 × 10−32; PLAGL2 (Tbio)) and UNC0638 (p = 2.2 × 10−28; IRF4 (Tbio); see S4 Fig). Similarly, among the Tclin genes, the top three inhibitors that demonstrated notable differences in efficacy were afatinib (p = 8.9 × 10−25 (ERBB2) and p = 3.0 × 10−22 (EGFR)), SNX-2112 (p = 4.6 × 10−20) and daporinad (p = 4.7 × 10−20 see S5 Fig). These findings highlight the potential of using dependence scores and drug responses from GDSC to identify additional drug targets in cancer cell lines. Specifically, the Tclin genes associated with drugs whose dose response profiles vary can serve as potential targets for further exploration and therapeutic intervention. Additionally, we found that cell lines with high dependency on pathway genes showed better responses to pathway inhibitors than those with lower dependency (S3E Fig).
Fig 5. Relationship between Achilles gene dependence scores and the responses of cell lines to pathway inhibitors for Tdark genes.
Comparison of the dose-response profiles to pathway inhibitors (a: Cediranib, b: Sepantronium bromide, c: KRAS(G12C) Inhibitor-12, d: C-75, e: Vinorelbine, f: UNC0638, g: rTRAIL, h: Uprosertib, i: Paclitaxel) between the cancer cell lines with lower dependence (boxplots coloured blue) on signalling pathway and those with higher dependence (boxplots coloured red) on signalling pathway. Boxplots show logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents the 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot show the overall distribution of the data points.
This study compared the mean dose responses of 397 pathway inhibitors (S3B Fig, also see S3 File for the list of inhibitors) between cancer cell lines with and without specific gene mutations, and only significant results were obtained (p-value < 0.05). We identified 143,443 instances that met this criterion, indicating a significant association between the pathway inhibitors and the response of the cancer cell lines. Notably, these instances encompassed both light genes (Tbio [n = 92,931], Tchem [n = 18,191]), and Tclin [n = 8,039] and dark genes (Tdark [n = 24,282]) (see S3D Fig and S3 File). Among the cell lines with mutated dark genes, we found that the top three inhibitors with the most significant efficacy differences between groups include vinorelbine (p = 1.5 × 10−9), daporinad (p = 2.2 × 10−9), and Wee1 inhibitor (p = 4.7 × 10−9; Fig 6). These findings highlight the significant role of dark gene mutations in influencing specific cancer responses to these drugs. Additionally, the top three inhibitors associated with light genes included nutlin-3a (-) (p = 2.4 × 10−38; TP53 (Tchem)), paclitaxel (p = 1.8 × 10−14; PLEKHA5 (Tbio)) and uprosertib (p = 6.5 × 10−12; PTEN (Tchem)) (see S6 Fig). Furthermore, our study revealed that cancer cells with mutations in specific pathway genes are more responsive to pathway inhibitors than those without mutations (S3F Fig).
Fig 6. Relationship between mutations of different pathway genes and the responses of the cell lines to pathway inhibitors for Tdark genes.
Comparison of the dose-response profiles to pathway inhibitors (a: Vinorelbine, b: Daporinad, c: Wee 1 Inhibitor, d: Afatinib, e: Sepantronium bromide, f: Uprosertib, g: Vinblastine, h: PLX-4720, i: Trametinib) between the cancer cell lines without mutation (boxplots blue) on signalling pathway and those with mutation (boxplots coloured red) on signalling pathway. Boxplots show logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentiles, whereas the top edge of the box represents the 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot show the overall distribution of the data points.
Using a chi-squared test of independence, we examined the potential relationship between the target development level of genes and drug sensitivity in cancer cell lines, obtaining a p-value of 0.2133. The results indicated a lack of statistical significance in the observed association, suggesting that mutations in no specific gene class had a biased effect on the drug sensitivity of cancer cells.
We further analysed the response of the cell lines to inhibitors targeting 24 different signalling pathways (S3 File). Interestingly, we observed that some cell lines with CRISPR-derived dependency scores for specific pathway genes were sensitive to inhibitors of more than one pathway. For example, cell lines dependent on MEF2C, RUNX1, and ALAD were sensitive to 80%, 44%, and 44% respectively, of the PI3K/mTOR signalling inhibitors. Similarly, these cell lines also displayed sensitivities to 72%, 76%, and 40%, respectively, of the RTK pathway inhibitors profiled by GDSC (also see S3 File). Furthermore, we found that cell lines dependent on TUBB4B and KLF5 showed resistance to PI3K/mTOR signalling inhibitors (44% and 40% respectively) as well as to RTK pathway inhibitors (64% and 68% respectively), as profiled by the GDSC (see S3 File). Additionally, we observed that dark gene-dependent cell lines exhibited notable sensitivity to pathway inhibitors, particularly in the presence of mutations. For example, in cell lines with mutations in the PI3K/mTOR signalling pathway, we found that 14 of the top 50 genes were dark genes, including NPVF, FAM122C, and TMEM144, which were associated with mixed responses (i.e., significantly increased sensitivity to some of the inhibitors and significantly decreased sensitivity to others) (Fig 7B). Whereas cell lines with mutations in the RTK pathway, 10 were dark genes, including ANKRD39, OR7D2, and CCDC43, (7 associated with mixed response, 2 associated with resistance and 1 associated with increased sensitivity) (see S7B Fig).
Fig 7. PI3K/mTOR pathway.
The relationship between gene dependencies (a) or mutations (b) and drug responses across cancer cell lines in the PI3K/mTOR pathway. From top to bottom, panels indicate: Dependence; the overall CRISPR-derived gene dependence scores of the gene along that column (a). Overall mutation frequencies observed for the gene along the columns (b). Clustered heatmap; The marks on the heatmap are coloured based on how a high dependence on, or mutations in, the gene along each column affect the efficacy of the drug given along each row: (1) with green denoting significantly (10% false discovery rate) increased sensitivity, (2) grey for no statistically significant difference between cell line with a higher and lower dependence on the gene, or cell line with mutation in a gene and (3) orange denoting significantly increased resistance (for gene dependence and for gene mutations). The gene names (column labels) are coloured based on the overall calculated effect that high dependence on the gene has on the efficacy of the drug given along rows. Green: all the cell lines are significantly more sensitive to all the pathway inhibitors, orange; all the cell lines are significantly more resistant to all the pathway inhibitors, and black; a mixed response to pathway inhibitors. The bar graphs represent the total number of drugs whose dose-response is significantly increased (green) or decreased (orange).
Our findings highlight that CRISPR-derived estimates of the dependency on signalling pathway components can predict the responsiveness of different primary tumour types to pathway inhibitors. This information could be used to develop targeted therapies for cancers with dark gene mutations or a great dependence on dark genes, which may be more sensitive to pathway inhibitors and respond better to treatment.
Dark genes and their impact on cancer patients’ survival
We aimed to evaluate the aggressiveness of tumours in patients with genetic alterations in dark genes. We conducted an analysis of disease-free survival (DFS) and overall survival (OS) in cancer patients, considering the mutation status and mRNA expression levels of dark genes to ascertain if specific patient groups exhibited distinct clinical outcomes. We obtained and analysed a pan-cancer dataset from TCGA, which included mRNA expression levels, mutations, and completely de-identified clinical information (refer to the Methods section).
We categorised patients’ tumours into two groups based on their mutation status to assess their impact on OS and DFS. The first group, "tumours with Tdark mutation", comprised tumours with mutations in dark genes (3,887 samples), while the second group, "tumours without Tdark mutation," consisted of tumours lacking mutations in dark genes (5,463 samples). We investigated whether the two groups were associated with different clinical outcomes. Using the Kaplan-Meier method [66], we observed that patients with tumours harbouring dark gene mutations had significantly shorter OS periods (OS = 93.83 months) than those with tumours lacking dark gene mutations (OS = 126.18 months; Fig 8A) (log-rank test; p = 0.002). We also found that DFS periods were significantly shorter (log-rank test; p = 0.008) in patients with tumours containing dark gene mutations than in those without dark gene mutations (Fig 8B).
Fig 8. Kaplan–Meier survival curves depicting the impact of dark genes on the survival of patients with cancer.
Overall survival periods (a) and disease-free survival periods (b) of TCGA patients with tumours that had Tdark mutations and tumours without Tdark mutations. Kaplan-Meier curve of the overall survival periods (c) and disease-free survival periods (d) of TCGA patients with tumours that expressed high and low ARL6IP6 transcript levels.
To analyse OS and DFS based on mRNA expression levels, we employed the z-score normalisation method to segregate patients’ tumours into two groups: those with higher expression levels of a specific dark gene and those with lower expression levels of a specific dark gene (refer to the Methods section). These groups were defined as high or low expression according to the mRNA transcript levels of a particular dark gene. Using the Kaplan-Meier method, we found that, of the 3,347 dark genes whose expression varied across 8,395 patient tumours analysed, OS periods were significantly shorter for 978 (29.22%) dark genes in patients with tumours expressing high-mRNA transcripts of a specific dark gene. In comparison, OS periods were significantly shorter for 1,258 (37.59%) dark genes in patients with tumours expressing low-mRNA transcripts of a specific dark gene (see S4 File). DFS analysis revealed that periods were significantly shorter for 755 (22.56%) dark genes in patients with tumours expressing high-mRNA transcripts of a specific dark gene, whereas DFS periods were significantly shorter for 1,011 (30.21%) dark genes in patients with tumours expressing low-mRNA transcripts of a specific dark gene (see S4 File).
Among the dark genes, we found that patients with tumours with high expression of ARL6IP6 exhibited the shortest significant OS duration (OS = 57.07 months) compared to those with low gene expression (OS = 127.33 months; Fig 8C). We observed that DFS periods were significantly shorter (log-rank test; p = 9.44 x 10−13) for patients with tumours expressing high ARL6IP6 levels than those with low ARL6IP6 expression (Fig 8D). Additionally, patients with tumours expressing low RAI2 levels had the shortest significant OS duration (OS = 68.48 months) compared to those with high expression of this gene (OS = 174.84 months; see S8A Fig). We observed that DFS periods were significantly shorter (log-rank test; p = 6.75 x 10−19) for patients with tumours expressing low RAI2 levels than those with high RAI2 expression (see S8B Fig).
Moreover, we investigated the association between the mRNA expression levels of specific dark genes and the aggressiveness of disease in each cancer type and across all the cancer types. We found that, of the top 50 dark genes that exhibited a significant association with reduced OS in each of the 28 cancer types (p-value < 0.05), many genes demonstrated an impact on OS across multiple cancer types. For instance, ABCA11P is associated with reduced OS in both oesophageal carcinoma and glioblastoma multiforme. At the same time, GAGE1 showed a similar effect in five cancer types, including kidney chromophobe, kidney renal papillary cell carcinoma, liver hepatocellular carcinoma, lung squamous cell carcinoma, and prostate adenocarcinoma (see S9 Fig and S4 File). Notably, a significant proportion of genes (694 [17%]) was associated with reduced OS in three different cancer types (see S10A Fig). Additionally, our analysis revealed that the cancer types with the highest number of significant genes were LGG (Brain lower grade glioma) and KIRC (Kidney renal clear cell carcinoma), 1,251 and 1,135 genes, respectively (see S10B Fig and S4 File).
In summary, our survival analyses demonstrated that patients with tumours harbouring mutations in the dark genes had significantly shorter overall survival (OS) and disease-free survival (DFS) than those without such mutations. Similarly, a considerable proportion of dark genes showed that patients with tumours expressing high mRNA transcripts had significantly shorter OS and DFS periods than those with low mRNA transcripts. These findings indicate that both the mutation status and mRNA expression levels of dark genes may be useful prognostic markers for cancer patients. Moreover, our study unveiled dark genes whose mRNA expression levels are associated with the aggressiveness of the disease, both in particular cancer types and across multiple cancer types. The identification of these genes provides valuable insights into potential targets for therapeutic interventions and highlights the interconnectedness of specific genes across different cancer types.
Discussion
In this study, we investigated the potential roles of Tdark genes in cellular processes, cancer development, and progression, and their possible use as drug targets. We observed that Tdark genes constituted a substantial proportion (27.9%) of the genome, highlighting the need for further research to elucidate their function. Our findings also revealed that only a small percentage (3.4%) of Tclin genes are currently utilised as drug targets, indicating a significant potential for developing new drugs targeting relatively well-studied Tchem genes [36,37].
We identified an increase in the number of publications, antibody counts, and monoclonal antibody counts for essential genes compared with non-essential genes, suggesting a focus on essential genes in research [67–72]. However, we also found that Tdark proteins had the least amount of associated data, indicating a knowledge deficit [28,30,31,37]. Furthermore, our study revealed that Tdark genes are mutated as frequently as light genes in human cancers, with Tdark genes being mutated in 98.96% of all cancers analysed, suggesting significant roles in cancer development and progression.
Our investigation identified PKHD1L1 as the most frequently mutated Tdark gene across 28 types of cancer, indicating its potential as a broad therapeutic target. Notably, previous research [73–78] has also indicated the substantial involvement of PKHD1L1 in cancer, underscoring its probable significance in the development and progression of this disease. We also reported differences in the mean gene dependency scores among Tclin, Tchem, Tbio, and Tdark genes, suggesting that Tdark genes are also essential for the survival of the cancer cell lines and may be promising therapeutic targets [30,33,79]. Furthermore, our study identified critical genes that may be involved in the development and progression of brain cancer, pancreatic cancer, breast cancer, and leukaemia. Therefore, this study sheds light on the genetic mechanisms underlying these cancers, indicating that the identified genes hold promise as potential therapeutic targets.
We found that cancer cell lines with high dependency on pathway genes showed better responses to pathway inhibitors than those with lower dependency. Additionally, cancer cell lines with mutations in specific pathway genes were more responsive to pathway inhibitors than those without mutations, in agreement with a previous study [48]. These findings have important implications for the development of targeted therapies and personalised medicine approaches for cancer treatment. Moreover, our findings confirm previous reports that nutlin-3a exhibits sensitivity in cancer cell lines that are dependent on MDM2, while demonstrating resistance in cancer cell lines with TP53 mutations [80,81]. These results underscore the significance of considering the molecular characteristics of cancer cells, including MDM2 dependence and TP53 mutation status, when assessing the potential effectiveness of nutlin-3a as a therapeutic intervention.
Furthermore, our study revealed a significant association between mutations in Tdark genes and poorer clinical outcomes in terms of overall survival (OS) and disease-free survival (DFS) in cancer patients. In addition, patients with tumours harbouring Tdark gene mutations had significantly shorter OS and DFS periods than those without such mutations, suggesting that mutations in Tdark genes may be important predictors of clinical outcomes. Moreover, our analysis revealed a strong correlation between the expression of Tdark genes and OS and DFS. Notably, we observed that high expression of ARL6IP6 and low expression of RAI2 were particularly associated with the shortest OS and DFS periods. It is worth noting that previous studies have shown that decreased RAI2 expression is linked to poor prognosis in colorectal cancer [82,83] and breast cancer [84–86]. These findings underscore the potential significance of Tdark as a valuable prognostic marker in cancer.
In conclusion, our findings underscore the importance of incorporating genetic information into cancer treatment and highlight the potential of personalised medicine approaches. Furthermore, the results demonstrated that Tdark genes are important players in cancer development, warranting further research into their biological functions and potential as targets for cancer therapy.
Supporting information
a. Number of genes in each target development level. b. Number of genes in each target development level between Target Central Resource Database (TCRD) versions 4.3.4 and 6.13.4. c. Number of mutated genes at each target development level across 28 cancer types. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
(TIF)
(TIF)
a. The correlation between pathway inhibitors and CRISPR-derived gene dependency of cancer cell lines: analysis of 397 pathway inhibitors and 5,503 genes. b. The correlation between pathway inhibitors and cancer cell lines with and without specific gene mutations: analysis of 397 pathway inhibitors and 14,607 genes. c. The number of instances in which cell lines with varying levels of dependence on the signalling pathway significantly respond to pathway inhibitors at each target development level. d. The number of instances in which cell lines significantly respond to pathway inhibitors in the presence or absence of mutations in the signalling pathway at each target development level. e. Overall comparison of the dose-responses to pathway inhibitors between the cell lines with higher dependence on the signalling pathway and those with lower dependence on the signalling pathway. f. Overall comparison of the dose-responses to pathway inhibitors between the cell lines with mutations and without mutations in the signalling pathway. Boxplots show the logarithm transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol.
(TIF)
Comparison of the dose-response profiles to pathway inhibitors (a: Nutlin-3a (-), b: rTRAIL, c: UNC0638, d: Vinblastine, e: UNC0638, f: PRIMA-1MET, g: Afatinib, h: Paclitaxel, i: Paclitaxel) between the cancer cell lines with lower dependence (boxplots coloured blue) on signalling pathway and those with higher dependence (boxplots coloured red) on signalling pathway. Boxplots show the logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot shows the overall distribution of the data points.
(TIF)
Comparison of the dose-response profiles to pathway inhibitors (a: Afatinib, b: Afatinib, c: SNX-2112, d: Daporinad, e: UNC0638, f: GSK429286A, g: Taselisib, h: PLX-4720, i: Linsitinib) between the cancer cell lines with lower dependence (boxplots coloured blue) on signalling pathway and those with higher dependence (boxplots coloured red) on signalling pathway. Boxplots show the logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot shows the overall distribution of the data points.
(TIF)
Comparison of the dose-response profiles to pathway inhibitors (a: Nutlin-3a (-), b: Paclitaxel, c: Uprosertib, d: PLX-4720, e: (5Z)-7-Oxozeaenol, f: Daporinad, g: Docetaxel, h: Afatinib, i: Docetaxel) between the cancer cell lines without mutation (boxplots blue) on signalling pathway and those with mutation (boxplots coloured red) on signalling pathway. Boxplots show the logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot shows the overall distribution of the data points.
(TIF)
The relationship between gene dependencies (a) or mutations (b) and drug responses across cancer cell lines in the RTK pathway. From top to bottom, panels indicate: Dependence; the overall CRISPR-derived gene dependence scores of the gene along that column (a). Overall mutation frequencies observed for the gene along the columns (b). Clustered heatmap; The marks on the heatmap are coloured based on how a high dependence on, or mutations in, the gene along each column affect the efficacy of the drug given along each row: (1) with green denoting significantly (10% false discovery rate) increased sensitivity, (2) grey for no statistically significant difference between cell line with a higher and lower dependence on the gene, or cell line with mutation in a gene and (3) orange denoting significantly increased resistance (for gene dependence and for gene mutations). The gene names (column labels) are coloured based on the overall calculated effect that high dependence on the gene has on the efficacy of the drug given along rows. Green: all the cell lines are significantly more sensitive to all the pathway inhibitors, orange; all the cell lines are significantly more resistant to all the pathway inhibitors, and black; a mixed response to pathway inhibitors. The bar graphs represent the total number of drugs whose dose-response is significantly increased (green) or decreased (orange).
(TIF)
Overall survival periods (a) and disease-free survival periods (b) of TCGA patients with tumours that expressed high and low RAI2 transcript levels.
(TIF)
The plot highlights the presence of multiple overlapping genes in the screened dataset. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
(TIF)
a. Number of significant genes with mRNA expression associated with reduced overall survival (OS) in different cancer types. b. Gene-cancer type interaction network, comprised of genes linked to decreased overall survival (OS) across 28 cancer types, based on mRNA expression. Cancer types are coloured green, while genes are represented in grey. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
(TIF)
The spreadsheet contains the following results/datasets according to the sheet name. Pharos Data: The distribution of genes within each development level obtained from Pharos. Cancer studies: List and description of individual cancer studies from which our analyses are based. Cancer Mutations in each TDL: The number of cancers with mutations in each target development level (TDL) class related to Fig 1b. Specific Gene Mutations: Frequency of dark and light gene mutations across all cancer types for each gene. Percent Dark Gene Mutations: Percentage of dark gene mutations in each cancer type for each gene. Percent Light Gene Mutations: Percentage of light gene mutations in each cancer type for each gene. Common Essential Genes: Publication, antibody, and monoclonal antibody counts for common essential genes (both dark and light genes). Non-Essential Genes: Publication, antibody, and monoclonal antibody counts for non-essential genes (both dark and light genes).
(XLSX)
(XLSX)
The spreadsheet contains the following results/datasets according to the sheet name. Between Cell line Dose Responses: mean difference comparison of the dose-responses to pathway inhibitors between the cancer cell lines that have a higher dependence on dark and light genes and those with a lower dependence on dark and light genes as defined using the CRISPR-derived gene dependence scores (see methods section). Mutations Gene Drug Response: mean dose-response comparison between cell lines that have a mutation(s) in a particular gene versus those that do not have a mutation(s) in that particular gene. Pathways: List of pathways from which our analyses are based. Pathway Inhibitors: List of pathway inhibitors from which our analyses are based. The rest of the sheets contain drug sensitive and resistant genes for all 24 pathways (e.g., PI3K MTOR), PI3K MTOR CRISPR: genes that are associated with significantly increased sensitivity to different pathway inhibitors in cell lines demonstrating higher dependence on the specific gene(s) for their fitness. And genes that are associated with a significant resistance to different pathway inhibitors in cell lines demonstrating higher dependence on the specific gene(s) for their fitness. PI3K MTOR Mutation: genes that are associated with significantly increased sensitivity to different pathway inhibitors in cell lines that have mutations in the specific gene. And genes that are associated with significant resistance to different pathway inhibitors in cell lines that have mutations in the specific gene.
(XLSX)
The spreadsheet contains the following results/datasets according to the sheet name. OS-mRNA Across Cancer Types: Overall survival analysis between patients’ tumours with high and low expression of a particular dark gene calculated using the Log-rank test. DFS-mRNA Across Cancer Types: Disease free survival analysis between patients’ tumours with high and low expression of a particular dark gene calculated using the Log-rank test. The rest of the sheets contain overall survival analysis on the association between the mRNA expression levels of specific dark genes in each cancer type.
(XLSX)
Data Availability
All relevant data are within the paper and its Supporting Information files.
Funding Statement
The author(s) received no specific funding for this work.
References
- 1.Chi K. The dark side of the human Genome. Nature. 2016;538: 275–277. Available: doi: 10.1038/538275a [DOI] [PubMed] [Google Scholar]
- 2.Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. Multiple evidence strands suggest that theremay be as few as 19 000 human protein-coding genes. Hum Mol Genet. 2014;23: 5866–5878. doi: 10.1093/hmg/ddu309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pertea M, Shumate A, Pertea G, Varabyou A, Breitwieser FP, Chang YC, et al. CHESS: A new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 2018;19. doi: 10.1186/s13059-018-1590-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Abascal F, Juan D, Jungreis I, Martinez L, Rigau M, Rodriguez JM, et al. Loose ends: Almost one in five human genes still have unresolved coding status. Nucleic Acids Res. 2018;46: 7070–7084. doi: 10.1093/nar/gky587 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nerenz RD, Lefferts J. Our genome’s “Dark Matter” is the next frontier in molecular diagnostics. Clinical Chemistry. American Association for Clinical Chemistry Inc.; 2017. pp. 792–793. doi: 10.1373/clinchem.2016.268607 [DOI] [PubMed] [Google Scholar]
- 6.Richards AL, Watza D, Findley A, Alazizi A, Wen X, Pai AA, et al. Environmental perturbations lead to extensive directional shifts in RNA processing. PLoS Genet. 2017;13. doi: 10.1371/journal.pgen.1006995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Li Y, McGrail DJ, Latysheva N, Yi S, Babu MM, Sahni N. Pathway perturbations in signaling networks: Linking genotype to phenotype. Seminars in Cell and Developmental Biology. Elsevier Ltd; 2020. pp. 3–11. doi: 10.1016/j.semcdb.2018.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kafita D, Daka V, Nkhoma P, Zulu M, Zulu E, Tembo R, et al. High ELF4 expression in human cancers is associated with worse disease outcomes and increased resistance to anticancer drugs. PLoS One. 2021;16. doi: 10.1371/journal.pone.0248984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Amin ARMR Karpowicz PA, Carey TE, Arbiser J, Nahta R, Chen ZG, et al. Evasion of anti-growth signaling: A key step in tumorigenesis and potential target for treatment and prophylaxis by natural compounds. Seminars in Cancer Biology. Academic Press; 2015. pp. S55–S77. doi: 10.1016/j.semcancer.2015.02.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Thompson PA, Khatami M, Baglole CJ, Sun J, Harris SA, Moon EY, et al. Environmental immune disruptors, inflammation and cancer risk. Carcinogenesis. Oxford University Press; 2015. pp. S232–S253. doi: 10.1093/carcin/bgv038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jackson M, Marks L, May GHW, Wilson JB. The genetic basis of disease. Essays in Biochemistry. Portland Press Ltd; 2018. pp. 643–723. doi: 10.1042/EBC20170053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Albanaz A, Rodrigues C, Pires D, Ascher D. Combating mutations in genetic disease and drug resistance: understanding molecular mechanisms to guide drug design—CORE Reader. Expert Opin Drug Discov. 2017;12: 553–563. Available: doi: 10.1080/17460441.2017.1322579 [DOI] [PubMed] [Google Scholar]
- 13.Diederichs S, Bartsch L, Berkmann JC, Fröse K, Heitmann J, Hoppe C, et al. The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non‐coding RNA and synonymous mutations. EMBO Mol Med. 2016;8: 442–457. doi: 10.15252/emmm.201506055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhao M, Kim P, Mitra R, Zhao J, Zhao Z. TSGene 2.0: An updated literature-based knowledgebase for Tumor Suppressor Genes. Nucleic Acids Res. 2016;44: D1023–D1031. doi: 10.1093/nar/gkv1268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wang LH, Wu CF, Rajasekaran N, Shin YK. Loss of tumor suppressor gene function in human cancer: An overview. Cellular Physiology and Biochemistry S. Karger AG; 2018. pp. 2647–2693. doi: 10.1159/000495956 [DOI] [PubMed] [Google Scholar]
- 16.Kontomanolis EN, Koutras A, Syllaios A, Schizas D, Mastoraki A, Garmpis N, et al. Role of oncogenes and tumor-suppressor genes in carcinogenesis: A review. Anticancer Research. International Institute of Anticancer Research; 2020. pp. 6009–6015. doi: 10.21873/anticanres.14622 [DOI] [PubMed] [Google Scholar]
- 17.Chandrashekar P, Ahmadinejad N, Wang J, Sekulic A, Egan JB, Asmann YW, et al. Somatic selection distinguishes oncogenes and tumor suppressor genes. Bioinformatics. 2020;36: 1712–1717. doi: 10.1093/bioinformatics/btz851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Leiserson MDM, Vandin F, Wu HT, Dobson JR, Eldridge J V., Thomas JL, et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2015;47: 106–114. doi: 10.1038/ng.3168 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chaitankar V, Karakülah G, Ratnapriya R, Giuste FO, Brooks MJ, Swaroop A. Next generation sequencing technology and genomewide data analysis: Perspectives for retinal research. Progress in Retinal and Eye Research. Elsevier Ltd; 2016. pp. 1–31. doi: 10.1016/j.preteyeres.2016.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502: 333–339. doi: 10.1038/nature12634 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kingsmore SF, Henderson A, Owen MJ, Clark MM, Hansen C, Dimmock D, et al. Measurement of genetic diseases as a cause of mortality in infants receiving whole genome sequencing. NPJ Genom Med. 2020;5. doi: 10.1038/s41525-020-00155-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Petrikin JE, Willig LK, Smith LD, Kingsmore SF. Rapid whole genome sequencing and precision neonatology. Seminars in Perinatology. W.B. Saunders; 2015. pp. 623–631. doi: 10.1053/j.semperi.2015.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pervez MT, Hasnain MJU, Abbas SH, Moustafa MF, Aslam N, Shah SSM. A Comprehensive Review of Performance of Next-Generation Sequencing Platforms. BioMed Research International. Hindawi Limited; 2022. doi: 10.1155/2022/3457806 [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 24.Nagai H, Kim YH. Cancer prevention from the perspective of global cancer burden patterns. Journal of Thoracic Disease. AME Publishing Company; 2017. pp. 448–451. doi: 10.21037/jtd.2017.02.75 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tran KB, Lang JJ, Compton K, Xu R, Acheson AR, Henrikson HJ, et al. The global burden of cancer attributable to risk factors, 2010–19: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet. 2022;400: 563–591. doi: 10.1016/S0140-6736(22)01438-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lin L, Li Z, Yan L, Liu Y, Yang H, Li H. Global, regional, and national cancer incidence and death for 29 cancer groups in 2019 and trends analysis of the global cancer burden, 1990–2019. J Hematol Oncol. 2021;14. doi: 10.1186/s13045-021-01213-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stefanoudakis D, Kathuria-Prakash N, Sun AW, Abel M, Drolen CE, Ashbaugh C, et al. The Potential Revolution of Cancer Treatment with CRISPR Technology. Cancers (Basel). 2023;15: 1813. doi: 10.3390/cancers15061813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Oprea TI. Exploring the dark genome: implications for precision medicine. Mammalian Genome. Springer New York LLC; 2019. pp. 192–200. doi: 10.1007/s00335-019-09809-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pandey AK, Lu L, Wang X, Homayouni R, Williams RW. Functionally enigmatic genes: A case study of the brain ignorome. PLoS One. 2014;9. doi: 10.1371/journal.pone.0088889 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Oprea TI, Bologa CG, Brunak S, Campbell A, Gan GN, Gaulton A, et al. Unexplored therapeutic opportunities in the human genome. Nature Reviews Drug Discovery. Nature Publishing Group; 2018. pp. 317–332. doi: 10.1038/nrd.2018.14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Brown SDM, Lad H V. The dark genome and pleiotropy: challenges for precision medicine. Mammalian Genome. 2019;30: 212–216. doi: 10.1007/s00335-019-09813-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45: 1113–1120. doi: 10.1038/ng.2764 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell. 2015;163: 1515–1526. doi: 10.1016/j.cell.2015.11.015 [DOI] [PubMed] [Google Scholar]
- 34.Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, et al. Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41. doi: 10.1093/nar/gks1111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ghandi M, Huang FW, Jané-Valbuena J, Kryukov G V., Lo CC, McDonald E, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 2019;569: 503–508. doi: 10.1038/s41586-019-1186-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sheils T, Mathias SL, Siramshetty VB, Bocci G, Bologa CG, Yang JJ, et al. How to Illuminate the Druggable Genome Using Pharos. Curr Protoc Bioinformatics. 2020;69. doi: 10.1002/cpbi.92 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Nguyen DT, Mathias S, Bologa C, Brunak S, Fernandez N, Gaulton A, et al. Pharos: Collating protein information to shed light on the druggable genome. Nucleic Acids Res. 2017;45: D995–D1002. doi: 10.1093/nar/gkw1072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lin Y, Mehta S, Küçük-McGinty H, Turner JP, Vidovic D, Forlin M, et al. Drug target ontology to classify and integrate drug discovery data. J Biomed Semantics. 2017;8. doi: 10.1186/s13326-017-0161-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Brachova P, Mueting SR, Carlson MJ, Goodheart MJ, Button AM, Mott SL, et al. TP53 oncomorphic mutations predict resistance to platinum- and taxane-based standard chemotherapy in patients diagnosed with advanced serous ovarian carcinoma. Int J Oncol. 2015;46: 607–618. doi: 10.3892/ijo.2014.2747 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang Y, Cao L, Nguyen D, Lu H. TP53 mutations in epithelial ovarian cancer. Translational Cancer Research. AME Publishing Company; 2016. pp. 650–663. doi: 10.21037/tcr.2016.08.40 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Iwanicki MP, Chen HY, Iavarone C, Zervantonakis IK, Muranen T, Novak M, et al. Mutant p53 regulates ovarian cancer transformed phenotypes through autocrine matrix deposition. JCI Insight. 2016;1. doi: 10.1172/jci.insight.86829 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Oien DB, Chien J. TP53 mutations as a biomarker for high-grade serous ovarian cancer: Are we there yet? Translational Cancer Research. AME Publishing Company; 2016. pp. S264–S268. doi: 10.21037/tcr.2016.07.45 [DOI] [Google Scholar]
- 43.Parkinson CA, Gale D, Piskorz AM, Biggs H, Hodgkin C, Addley H, et al. Exploratory Analysis of TP53 Mutations in Circulating Tumour DNA as Biomarkers of Treatment Response for Patients with Relapsed High-Grade Serous Ovarian Carcinoma: A Retrospective Study. PLoS Med. 2016;13. doi: 10.1371/journal.pmed.1002198 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Silwal-Pandit L, Langerød A, Børresen-Dale AL. TP53 mutations in breast and ovarian cancer. Cold Spring Harb Perspect Med. 2017;7. doi: 10.1101/cshperspect.a026252 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tuna M, Ju Z, Yoshihara K, Amos CI, Tanyi JL, Mills GB. Clinical relevance of TP53 hotspot mutations in high-grade serous ovarian cancers. Br J Cancer. 2020;122: 405–412. doi: 10.1038/s41416-019-0654-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Yamulla RJ, Nalubola S, Flesken-Nikitin A, Nikitin AY, Schimenti JC. Most Commonly Mutated Genes in High-Grade Serous Ovarian Carcinoma Are Nonessential for Ovarian Surface Epithelial Stem Cell Transformation. Cell Rep. 2020;32. doi: 10.1016/j.celrep.2020.108086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wallis B, Bowman KR, Lu P, Lim CS. The Challenges and Prospects of p53-Based Therapies in Ovarian Cancer. Biomolecules. MDPI; 2023. doi: 10.3390/biom13010159 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sinkala M, Nkhoma P, Mulder N, Martin DP. Integrated molecular characterisation of the MAPK pathways in human cancers reveals pharmacologically vulnerable mutations and gene dependencies. Commun Biol. 2021;4. doi: 10.1038/s42003-020-01552-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Padayachee J, Singh M. Therapeutic applications of CRISPR/Cas9 in breast cancer and delivery potential of gold nanomaterials. Nanobiomedicine. SAGE Publications Ltd; 2020. doi: 10.1177/1849543520983196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zhang H, Qin C, An C, Zheng X, Wen S, Chen W, et al. Application of the CRISPR/Cas9-based gene editing technique in basic research, diagnosis, and therapy of cancer. Molecular Cancer. BioMed Central Ltd; 2021. doi: 10.1186/s12943-021-01431-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chira S, Nutu A, Isacescu E, Bica C, Pop L, Ciocan C, et al. Genome Editing Approaches with CRISPR/Cas9 for Cancer Treatment: Critical Appraisal of Preclinical and Clinical Utility, Challenges, and Future Research. Cells. MDPI; 2022. doi: 10.3390/cells11182781 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tufail M. Genome editing: An essential technology for cancer treatment. Medicine in Omics. 2022;4: 100015. doi: 10.1016/j.meomic.2022.100015 [DOI] [Google Scholar]
- 53.Katti A, Diaz BJ, Caragine CM, Sanjana NE, Dow LE. CRISPR in cancer biology and therapy. Nature Reviews Cancer. Nature Research; 2022. pp. 259–279. doi: 10.1038/s41568-022-00441-w [DOI] [PubMed] [Google Scholar]
- 54.Liu Z, Shi M, Ren Y, Xu H, Weng S, Ning W, et al. Recent advances and applications of CRISPR-Cas9 in cancer immunotherapy. Molecular Cancer. BioMed Central Ltd; 2023. doi: 10.1186/s12943-023-01738-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Rabaan AA, AlSaihati H, Bukhamsin R, Bakhrebah MA, Nassar MS, Alsaleh AA, et al. Application of CRISPR/Cas9 Technology in Cancer Treatment: A Future Direction. Current Oncology. MDPI; 2023. pp. 1954–1976. doi: 10.3390/curroncol30020152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Meng H, Nan M, Li Y, Ding Y, Yin Y, Zhang M. Application of CRISPR-Cas9 gene editing technology in basic research, diagnosis and treatment of colon cancer. Frontiers in Endocrinology. Frontiers Media S.A.; 2023. doi: 10.3389/fendo.2023.1148412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.McLean B, Istadi A, Clack T, Vankan M, Schramek D, Neely GG, et al. A CRISPR Path to Finding Vulnerabilities and Solving Drug Resistance: Targeting the Diverse Cancer Landscape and Its Ecosystem. Advanced Genetics. 2022;3: 2200014. doi: 10.1002/ggn2.202200014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zhang N, Wang H, Fang Y, Wang J, Zheng X, Liu XS. Predicting Anticancer Drug Responses Using a Dual-Layer Integrated Cell Line-Drug Network Model. PLoS Comput Biol. 2015;11. doi: 10.1371/journal.pcbi.1004498 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wang Y, Fang J, Chen S. Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties. Sci Rep. 2016;6. doi: 10.1038/srep32679 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wang X, Sun Z, Zimmermann MT, Bugrim A, Kocher JP. Predict drug sensitivity of cancer cells with pathway activity inference. BMC Med Genomics. 2019;12. doi: 10.1186/s12920-018-0449-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ben-Hamo R, Jacob Berger A, Gavert N, Miller M, Pines G, Oren R, et al. Predicting and affecting response to cancer therapy based on pathway-level biomarkers. Nat Commun. 2020;11. doi: 10.1038/s41467-020-17090-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhao W, Li J, Chen MJM, Luo Y, Ju Z, Nesser NK, et al. Large-Scale Characterization of Drug Responses of Clinically Relevant Proteins in Cancer Cell Lines. Cancer Cell. 2020;38: 829–843.e4. doi: 10.1016/j.ccell.2020.10.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Tang YC, Gottlieb A. Explainable drug sensitivity prediction through cancer pathway enrichment. Sci Rep. 2021;11. doi: 10.1038/s41598-021-82612-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, et al. Modeling cancer drug response through drug-specific informative genes. Sci Rep. 2019;9. doi: 10.1038/s41598-019-50720-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Xia F, Allen J, Balaprakash P, Brettin T, Garcia-Cardona C, Clyde A, et al. A cross-study analysis of drug response prediction in cancer cell lines. Brief Bioinform. 2022;23. doi: 10.1093/bib/bbab356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Goel M, Kishore J, Khanna P. Understanding survival analysis: Kaplan-Meier estimate. Int J Ayurveda Res. 2010;1: 274. doi: 10.4103/0974-7788.76794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Zhang W, Quevedo J, Fries GR. Essential genes from genome-wide screenings as a resource for neuropsychiatric disorders gene discovery. Transl Psychiatry. 2021;11. doi: 10.1038/s41398-021-01447-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Blomen VA, Májek P, Jae LT, Bigenzahn JW, Nieuwenhuis J, Staring J, et al. Gene essentiality and synthetic lethality in haploid human cells. Science (1979). 2015;350: 1092–1096. doi: 10.1126/science.aac7557 [DOI] [PubMed] [Google Scholar]
- 69.Chen H, Zhang Z, Jiang S, Li R, Li W, Zhao C, et al. New insights on human essential genes based on integrated analysis and the construction of the HEGIAP web-based platform. Brief Bioinform. 2019;21: 1397–1410. doi: 10.1093/bib/bbz072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Schonfeld E, Vendrow E, Vendrow J, Schonfeld E. On the relation of gene essentiality to intron structure: A computational and deep learning approach. Life Sci Alliance. 2021;4. doi: 10.26508/lsa.202000951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Caldu-Primo JL, Verduzco-Martínez JA, Alvarez-Buylla ER, Davila-Velderrain J. In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints. NAR Genom Bioinform. 2021;3. doi: 10.1093/nargab/lqab063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Cacheiro P, Smedley D. Essential genes: a cross-species perspective. Mammalian Genome. Springer; 2023. doi: 10.1007/s00335-023-09984-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Kohrs B. Exploring the Role of the PKHD1L1 Gene in Epithelial Cancer Cells. Denison Student Scholarship. 2021; 47. Available: https://digitalcommons.denison.edu/studentscholarship/47. [Google Scholar]
- 74.Zheng C, Quan R, Xia EJ, Bhandari A, Zhang X. Original tumour suppressor gene polycystic kidney and hepatic disease 1-like 1 is associated with thyroid cancer cell progression. Oncol Lett. 2019;18: 3227–3235. doi: 10.3892/ol.2019.10632 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Wang Y, Zhang L, Chen Y, Li M, Ha M, Li S. Screening and identification of biomarkers associated with the diagnosis and prognosis of lung adenocarcinoma. J Clin Lab Anal. 2020;34. doi: 10.1002/jcla.23450 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Han LK, Huai QL, Guo W, Song P, Kong DM, Gao SG. Identification of prognostic genes in lung adenocarcinoma immune microenvironment. Chinese Medical Journal. Lippincott: Williams and Wilkins; 2021. pp. 2125–2127. doi: 10.1097/CM9.0000000000001367 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Al-Dherasi A, Huang QT, Liao Y, Al-Mosaib S, Hua R, Wang Y, et al. A seven-gene prognostic signature predicts overall survival of patients with lung adenocarcinoma (LUAD). Cancer Cell Int. 2021;21. doi: 10.1186/s12935-021-01975-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Yang Y, Pang Q, Hua M, Huangfu Z, Yan R, Liu W, et al. Excavation of diagnostic biomarkers and construction of prognostic model for clear cell renal cell carcinoma based on urine proteomics. Front Oncol. 2023;13. doi: 10.3389/fonc.2023.1170567 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, et al. High-throughput discovery of novel developmental phenotypes. Nature. 2016;537: 508–514. doi: 10.1038/nature19356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Crane EK, Kwan SY, Izaguirre DI, Tsang YTM, Mullany LK, Zu Z, et al. Nutlin-3a: A potential therapeutic opportunity for TP53 wild-type ovarian carcinomas. PLoS One. 2015;10. doi: 10.1371/journal.pone.0135101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Walter RFH, Werner R, Wessolly M, Mairinger E, Borchert S, Schmeller J, et al. Inhibition of MDM2 via Nutlin-3A: A Potential Therapeutic Approach for Pleural Mesotheliomas with MDM2-Induced Inactivation of Wild-Type P53. J Oncol. 2018;2018. doi: 10.1155/2018/1986982 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Yan W, Wu K, Herman JG, Xu X, Yang Y, Dai G, et al. Retinoic acid-induced 2 (RAI2) is a novel tumor suppressor, and promoter region methylation of RAI2 is a poor prognostic marker in colorectal cancer. Clin Epigenetics. 2018;10. doi: 10.1186/s13148-018-0501-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Zhang W, Kong L, Zhu H, Sun D, Han Q, Yan B, et al. Retinoic Acid-Induced 2 (RAI2) Is a Novel Antagonist of Wnt/β-Catenin Signaling Pathway and Potential Biomarker of Chemosensitivity in Colorectal Cancer. Front Oncol. 2022;12. doi: 10.3389/fonc.2022.805290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Esposito M, Kang Y. RAI2: Linking retinoic acid signaling with metastasis suppression. Cancer Discov. 2015;5: 466–468. doi: 10.1158/2159-8290.CD-15-0352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Werner S, Brors B, Eick J, Marques E, Pogenberg V, Parret A, et al. Suppression of early hematogenous dissemination of human breast cancer cells to bone marrow by retinoic acid–induced 2. Cancer Discov. 2015;5: 506–519. doi: 10.1158/2159-8290.CD-14-1042 [DOI] [PubMed] [Google Scholar]
- 86.Jiao Y, Li S, Gong J, Zheng K, Xie Y. Comprehensive analysis of the expression and prognosis for RAI2: A promising biomarker in breast cancer. Front Oncol. 2023;13. doi: 10.3389/fonc.2023.1134149 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
a. Number of genes in each target development level. b. Number of genes in each target development level between Target Central Resource Database (TCRD) versions 4.3.4 and 6.13.4. c. Number of mutated genes at each target development level across 28 cancer types. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
(TIF)
(TIF)
a. The correlation between pathway inhibitors and CRISPR-derived gene dependency of cancer cell lines: analysis of 397 pathway inhibitors and 5,503 genes. b. The correlation between pathway inhibitors and cancer cell lines with and without specific gene mutations: analysis of 397 pathway inhibitors and 14,607 genes. c. The number of instances in which cell lines with varying levels of dependence on the signalling pathway significantly respond to pathway inhibitors at each target development level. d. The number of instances in which cell lines significantly respond to pathway inhibitors in the presence or absence of mutations in the signalling pathway at each target development level. e. Overall comparison of the dose-responses to pathway inhibitors between the cell lines with higher dependence on the signalling pathway and those with lower dependence on the signalling pathway. f. Overall comparison of the dose-responses to pathway inhibitors between the cell lines with mutations and without mutations in the signalling pathway. Boxplots show the logarithm transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol.
(TIF)
Comparison of the dose-response profiles to pathway inhibitors (a: Nutlin-3a (-), b: rTRAIL, c: UNC0638, d: Vinblastine, e: UNC0638, f: PRIMA-1MET, g: Afatinib, h: Paclitaxel, i: Paclitaxel) between the cancer cell lines with lower dependence (boxplots coloured blue) on signalling pathway and those with higher dependence (boxplots coloured red) on signalling pathway. Boxplots show the logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot shows the overall distribution of the data points.
(TIF)
Comparison of the dose-response profiles to pathway inhibitors (a: Afatinib, b: Afatinib, c: SNX-2112, d: Daporinad, e: UNC0638, f: GSK429286A, g: Taselisib, h: PLX-4720, i: Linsitinib) between the cancer cell lines with lower dependence (boxplots coloured blue) on signalling pathway and those with higher dependence (boxplots coloured red) on signalling pathway. Boxplots show the logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot shows the overall distribution of the data points.
(TIF)
Comparison of the dose-response profiles to pathway inhibitors (a: Nutlin-3a (-), b: Paclitaxel, c: Uprosertib, d: PLX-4720, e: (5Z)-7-Oxozeaenol, f: Daporinad, g: Docetaxel, h: Afatinib, i: Docetaxel) between the cancer cell lines without mutation (boxplots blue) on signalling pathway and those with mutation (boxplots coloured red) on signalling pathway. Boxplots show the logarithm-transformed mean IC50 values of the cancer cell lines of each group. On each box, the central mark indicates the median, and the bottom edge represents the 25th percentile, whereas the top edge of the box represents 75th percentile. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. The scatter points within each box plot shows the overall distribution of the data points.
(TIF)
The relationship between gene dependencies (a) or mutations (b) and drug responses across cancer cell lines in the RTK pathway. From top to bottom, panels indicate: Dependence; the overall CRISPR-derived gene dependence scores of the gene along that column (a). Overall mutation frequencies observed for the gene along the columns (b). Clustered heatmap; The marks on the heatmap are coloured based on how a high dependence on, or mutations in, the gene along each column affect the efficacy of the drug given along each row: (1) with green denoting significantly (10% false discovery rate) increased sensitivity, (2) grey for no statistically significant difference between cell line with a higher and lower dependence on the gene, or cell line with mutation in a gene and (3) orange denoting significantly increased resistance (for gene dependence and for gene mutations). The gene names (column labels) are coloured based on the overall calculated effect that high dependence on the gene has on the efficacy of the drug given along rows. Green: all the cell lines are significantly more sensitive to all the pathway inhibitors, orange; all the cell lines are significantly more resistant to all the pathway inhibitors, and black; a mixed response to pathway inhibitors. The bar graphs represent the total number of drugs whose dose-response is significantly increased (green) or decreased (orange).
(TIF)
Overall survival periods (a) and disease-free survival periods (b) of TCGA patients with tumours that expressed high and low RAI2 transcript levels.
(TIF)
The plot highlights the presence of multiple overlapping genes in the screened dataset. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
(TIF)
a. Number of significant genes with mRNA expression associated with reduced overall survival (OS) in different cancer types. b. Gene-cancer type interaction network, comprised of genes linked to decreased overall survival (OS) across 28 cancer types, based on mRNA expression. Cancer types are coloured green, while genes are represented in grey. ACC: Adenoid cystic carcinoma; BRCA: Breast cancer; CESC: Cervical squamous cell carcinoma; CHOL: Cholangiocarcinoma; COADREAD: Colorectal cancer; ESCA: Oesophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck squamous cell carcinoma; KICH: Kidney chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute myeloid leukaemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: Pheochromocytoma and paraganglioma; PRAD: Prostate adenocarcinoma; SARC: Sarcoma; SKCM: Skin cutaneous melanoma; TGCT: Testicular germ cell tumours; THCA: Thyroid carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; UCS: Uterine carcinosarcoma; UVM: Uveal melanoma.
(TIF)
The spreadsheet contains the following results/datasets according to the sheet name. Pharos Data: The distribution of genes within each development level obtained from Pharos. Cancer studies: List and description of individual cancer studies from which our analyses are based. Cancer Mutations in each TDL: The number of cancers with mutations in each target development level (TDL) class related to Fig 1b. Specific Gene Mutations: Frequency of dark and light gene mutations across all cancer types for each gene. Percent Dark Gene Mutations: Percentage of dark gene mutations in each cancer type for each gene. Percent Light Gene Mutations: Percentage of light gene mutations in each cancer type for each gene. Common Essential Genes: Publication, antibody, and monoclonal antibody counts for common essential genes (both dark and light genes). Non-Essential Genes: Publication, antibody, and monoclonal antibody counts for non-essential genes (both dark and light genes).
(XLSX)
(XLSX)
The spreadsheet contains the following results/datasets according to the sheet name. Between Cell line Dose Responses: mean difference comparison of the dose-responses to pathway inhibitors between the cancer cell lines that have a higher dependence on dark and light genes and those with a lower dependence on dark and light genes as defined using the CRISPR-derived gene dependence scores (see methods section). Mutations Gene Drug Response: mean dose-response comparison between cell lines that have a mutation(s) in a particular gene versus those that do not have a mutation(s) in that particular gene. Pathways: List of pathways from which our analyses are based. Pathway Inhibitors: List of pathway inhibitors from which our analyses are based. The rest of the sheets contain drug sensitive and resistant genes for all 24 pathways (e.g., PI3K MTOR), PI3K MTOR CRISPR: genes that are associated with significantly increased sensitivity to different pathway inhibitors in cell lines demonstrating higher dependence on the specific gene(s) for their fitness. And genes that are associated with a significant resistance to different pathway inhibitors in cell lines demonstrating higher dependence on the specific gene(s) for their fitness. PI3K MTOR Mutation: genes that are associated with significantly increased sensitivity to different pathway inhibitors in cell lines that have mutations in the specific gene. And genes that are associated with significant resistance to different pathway inhibitors in cell lines that have mutations in the specific gene.
(XLSX)
The spreadsheet contains the following results/datasets according to the sheet name. OS-mRNA Across Cancer Types: Overall survival analysis between patients’ tumours with high and low expression of a particular dark gene calculated using the Log-rank test. DFS-mRNA Across Cancer Types: Disease free survival analysis between patients’ tumours with high and low expression of a particular dark gene calculated using the Log-rank test. The rest of the sheets contain overall survival analysis on the association between the mRNA expression levels of specific dark genes in each cancer type.
(XLSX)
Data Availability Statement
All relevant data are within the paper and its Supporting Information files.