Summary
Cancer cell line models are a cornerstone of cancer research, yet our understanding of how well they represent the molecular features of patient tumours remains limited. Our recent work provides a computational approach to systematically compare large gene expression datasets to better understand which cell lines most closely resemble each tumour type, as well as identify potential gaps in our current cancer models.
Subject terms: Cancer genomics, Data integration
Main
Cancer cell line models are a critical laboratory tool to cheaply and reproducibly study and manipulate cancer cells and have become a cornerstone of cancer research. Indeed, cancer cell lines have been used for the discovery and development of the majority of cell-intrinsic targeted therapeutics. Existing cell line models span the range of human cancers, and have been deeply characterised regarding their genomic and molecular features as part of the Cancer Cell Line Encyclopedia (CCLE).1 On top of this, large-scale efforts, such as the Cancer Dependency Map (DepMap),2 are systematically characterising the drug sensitivities and genetic dependencies of these cell lines. These systematic studies are beginning to illuminate the landscape of cancer vulnerabilities. However, cancer cell lines are imperfect models that do not recapitulate all aspects of tumour biology.
Indeed, cell lines likely provide an incomplete and biased representation of patient tumours. Sampling bias and selection pressure favour cancer types and cell states amenable to in vitro growth conditions, and likely leave important ‘gaps’ in cancer model representation. Furthermore, long-term cell culture is likely to introduce additional genetic and epigenetic changes that may create further systematic differences from patient tumours.3 There is thus a critical need to understand these systematic differences and identify which cell lines best represent particular patient tumour types and cell states and reveal the gaps in our preclinical models.
To answer these questions, one can leverage large pan-cancer databases of the genomic and molecular features of patient tumours, such as TCGA and TARGET. However, direct comparisons of cell line and tumour genomic profiles pose a number of technical challenges. Previous efforts have largely focused on comparing the genetic features,4 though such comparisons often require specifying particular features to compare up-front, and can be biased by the lack of matched normals for cell lines and global differences in the rates of mutations and copy number variations (CNVs). Comparisons based on gene expression profiles present a robust and highly general approach to characterise cell state that can be used to infer cancer subtypes and predict cancer vulnerabilities.5–7 Directly comparing cell line and tumour expression profiles can be strongly biased by systematic differences, and existing methods for ‘batch correction’ generally fail to account for several features of this problem. For example, tumour samples can be contaminated with multiple normal cell types whose composition and signatures can vary across different cancer types. Furthermore, crucial cancer types or transcriptional states may not be present in both datasets. There has thus been a need for computational methods that can align cell line and tumour RNA-Seq data to allow for integrated analyses to connect these datasets.
We thus developed Celligner to align gene expression data between cell lines and tumours without using any separate disease-type annotations.8 Celligner uses multiple recently developed computational tools to remove systematic biases in a flexible and unbiased manner. In the first step, it identifies expression patterns that exist selectively in one dataset, using contrastive PCA.9 This identifies and removes multiple expression patterns likely reflecting contaminating normal cells. Celligner then uses methods developed in the single-cell RNA-Seq literature to perform alignment across multiple datasets based on mutual nearest neighbours.10 This allows for flexible nonlinear correction of systematic differences, without assuming that the ‘subtype’ composition of the datasets is the same. Importantly, it performs alignment in an unbiased and unsupervised fashion, without requiring any annotations of cancer type.
Celligner produces good alignment of tumours and cell lines by cancer type and more granular subtypes, outperforming several existing methods. Furthermore, Celligner is also able to preserve expected differences between the composition of tumour and cell line datasets, like the presence of chromophobe renal cell carcinoma exclusively in tumour data. At the same time, it reveals large differences in how well different cell lines reflect the transcriptional state of corresponding patient tumours. For example, a majority of cell lines from certain lineages clustered with tumours of the same type, while others like oesophagus, thyroid, and brain showed poor agreement. Celligner also revealed a group of several hundred cell lines from diverse lineages that did not cluster with their corresponding tumour types, instead exhibiting a more mesenchymal, undifferentiated expression profile largely not seen in the tumour cohort used. These undifferentiated cell lines also exhibited distinct chemical and genetic vulnerabilities, raising questions of whether they might reflect a therapeutically relevant tumour cell state, such as cancer stem cells or an early metastatic state.
An important direction of research going forward will be expanding Celligner to incorporate additional datasets, new model types, and genomic features. For example, expanding the tumour datasets used with metastatic and drug-resistant samples will provide a more comprehensive reference set of cancer types and states. Furthermore, Celligner could be extended to allow integration across multiple datasets simultaneously, allowing for example integration of patient-derived organoid and xenograft models to assess whether they better recapitulate features of patient tumours. Furthermore, projects such as the Human Cancer Models Initiative (HCMI) have been initiated to produce public datasets from cancer models with matched patient tumour data, providing a mechanism to directly link in vitro models with clinical data.
Currently, Celligner only relies on transcriptional data. Incorporation of additional genomic features, such as mutations and copy numbers could provide a more comprehensive assessment of relationships between tumours and cancer models. Finally, the increasing availability of single-cell RNA-seq data from tumours11 could be utilised to compare cancer models with tumour cells more directly and address whether cell line models represent specific subpopulations of tumour cells.
Celligner and its extensions present new opportunities to draw a clearer link between cancer cell line models and patient tumour samples. We anticipate that such methods will expedite the path from preclinical studies to clinical trials, and guide future cancer cell model development efforts.
Acknowledgements
Not applicable.
Author contributions
J.N., F.V. and J.M.M. drafted and revised the paper.
Ethics approval and consent to participate
Not applicable.
Consent to publish
Not applicable.
Data availability
Not applicable.
Competing interests
F.V. receives research support from Novo Ventures. All authors were partially funded by the Cancer Dependency Map Consortium, but no consortium member was involved in or influenced the study.
Funding information
This work was supported by the Cancer Dependency Map Consortium.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Ghandi M, Huang FW, Jané-Valbuena J, Kryukov GV, Lo CC, McDonald ER, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 2019;569:503–508. doi: 10.1038/s41586-019-1186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Boehm JS, Garnett MJ, Adams DJ, Francies HE, Golub TR, Hahn WC, et al. Cancer research needs a better map. Nature. 2021;589:514–516. doi: 10.1038/d41586-021-00182-0. [DOI] [PubMed] [Google Scholar]
- 3.Tseng Y-Y, Boehm JS. From cell lines to living biosensors: new opportunities to prioritize cancer dependencies using ex vivo tumor cultures. Curr Opin. Genet. Dev. 2019;54:33–40. doi: 10.1016/j.gde.2019.02.007. [DOI] [PubMed] [Google Scholar]
- 4.Najgebauer H, Yang M, Francies HE, Pacini C, Stronach EA, Garnett MJ, et al. CELLector: genomics-guided selection of cancer in vitro models. Cell Syst. 2020;10:424–432. doi: 10.1016/j.cels.2020.04.007. [DOI] [PubMed] [Google Scholar]
- 5.Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, et al. Defining a cancer dependency map. Cell. 2017;170:564–576. doi: 10.1016/j.cell.2017.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Corsello SM, Nagari RT, Spangler RD, Rossen J, Kocak M, Bryan JG, et al. Discovering the anti-cancer potential of non-oncology drugs by systematic viability profiling. Nat. Cancer. 2020;1:235–248. doi: 10.1038/s43018-019-0018-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Marisa L, de, Reyniès, Duval A, Selves J, Gaub MP, Vescovo L, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10:e1001453. doi: 10.1371/journal.pmed.1001453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Warren A, Chen Y, Jones A, Shibue T, Hahn WC, Boehm JS, et al. Global computational alignment of tumor and cell line transcriptional profiles. Nat Commun. 2021;12:22. doi: 10.1038/s41467-020-20294-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Abid A, Zhang MJ, Bagaria VK, Zou J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 2018;9:2134. doi: 10.1038/s41467-018-04608-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36:421–427. doi: 10.1038/nbt.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rozenblatt-Rosen O, Regev A, Oberdoerffer P, Nawy T, Hupalowska A, Rood JE, et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell. 2020;181:236–249. doi: 10.1016/j.cell.2020.03.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Not applicable.
