Abstract
Single-cell sequencing technologies have undergone rapid development and adoption by the scientific community in the past five years, fueling discoveries about the etiology, pathogenesis, and treatment responsiveness of individual tumor cells within cancer ecosystems. Most of the advancements in our understanding of cancer with these new technologies have focused on basic tumor biology. However, the knowledge produced by these and other studies are beginning to provide biomarkers and drug targets for clinically-relevant subpopulations within a tumor, creating opportunities for the development of biologically-informed, clone-specific combination treatment strategies. Here we provide an overview of the development of the field of single-cell cancer sequencing and provide a roadmap for shepherding these technologies from research tools to diagnostic instruments that provide high-resolution, treatment-directing details of tumors to clinical oncologists.
Introduction
The idea of squencing an entire human genome in a few days was unthinkable twenty years ago as the Human Genome Project came to a close [1]. Yet, over the next decade, the development of massively parallel sequencing technologies made that vision a reality [2]. Today, a single Ilumina Novaseq instrument can acquire high-quality sequence data from 48 genomes in under two days. This massive expansion in the quantity of sequence data from cancer samples has included large consortiums like The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and Therapeutically Applicable Research to Generate Effective Treatments (TARGET) have provided catalogues of sequence variants recurrently detected within and across cancer types [3, 4]. This has enabled the discovery and cataloguing of novel genomic variants across cancer types, some of which inform more precise treatment strategies for that patient population. Still, the magnitude of change in patient outcomes from this new genomic information has been limited in most patients [5].
Tumor Heterogeneity—The Major Obstacle for Effective Personalized Therapy
The major reason reason that many malignancies become resistant to current precision treatment strategies is intra-tumoral heterogeneity [6]. Tumors are heterogeneous on many levels, including cellular composition, intercellular genetics and epigenetics, and metabolic requirements [7]. In addition, each of these processes can be spatially distinct within a tumor in response to the local microenvironment. That heterogeneity has been captured by a number of studies that have compared -omic profiles of biopsies taken from different locations in the same tumor [8, 9]. Those differences in tumor populations can create differential clonal sensitivity to specific drugs, with the most resistant populations surviving therapy and ultimating causing disease recurrence [10]. This phenomena has been documented by performing serial measurements as patients go from initial treatment to disease recurrence, which have shown that minor clones harbor or acquire mutations in genes that directly alter the activity of specific drugs by changing target binding affinity, metabolism, or ability to enter and stay in the target cell [7]. Although tumor evolution in response to therapy is well documented, the magnitude of cellular diversity within tumors remains poorly described, as most genomic studies are done on bulk samples composed of tens of thousands of cells mixed together from a single tumor location.
Single-Cell Genomic Interrogation of Tumor Heterogeneity
To begin to accurately define the therapeutic challenges posed by tumor heterogeneity, we must first define the molecular landscape at the cellular level. Single-cell technologies have several important advantages over standard tissue-level methods [11, 12]. First, as seen in Figure 1, current bulk studies are done using samples composed of tens of thousands of cells mixed together, making the estimation a given cell contributes to that measurement difficult. For example, if a cell is extremely rare within the population the mRNA expression, epigenomics changes, or genomic variants contributed by that cell to the sequencing signal could be completely lost. In addition, the sensitivity of genotyping tissues at the cellular level scales the number of cells, whereas bulk sequencing is limited by the sequencing error rate.
Figure 1. Single cell genome sequencing has higher resolution than bulk sequencing.
Single cell genomics enables measurements of biolofical contributions of each cell to healthy or diseased tissues, whereas bulk sequencing only reflects an average signal of mixed cells.
In theory, single-cell methods could identify variants in one in a million or more cells if enough cells are analyzed. Finally, single-cell studies enable the measurement of the co-occurrence of the abundance of macromoleculea and other phenotypic features within the same cells, enabling the correlation of those features in those cells, as well as the creation of more complex biomarkers of disease features than is currently possible [13, 14]. Still, the concept of sequencing the entire genome or transcriptome of a single cell was unthinkable just ten years ago-- we can now sequence selected nucleic acid from millions of cells in a single experiment, which required overcoming a number of challenging technical barriers.
The Development of Single-Cell Technologies
The idea to study tumors using the fundamental unit of human life—the cell—has been around for many years. In fact, some technologies that are routinely used in clinical oncology, such as fluorescent in situ hybridization, cytogenetics, and flow cytometry, have been making single cell measuremnts for decades [15]. However, each of those technologies made a small number of measurements from each cell, limiting the breadth of information from those tumors that could be studied. More global studies of macromolecules have become possible over the past twenty years, but those technologies have not been sensitive enough to study those molecules within individual cells.
The development of single-cell genomics has required two general areas of technological development: tools for easily isolating and manipulating single cells and new biochemical methods that faithfully copy the minute nucleic acid content of individual cells [16]. The first area has seen a range of technological innovations, most of which have centered around the micromanipulation of cells using microfluidics [17, 18]. These tools have included the use of thousands of controllable valves, the use have arrays of thousands of miniature wells, and the partitioning of cells into oil or gel droplets. More recently, investigators have been developing methods for bypassing complex microfluidic handlers through the use of creative barcoding schemes paired with standard lab equipment for cell isolation [19, 20]. This is an area of intense interest within the bioengineering field, and new developments are likely to increase the accuracy, throughput, and ease of use of these devices in the coming years.
In parallel to the development of cell manipulation methods, there have rapid advancements in the biochemical tools required to sequence and quantify the nucleic acid in a single cell [21]. The first methods for amplifying nucleic acid from a single-cell relied on PCR. However, the limitations of those strategies quickly became apparent as only a small fraction of the genome or transcriptome could be interrogated. Methods for transcriptomewide amplification that were sensitive enough to detect biologically-relevant variation between cells started with the creation of the Smart-Seq method. That initial approach has undergone additional improvements that have enabled increases in sensitivity, as well as the incorporation of cell barcodes for parallel analyses of thousands of cells in a single experiment.
Single-cell DNA sequencing has lagged behind the advancement achieved by the field of single-cell transcriptomics [13]. This is in part due to the much higher requirements for detecting biological variation between cells. Current scRNA-seq methods capture 5–10% of the mRNA transcripts, which is sufficient for classifying many cell types and states. Cancer cells may only have a few thousand or less genetic changes in a three billion base pair genome, with the vast majority of genomic bases between even normal and malignant cells the same. In addition, only a small number of variants contribute to a given phenotype of a cell. As a result, tumor genomic profiling must capture the majority of the genome with high accuracy to identify biologically-relevant changes that define specific cell types.
Multiple Displacement Amplification (MDA) was a major advancement in our ability to amplify minute DNA samples [22, 23]. Despite issues with the unevenness of amplification and the invention of many competing methods, MDA has remained the gold standard method for most single-cell sequencing applications due to its high coverage breadth and low error rate. We recently invented an method that achieves the coverage breadth and low error rate of MDA, but introduces irreversible terminators into the reaction to alter the kinetics of the reaction. The result is a more uniform process that is able to call over 90% of the genetic variants from a single cell [24]. However, unlike single-cell transcriptomics, there is not yet a viable microfluidic solution for the parallel and accurate detection of genomic variants in thousands of single cells.
Single-Cell Epigenome and Proteome Profiles
Epigenetic states also regulate specific gene expression programs that contribute to cancer formation and treatment resistance, and the changes can be heterogeneous between clonal populations of cells [25, 26]. These markers can be inherited by daughter cells, making understanding the interactions between the methylation of specific regions, gene expression, and genomic variants critical to understand how the genome interacts with the state of that cell to create healthy or disease-associated phenotypes.
Methods have been developed for sequencing methylated cytosines from single cells [27, 28]. However, the tools developed to date, including bisulfite coversion-based strategies, result in the loss of a significant proportion of the genome. Assay for transposase-accessible chromatin using sequencing (ATAC-Seq) is able to measure open chromatin in single cells, which is an indirect measurement of the epigenetic state of that region [29]. The direct mesurement of histone and other relevant epigenetic marks have been more challenging to adopt in single cell protocols, although methods are currently under development. The creation of oligo-tagged antibodies have enabled the parallel quantification of large number of proteins in single cells, which could be applied to study single-cell phospho-signaling [30]. Finally, posttranslational modifications, such as glycosylation, can be queried in single cells.
Computational Challenges for Single-Cell Genomics
An important parallel advancement that must occur for the creation of single-cell genomic-based clinical diagnostics is the development of robust computational methods for identifying and quantifying specific biological molecules within cells. For most single-cell genomic technologies, the processing and alignment of sequencing files has utilized existing bulk sequencing methods, as there is not a clear advantage for modifying those tools for single-cell sequencing data [31]. After the sequencing files have been aligned, the major challenge is dealing with the increase in noise, including the loss of data loss and decreased uniformity of data representation after amplifying nucleic acid from a single cell. Conequently, there has been a great deal of work for single-cell RNA sequencing to optimize the normalization of transcript counts that take those technical artifacts into account, as well as to develop visualization tools that enable users to identify patterns of relationships between cells based on their global gene expression profiles [32].
The computational tools are much less developed for single-cell genomic and epigenomic sequencing. Still, the concepts are similar in that there are missing and noisy data that need to be considered when identifying biological variation between cells. There are single cell-specifc variant and methylation callers that have been developed to take into account these artifacts [33, 34]. The development of tools used to reconstruct phylogenetic relationships between cells is also an active area of development [35]. Finally, there are methods being developed to integrate different types of single cell data to acquire a higher dimensional biological view of each cell. Still, a great deal of work needs to be done developing computational methods that provide reproducible biological insights from single cell data.
A Roadmap for Single-Cell-Based Precision Oncology
The care of oncology patients has become much more complex over the past few decades with the advent of complex combination treatment protocols [36]. More recently, the incorporation of immunotherapy regimens and advanced genomic studies into standard treatment algorithms has further pushed our treatments towards patient-centric interventions based on many data points rather than uniform interventions for large patient populations [37]. Still, the outcomes for many advanced cancer types have not significantly improved, highlighting the significant unmet gaps in our knowledge -- our current oncological treatment strategies are more precise than ever, but are still not precise enough to improve the outcomes of many patients.
Single-cell technologies bring cancer diagnostics and treatment to the fundamental unit of disease formation. The integration of genetic, epigenetic, transcriptomic, proteomic, and posttranslational modifications in the same cells will create datasets for annotating features associated with specific phenotypes, such as cells that are able to survive treatment (Figure 2). In addition, localizing these cells within tissues, as well as specific molecules within cells in those tissues will provide an additional layer of information [38–40]. These rich datasets can then be mined for features that can inform all areas of oncologic care — from making the correct diagnosis to providing better prognostication to informing the development and ongoing modification of disease-specific treatment regimens. The aspiration is for a level of detail that will provide the required precision to cure most patients diagnosed with all types of malignancies.
Figure 2. Integrative single-cell multiomics has potential for precision medicine.
Single-cell multiomics enables accurate identification of clinically relevant cells for developing new diagnostic classification, higher-resolution prognostic markers, and biologically-informed therapy modification as patients undergo treament.
This vision includes improvments in the control and eradication of metastatic disease, which is still extraordinarily difficult to manage in most patient-- even with modern therapies. Some specific questions that can be address by studying the single-cell genomics of metastatic disease include how the metastatic cells are able to adapt to unique environment outside of the primary tumor, what variants enable cells from specific tumor types to thrive in a limited number of tissues, and how does the evolution of metastatic cells differ in distinct metastatic lesions in the same organ [41]. These types of studies are likely to uncover new insights into metastatic tumor biology that can inform prognostic biomarkers, as well as new therapeutic targets. In addition, sampling of many of these lesions are done using needle aspiration techniques, making single-cell interrogations a practical approach [38]. Finally, sequencing of circulating tumor cells may serve as a complementary, and potentially more sensitive, strategy for noninvasively characterizing the genomics of metastatic disease.
Still, to achieve that vision of creating single-cell based cancer diagnostics, a number of important technical and practical challenges need to be overcome. First, the technologies need to move from research tools to diagnostic tests, which will require effort to standardize and protocolize interrogations between investigators. This includes the development of best practices for computational methods, as well as validated strategies for interpreting results. Clinical trials will then need to be carefully designed to evaluate the capacity of this new information to guide patients to less toxic and more efficacious treatment strategies, including the development of companion diagnostics for specific drugs.
Single-cell cancer diagnostics hold great promise if we can overcome the challenges required to systemically integrate these tools into clinical care. The last two decades of cancer biotechnology and pharmaceutical development have gone farther and faster than most thought was possible [42]. The next several decades aim for an even deeper transformation of our understanding of these deadly diseases. Bringing our knowledge about cancer to true cellular resolution may fill in the knowledge gaps required to transform additional malignant disorders from life-altering and ending maladies to preventable and controllable aging disorders.
Abbreviation Table
- TCGA
The Cancer Genome Atlas
- ICGC
International Cancer Genome Consortium
- TARGET
Therapeutically Applicable Research to Generate Effective Treatments
- PCR
Polymerase Chain Reaction
- scRNA-seq
Single Ceil RNA Sequencing
- MDA
Multiple Displacement Amplification
- ATAC-Seq
Assay for Transposase-Accessible Chromatin with hiqh-throuqhput Sequencing
Footnotes
Publisher's Disclaimer: This AM is a PDF file of the manuscript accepted for publication after peer review, when applicable, but does not reflect post-acceptance improvements, or any corrections. Use of this AM is subject to the publisher’s embargo period and AM terms of use. Under no circumstances may this AM be shared or distributed under a Creative Commons or other form of open access license, nor may it be reformatted or enhanced, whether by the Author or third parties. See here for Springer Nature’s terms of use for AM versions of subscription articles: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms
Reference
- 1.Lander ES, et al. , Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860–921. [DOI] [PubMed] [Google Scholar]
- 2.Tucker T, Marra M, and Friedman JM, Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet, 2009. 85(2): p. 142–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Consortium I.T. P.-C. A. o. W. G., Pan-cancer analysis of whole genomes. Nature, 2020. 578(7793): p. 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ma X, et al. , Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature, 2018. 555(7696): p. 371–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Presley CJ, et al. , Association of Broad-Based Genomic Sequencing With Survival Among Patients With Advanced Non-Small Cell Lung Cancer in the Community Oncology Setting. JAMA, 2018. 320(5): p. 469–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rosenthal R, et al. , Deciphering genetic intratumor heterogeneity and its impact on cancer evolution. Annual Review of Cancer Biology, 2017. 1: p. 223–240. [Google Scholar]
- 7.Dagogo-Jack I and Shaw AT, Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol, 2018. 15(2): p. 81–94. [DOI] [PubMed] [Google Scholar]
- 8.Lee D, Park Y, and Kim S, Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Briefings in bioinformatics, 2021. 22(3): p. bbaa188. [DOI] [PubMed] [Google Scholar]
- 9.Zhang Q, et al. , Integrated multiomic analysis reveals comprehensive tumour heterogeneity and novel immunophenotypic classification in hepatocellular carcinomas. Gut, 2019. 68(11). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu J, Dang H, and Wang XW, The significance of intertumor and intratumor heterogeneity in liver cancer. Experimental & molecular medicine, 2018. 50(1): p. e416–e416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Stuart T and Satija R, Integrative single-cell analysis.Nat Rev Genet, 2019. 20(5): p. 257–272. [DOI] [PubMed] [Google Scholar]
- 12.Linnarsson S and Teichmann SA, Single-cell genomics: coming of age. 2016, Springer. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gawad C, Koh W, and Quake SR, Single-cell genome sequencing: current state of the science. Nat Rev Genet, 2016. 17(3): p. 175–88. [DOI] [PubMed] [Google Scholar]
- 14.Luquette LJ, et al. , Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance. Nature communications, 2019. 10(1): p. 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sokolenko AP and Imyanitov EN, Molecular diagnostics in clinical oncology. Frontiers in molecular biosciences, 2018. 5: p. 76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tang X, et al. , The single-cell sequencing: new developments and medical applications. Cell & bioscience, 2019. 9(1): p. 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Klein AM, et al. , Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 2015. 161(5): p. 1187–1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Macosko EZ, et al. , Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015. 161(5): p. 1202–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Picelli S, et al. , Tn5transposase and tagmentation procedures for massively scaled sequencing projects. Genome research, 2014. 24(12): p. 2033–2040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pellegrino M, et al. , High-throughput single-cell DNA sequencing of acute myeloid leukemia tumors with droplet microfluidics. Genome research, 2018. 28(9): p. 1345–1352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Huang L, et al. , Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications. Annu Rev Genomics Hum Genet, 2015. 16: p. 79–102. [DOI] [PubMed] [Google Scholar]
- 22.Dean FB, et al. , Comprehensive human genome amplification using multiple displacement amplification. Proceedings of the National Academy of Sciences, 2002.99(8): p. 5261–5266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Long N, et al. , Recent advances and application in whole-genome multiple displacement amplification. Quantitative Biology, 2020: p. 1–16.32219006 [Google Scholar]
- 24.Gonzalez-Pena V, et al. , Accurate genomic variant detection in single cells wit hprimar ytemplate-directe damplification. Proceedings o fth eNational Academ yof Sciences,2021. 118(24). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Allis CDand Jenuwein T, The molecular hallmarks of epigenetic control. Nature Reviews Genetics,2016. 17 (8):p. 487–500. [DOI] [PubMed] [Google Scholar]
- 26.Virani S,et al. , Cancer epigenetics: a brief review. Ilar Journal, 2012. 53(3–4):p. 359–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Buenrostro JD, et al. , ATAC-seq: a method for assaying chromatin accessibility genome-wide. Current protocols in molecular biology,2015. 109(1):p.21.29.1–21.29.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Li Y and Tollefsbol TO, DNA methylation detection: bisulfite genomic sequencing analysis, in Epigenetics Protocols. 2011, Springer. p.11–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kashima Y,et al. , Single-cell sequencing techniques from individual to multiomics analyses. Exp Mol Med,2020. 52(9):p. 1419–1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Robinson WH, Sequencing the functional antibody repertoire—diagnostic and therapeutic discovery. Nature Reviews Rheumatology, 2015. 11(3):p. 171–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Van der Auwera GA and O’Connor BD, Genomics in the Cloud: Using Docker, GATK, and WDL in Terra 2020: O’Reilly Media. [Google Scholar]
- 32.Hwang B, Lee JH, and Bang D, Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med, 2018. 50(8):p. 96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Laird PW, Principles and challenges of genome-wide DNA methylation analysis. Nature Reviews Genetics, 2010.11(3): p.191–203. [DOI] [PubMed] [Google Scholar]
- 34.Guo H, et al. , Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome research, 2013. 23(12):p. 2126–2135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Melchor L, et al. , Single-cell genetic analysis reveals the composition of initiating clones and phylogenetic patterns of branching and parallel evolution in myeloma. Leukemia, 2014. 28(8):p. 1705–1715. [DOI] [PubMed] [Google Scholar]
- 36.Zugazagoitia J, et al. , Current challenges in cancer treatment. Clinical therapeutics, 2016. 38(7): p.1551–1566. [DOI] [PubMed] [Google Scholar]
- 37.Waldman AD, Fritz JM, and Lenardo MJ, A guide to cancer immunotherapy: from T cell basic science to clinical practice. Nature Reviews Immunology, 2020. 20(11): p. 651–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Casasent AK,et al. , Multiclonal Invasion in Breast Tumors Identified by Topographic Single Cell Sequencing. Cell, 2018. 172(1–2): p. 205–217 e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Waylen LN, et al. , From whole-mount to single-cell spatial assessment of gene expression in 3D. Communications biology, 2020. 3(1): p. 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gawad C, Koh W, and Quake SR, Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics. Proc Natl Acad Sci U S A, 2014. 111(50): p. 17947–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lawson DA, et al. , Tumour heterogeneity and metastasis at single-cell resolution. Nature cell biology, 2018. 20(12): p. 1349–1360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Metzker ML, Sequencing technologies—the next generation. Nature reviews genetics, 2010. 11(1): p. 31–46. [DOI] [PubMed] [Google Scholar]


