Abstract
Longitudinal monitoring of patients with advanced cancers is crucial to evaluate both disease burden and treatment response. Current liquid biopsy approaches mostly rely on the detection of DNA-based biomarkers. However, plasma RNA analysis can unleash tremendous opportunities for tumor state interrogation and molecular subtyping. Through the application of deep learning algorithms to the deconvolved transcriptomes of RNA within plasma extracellular vesicles (evRNA), we successfully predicted consensus molecular subtypes in metastatic colorectal cancer patients. Analysis of plasma evRNA also enabled monitoring of changes in transcriptomic subtype under treatment selection pressure and identification of molecular pathways associated with recurrence. This approach also revealed expressed gene fusions and neoepitopes from evRNA. These results demonstrate the feasibility of using transcriptomic-based liquid biopsy platforms for precision oncology approaches, spanning from the longitudinal monitoring of tumor subtype changes to identification of expressed fusions and neoantigens as cancer-specific therapeutic targets, sans the need for tissue-based sampling.
Introduction
Liquid biopsies can detect cancer-derived materials without the need to perform an invasive tissue biopsy. Most liquid biopsy assays designed for therapeutic assignment and response monitoring in cancer are currently based on circulating cell-free DNA (cfDNA), such as the detection of mutations, fragmentation patterns or methylation signatures (1–4). Recent studies have made initial forays into inferring tumor transcriptomes indirectly via the use of fragmented cfDNA profiles in plasma samples, and identifying pertinent biological pathways based on the anatomical location of the primary cancer (5, 6). Nonetheless, the direct interrogation of tumor transcriptomes in plasma could enable significant insights into the cancer molecular landscape beyond what genomic alterations alone would provide. The comprehensive assessment of cancer-derived RNA in plasma has been challenging due to the presence of circulating ribonucleases, which cleave coding RNAs, confounding their assessment with next generation sequencing (NGS) platforms, and restricting most studies of this nature to the non-coding transcriptome (7, 8). A second major challenge of a circulating RNA-based approach is the reliable separation of tumor-derived transcriptomic signatures from the “background” RNA released by non-neoplastic cell types. Among liquid biopsy substrates, extracellular vesicles (EVs) represent a facile compartment for querying tumor transcriptomes as they are actively released from cancer cells and carry RNA cargoes (evRNA) that are shielded from enzymatic degradation (9). While we and others have previously demonstrated the ability to map the DNA landscapes of advanced cancers reliably with EVs (10, 11), the feasible extrapolation of this to cancer-derived evRNA has not been documented.
In this paper, we describe our pipeline for isolating, sequencing, and interrogating the whole transcriptomes of advanced colorectal cancer (CRC) using plasma EVs. Notably, the use of evRNA cargo overcomes the current limitation of sequencing extensively fragmented RNAs in circulation, while the application of machine learning facilitates deconvolution of cancer-specific transcriptome data from the cumulative “pool” of plasma EVs. We demonstrate our ability to assign current widely accepted RNA-based tumor subtype signatures typically obtained via tissue samples – namely, Consensus Molecular Subtypes (CMS) for CRC in baseline plasma samples (12). We further demonstrate the ability of our pipeline to longitudinally monitor changes in evRNA-based CMS subtypes, cancer-associated pathways, immune cell types, and differential gene expression under treatment selection, which was clinically associated with disease progression. While such “subtype switching” as a basis for treatment resistance is well established (13), it is based entirely on serial sampling of tumor tissue, a requirement that can now be precluded with longitudinal monitoring of evRNAs. Finally, our evRNA-based profiling identifies expressed gene fusions and neoantigens in the circulation of CRC patients, which has significant implications for monitoring those on targeted and immunotherapies. The pipeline described herein should be applicable across cancer types and provide an avenue for leveraging the full potential of mapping tumor transcriptomes and their evolution over time, sans the requirement for repeated tissue biopsies.
Materials and Methods
Cell lines and culturing
Ten established CRC cell lines, LOVO (CMS1), COLO201 (CMS1), LS1034 (CMS2), SNUC1 (CMS2), HT29 (CMS3), LS513 (CMS3), WiDr (CMS3), HCT116 (CMS4), LS123 (CMS4), SW480 (CMS4) from primary or metastatic CRC tissue were used in this study. A CRC neoplastic cell line, HCC2998, with mixed molecular subtypes and 481CoN, a normal colon cell line, were used as controls. All cell lines were maintained either in RPMI-1640 or DMEM medium with 10% FBS, 100 units/mL penicillin and 100 μg/mL streptomycin (Gibco, Life Technologies, Carlsbad, CA, USA) at 37°C and 5% CO2 in a humidified incubator. Before EV isolation, cell lines were cultured in HYPERflasks (Corning Inc, Corning, NY, USA) in medium with 10% exosome-free FBS for at least 48 hours and conditioned media was then harvested. Next, 500 mL of media per cell line was centrifuged serially at 1000 rpm for 10 minutes and 5000 rpm for 5 minutes at 4°C to remove all cellular debris. The media was then passed through a 400-nm filter to remove any remaining debris and transferred to 50-mL tubes. The media was then ultra-centrifuged at 36,000 rpm at 4°C overnight. The resulting pellet was resuspended in PBS with a subsequent ultra-centrifugation for 2–4 hours at 36,000 rpm. The EV pellet was resuspended in 600 uL PBS. All cell lines, except for HCT116 and HT19, were obtained from ATCC. These two cell lines were acquired from a collaborator at MDACC. All cell lines were submitted for cell authentication and mycoplasma testing was routinely performed.
Patient samples
One hundred and fifty-five plasma samples from 71 patients; 42 for the baseline cohort (cohort cancer 1), 36 for the longitudinal patients (cohorts 2 and 3, including 7 patients from the baseline cohort), with histologically confirmed CRC who were diagnosed and treated at the University of Texas MD Anderson Cancer Center (UTMDACC) between 8/8/2014 and 8/29/2019 were enrolled for this study. Written informed consent was obtained following protocol LAB10-0982 in accordance with standard ethical guidelines approved by the institutional review board (IRB) and the Declaration of Helsinki. Blood samples were collected preoperatively on the day of surgery, tissue samples were collected at the time of surgical resection. Blood samples from seventy healthy individuals were collected (matched based on age/gender). Disease responses were measure according to RECIST 1.1 criteria (14).
Next-generation sequencing library construction for tumor tissues
Following RNA extraction, all tissue samples were processed using the KAPA Stranded RNA-Seq kit with RiboErase (HMR) (KAPA, Roche, cat. # 08098140702). Libraries were constructed according to the manufacturer protocol and were then quantified using an Agilent 2100 Tapestation. Libraries were then sequenced using 150-bp paired-end runs on a NextSeq 500 sequencer (Illumina, San Diego, CA, USA).
EV Isolation from human plasma samples
Three 8ml EDTA Vacutainer tubes (BD) of blood were collected from each case. Plasma was separated by centrifuging blood samples at 2500 rpm for 10 minutes at room temperature (RT), followed by an additional 5-minute 5000 rpm centrifugation at 4°C to remove cellular debris. The plasma was ultracentrifugated overnight at 36,000 rpm. The EV pellet was then washed with PBS and centrifuged at 36,000 rpm for 2–4 hours. The supernatant was then discarded, and the EV pellets were resuspended in 600ul PBS for long term storage at −80°C.
Exosomes Characterization
EVs isolated using serial ultracentrifugation were characterized with transmission electron microscopy (TEM, JEOL1230) as previously described(15). The morphology of the isolated EVs was assessed by cryogenic transmission electron microscopy (cryo-TEM, JEOL3200), as described before. The size of the isolated EVs was assessed using ZetaView Nanoparticle-tracking analysis (NTA) equipment (Particle Metrix, Meerbusch, Germany), as previously described (16). In summary, the exosomes were appropriately diluted with PBS to achieve a final volume of 1 mL, with optimal EV concentrations determined through preliminary testing.
RNA extraction from EVs
Around 200 μL of resuspended EV were used for RNA extraction using the total exosome RNA & Protein Isolation kit (Invitrogen, cat. # 4478545) which allowed us to isolate RNA and eliminate DNA contamination, following the manufacturer’s protocol. Small RNA species such as miRNAs were removed during the washing steps. However, our preliminary findings indicated that a percentage of the sequenced reads were mapping to intronic and intergenic regions after sequencing patient plasma samples. This was particularly noticeable in samples with a significant quantity of plasmatic cell-free DNA (cfDNA). Therefore, contaminating DNA was removed using DNase I (Invitrogen, cat. #18047019).
Globin and ribosomal RNA depletion
Plasma evRNA samples often have significant levels of both globin RNA (gRNA) and ribosomal RNA (rRNA) contamination. The elimination of RNA from globin genes and rRNA significantly improve the read coverage mapped to reference transcripts. To select an optimal approach to efficiently remove the abundant RNA from plasma EV RNA samples, we first compared the effectiveness of three depletion kits; Ribo-Zero (Illumina), GLOBINclear (Invitrogen), and riboPool (siTOOLs Biotech), according to the manufacturer’s protocol. Prior to depletion, all EV RNA samples underwent DNaseI treatment. Two of these methods, Ribo-Zero and riboPools, were designed to deplete both gRNA and rRNA. riboPOOLs gRNA/rRNA depletion kit (siTOOLsBiotech, cat. # dp-K096-001002) was chosen as the optimal method for eliminating high-abundance human blood globin and ribosomal transcripts.
cDNA synthesis and next generation sequencing
All depleted evRNAs were mixed with 1.8 X Ampure RNA Clean XP beads (Beckman Coulter, IN, USA) for the purification and eluted with 25 μl of water. cDNA amplification was performed using the SMART-Seq® v4 Ultra® low input RNA kit (Takara, cat. # 634891) with sixteen cycles of PCR. cDNA samples were then diluted to 0.2–0.3 ng per uL and 1 ng of cDNA was carried over into library preparation using the Nextera XT DNA Library Preparation Kit (Illumina, cat. # FC-131-1096) following the manufacturer’s protocol. Libraries were quantified with an Agilent 2100 Tapestation and sequenced in a NextSeq 500 sequencer.
Creation of custom gene set for CMS classification of CRC evRNA
EVs were isolated from 10 well-characterized cell lines representing distinct molecular subtypes, and an additional cell line with a mixed CMS status. RNA was extracted from both the cells (cellRNA) and their EVs (evRNA), followed by the construction of libraries and subsequent sequencing. CMScaller was then used to classify molecular subtypes for both cellRNA and evRNA samples. The use of the CMS gene set provided in the ‘CMScaller’ package led to inaccurate classification in two of ten cell line EV samples (data not shown). In order to improve the performance of nearest template prediction for EV, the preexisting gene list was subset by taking the genes that are highly expressed in the cells and EV of each CMS group. For cell lines and EV of each CMS group, differentially expressed genes compared to cell lines and EV of other CMS groups were obtained using the ‘limma’ package (17). Only those genes in the ‘CMScaller’ gene set that were upregulated in cell lines and EV of each CMS group with fold change of 2 or greater and adjusted P-value < 0.05 were included in the new gene set. We conducted additional validation of the custom CMScaller panel using The Cancer Genome Atlas (TCGA) RNA-seq dataset. Subsequently, we used the custom CMScaller panel to assign molecular subtypes to both cellRNA and evRNA from CRC cell lines (Table S1).
In vitro cellRNA mixing experiments
We first evaluated the correlation between the observed and expected cancer proportions. We extracted RNA from CRC cell lines, LOVO (CMS1), LS1034 (CMS2), HT29 (CMS3), HCT116 (CMS4), and a normal colon cell line, 481CoN, as previously described. Cell cultures and sequencing were conducted in triplicate. The extracted RNA from each CRC cell line was quantified and combined with RNA from 481CoN at various ratios, namely 0.1%, 1%, 10%, and 50%, to create cancer-derived RNA mixtures. RNA mixtures were then sequenced and used for subsequent analyses. CRC cellRNA samples (CRC cell lines, representing 100% cancer) and normal colon cellRNA samples (CoN841 cell line, representing 100% normal) were utilized to create a reference signature matrix using CIBERSORTx. This matrix is composed of genes that are enriched in their respective groups of interest, cancer and normal. Subsequently, the signature matrix was used to deconvolve cellRNA mixtures, generating normal and cancer-specific gene expression profiles using CIBERSORTx and CODEFACS. Finally, the resultant cancer-specific gene expression profiles were classified using CMScaller (Figure S5a).
Artificial mixing experiments
Synthetic mixtures from CRC TCGA RNA sequencing data were generated for performance assessment of deconvolution methods. To generate synthetic mixture, the average read counts from normal plasma evRNA samples were computationally spiked in with RNA reads from CRC TCGA data using predefined vectors to obtain mixtures with different cancer portion ratios ranging from 0.01% to 10%. CIBERSORTx’ signature matrix was generated by using the reference matrix with CRC TCGA data defined as 100% cancer, and normal plasma evRNA data as 100% normal. Multiple linear regression was used to assess the linear relationship between the observed and expected cancer levels. We then assigned each deconvoluted expression profile generated by CIBERSORTx and a recently developed tool, named the COnfident DEconvolution For All Cell Subsets (CODEFACS) to a molecular subtype by using pure TCGA expression data as the reference for CMScaller (18).
RNA-Seq data analysis, deconvolution and molecular classification
QC was performed on FASTQ files using MultiQC. Raw reads were aligned to the human genome reference (hg19) using STAR. Deep cancer subtype classification (DeepCC) [21] was used to classify the transcriptomic profiles of tumor tissues. In summary, we trained a DeepCC classifier using a TCGA CRC data set (n = 456) and a multilayer artificial neural network (ANN) as previously described by Gao et al (19). We then obtained the functional spectra from tumor expression profiles and used them as input to predict molecular subtypes for tumor samples.
To deconvolve liquid biopsy samples, we initially developed a signature matrix by utilizing the expression profiles obtained from tumor tissue RNA sequencing data and normal plasma evRNA samples. This signature matrix was then employed to deconvolve CRC evRNA, enabling the generation of non-tumor and tumor-derived gene expression profiles by CIBERSORTx (20) and CODEFACS (18). We used the cancer-specific compartment of the deconvolved transcriptomic profiles to predict the CMS class by ANN-based classification. CRC TCGA RNA sequencing data was used to train a classifier as described for the tumor tissue CMS classification. We then assessed the correlation between observed cancer proportions and molecular subtypes concordance between liquid biopsy and tissue samples.
For the assessment of cancer evRNA abundance and to compare cancer evRNA levels between CRC and normal plasma samples, age/gender matched samples from healthy individuals randomly divided into two groups. Half of normal samples were used as the predefined normal group to create the signature matrix for CIBERSORTx. For the accurate assessment of cancer evRNA levels, all CRC baseline plasma samples were compared with the remaining half of normal samples (control group).
Immune cell subtyping
We utilized the MCP-counter method to quantify the absolute abundance of various immune cell subsets. The input for MCP-counter was provided by using Deseq2 log-normalized data (without deconvolution) (21).
Single sample GSEA analysis (ssGSEA)
ssGSEA was used to calculate distinct enrichment scores for each combination of a sample and gene set. For ssGSEA analysis, we utilized Deseq2 log-normalized tumor-derived RNAseq data (after the deconvolution) as the input. Each ssGSEA enrichment score reflects the extent to which genes within a specific gene set are coordinately up- or down-regulated within a given liquid biopsy sample (22).
Digital Droplet PCR Analysis
ddPCR (QX200; BioRad) was used for the sensitive detection of circulating tumor DNA (ctDNA) in liquid biopsy samples with matched tumor DNA positive for one of G12/G13 KRAS mutations (G12V, G12D, G12R, G12C, G12S, G12A, and G13D mutant codons), as previously outlined (16). DNA from SW480 cell line (KRAS G12V mutation) and 841-CoN cell line were used as positive and negative controls, respectively.
Gene fusions and neoantigens prediction
Gene fusions were analyzed from tumor tissue samples, and plasma liquid biopsies from both mCRC patients and healthy controls. Three different methods were used to detect gene fusions and settings criteria was kept as: a) Arriba (23), confidence of medium or high, discordant_mates ≥1, split reads_reads ≥2, b) Pizzly (24), paircount ≥1 and split count ≥2 and c) FusionCatcher (25), spanning_pairs ≥1 and spanning_unique_reads ≥2. Gene fusions present in healthy controls were removed from the analysis. Neoantigens were detected by NeoFuse (26) and only cases with binding affinity < 100nM were reported (Table S2).
Neoepitopes derived from INDELs and Exitrons
INDELs and exitrons were analyzed from 42 tissue samples, matched liquid biopsies and healthy controls. INDELs were called by TransIndel (27) followed by neoepitopes analysis by ScanNeo (28) and calls present in healthy controls were removed. Exitrons were called by ScanExitron (29) and ScanNeo further predicted neoantigens (Table S3).
Statistical analysis
Statistical analysis was performed using GraphPad Prism software version 9.0 for Mac (GraphPad Software, San Diego, CA, USA). The statistical differences between the groups were assessed using a Paired Student’s t-test. Statistical significance was defined as P-values less than 0.05. The data is presented as mean values with 95% confidence intervals (CIs).
Data availability
The data generated in this study are publicly available in Gene Expression Omnibus (GEO) at GSE255775. The data analyzed in this study were obtained from https://gdac.broadinstitute.org. Specifically, COAD (Colon adenocarcinoma) and READ (Rectum adenocarcinoma) were used. All other raw data generated in this study are available upon request from the corresponding author.
Results
Characterization of EVs isolated from CRC cell lines and plasma
The morphology of EVs isolated from CRC cell lines was examined by both TEM and cryo-TEM. Based on TEM results, majority of EVs were identified as cup-shaped vesicles surrounded by a lipid membrane (Figure S1a). cryo-TEM results showed that EVs were highly heterogeneous in shape and could be identified as single-membrane (Figure S1b, I), double-membrane (Figure 1Sb, I–II), multilayer complex vesicles (Figure S1b, III–VI), small vesicles (Figure 1Sb, VII), and vesicles with broken membrane (Figure S1b, VIII). The size distribution of EVs derived from CRC cell lines and plasma samples was determined using NTA, as depicted in Figure 1Sc.
Elimination of library preparation contaminants from evRNA samples
In order to improve our transcriptomic readout quality, we performed DNA, globin mRNA, and ribosomal RNA removal prior to library preparation. We implemented an additional DNase treatment step after RNA isolation, resulting in an improvement in the proportion of coding reads within the RNA sequencing data from CRC plasma evRNA samples (Figure S2a). Furthermore, our early results revealed that a portion of the sequenced RNA originated from highly abundant rRNA and gRNA, with the latter being released by erythrocytes in liquid biopsy samples. This presence of gRNA led to a significant reduction in the sensitivity of detecting cancer-related evRNA transcriptomes, particularly in plasma samples with extended processing times. To enhance transcript detection sensitivity for plasma evRNA, we explored and compared three methods; Ribo-Zero (Illumina), GLOBINclear (Invitrogen), and riboPool (siTOOLs Biotech), for preparing gRNA/rRNA-depleted sequencing libraries from plasma samples of three CRC patients. Ribo-Zero resulted in a significant reduction in cDNA concentrations, rendering it less suitable for low-quantity and low-quality plasma evRNA samples. GLOBINclear, while effective in removing gRNA, was not selected due to its inability to remove rRNA. The preferred method for our subsequent experiments was riboPools, as it efficiently reduced the abundance of both gRNA and rRNA reads from plasma EVs and provided adequate cDNA for library preparation (Figure S2b).
Comparison between evRNA and supernatant RNA in CRC plasma samples
We used ultracentrifugation (UC) as a method to isolate EVs and to eliminate supernatant RNA (non-EV RNA) from liquid biopsy samples. This step was crucial to ensure the enrichment of cancer-associated transcriptome. We further sought to discern the differences between evRNA and supernatant RNA by sequencing RNA from both compartments, in five CRC patients, at the time of recurrence. Due to the absence of counts from several genes in the supernatant RNA sequencing (RNA-seq) data, we selected genes with reads greater than zero in at least 4 out of 5 samples in either the evRNA or non-EV RNA groups. This approach allowed us to identify 675 genes highly expressed in evRNA and 362 genes highly expressed in the non-EV RNA group. Top 15 significantly expressed genes from each group is shown in Figure S2c. Subsequently, we focused on genes significantly associated with CRC initiation and progression. A number of these genes exhibited high expression levels in the evRNA samples, yet had zero reads in the supernatant RNA, suggesting that non-EV RNA is not an essential source in our liquid biopsy pipeline (Figure S2c).
EVs transcriptomic profiles recapitulate the transcriptomic profiles of the cells of origin in vitro
It is well established that cancer cell-derived EVs contain mRNA cargo(30–32). However, the correlation between cellular RNA and the derivative evRNA with respect to transcriptome-based cancer subtyping has not been explored. To address this question, we performed RNA-seq on total cellular RNA (cellRNA) and evRNA isolated from different CRC cell lines, which were previously classified according to the four-tier CMS classification system (CMS1–4, respectively) (33, 34). We first applied CMScaller, a CMS classifier optimized for classification using both cell line and tumor tissue RNA samples, to determine the correlation between cellRNA and evRNA (35, 36). Using the gene set provided in the CMScaller package, we observed only partial concordance between cellRNA and evRNA (data not shown). Therefore, to improve the performance of molecular subtyping for EVs, a customized panel of CMS template genes was designed (Table S1). This “custom CMScaller” panel was developed by selecting highly expressed CMS genes present in both cells and their corresponding EVs within each CMS group (Figure S3a, Table S1). We further validated the customized gene panel using 647 available CRC TCGA RNA seq data. The CMS concordance rate between the original CMScaller gene sets and our customized gene panel was 90% (Figure S3b). Utilizing the custom CMScaller panel, the assignment of CMS subtypes was identical between matched cellRNA and evRNA for all cell lines (Figure S3c). Thus, for further assessment of the deconvolution methods as described below, the custom CMScaller gene set was used for the CMS classification.
evRNA is a viable choice for pathway enrichment analysis
Given the subtle differences observed in transcriptomic profiles between evRNA and cellRNA samples, we conducted ssGSEA to establish whether pathways dysregulated in CRC cellRNA samples, compared to the normal colon cell line (CoN841), are similarly upregulated in CRC evRNA samples (22). Correlation studies revealed consistent dysregulation of pathways in both CRC cellRNA and evRNA samples (Figure S4a). Furthermore, the ssGSEA scores demonstrated that the pattern of dysregulation observed in CRC evRNA samples aligns with that of CRC cellRNA samples when compared to normal cellRNA samples (Figure S4b).
The deconvolution pipeline identifies tumor-specific evRNA in vitro
Plasma samples contain both tumor and non-tumor derived EVs mixed in unknown proportions. To overcome this intrinsic limitation of the evRNA analyte and to enrich for tumor-derived evRNA, we designed a series of experiments to bioinformatically deconvolve evRNA into tumor and non-tumor-derived transcripts and then classify the tumor-derived transcripts (19) (Figure S5). As a proof of concept, we applied a computational deconvolution pipeline on in vitro cellRNA mixtures containing known proportions of cancer and normal RNAs to infer cancer-derived RNA proportions and test whether the deconvoluted cancer-specific transcriptome can be used for the downstream analysis (Figure S5a). For in vitro cellRNA mixtures, we mixed extracted RNA from a CRC cell line (one cell line per CMS) with RNA from a normal colon cell line (841-CoN) at different ratios ranging from 0.1% to 50% cancer-derived RNA (Figure S5b). Using CIBERSORTx, we generated a signature matrix consisting of genes that can discriminate each compartment of interest (cancer and non-cancer) to impute cancer proportion and expression signatures (20, 37) (Figure S5a). The signature matrix and imputed proportions generated by CIBERSORTx along with cellRNA mixtures were used as input for CODEFACS. After data deconvolution, we classified the deconvolved cancer-specific gene expression using the custom CMScaller, and we then compared the results from the same cell mixtures that were not subject to deconvolution (Figure S5b). We were able to accurately predict the molecular subtype of cancer cell RNA in the cellRNA mixtures for all samples (Figure S5b). Mixtures containing as low as 1% cancer cell RNA could be confidently classified into the correct CMS after deconvolution (Figure S5b).
The deconvolution pipeline effectively screens out non-cancer pathways
After applying the computational deconvolution pipeline to in vitro cellRNA mixtures and segregating evRNA into tumor and non-tumor derived transcripts, as previously described, we proceeded to evaluate the accuracy of utilizing tumor-derived transcripts in predicting cancer-associated pathways. This assessment was carried out by examining ssGSEA scores across all samples before and after the deconvolution process (Figures S6). To begin, we compared pathway intensities between cancer and normal cellRNA samples, organizing the pathways based on their ssGSEA scores (Figures S6a). Without the deconvolution within the cellRNA mixtures, only samples containing 50% cancer-derived RNA displayed enrichment for cancer-associated pathways (Figures S6b). Remarkably, our results demonstrate that the deconvolution process effectively eliminated non-tumor derived transcripts associated with pathways predominantly upregulated in 841-CoN from the mixture even in samples with as little as 0.1% cancer cell RNA (Figures S6c).
Subsequent validation of the deconvolution pipeline using synthetic mixtures
An in silico mixing experiment was then performed to assess the accuracy of the CMS classifier on a larger deconvolved transcriptomic dataset. First, CRC tumor bulk RNA data from CRC TCGA data were mixed in silico with bulk RNA-seq data derived from plasma EVs of healthy individuals. To generate synthetic mixtures, the average read counts from normal plasma evRNA samples were computationally spiked in with RNA reads from CRC TCGA data using predefined vectors to obtain mixtures with different cancer portion ratios ranging from 0.01% to 10%. Deconvolution was performed with CIBERSORTx and CODEFACS. The performance of CIBERSORTx was high in predicting the proportion of cancer-derived RNA (Figures S7a, P value <0.0001). Next, the custom CMS classifier was applied to the deconvolved cancer specific expression, which showed that the assigned CMS subtype of the original TCGA sample could be successfully predicted after deconvolution (Figures S7b). Mixture samples with lower proportions of cancer RNA were more likely to have discordant subtypes between the original sample and the deconvolved mixture (Figure S7c). Compared to CIBERSORTx, CODEFACS showed better performance in predicting CMS4 whereas the performance on assigning CMS1–3 was comparable (Figure S7b). In addition, it lowered the proportion of discordant prediction of molecular subtypes to 1.2% compared to 2% for CIBERSORTx (Figure S7c).
The deconvolution pipeline for patient-derived liquid biopsies
After establishing the pipeline based on in vitro and in silico studies, we aimed to analyze the feasibility of our approach in patient-derived liquid biopsies. We first utilized the deconvolution pipeline to determine the tumor-specific evRNA percentage. Moreover, it was used to deconvolve tumor-derived transcripts for CMS classification and ssGSEA-based pathway analysis (Figure 1a). In addition, we utilized MCP-counter(21) on bulk RNA data from liquid biopsy samples to investigate immune subtypes. We divided our samples into three CRC cohorts (Figure 1). First, we utilized 42 baseline liquid biopsies from mCRC patients with paired tissue (Cohort 1, Figure 1b and 2, Table S4). Second, we applied our approach to longitudinal plasma evRNA samples from 28 patients, 16 with recurrence and 12 with no recurrence during the time evaluated (Cohort 2). Four out of sixteen recurrence patients with multiple post-surgery draws were selected for the longitudinal monitoring of molecular subtypes (Cohort 2a, Figure 3a–d). A set of draws from the remaining twelve recurrence patients with no evidence of disease (NED), prior to recurrence, were selected and subsequently compared to draws obtained from patients who did not experience recurrence throughout the study (Cohort 2b, Figure 4a–c). In addition, we investigated molecular alterations in 8 patients who initially displayed stable disease after surgery, did not respond to treatment, and experienced early progression (Cohort 3, Figure 4d).
Figure 1. A visual overview of workflow in this study.

a, RNA-seq was performed on tumor samples and plasma EV of cancer patients as well as plasma EV of healthy controls. For deconvolution with CIBERSORTx and CODEFACS, a signature matrix was created using genes that are enriched in the tumor samples and in the plasma EV of healthy controls. Deconvolution could impute the proportion of cancer present in bulk plasma evRNA, which also allowed for generation of ROC curve based on whether cancer RNA was present in the sample. Deconvolved plasma EV profiles were also used in gene set enrichment analysis (GSEA) and artificial neural network to assign the CMS groups, which were then compared to the subtypes of the tumor samples. b, Cohorts of CRC patients analyzed in this study. In the baseline cohort, molecular subtypes of tumor tissues and their matched plasma evRNA samples were compared in addition to relevant transcriptomic pathways. In the longitudinal cohort, molecular subtype switch and emerging changes at the gene and pathway level were evaluated and compared, at each serial point, in patients with recurrence, without recurrence, and those with stable disease.
Figure 2. Transcriptomic subtyping for solid tumor tissues and their paired liquid biopsy samples.

a, Summary of transcriptomic subtyping of tumor tissues along with tumor MCP-counter results. Each column represents a patient, arranged by tumor purity from highest on the left to lowest on the right. b, Summary of molecular subtyping for deconvoluted liquid biopsy samples and corresponding MCP-counter results. The top row represents the prediction for bulk plasma EV and the bottom row represents the prediction for deconvolved plasma EV. If the predicted subtype of the liquid biopsy is concordant with the tumor sample, the column is marked by black box. c, Sankey diagram depicting CMS classification for tumor (left) and paired liquid biopsy (right) using CODEFACS. d-h, Single sample GSEA analysis (ssGSEA) of deconvoluted liquid biopsy samples for some of major pathways elevated at each CMS.
Figure 3. Longitudinal monitoring of CMS changes in mCRC patients.

Timeline showing disease history of four mCRC patients, along with sequential ssGSEA and MCP-counter analyses. The tumor-derived transcripts, obtained post-deconvolution using CODEFACS, were utilized as input for molecular subtyping and ssGSEA analysis. Additionally, the evRNA data, without deconvolution, was employed for MCP-counter. a and b, in patients one and two, EVs-based CMS was determined at baseline, before and at the time of radiological recurrence for patient 1 or progression for patient 2. c, in patient three, EV-based CMS was determined at baseline and at four additional time points. d, in patient four, EV-based CMS was predicted at baseline and in six additional longitudinal draws. Disease responses according to RECIST 1.1 criteria are shown in the y axis (14). An upward segment between two time points represents recurrence/progressive disease (Rec/PD), a flat segment represents stable disease (SD) or non-evidence of disease (NED) and a downward segment represents complete response/partial response (CR/PR).
Figure 4. Longitudinal monitoring of genes and pathways changes in mCRC patients.

Representation of longitudinal timeline for 24 patients, 12 with no recurrence (a) and 12 with recurrence (b). Average cancer proportions for all patients at each time point is shown at the right of the timeline and mean ssGSEA scores for all cases is shown below. The imputed cancer fraction was estimated using CIBERSORTx. Subsequently, the liquid biopsy samples were deconvoluted using CODEFACS, and the resulting tumor-derived transcripts were utilized as input for ssGSEA analysis. c, Differential gene expression at NED2 for patients with no recurrence (yellow) and recurrence (red). Patients are shown as columns. d, Average ssGSEA scores for pathways that showed significant differences based on a paired t-test between the baseline draw and the progression time point for the stable disease cohort.
Imputed tumor-specific evRNA percentage in plasma
We first evaluated the tumor-specific evRNA percentage for all CRC plasma baseline draws at the time of surgery. In addition, liquid biopsy samples from healthy individuals were used as control. Following deconvolution with CIBERSORTx, the average imputed proportions of cancer-derived RNA were 2.08 and 0.36 for cancer and normal liquid biopsy samples, respectively. Receiver operating characteristic (ROC) curve for prediction of proportion of cancer-derived RNA showed an area under the curve (AUC) of 0.87 with specificity of 0.83 and sensitivity of 0.81 at imputed cancer proportion threshold of 0.88% (Figure S8a–b). These results suggest that our EVs RNA deconvolution pipeline can accurately detect the presence of cancer even if the proportion of cancer-derived transcripts in the circulating evRNA transcriptome is as low as 1%. To compare the sensitivity of ctDNA detection with evRNA, we performed droplet digital PCR (ddPCR) to identify KRAS G12/13 mutations in DNA extracted from 42 CRC tumor tissues. This approach selected 16 individuals with detectable mutations. (Figure S8c). Subsequently, we performed ddPCR on cfDNA obtained from the plasma of these 16 patients with confirmed KRAS G12/13 mutations in their tumor tissues. We then compared the KRAS G12/13 allele frequency results with the estimated proportion of cancer RNA for each patient (Figure S8d). Among these patients, seven tested negative for KRAS G12/13 mutations, while for eight samples, the cancer RNA percentage was below the optimal cancer RNA cutoff (Figure S8d).
Plasma evRNA can be used to predict tumor transcriptomic subtypes from liquid biopsy samples (Cohort 1)
To test if the deconvolution pipeline can be used to predict tumor molecular subtypes in liquid biopsies, we applied our evRNA deconvolution pipeline on the aforementioned 42 mCRC plasma samples with matching tissue (Figure 2). Samples were arranged according on the tumor purity (Figure 2a). We tested DeepCC, a previously described machine learning approach, to classify molecular subtypes. DeepCC offers the advantage of classifying molecular subtypes without the requirement of a predefined gene list (19). In addition, immune subtypes of tumor and liquid biopsy samples were assessed using MCP-counter (21). Overall, in the first cohort, 9.5% (4/42) of the patients’ tumors were classified as CMS1, 45.2% (19/42) as CMS2, 14.3% (6/42) as CMS3, and 30.9% (13/42) as CMS4 (Figure 2a). Of note, most tissue samples with tumor purity less than 10% were classified as CMS4 (Figure 2a). The monocyte/macrophage cell population was the predominant group in the tumor tissue samples. Overall MCP-counter scores exhibited a correlation with tumor purity, with most CSM4 tumor tissues displaying the highest MCP-counter overall scores (Figure 2a).
The bulk evRNA without the deconvolution (mixture) was used as a control in CMS prediction (Figure 2b). By applying the deconvolution pipeline on liquid biopsy samples, we accurately predict the CMS of the tumor of origin in 71% (30/42) of cases (Figure 2b–c). This concordance was significantly higher than the random concordance rate of 22%. Interestingly, the concordance between tumor tissue and liquid biopsy was as high as 81% (26/32) in patients with tumor purity higher than 10% (Figure 2b). In the liquid biopsy samples, neutrophil cells were the predominant population among all immune subtypes assessed by MCP-counter (Figure 2b). We then assigned molecular subtypes of tumor tissues to their matched liquid biopsy samples and performed ssGSEA on samples per group. We showed that CMS specific pathways are differentially regulated in liquid biopsy samples, as indicated by the average ssGSEA scores per group (Figure 2d–h). For instance, we observed high immune estimates in CMS1, elevated Wnt/β-catenin activity in CMS2, increased cancer stem cell (CSC) markers in CMS3, and upregulated transforming growth factor beta (TGFB) along with vascular endothelial growth factor/vascular endothelial growth factor receptor (VEGF/VEGFR) signaling in CMS4. (Figure 2d–h).
Plasma evRNA to track subtype evolution in longitudinal samples (Cohort 2a)
We then applied our evRNA deconvolution pipeline to a subset of 15 longitudinal samples collected from four mCRC patients from the recurrence cohort (Figure 3). In patient ID1, an evRNA liquid biopsy performed after neoadjuvant treatment for a single metachronous metastasis assigned the tumor to subtype CMS2. After liver resection, both CT scans and cfDNA liquid biopsies were negative. However, evRNA liquid biopsy was positive 3 months before radiological manifestation of disease recurrence and our algorithm initially assigned the recurrence to CMS2. At the time of radiological progression, however, a switch to CMS4 was observed, while cfDNA liquid biopsies continued to be negative (Figure 3a). At the time of recurrence, the ssGSEA score significantly increased for the VEGF/VEGFR pathway. This finding aligns with the notion that CMS4 is associated with prominent activation of angiogenesis (12) (Figure 3a). Furthermore, the decrease in the ssGSEA score for the P53 pathway at the time of recurrence may suggest TP53 inactivation, potentially leading to significant alterations in transcriptomic profiles and subtype switching (Figure 3a). Additionally, an elevated MCP-counter score for neutrophils coincided with CMS4 in this patient (Figure 3a).
A similar switch from CMS2 to CMS4 was detected at the time of radiological progression in patient ID2. Interestingly, the CMS switch in this patient was associated with the identification of mutant KRAS in cfDNA (Figure 3b). Likewise, ssGSEA scores for the KRAS pathway considerably increased over time (Figure 3b). In line with observations from patient ID1, this patient also demonstrated an elevated ssGSEA score for the VEGF/VEGFR pathway and an increased MCP-Counter score for the neutrophil population (Figure 3b). Remarkably, there was a significant increase in Th17 immune cells in this patient at the time of disease progression. Tumor-infiltrating Th17 cells are linked with the inhibition of CD8+ T cell migration in advanced-stage CRC patients (38) (Figure 3b).
Patient ID3 presented with multiple synchronous liver metastases. After neoadjuvant therapy and partial metastasectomy combined with radiofrequency ablation, evRNA liquid biopsy was initially assigned as CMS2 (Figure 3c, draw 1). The subsequent clinical course continued to demonstrate a relatively stable CMS2 subtype on repeated sampling. Nonetheless, a switch to CMS1 was observed only a few weeks preceding a CT scan that demonstrated a substantial progression in the liver (Figure 3c). At the time of disease progression, not only were the immune response and several immune-associated pathways, such as interferon alpha and gamma response, and PD1 signaling, activated, but the MCP-counter cytotoxicity score was also significantly increased. This patient had been undergoing FOLFIRI treatment during the monitoring period, and interestingly, the ssGSEA score for FOLFIRI response significantly dropped at the time of progression (Figure 3c).
A similar pattern was observed in patient ID4 who was initially classified as CMS2, on two consecutive liquid biopsies after neoadjuvant treatment and partial hepatectomy. However, he later switched into CMS1 three months prior to detection of radiological progression in the liver and maintained as CMS1 on a consecutive draw. After microwave ablation, the patient had a subsequent draw positive for mutant KRAS in cfDNA which was followed by a second relapse in the liver and a switch into CMS4 (Figure 3d). Once more, we noticed a consistent pattern of changes in the VEGF/VEGFR and P53 pathways and the neutrophil population at the time of transitioning to CMS4 (Figure 3d).
Thus, in all four cases, we observed a “subtype switch” in evRNA CMS assignment which predates the appearance of therapeutic resistance and radiological progression.
Plasma evRNA to track subtype evolution in longitudinal samples (Cohort 2b)
In addition to CMS classification, we investigated whether our approach was able to capture transcriptomic changes, at the gene and pathway levels which could help elucidate molecular underpinnings of disease relapse. For this analysis, we used our longitudinal cohort consisting of 24 metastatic patients that underwent hepatectomy (Table S5). 12 patients showed NED within a median follow up of 425 days (range 175–1578 days) from the time of hepatectomy while the other 12 patients recurred within a median follow up of 435 days after surgery (range 97 to 1132 days) (the latter were in addition to the four patients with recurrence described in Figure 3). For patients with no recurrence, two serial draws were assessed in the post-operative period, both at the time of clinical NED (Figure 4a). For patients with recurrence, two follow-up draws during the NED interval, and a third on clinical recurrence were analyzed (Figure 4a). Proportions of cancer-derived evRNA were analyzed at each time point using our deconvolution pipeline. These proportions were significantly higher at recurrence compared to previous draws and were lower in NED draw 2 (NED2) for patients without recurrence (Figure 4a–b). We then performed Gene Set Enrichment Analysis (GSEA) after the deconvolution at each longitudinal draw and compared the median pathway scores across all 12 patients from each group within deconvoluted evRNA. Multiple pathways showed opposing trends between these two groups, including Wnt/β-catenin, Hedgehog signaling, Immune Th17, VEGFR/VEGF and IL2_STAT5 signaling (Figure 4a–b, S9). Patients that underwent clinical relapse had an upregulation of these pathways at recurrence, while patients that remained NED, showed a decrease from draw 1 to draw 2. Because of the opposite trends observed in NED2 by pathway analysis, we examined differences in gene expression at this draw, for both groups. Several cancer-associated genes were differentially expressed in NED2 for patients with recurrence, including transcripts corresponding to the oncogenes MET, MYC, and KRAS (Figure 4c).
Plasma evRNA in longitudinal samples with stable disease (Cohort 3)
We isolated and sequenced 24 evRNA samples from 3 time points obtained from 8 patients who initially had stable disease after surgery, but later experienced disease progression (Figure 4d). Following the deconvolution, we performed ssGSEA analysis on the tumor-derived evRNA data. Although we did not observe CMS switching in these samples, paired t-test analysis between the baseline and progression time points revealed critical pathways that exhibited significant differences between time points: the time of surgery and recurrence (Figure 4d). Notably, we observed an early downregulation of the P53 pathway, which could potentially result from TP53 inactivation, a significant factor contributing to treatment resistance.
evRNA can identify expressed gene fusions and neoepitopes
Chromosomal rearrangements have been well documented in CRC and can lead to the formation of gene fusions, offering new targets for personalized therapy (39, 40). We evaluated the present of gene fusions in evRNA from our baseline and longitudinal cohorts by using three bioinformatic pipelines (Arriba, Pizzly pair and Fusioncatcher, see Methods). In our baseline cohort, three gene fusions, DOCK4:IMMP2L, SLC6A6:XPC and TFG:ADGRG7, were detected independently in three samples, both in evRNA and in matched tumor tissues (Figure S10a–c). In addition, GATC::COX6A1 was identified in two independent evRNA samples (Figure S10d). The gene fusion DOCK4::IMMP2L is of particular interest, since IMMP2L somatic breakpoints have been reported in inflammatory bowel disease-associated CRC, and somatic rearrangements have been also described for this gene (41, 42). In our longitudinal cohort 2b, we identified 39 total gene fusions in 25 out of 84 samples (30%) which were called by at least two of the algorithms. Interestingly, in patients with no recurrence, no gene fusions were detected in NED2 evRNA samples, while in patients with recurrence, fusions were present in NED2 evRNA and at recurrence (Figure 5a, Table S2). Additionally, six gene fusions were detected in MAPK-related genes, five in ITGB3 and four in CHIC2. Interestingly, VCPIP1:MAPK1, PTPRK:FAM120B and KRAS:SENP6 were identified at in evRNA samples of patient 7 at baseline, NED2 and at recurrence, respectively (Figure 5b–d).
Figure 5. Prediction of gene fusions in plasma-derived evRNA.

a, Quantification of gene fusions in patients with and without recurrence. b-d, Structural representation of gene fusions and circle plot depicting main chromosomal aberrations in a patient with recurrence (patient ID7). Fusions at baseline (b, VCPIP1:MAPK1), NED2 (c, PTPRK;FAM120B) and recurrence (d, KRAS:SENP6) are depicted.
Furthermore, we investigated the presence of neoepitopes from gene fusions, insertions/deletions (INDELs) and exitrons in our initial cohort of 42 baseline samples while comparing liquid biopsy calls to paired tissue samples. Through the pipeline NeoFuse we were able to predict the peptide HQDQAVSL from the gene fusion GATC:COX6A1 (Table S3). ScanNeo was utilized to predict INDELs-derived neoepitopes. With this pipeline we predicted matched plasma and tissue neoepitopes in 15/42 patients, and a total of 466 paired neoepitopes with a <500mM mutant binding affinity were found. Among these, 181 neoepitopes from 9 patients were predicted to have <100nM binding affinity and 158 from 10 patients showed a <50nM binding (Figure S10e, Table S3). We then applied ScanExitron, a computational pipeline to identify exitron spicing events. We were able to identify seven exitrons in both plasma and matched tumor samples. Among these, two exitrons contained one and two predicted neoantigens, respectively (Table S3).
Discussion
In this study, we coupled low-input RNA-seq data obtained from circulating EVs with a bioinformatic deconvolution pipeline and deep learning algorithms in order to interrogate multiple tumor-specific transcriptomic features sans the requirement for tissue biopsies. Shi and colleagues are credited with the first demonstration that transcriptomic analyses of circulating EVs can be used to predict the response of advanced melanomas to immune checkpoint inhibitor (ICI) immunotherapy (43). Notably, response to ICIs was associated with an enrichment for transcripts related to immune pathways within post-treatment EV samples, and the latter were principally contributed by non-tumor sources (i.e., immune cells). Although this prior study principally focused on the non-tumor EV transcriptome, it laid the foundational basis for exploring the possibility that evRNA might represent a rich resource for mining cancer-specific information, so long as the cancer-specific compartment could be reliably extracted from the composite pool of circulating EVs.
Our study demonstrates that bioinformatic deconvolution of cancer-specific RNAs in plasma EVs is indeed feasible, to as low a fraction as only 1% of cancer-derived transcripts in the circulating evRNA transcriptome. While the current limit of detection in our algorithm is at 1% of cancer-derived transcripts within the evRNA pool, ongoing improvements in the pipeline is likely to further reduce this threshold, enabling us to track patients in the minimal residual disease (MRD) setting. Using our deconvolution pipeline, we were able to successfully predict tumor molecular subtypes in the majority of CRC cases analyzed, using tissue-based RNA profiling as the “ground truth”. The discrepancies identified between blood and tissue in the case of CRC are likely the combined result of underlying heterogeneity, where region-specific sampling of the tissue impacts the predicted subtype, and the impact of non-tumor components (stroma and immune cells) to the subtype annotation. Interestingly, while RNA-based immune subtype profiling of tumor tissues per se revealed monocytes as the expected predominant population within the tumor immune microenvironment, plasma evRNA demonstrated an enrichment of EVs derived from other immune cell subtypes such as neutrophils and cytotoxic T/NK cells. Importantly, our pipeline used evRNA signatures not only for classifying tumors into established molecular subtypes at baseline, but also for longitudinal monitoring of changes in aforementioned molecular subtypes over time. For example, longitudinal monitoring of evRNA in a cohort of CRC patients was able to identify changes in molecular subtype prior to onset of imaging-based progression, underscoring the importance of this approach to act as a sentinel for emergence of treatment resistance. While the association between tumor subtype switching and treatment resistance is well recognized (13), such monitoring has mostly been restricted to serial tissue samples, which is neither universally feasible nor cost effective. Liquid biopsies may facilitate the ability to detect evolution in CMS subtypes without the need of invasive biopsies. For example, a switch from CMS2 to CMS1 was detected at the time of progression in cases 3 and ssGSEA scores supported the results, thus opening new possibilities for potential immune therapy intervention for these patients, as CMS1 stratification has been reported to be more responsive to immune therapies (44).
Our approach was not only able to detect changes in CMS but also gene and pathway level changes over time. When comparing patients that progressed with those that did not during the time monitored, we were able to detect subtle differences in blood that could indicate residual disease. For instance, KRAS, MYC and MET, were overexpressed at NED2 in patients with recurrence. MET amplification has been associated with resistance to anti-EGFR therapy in CRC (45). Similarly, elevated expression of MYC and KRAS have been related to resistance to EGFR therapy in mCRC patients (46, 47). Interestingly, patients with recurrence showed increase on various genes involved in DNA damage repair such as MLH1, MSH2, SHLD2 and BCR which may suggest an unstable genome. In addition, longitudinal analysis of patients with stable disease clearly showed transcriptomic changes at the time of progression which included downregulation on KRAS down, TP53, MYC, IL2/STAT5 and IL6/JACK/STAT3 signaling.
Finally, the identification of gene fusions and neoepitopes in blood brings a unique opportunity for precision oncology for advanced cancer patients. In CRCs, expressed fusions are often actionable, but their identification can be limited in tissue-based assays (48–50). A blood-based assay provides redundant opportunities for assessing fusions, including over the course of time. In addition, our results indicate that the detection of gene fusions could be used as a potential biomarker for recurrence and tumor monitoring in mCRC patients. Moreover, the identification of potential neoepitopes without the prerequisite for tissue biopsies is of considerable translational relevance in patients receiving neoantigen targeted therapies (such as vaccines or engineered T cell receptors), where both de novo epitopes and newly emerged candidates can be tracked longitudinally over time and the therapeutic modulated accordingly.
Our data underpins the utility of plasma evRNA as a multifaceted liquid biopsy analyte in cancer patients, complementing the well-established of DNA-based profiling. Transcriptomic subtyping of tumors through liquid biopsy can be of particular interest when routine tissue biopsy is not feasible. This approach can be used to predict molecular subtypes of solid tumors at baseline, but more importantly, serve as a platform for identifying determinants of response and resistance, as well as potentially anticipate clinical recurrence in serial samples. This study could be benefited by having longitudinal tumor biopsies which could help to validate the transcriptomics changes we detect in evRNA. In addition, having a larger number of patients with longitudinal follow-ups would reaffirm the results obtained with the current cohort. While our study successfully demonstrated the potential of classifying cancer through liquid biopsies, focusing on the CMS of colorectal cancer, we acknowledge the inherent limitations linked to CMS. These include tumor heterogeneity, overlapping molecular subtypes, and an incomplete understanding of their biological implications. Nevertheless, the liquid biopsy approach presented in this study holds promise for broader application within future refined and widely accepted cancer classification systems. This study lay the foundational basis for extrapolating the power of liquid biopsy-based monitoring of cancer patients using direct RNA profiling.
Supplementary Material
Significance.
Development of an approach to interrogate molecular subtypes, cancer-associated pathways, and differentially expressed genes through RNA-sequencing of plasma extracellular vesicles lays the foundation for liquid biopsy-based longitudinal monitoring of patient tumor transcriptomes
Acknowledgments
We thank our patients and their families.
Funding
A.M. is supported by the MD Anderson Pancreatic Cancer Moon Shot Program and the Sheikh Khalifa Bin Zayed Al-Nahyan Foundation. S.K. and A.M. are supported by NIH (R01CA218230, P50CA221707 and P30CA016672). P.A.G. and A.M. are supported by Break Through Cancer, J. J. L by NIH (T32CA009599), R. S. is supported by NIH (5R01CA218230 and 5P50CA221707), V. B is supported by the Rosalie B. Hite Graduate Fellowship, the American Legion Auxiliary Fellowship, and the John J. Kopchick Fellowship and W. W. is supported by NIH (R01CA239342 and R01CA268380).
Footnotes
Conflict of interest
A.M. is listed as an inventor on a patent that has been licensed by Johns Hopkins University to ThriveEarlier Detection, an Exact Sciences Company. A.M. serves as a consultant for Tezcat Biotechnology. S.K. has ownership interest in MolecularMatch, Lutris, Iylon, and is a consultant for Genentech, EMD Serono, Merck, Holy Stone, Novartis, Lilly, Boehringer Ingelheim, Boston Biomedical, AstraZeneca/MedImmune, Bayer Health, Pierre Fabre, Redx Pharma, Ipsen, Daiichi Sankyo, Natera, HalioDx, Lutris, Jacobio, Pfizer, Repare Therapeutics, Inivata, GlaxoSmithKline, Jazz Pharmaceuticals, Iylon, Xilis, Abbvie, Amal Therapeutics, Gilead Sciences, Mirati Therapeutics, Flame Biosciences, Servier, Carina Biotechnology, Bicara Therapeutics, Endeavor BioMedicines, Numab Pharma, Johnson & Johnson/Janssen, and has received research funding from Sanofi, Biocartis, Guardant Health, Array BioPharma, Genentech/Roche, EMD Serono, MedImmune, Novartis, Amgen, Lilly, Daiichi Sankyo. R.S. has received consulting fees from Boehringer Ingelheim. All other authors declare no potential conflicts of interest.
References
- 1.Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359(6378):926–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Shen SY, Singhania R, Fehringer G, Chakravarthy A, Roehrl MHA, Chadwick D, et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature. 2018;563(7732):579–83. [DOI] [PubMed] [Google Scholar]
- 3.Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV, Consortium C. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31(6):745–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Larson MH, Pan W, Kim HJ, Mauntz RE, Stuart SM, Pimentel M, et al. A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype-specific biomarkers for cancer detection. Nat Commun. 2021;12(1):2357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Esfahani MS, Hamilton EG, Mehrmohamadi M, Nabet BY, Alig SK, King DA, et al. Inferring gene expression from cell-free DNA fragmentation profiles. Nat Biotechnol. 2022;40(4):585–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Roskams-Hieter B, Kim HJ, Anur P, Wagner JT, Callahan R, Spiliotopoulos E, et al. Plasma cell-free RNA profiling distinguishes cancers from pre-malignant conditions in solid and hematologic malignancies. NPJ Precis Oncol. 2022;6(1):28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lin Y, Leng Q, Zhan M, Jiang F. A Plasma Long Noncoding RNA Signature for Early Detection of Lung Cancer. Transl Oncol. 2018;11(5):1225–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.O’Brien K, Breyne K, Ughetto S, Laurent LC, Breakefield XO. RNA delivery by extracellular vesicles in mammalian cells and its applications. Nat Rev Mol Cell Biol. 2020;21(10):585–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bernard V, Kim DU, San Lucas FA, Castillo J, Allenson K, Mulu FC, et al. Circulating Nucleic Acids Are Associated With Outcomes of Patients With Pancreatic Cancer. Gastroenterology. 2019;156(1):108–18 e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Moding EJ, Nabet BY, Alizadeh AA, Diehn M. Detecting Liquid Remnants of Solid Tumors: Circulating Tumor DNA Minimal Residual Disease. Cancer Discov. 2021;11(12):2968–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guinney J, Dienstmann R, Wang X, de Reynies A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nat Med. 2015;21(11):1350–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Woolston A, Khan K, Spain G, Barber LJ, Griffiths B, Gonzalez-Exposito R, et al. Genomic and Transcriptomic Determinants of Therapy Resistance and Immune Landscape Evolution during Anti-EGFR Treatment in Colorectal Cancer. Cancer Cell. 2019;36(1):35–50 e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009;45(2):228–47. [DOI] [PubMed] [Google Scholar]
- 15.Yang JE, Rossignol ED, Chang D, Zaia J, Forrester I, Raja K, et al. Complexity and ultrastructure of infectious extracellular vesicles from cells infected by non-enveloped virus. Sci Rep. 2020;10(1):7939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.San Lucas FA, Allenson K, Bernard V, Castillo J, Kim DU, Ellis K, et al. Minimally invasive genomic and transcriptomic profiling of visceral cancers by next-generation sequencing of circulating exosomes. Ann Oncol. 2016;27(4):635–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wang K, Patkar S, Lee JS, Gertz EM, Robinson W, Schischlik F, et al. Deconvolving Clinically Relevant Cellular Immune Cross-talk from Bulk Gene Expression Using CODEFACS and LIRICS Stratifies Patients with Melanoma to Anti-PD-1 Therapy. Cancer Discov. 2022;12(4):1088–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gao F, Wang W, Tan M, Zhu L, Zhang Y, Fessler E, et al. DeepCC: a novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis. 2019;8(9):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019;37(7):773–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462(7269):108–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Uhrig S, Ellermann J, Walther T, Burkhardt P, Frohlich M, Hutter B, et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 2021;31(3):448–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mazouji O, Ouhajjou A, Incitti R, Mansour H. Updates on Clinical Use of Liquid Biopsy in Colorectal Cancer Screening, Diagnosis, Follow-Up, and Treatment Guidance. Front Cell Dev Biol. 2021;9:660924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nicorici D, Satalan M, Edgren H, Kangaspeska S, Murumägi A, Kallioniemi O, et al. FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. BioRxiv. 2014. [Google Scholar]
- 26.Fotakis G, Rieder D, Haider M, Trajanoski Z, Finotello F. NeoFuse: predicting fusion neoantigens from RNA sequencing data. Bioinformatics. 2020;36(7):2260–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yang R, Van Etten JL, Dehm SM. Indel detection from DNA and RNA sequencing data with transIndel. BMC Genomics. 2018;19(1):270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang TY, Wang L, Alam SK, Hoeppner LH, Yang R. ScanNeo: identifying indel-derived neoantigens using RNA-Seq data. Bioinformatics. 2019;35(20):4159–61. [DOI] [PubMed] [Google Scholar]
- 29.Wang TY, Yang R. Integrated protocol for exitron and exitron-derived neoantigen identification using human RNA-seq data with ScanExitron and ScanNeo. STAR Protoc. 2021;2(3):100788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kalluri R, LeBleu VS. The biology, function, and biomedical applications of exosomes. Science. 2020;367(6478). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Skog J, Wurdinger T, van Rijn S, Meijer DH, Gainche L, Sena-Esteves M, et al. Glioblastoma microvesicles transport RNA and proteins that promote tumour growth and provide diagnostic biomarkers. Nat Cell Biol. 2008;10(12):1470–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Valadi H, Ekstrom K, Bossios A, Sjostrand M, Lee JJ, Lotvall JO. Exosome-mediated transfer of mRNAs and microRNAs is a novel mechanism of genetic exchange between cells. Nat Cell Biol. 2007;9(6):654–9. [DOI] [PubMed] [Google Scholar]
- 33.Sveen A, Bruun J, Eide PW, Eilertsen IA, Ramirez L, Murumagi A, et al. Colorectal Cancer Consensus Molecular Subtypes Translated to Preclinical Models Uncover Potentially Targetable Cancer Cell Dependencies. Clin Cancer Res. 2018;24(4):794–806. [DOI] [PubMed] [Google Scholar]
- 34.Eide PW, Bruun J, Lothe RA, Sveen A. CMScaller: an R package for consensus molecular subtyping of colorectal cancer pre-clinical models. Sci Rep. 2017;7(1):16618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hoshida Y Nearest template prediction: a single-sample-based flexible class prediction with confidence assessment. PLoS One. 2010;5(11):e15543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hoshida Y, Villanueva A, Kobayashi M, Peix J, Chiang DY, Camargo A, et al. Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med. 2008;359(19):1995–2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wang Z, Cao S, Morris JS, Ahn J, Liu R, Tyekucheva S, et al. Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration. iScience. 2018;9:451–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wang D, Yu W, Lian J, Wu Q, Liu S, Yang L, et al. Th17 cells inhibit CD8(+) T cell migration by systematically downregulating CXCR3 expression via IL-17A/STAT3 in advanced-stage colorectal cancer patients. J Hematol Oncol. 2020;13(1):68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kloosterman WP, Coebergh van den Braak RRJ, Pieterse M, van Roosmalen MJ, Sieuwerts AM, Stangl C, et al. A Systematic Analysis of Oncogenic Gene Fusions in Primary Colon Cancer. Cancer Res. 2017;77(14):3814–22. [DOI] [PubMed] [Google Scholar]
- 40.Mertens F, Johansson B, Fioretos T, Mitelman F. The emerging complexity of gene fusions in cancer. Nat Rev Cancer. 2015;15(6):371–81. [DOI] [PubMed] [Google Scholar]
- 41.Rajamaki K, Taira A, Katainen R, Valimaki N, Kuosmanen A, Plaketti RM, et al. Genetic and Epigenetic Characteristics of Inflammatory Bowel Disease-Associated Colorectal Cancer. Gastroenterology. 2021;161(2):592–607. [DOI] [PubMed] [Google Scholar]
- 42.Bass AJ, Lawrence MS, Brace LE, Ramos AH, Drier Y, Cibulskis K, et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat Genet. 2011;43(10):964–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Shi A, Kasumova GG, Michaud WA, Cintolo-Gonzalez J, Diaz-Martinez M, Ohmura J, et al. Plasma-derived extracellular vesicle analysis and deconvolution enable prediction and tracking of melanoma checkpoint blockade outcome. Sci Adv. 2020;6(46). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.He R, Lao Y, Yu W, Zhang X, Jiang M, Zhu C. Progress in the Application of Immune Checkpoint Inhibitor-Based Immunotherapy for Targeting Different Types of Colorectal Cancer. Front Oncol. 2021;11:764618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bardelli A, Corso S, Bertotti A, Hobor S, Valtorta E, Siravegna G, et al. Amplification of the MET receptor drives resistance to anti-EGFR therapies in colorectal cancer. Cancer Discov. 2013;3(6):658–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Strippoli A, Cocomazzi A, Basso M, Cenci T, Ricci R, Pierconti F, et al. c-MYC Expression Is a Possible Keystone in the Colorectal Cancer Resistance to EGFR Inhibitors. Cancers (Basel). 2020;12(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Favazza LA, Parseghian CM, Kaya C, Nikiforova MN, Roy S, Wald AI, et al. KRAS amplification in metastatic colon cancer is associated with a history of inflammatory bowel disease and may confer resistance to anti-EGFR therapy. Mod Pathol. 2020;33(9):1832–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Fusco MJ, Saeed-Vafa D, Carballido EM, Boyle TA, Malafa M, Blue KL, et al. Identification of Targetable Gene Fusions and Structural Rearrangements to Foster Precision Medicine in KRAS Wild-Type Pancreatic Cancer. JCO Precis Oncol. 2021;5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jones MR, Williamson LM, Topham JT, Lee MKC, Goytain A, Ho J, et al. NRG1 Gene Fusions Are Recurrent, Clinically Actionable Gene Rearrangements in KRAS Wild-Type Pancreatic Ductal Adenocarcinoma. Clin Cancer Res. 2019;25(15):4674–81. [DOI] [PubMed] [Google Scholar]
- 50.Heining C, Horak P, Uhrig S, Codo PL, Klink B, Hutter B, et al. NRG1 Fusions in KRAS Wild-Type Pancreatic Cancer. Cancer Discov. 2018;8(9):1087–95. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data generated in this study are publicly available in Gene Expression Omnibus (GEO) at GSE255775. The data analyzed in this study were obtained from https://gdac.broadinstitute.org. Specifically, COAD (Colon adenocarcinoma) and READ (Rectum adenocarcinoma) were used. All other raw data generated in this study are available upon request from the corresponding author.
