Abstract
Circulating cell-free mRNA (cf-mRNA) holds great promise as a non-invasive diagnostic biomarker. However, cf-mRNA composition and its potential clinical applications remain largely unexplored. Here we show, using Next Generation Sequencing-based profiling, that cf-mRNA is enriched in transcripts derived from the bone marrow compared to circulating cells. Further, longitudinal studies involving bone marrow ablation followed by hematopoietic stem cell transplantation in multiple myeloma and acute myeloid leukemia patients indicate that cf-mRNA levels reflect the transcriptional activity of bone marrow-resident hematopoietic lineages during bone marrow reconstitution. Mechanistically, stimulation of specific bone marrow cell populations in vivo using growth factor pharmacotherapy show that cf-mRNA reflects dynamic functional changes over time associated with cellular activity. Our results shed light on the biology of the circulating transcriptome and highlight the potential utility of cf-mRNA to non-invasively monitor bone marrow involved pathologies.
Subject terms: Gene expression analysis, Sequencing, RNA sequencing, Molecular medicine
Circulating cell-free mRNA holds great promise as a non-invasive diagnostic biomarker. Here the authors show that cell-free mRNA captures transcripts from the bone marrow and can be used to non-invasively monitor dynamic changes in bone marrow physiology.
Introduction
Blood irrigates all organs, supplying oxygen and nutrients to the cells of the body while collecting byproducts of cell metabolism, including lipids, proteins and nucleic acids. These circulating biomolecules in blood contain information linked to specific organ health. While most research effort has focused on circulating proteins and lipids, circulating cell-free nucleic acids (cf-NA) have recently emerged as a non-invasive tool for diagnosis and monitoring of health and disease1. For example, cell-free DNA (cfDNA), the most well-characterized cf-NA, has been utilized for prenatal diagnostics, transplant rejection prediction, and monitoring of cancer2–6. Despite these advances, the value of cfDNA tests remains mainly restricted to physiologic and disease situations characterized by genetic differences (i.e., pregnancy, transplants, or tumors). In contrast, the cell-free messenger RNA (cf-mRNA) transcriptome can be considered as a compendium of transcripts collected from all organs7. Some of these circulating transcripts correspond to well-characterized tissue-specific genes, supporting interrogation of these biomolecules to dynamically monitor health or disease state of tissues and organs. Indeed, cf-mRNA has been shown to reflect fetal development, predict preterm delivery in pregnant women7–9, and as a cancer biomarker10,11.
Strikingly, the biological processes underlying the presence of cf-NA in circulation remain inferred, but largely unknown. In the case of cfDNA, studies have proposed the primary mechanism is passive release into circulation upon cell death12,13. In contrast, RNA molecules can be actively secreted from cells11,14–16. Much work has focused on the secretion of non-coding and smaller RNA molecules into exosomes and other lipid vesicles. However, on a per-molecule basis, mRNA comprises a minor fraction of this phenomenon17, and the origin of cf-mRNA remains unclear.
In this study, we conduct next-generation sequencing-based whole-transcriptomic profiling of cf-mRNA and compare expression levels to those from circulating cells of the blood (CC) to decipher the origin of circulating transcripts and better understand their potential clinical utility. We show that cf-mRNA captures transcripts of non-hematopoietic and hematopoietic origin, and is enriched in transcripts derived from the bone marrow (BM). Longitudinal studies of cancer patients undergoing BM ablation and transplantation show that cf-mRNA profiling non-invasively captures temporal transcriptional activity of the BM. Further, stimulation of specific BM lineages with growth factor therapeutics suggests that cf-mRNA fluctuations reflect active lineage-specific transcriptional activity. Collectively, our data provide insights into the biological origins of cf-mRNA, strongly suggesting that living cells contribute cf-mRNA to circulation, and anticipate the potential of circulating transcripts as non-invasive biomarkers that could eventually alleviate the use of BM biopsies.
Results
cf-mRNA is enriched in hematopoietic progenitor transcripts
To characterize the landscape of the human cell-free RNA transcriptome (cf-mRNA), we isolated and sequenced cf-mRNA from 1 ml of serum of 24 healthy donors. Among this cohort, we identified 10,357 transcripts with >1 TPM (transcripts per million) and 7386 transcripts with >5 TPM in at least 80% of the samples, reflecting the diversity and consistency of cf-mRNA transcriptome among healthy subjects (Supplementary Tables 1 and 2 provide additional information of cf-mRNA sequencing metrics). We used non-negative matrix factorization (NMF) to decompose the cf-mRNA transcriptome in an unsupervised manner18,19 and gene expression reference databases (GTEx and Blueprint) to estimate the relative contributions of the different tissues and cell types (see Methods). The majority of the transcripts detected in cf-mRNA, ~85% on average, are of hematopoietic origin (i.e., derived from circulating cells and BM-resident cells), with the remaining ~15% being of non-hematopoietic origin (i.e., derived from solid tissues, Fig. 1a, b). Specifically, deconvolution analyses estimated that, on average, ~29% of transcripts are of megakaryocyte/platelet origin (first to third quartile range 23–36%), ~28% are of lymphocyte origin (range 18–30%), 12.8% of granulocyte origin (range 6–16%), 3% of neutrophil progenitor origin (range 0.2–3.7%), 11% of erythrocyte origin (range 8–14%), and ~15% derived from solid tissues (range 11–20%) (Fig. 1a, b). To gain insight into the origin of these transcripts, similar deconvolution analysis was performed in whole-blood (WB) samples from 19 healthy individuals from previously reported RNA-sequencing (RNA-Seq) data20. As expected, the WB transcriptome is largely composed of lymphocyte (~69% on average) and granulocyte (~22% on average) transcripts, with an additional ~7% of transcripts of erythrocyte origin and minor contributions from other cell types and tissues (Fig. 1a, b). These analyses represent an estimation of the composition of the transcriptome of these biofluids that could be influenced by different factors. Nevertheless, our data show the higher diversity of cf-mRNA transcriptome, which, compared to WB, contains a larger fraction of non-hematopoietic transcripts and of hematopoietic progenitor genes derived from the BM.
To confirm the presence of BM-specific transcripts in circulation, we performed RNA-Seq in three paired WB (which includes all cellular components of blood) and plasma samples from healthy donors (Supplementary Fig. 1A) and compared the levels of the main hematopoietic cell-type-specific transcripts (i.e., neutrophils, erythrocytes, platelets/megakaryocyte, T cells) in these specimens (Fig. 1c and Supplementary Fig. 1B, C). Striking differences were observed among neutrophil-specific transcripts (Fig. 1c). Using the hematopoiesis transcriptomic reference database (Blueprint), we observed that transcripts expressed in mature circulating neutrophils are detected at much lower levels in plasma compared to WB (Fig. 1c). In contrast, transcripts expressed in BM-resident neutrophil progenitors are enriched in cf-mRNA (Fig. 1c). To confirm these findings, we performed RNA-Seq of five paired plasma and buffy coat samples (buffy coat is enriched in white blood cells). Consistently, neutrophil mature and progenitor transcripts were found to form distinct populations (Fig. 1d), in which cf-mRNA shows low levels of mature transcripts such as the chemokine receptors CXCR1 and CXCR2 (Fig. 1e, p < 0.01) compared to buffy coat, but is enriched in progenitor transcripts, such as PRTN3 (myeloblastin precursor), CTSG (cathepsin G), and AZU1 (azurocidin precursor) (p < 0.05, Fig. 1f, Supplementary Fig. 1D, E). These data support the presence of BM transcripts in cf-mRNA; indeed, quadratic programing deconvolution analysis of hematopoietic transcripts from healthy donors indicated that BM transcripts contribute ~9% of cf-mRNA transcriptome, in contrast to ~1% in WB.
To further confirm this result, we performed RNA-Seq on a human BM sample and compared it with the WB transcriptome. We identified 377 genes enriched in BM transcriptome (>5-fold, BM genes) (Supplementary Table 3), representing hematopoietic progenitors (i.e., neutrophil progenitors and mesenchymal stem cells from the BM). Interestingly, progenitor transcripts such as PRTN3, CTSG, and AZU1 are among the top transcripts enriched in BM transcriptome. In addition, 374 genes were identified enriched in WB (>5-fold, WB genes) (Supplementary Table 4), mainly representing mature circulating blood cell genes (i.e., associated with mature granulocytes and lymphocytes). Subsequently, the levels of BM genes and WB genes were compared in three matching WB and plasma samples, which confirmed that these transcripts segregate into two populations (p < 0.001), with cf-mRNA being enriched in hematopoietic progenitor genes (BM genes) and depleted of mature genes (WB genes) compared to WB (Fig. 1g and Supplementary Fig. 1F). In summary, our data indicate that cf-mRNA transcriptome captures transcripts derived from the BM, potentially providing a window to non-invasively evaluate BM function
Measurement of BM-specific transcripts by cf-mRNA
As further evidence that BM-specific transcripts can be detected in cf-mRNA and to evaluate their potential utility, we recruited three multiple myeloma (MM) patients. MM is characterized by the clonal expansion and accumulation of malignant plasma cells almost exclusively in the BM. These cells express specific immunoglobulin (Ig) rearrangements, in contrast to plasma cells of healthy individuals, which express multiple Ig combinations. In this study, MM patients underwent melphalan-mediated BM ablation (starting at day −2), followed by autologous hematopoietic stem cell (HSC) infusion (day 0) (Fig. 2b). We isolated and sequenced cf-mRNA from 1 ml of plasma of these patients before BM ablation (day −2). Clonal expansion of Ig heavy (IgH) and Ig light (IgL) chains transcripts was identified for two out of three patients. For instance, in patient 2 we detected IGHG1 and IGKC transcripts as the most prevalent Ig constant regions (Supplementary Fig. 2A–C). For the variable regions, IGHV3–15 and IGKV2–24 transcripts dominated the sample’s transcriptome, while clonal lambda regions were not detected (Fig. 2a, c and Supplementary Fig. 2C). In contrast, clonal transcripts were not observed in plasma of a healthy control individual, as expected (Fig. 2a). Similar analyses in patient 1 revealed a clone composed of the IgH constant chain IGHA1 and variable region IGHV1–69, and IgL lambda constant chain IGLC1 and variable region IGLV1–40 (Supplementary Fig. 2D). In both cases, the malignant clones we identified are consistent with the molecular testing performed from BM aspirates (Supplementary Table 6). However, for patient 3, we did not detect dominant Ig rearrangements (Supplementary Fig. 2E), likely due to the low number of plasma cells in the BM of this patient at the start of this study (Supplementary Table 6). Malignant plasma cells are rarely found in circulation in MM patients; indeed, RNA-Seq analysis of the matching buffy coat of patient 2 samples before chemotherapy treatment showed only low levels of a repertoire of IgH and IgL transcripts, with no dominant rearrangements (Fig. 2a, c and Supplementary Fig. 2A–C), highlighting the unique ability of cf-mRNA to capture the clonal Ig transcripts generated by plasma cells in the BM.
To test whether cf-mRNA profiling can be used to monitor the levels of the malignant Ig clone, we sequenced the cf-mRNA from plasma of these patients every day for 2 weeks after chemotherapy and transplant. While patient 1 showed no apparent reduction of the malignant clone after therapy (Supplementary Fig. 2D), patient 2 showed decreased levels of the predominant Ig variants in cf-mRNA after melphalan-induced apoptosis of plasma cells (Fig. 2b–d and Supplementary Fig. 2A–C). By day 10, the immune profile was no longer dominated by clonal Ig combinations, indicating successful therapy and BM reconstitution (Fig. 2b–d). In contrast, RNA-Seq performed on the matching buffy coat fraction throughout the study showed very limited information regarding the malignant Ig transcripts (Fig. 2c and Supplementary Fig 2A–C), supporting the potential of cf-mRNA to non-invasively capture BM activity.
cf-mRNA reflects hematopoietic reconstitution after BM transplant
To gain further insight into the ability of circulating mRNA to reveal BM transcriptional activity, we followed the BM ablation and reconstitution dynamics after autologous HSC transplants in cf-mRNA, using the prototypical MM patient 2. Additionally, we investigated acute myeloid leukemia (AML) patients who underwent submyeloablative treatment followed by allogeneic HSC transplants (see Methods). Unsupervised clustering of transcripts detected in plasma cf-mRNA of MM and AML patients identified temporal patterns of expression for several groups of genes (Fig. 3a, b). Both Gene Ontology enrichment analysis and RNA-Seq data from Blueprint Consortium indicated that many of the identified components correspond to specific hematopoietic lineages (Fig. 3a, b). Therefore, we examined in detail the dynamics of hematopoietic lineage-specific transcripts (i.e., erythrocytes, megakaryocytes, neutrophils) in circulation during BM ablation and reconstitution.
First, to clarify the relationship between erythrocyte circulating transcripts and red blood cells (RBCs), we examined the levels of erythrocyte lineage-specific transcripts in plasma and RBC counts throughout the study. RBCs are the predominant cell type in circulation and are stable for ~120 days in the bloodstream21. Little variation in RBC numbers was noticed in MM and AML patients during the duration of these studies (Fig. 3c–d and Supplementary Fig. 4A). In contrast, most erythrocyte-specific transcripts in cf-mRNA were rapidly reduced after chemotherapy-mediated BM ablation in all patients, and recovered at later time points during BM reconstitution (Fig. 3c–d, Supplementary Fig. 3a–b, and Supplementary Fig. 4A). The discrepancy between RBC number and most erythrocyte transcripts in cf-mRNA indicates that these transcripts derive primarily from immature erythrocyte forms either in the BM or in circulation (reticulocytes), rather than from mature RBCs. We performed RNA-Seq analysis of paired buffy coat samples of MM patient 2 to gain further insight into the origin of these transcripts. The levels of erythrocyte-specific genes in CC are reduced after chemotherapy, resembling the dynamics observed in cf-mRNA (Supplementary Fig. 3C), and indicate that reticulocytes may be the source of most erythrocyte transcripts in WB. However, transcripts such as GATA1, a key transcriptional regulator of erythrocyte development, are detectable in cf-mRNA earlier than in buffy coat during BM reconstitution (Supplementary Fig. 3C), suggesting their BM origin. While future experiments will be necessary to discriminate the precise contribution of each compartment, our data show that erythrocyte transcripts derive primarily from immature erythrocyte cells residing in the BM and circulating reticulocytes, rather than from the highly abundant mature RBC.
To test whether discrepancies between cell blood counts (CBCs) and lineage-specific transcripts in circulation extend to other hematopoietic cell types, we next compared the dynamics of platelet counts and megakaryocyte-specific transcripts. In MM patient 2, a consistent increase in the levels megakaryocyte-specific transcripts is detected in cf-mRNA by days 9 and 10 after transplant, prior to platelet count recovery, which occurs by days 12 and 13 (Fig. 3e). Interestingly, RNA-Seq from matched buffy coat samples showed that megakaryocyte transcript levels in CC mimic the dynamic of platelet counts throughout the study (Supplementary Fig. 3c), and, unlike in cf-mRNA, early recovery of megakaryocyte transcripts is not detectable in CC during BM reconstitution. This disparity suggests that megakaryocyte transcripts detected in cf-mRNA during BM reconstitution, and before the platelet count recovers, are derived from the BM. Supporting this observation, in AML patient 1 megakaryocyte transcripts in circulation decreased after BM ablation and recovered by day ~9, clearly foreshadowing the increase in platelet counts occurring by days 12 and 13 (Fig. 3f). Strikingly, no recovery of this lineage occurred in cf-mRNA of AML patient 2 (Supplementary Fig. 4B). Follow-up BM biopsy confirmed lack of megakaryocyte development in this patient (Supplementary Table 9), demonstrating the specificity of the measured megakaryocyte signal. Thus, our data indicate that a fraction of megakaryocyte/platelet transcripts in circulation derive from the BM and that cf-mRNA reflects megakaryocyte transcriptional activity during BM reconstitution.
Last, we examined the kinetics of neutrophil counts and specific transcripts in circulation of MM and AML patients during the therapy. In MM patient 2, neutrophil counts showed two spikes, one shortly after transplant, likely due to the granulocyte-colony stimulating factor (G-CSF) treatment, which is followed by a rapid decrease due to BM ablation, and a second spike by days 12 and 13 likely indicative of BM reconstitution (Fig. 3g). This resembles the overall dynamics of neutrophil-specific genes in cf-mRNA and in buffy coat during the procedure (Fig. 3g and Supplementary Fig. 3E). However, while neutrophil transcripts in buffy coat and cf-mRNA peaked at a similar time to neutrophil counts during BM reconstitution, neutrophil precursor genes like CTSG increased earlier in cf-mRNA, by days ~8 and 9 after the autologous stem cell transplant. Supporting this observation, the levels of progenitor neutrophil transcripts in plasma of all AML patients decreased after BM ablation, and increased in cf-mRNA during BM reconstitution days earlier than the neutrophil counts (Fig. 3h–j and Supplementary Fig. 4D–F). These data further support that, during BM reconstitution, progenitor neutrophil transcripts in circulation are not derived from CC, but rather reflect BM transcriptional activity of the granulocyte lineage, providing valuable information about transplant engraftment and BM reconstitution.
We also investigated an orthogonal approach to measure transplant engraftment using cf-mRNA from AML patients receiving allogeneic HSC transplants, in which genetic differences exist between host and donor cells. Using a reference database of single-nucleotide polymorphisms (SNPs), we identified host-specific polymorphisms in progenitor neutrophil transcripts before the transplant (e.g., ELANE, AZU1, and PRTN3). After transplantation, these transcripts are substituted by new genetic variants from donor cells (Fig. 4a). Indeed, cf-mRNA profiling enabled monitoring of changes in these transcripts during therapeutic treatment of patients 1 and 2 (Fig. 4b–c). Combined analysis of all detected SNP from the host switching to a different genetic variant after transplant (e.g., from homozygous to heterozygous) indicates that multiple genetic differences can be identified in cf-mRNA to temporally monitor transplant engraftment (Fig. 4d–e). Altogether, our data show that cf-mRNA captures both genetic information and transcriptional activity from the BM, and enables monitoring of transplant engraftment and BM reconstitution from donor cells.
cf-mRNA reveals response to stimulation with growth factors
To evaluate the potential of cf-mRNA to monitor the activity of specific BM lineages after stimulation with growth factors, we obtained plasma from nine patients with varying degrees of chronic kidney failure on chronic maintenance erythropoietin (EPO) therapy. EPO is a peptide hormone that specifically increases the rate of maturation and proliferation of erythrocytes in the BM22,23. Samples were obtained prior to administration of EPO (day 0), and at several time points up to 30 days after treatment (see Methods). Average levels of erythrocyte transcripts across nine patients in cf-mRNA increased shortly after EPO treatment (Fig. 5a). The levels of erythrocyte transcripts continued to increase during the initial days after treatment compared to untreated control individuals (Fig. 5a, b). Indeed, key erythropoietic developmental transcripts involved in heme biosynthesis (e.g., ALAS2, HBB, HBA2) were induced in most patients (Supplementary Fig. 5A). Further, analysis of dysregulated genes (p < 0.05) in plasma at days 3 or 4 after treatment with EPO using IPA (Ingenuity Pathway Analysis) showed “Heme biosynthesis II” as the top enriched pathway (p < 0.001), supporting the transcriptional induction of this cell lineage. Thirty days after EPO treatment, erythrocyte transcripts returned to basal expression levels in these patients (Fig. 5b and Supplementary Fig. 5). Thus, our longitudinal studies indicate that cf-mRNA levels reflect specific transient stimulation of the erythroid cell line.
As another approach to study in vivo changes in cf-mRNA upon perturbation of a cell lineage, we collected samples from three healthy patients who received G-CSF treatment, a well-known pro-survival factor for neutrophilic granulocytes. Blood was drawn before the treatment and at 1, 4, and 10 days after G-CSF stimulation (the 10-day time point and CBC could only be obtained for two patients, see Methods). As expected, neutrophil count increased after G-CSF treatment, peaking at day 4, and returned to basal levels by day 10 (Fig. 5c). Neutrophil-specific transcripts in plasma cf-mRNA showed a bimodal increase after G-CSF treatment for all patients (Fig. 5c and Supplementary Fig. 5B, C). Neutrophil progenitor-specific transcripts increased in cf-mRNA coinciding with the peak in neutrophil counts likely as a consequence of G-CSF-mediated mobilization of granulocytes from the BM into circulation (Fig. 5c and Supplementary Fig. 5B). However, mature neutrophil transcripts rapidly increase in cf-mRNA one day after the treatment, foreshadowing the peak of neutrophil counts (Fig. 5c and Supplementary Fig. 5C). This suggests a direct and transient transcriptional response of neutrophils to G-CSF. Indeed, transcripts previously reported both in vivo and in vitro to increase (e.g., IRAK3) or decrease (e.g., IFIT1) in neutrophils in response to G-CSF, followed the expected trend24 (Fig. 5d). Altogether, our results indicate that cf-mRNA reflects dynamic cell-type-specific transcriptional responses to stimulation.
Discussion
The growing interest to identify non-invasive alternatives to standard tissue biopsies has generated enormous attention for liquid biopsies over the last decade. Initial advances in cfDNA technology have paved the way for the development of clinically applicable cf-NA-based biomarkers25–27. cfDNA offers potential advantages compared to invasive tissue biopsies; however, cfDNA analyses largely rely on mutations, polymorphisms, or structural variations, compromising its use in disease and physiological scenarios not associated with genetic differences. To partially circumvent these limitations, cfDNA methylation analyses have recently been used as a proxy of tissue-specific gene expression, but further work is needed to validate this approach28. For RNA-based non-invasive biomarkers, non-coding RNAs including miRNA and lncRNA have been studied extensively in multiple diseases29. While these developments in cell-free RNA are intriguing, functional annotation of non-coding RNAs remains poorly characterized. In contrast, the cf-mRNA transcriptome provides direct access to both genetic information and information pertaining to the tissue of origin and its physiology. For instance, we have shown that genetic alterations in cf-mRNA provide valuable information for monitoring allografts, and similar approaches have shown their value in diagnosing fetal chromosomal abnormalities30. Given that several studies have identified tumor-derived transcripts in the circulation14, the genetic information captured by cf-mRNA is of particular interest in cancer diagnosis and monitoring. In addition, cf-mRNA provides tissue-specific transcripts that reveal functional information pertaining to the tissue of origin. While further experiments in larger cohorts will be necessary to determine the clinical utility of cf-mRNA, we showed that cf-mRNA captures transcripts that reveal BM physiology. Similarly, previous studies have reported transcripts in circulation encoding functional information of the liver, brain, immune system, or fetal development7,31,32. Therefore, cf-mRNA has the capability of integrating functional and genetic information of tissues, highlighting this analyte’s unique potential as a non-invasive biomarker.
Another key aspect of non-invasive approaches is that by eliminating the need for surgical tissue acquisition they enable repeated, longitudinal assessment of a patient’s disease state over time. This could be particularly relevant in clinical settings, such as monitoring of treatment in cancer patients, where biopsy of affected tissue remains the accepted reference method. In this regard, our longitudinal cf-mRNA profiling data provides evidence for circulating transcript snapshots of gene expression profiles in tissues such as BM. Longitudinal cf-mRNA monitoring may allow non-invasive temporal delineation of BM ablation efficiency, early detection of transplant engraftment, and monitoring of BM reconstitution. For example, in MM patients, cf-mRNA profiling integrates temporal measurement of clonal Ig transcripts generated by malignant plasma cells in the BM, with detailed BM-lineage transcriptional activity and establishment of a new immune profile. The comprehensive picture revealed by cf-mRNA profiling provides additional relevant information compared to other non-invasive tests commonly used in this malignancy, such as clonal antibody detection in serum of MM patients. Indeed, given the challenging and subjective quantification and characterization of these antibodies, BM biopsies remain a common practice in the therapy management of MM patients33. In addition, unlike antibody detection, cf-mRNA profiling has the potential for early identification of suboptimal BM reconstitution, as shown by the lack of development of megakaryocyte lineage in AML patient 2. While our study is based on a limited number of patients, our data provide promising initial proof of concept of using cf-mRNA profiling to monitor BM activity, which could lead to improved therapeutic management of patients with BM disease, and eventually alleviate the need for invasive BM biopsies.
Finally, understanding the mechanisms underlying the presence of mRNA transcripts in circulation is essential to interpret their clinical value. For example, cfDNA is expected to originate primarily from dying cells13; therefore, the use of this liquid biopsy likely relies on scenarios associated with cell death. While further experiments will be necessary to formally rule out the hypothesis of cell turn over as the exclusive source of cf-mRNA in vivo, our data suggest that changes in cf-mRNA levels are influenced by transcriptional changes in living cells during maturation, proliferation, and response to stimuli, without requiring cell death. For example, we showed that melphalan-induced apoptosis did not significantly increase the levels of cf-mRNA. In contrast, a large increase of transcripts in circulation was observed during BM reconstitution and upon stimulation with well-known pro-survival and antiapoptotic growth factors. Supporting our interpretation, in vitro studies indicate that extracellular mRNA levels and composition change upon cellular stimulation34,35 and that living cells can secrete RNA molecules encapsulated in vesicles. Additionally, our longitudinal clinical studies demonstrate that the circulating transcriptome is a dynamic metric that allows constant measurement of tissue function over time. Alternatively, cfDNA methylation and mutation events are less dynamic and likely provide limited information on tissue homeostasis and disruption. In summary, cf-mRNA profiling may provide richer molecular content compared to other non-invasive biomarkers and constitutes a unique non-invasive interrogation of tissue function in scenarios such as monitoring of disease and drug engagement and response in patients.
Methods
Samples and patients
MM patients eligible for autologous marrow transplantation were recruited from the Scripps Bone Marrrow Transplant Center. Patients with non-secretory disease or plasma cell leukemia were excluded. Three total patients were enrolled with daily blood draws collected throughout the cytoreductive conditioning regiment and subsequent hospital stay. High-dose melphalan was used to ablate the marrow over a 2-day conditioning regiment, followed by transplantation of HSCs. Sequential daily collections discontinued the day of hospital discharge. Follow-up BM biopsy occurred between 60 and 90 days. Complete blood counts (CBCs) were collected as a part of the study. Plasma was processed within 2 h of blood collection and stored at −80 °C. Patient characteristics are described in Supplementary Table 6.
EPO-treated patients were recruited for study enrollment provided they were administered erythropoietin as part of routine medical care. Potential patients were excluded if they were (1) currently on any anti-cancer therapy; (2) had active hemolysis from any cause, or (3) were pregnant. Patients were consented and enrolled from the Renal and Hematology/Oncology Clinics at Scripps Clinic Cancer Center. Per standard clinical care, a single dose of EPO was administered per month. Blood was collected at day 0 (before administration of EPO), and at days 1, 4, and 10 after administration of EPO. Days 4 and 10 collections were allowed for ±1 day adjustment to accommodate patients’ schedules. A subset of patients consented to an expanded protocol allowing for blood collections up to day 30. CBCs were performed as well. Plasma was processed within 2 h of blood collection and stored at −80 °C for batch processing. Patient characteristics are shown in Supplementary Table 7.
Specimens from healthy controls were obtained from the San Diego Blood Bank, processed, frozen, and stored at −80 °C for batch processing.
G-CSF patients, normal healthy individuals preparing to donate peripherally harvested stem cells for allo-transplants, were recruited from Scripps and enrolled as part of our G-CSF cohort. In total, three patients were consented and donated blood during their stem cell mobilization. Patient characteristics are shown in Supplementary Table 8. Blood was collected at day 0 (before administration of G-CSF), and at days 1, 4, and 10 after administration of G-CSF. Day 4 and 10 collections were allowed for ±1 day adjustment to accommodate patients’ schedules and additionally, the day 10 collection was optional. CBCs were performed for each sample. Samples were processed within 2 h of blood collection and stored at −80 °C for batch processing.
Patients with known AML, in preparation for submyeloablative treatment and allogeneic stem cell transplantation as part of standard care, were recruited for daily blood draws throughout their treatment and stem cell transplant. Three patients were enrolled in our study (characteristics in Supplementary Table 9), and submyeloablative treatment were generally 6 days, using a combination of fludarabine and melphalan to obtain a partial ablation of the marrow, prior to transplantation. HSCs obtained from a single donor, were administered on day 0, and blood draws were continued through the hospital stay. Patient 3 was discharged by day ~15 and in-hospital collections were limited to day 45 post transplant. Follow-up routine BM biopsies were performed. CBCs were collected. cf-mRNA was sequenced every 3 days. Plasma was processed within 2 h of blood collection and stored for batch processing.
Patient consent
All studies were approved by their respective institutional IRBs and patients consented according to submitted study protocols. We have complied with all relevant ethical regulations. Molecular Stethoscope maintained approval for blood collection and research through Western IRB Protocol #20162748, under which healthy control samples were collected. In collaboration with the Scripps Cancer Center and the Blood & Marrow Transplant Program at Scripps Green Hospital, G-CSF and EPO studies were conducted under Scripps Institutional Review Board-approved protocol IRB-16-6808. Our studies involving hematopoietic BM transplants, for both MM and AML, were approved by and conducted in accordance with Scripps IRB Protocol IRB-17-6953, in collaboration with the same groups.
Sample processing
Blood samples were collected in EDTA tubes (BD #366643) for plasma processing or in BD Vacutainer red-top clotting tubes (BD #367820) for serum processing. The biofluid used in each experiment is indicated in the main text as well in the corresponding cohort details in this section. Blood samples were kept at room temperature and samples were processed within 2 h after blood draw. Plasma and serum volume ranging from 500 μl to 1 ml was used for the extractions. Samples were first centrifuged at 1900 × g for 10 min. Plasma and serum were separated into new tubes. To remove cell debris, we subsequently centrifuged serum/plasma at 16,000 × g. For cancer patient plasma samples (MM and AML), the second centrifugation step was performed at 6000 × g. Plasma/serum samples were immediately frozen and stored at −80 °C. Freeze/thaw cycles were avoided. Buffy coat samples were obtained by isolating the buffy coat layer enriched in white blood cells after initial centrifugation of blood samples. Nucleic acids were isolated from plasma/serum using the Circulating Nucleic Acid Kit (Qiagen). ERCC RNA Spike-In Mix (Thermo Fisher Scientific, Cat. #4456740) was added during the extraction process as an exogenous spike-in control according to manufacturer’s instruction (Ambion). Nucleic acids from WB and buffy coat samples were extracted with TRIzol LS (Thermo Fisher) following the manufacturer’s instructions. Subsequently, RNA and cf-RNA samples were incubated for 25 min with 3 μl of the inhibitor resistant rDNase (Turbo DNase, Invitrogen) to eliminate any remnant DNA and concentrated afterwards. RNA was eluted in 15 μl of RNase free water. The amount, size, and integrity of cf-RNA was estimated by running 1 μl of the sample in an Agilent RNA 6000 Pico chip using a 2100 Bioanalyzer (Agilent Technologies) and confirmed by quantitative PCR (qPCR). Twenty-five to thirty percent of the cf-RNA eluate was converted to cDNA using random hexamers, NGS libraries were generated and whole-exome was captured prior to Illumina sequencing. Libraries were quantified by qPCR with Kapa Quantification Kit (Kapa) and in a Quantifluor (Agilent Quantus Fluorometer, Promega) using QuantiFluor ONE dsDNA Kit (Promega), and library size was checked on the Bioanalyzer (Agilent Technologies) using high-sensitivity DNA chips (Agilent Technologies). Samples were pooled and sequenced on a NextSeq 500 (Illumina) platform according to the manufacturer’s instructions.
Sequence data processing, alignment, and quantification
Base calling was performed on an Illumina BaseSpace platform, using the FASTQ Generation Application. Adaptor sequences are removed and low-quality bases trimmed, using cutadapt (v1.11). Reads shorter than 15 base pairs were excluded from subsequent analysis. Read sequences are then aligned to the human reference genome GRCh38 using STAR (v2.5.2b) with GENCODE version 24 gene models. Duplicated reads are removed by invoking the samtools (v1.3.1) rmdup command. Gene expression levels were inferred from de-duplicated BAM files using RSEM (v1.3.0).
Differential expression analysis
Differential expression analysis between different conditions was performed using DESeq2 (v1.12.4). RSEM-estimated read counts are used as input for DESeq2. Genes with fewer than 20 reads across the samples are excluded from this analysis. Potential Gene Ontology enrichment and involvement on biological pathways of genes were examined using the R package limma (v3.28.21) and IPA software (Qiagen)
Cell-type-specific genes
Tissue (cell-type)-specific genes are defined as genes that show much higher expression in a particular tissue (cell type) compared to other tissues (cell types). Information about tissue (cell-type) transcriptome expression levels was obtained from the following two public databases: GTEx36 for gene expression across 51 human tissues and Blueprint Epigenome37 for gene expression across 56 human hematopoietic cell types. For each gene, the tissues (cell types) were ranked by their expression of that particular gene, and if the expression in the top tissue (cell type) is >20-fold higher than all the other tissues (cell types), the gene was considered specific to the top tissue (cell type). For the establishment of BM-enriched transcripts, we performed RNA-Seq on a commercial human BM total RNA sample. Subsequently, BM transcriptome was compared to WB transcriptome to identify genes enriched in BM and WB transcriptomes (fold change >5).
Immunoglobulin gene repertoire in MM patients
For clone-type assembly, we performed de novo transcriptome assembly using Trinity. Next, the assembled contigs were compared to Ig gene annotation database IMGT38 using igBLAST (v2.5.1) to identify the V(D)J combinations. To quantify the relative abundance of variable region genes, we collected reads that were either unaligned to the human reference genome or aligned to an annotated Ig gene by STAR and add them to the sequences in the IMGT database using igBLAST. Relative abundance was calculated as the ratio of the number of reads mapped to a particular Ig gene over the total number of reads mapped to any Ig gene.
Unsupervised clustering
Genes that met the following two criteria were selected for clustering: (1) the maximum expression across time points higher than 50 TPM; (2) the ratio of the highest expression over the lowest was >5. For each of the selected genes, the expression values were normalized by dividing each value by the maximum value across all time points. The purpose of this normalization was to bring all the genes to a comparable scale and focus on their relative changes across time points instead of their absolute expression levels. K-means and hierarchical clustering were then performed to find genes that share similar temporal expression patterns.
Non-negative matrix factorization
Genes whose expression was lower than 20 TPM in all samples were excluded from the decomposition analysis. For each of the remaining genes, the expression values were normalized by dividing each value by the maximum value across all samples. The purpose of this normalization step is to bring all the genes to a comparable scale. NMF was then performed on the normalized values to decompose the genes into 8–12 components. NMF decomposition was implemented by invoking the “decomposition.NMF” class in the sciki-learn Python library. NMF decomposition creates groups of genes (components) sharing similar expression patterns (correlated across samples) in an unsupervised manner, thereby revealing underlying structures within the data. In order to better annotate the discovered components, we selected genes enriched in a particular component (i.e., those genes that have the highest loadings within the component) and examined (1) their expression levels across 51 human tissues in GTEx; (2) their expression levels across 55 human hematopoietic cell types from the Blueprint Epigenome consortium; (3) their Gene Ontology functional enrichment. If most of these genes show high expression in a certain cell type (e.g., platelet) or are enriched in certain biological processes (e.g., platelet activation and coagulation), we will designate the component accordingly (e.g., megakaryocyte component). By integrating those three sources of information, we are able ascertain the tissue/cell-type origin for most components.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Guillermo Elias, Richard Rava, Teresa Wright, and John Sninsky for conceptual discussions and critical reading of the manuscript. We also thank Ruben Rodrigues and Brian Read at the Development Research Services of the San Diego Blood Bank for assistance with sample sourcing.
Source data
Author contributions
Conceptualization: M.N., A.I.; methodology: A.I., N.S.S., J.Z., Y.Z.; investigation: A.I., V.H., A.D.A., A.P.K., L.G., J.R.P., and I.P.; writing: A.I., J.Z., S.T.; samples and resources: J.A., T.S.N., M.N., J.M., D.S.; funding acquisition: T.S.N., M.N., J.A., S.R.Q; visualization: J.Z., S.T., Y.Z., A.D.A., A.I.; formal analysis: J.Z., Y.Z; project administration: A.I., J.A.; supervision: A.I., M.N.
Data availability
RNA-Seq datasets have been deposited online in Sequence Read Archive (SRA) under accession numbers PRJNA517339. Source data underlying Figs. 1B–F, 4A, 5A, C and Supplementary Figs. 1D, E are provided as a Source Data file.
Code availability
Custom code used during the current study are also available at Bitbucket https://MS_JialiZhuang@bitbucket.org/MS_JialiZhuang/naturecomm2019-related-codes.git.
Competing interests
A.I., Y.Z., N.S.S., J.Z., J.R.P., V.H., S.T., L.G., I.P., A.D.A., A.P.K., J.A., D.S., T.S.N., S.R.Q., and M.N. declare a competing interest as stakeholders, past or current employees at Molecular Stethoscope, Inc., or members of its scientific advisory board. J.M. declares no competing interests.
Footnotes
Peer review information Nature Communications thanks Irene Ghobrial and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Arkaitz Ibarra, Jiali Zhuang, Yue Zhao.
Contributor Information
Arkaitz Ibarra, Email: aibarra@molecularstethoscope.com.
Michael Nerenberg, Email: mnerenberg@molecularstethoscope.com.
Supplementary information
Supplementary information is available for this paper at 10.1038/s41467-019-14253-4.
References
- 1.Pös O, Biró O, Szemes T, Nagy B. Circulating cell-free nucleic acids: characteristics and applications. Eur. J. Hum. Genet. 2018;26:937–945. doi: 10.1038/s41431-018-0132-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bianchi DW, Chiu RWK. Sequencing of circulating cell-free DNA during pregnancy. N. Engl. J. Med. 2018;379:464–473. doi: 10.1056/NEJMra1705345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA. 2008;105:16266–16271. doi: 10.1073/pnas.0808319105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.De Vlaminck I, et al. Circulating cell-free DNA enables noninvasive diagnosis of heart transplant rejection. Sci. Transl. Med. 2014;6:241ra277. doi: 10.1126/scitranslmed.3007803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fan HC, et al. Non-invasive prenatal measurement of the fetal genome. Nature. 2012;487:320–324. doi: 10.1038/nature11251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Snyder TM, Khush KK, Valantine HA, Quake SR. Universal noninvasive detection of solid organ transplant rejection. Proc. Natl Acad. Sci. USA. 2011;108:6229–6234. doi: 10.1073/pnas.1013924108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Koh W, et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc. Natl Acad. Sci. USA. 2014;111:7361–7366. doi: 10.1073/pnas.1405528111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Camunas-Soler J, et al. Noninvasive prenatal diagnosis of single-gene disorders by use of droplet digital PCR. Clin. Chem. 2018;64:336–345. doi: 10.1373/clinchem.2017.278101. [DOI] [PubMed] [Google Scholar]
- 9.Ngo TTM, et al. Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science. 2018;360:1133–1136. doi: 10.1126/science.aar3819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fernandez-Mercado M, et al. The circulating transcriptome as a source of non-invasive cancer biomarkers: concepts and controversies of non-coding and coding RNA in body fluids. J. Cell. Mol. Med. 2015;19:2307–2323. doi: 10.1111/jcmm.12625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhou R, et al. The decade of exosomal long RNA species: an emerging cancer antagonist. Mol. Cancer. 2018;17:75. doi: 10.1186/s12943-018-0823-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wan JCM, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer. 2017;17:223–238. doi: 10.1038/nrc.2017.7. [DOI] [PubMed] [Google Scholar]
- 13.Lui YY, et al. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin. Chem. 2002;48:421–427. [PubMed] [Google Scholar]
- 14.Zaporozhchenko IA, Ponomaryova AA, Rykova EY, Laktionov PP. The potential of circulating cell-free RNA as a cancer biomarker: challenges and opportunities. Expert Rev. Mol. Diagn. 2018;18:133–145. doi: 10.1080/14737159.2018.1425143. [DOI] [PubMed] [Google Scholar]
- 15.Shah R, Patel T, Freedman JE. Circulating extracellular vesicles in human disease. N. Engl. J. Med. 2018;379:958–966. doi: 10.1056/NEJMra1704286. [DOI] [PubMed] [Google Scholar]
- 16.Kim Kyoung Mi, Abdelmohsen Kotb, Mustapic Maja, Kapogiannis Dimitrios, Gorospe Myriam. RNA in extracellular vesicles. Wiley Interdisciplinary Reviews: RNA. 2017;8(4):e1413. doi: 10.1002/wrna.1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Li Mu, Zeringer Emily, Barta Timothy, Schageman Jeoffrey, Cheng Angie, Vlassov Alexander V. Analysis of the RNA content of the exosomes derived from blood serum and urine and its potential as biomarkers. Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369(1652):20130502. doi: 10.1098/rstb.2013.0502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhu X, Ching T, Pan X, Weissman SM, Garmire L. Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization. PeerJ. 2017;5:e2888. doi: 10.7717/peerj.2888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gaujoux R, Seoighe C. Semi-supervised nonnegative matrix factorization for gene expression deconvolution: a case study. Infect. Genet. Evol. 2012;12:913–921. doi: 10.1016/j.meegid.2011.08.014. [DOI] [PubMed] [Google Scholar]
- 20.Nguyen CB, et al. Whole blood gene expression in adolescent chronic fatigue syndrome: an exploratory cross-sectional study suggesting altered B cell differentiation and survival. J. Transl. Med. 2017;15:102. doi: 10.1186/s12967-017-1201-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Smith JA. Exercise, training and red blood cell turnover. Sports Med. 1995;19:9–31. doi: 10.2165/00007256-199519010-00002. [DOI] [PubMed] [Google Scholar]
- 22.Elliott S, Pham E, Macdougall IC. Erythropoietins: a common mechanism of action. Exp. Hematol. 2008;36:1573–1584. doi: 10.1016/j.exphem.2008.08.003. [DOI] [PubMed] [Google Scholar]
- 23.Cooper MC, Levy J, Cantor LN, Marks PA, Rifkind RA. The effect of erythropoietin on colonial growth of erythroid precursor cells in vitro. Proc. Natl Acad. Sci. USA. 1974;71:1677–1680. doi: 10.1073/pnas.71.5.1677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Drewniak A, et al. Invasive fungal infection and impaired neutrophil killing in human CARD9 deficiency. Blood. 2013;121:2385–2392. doi: 10.1182/blood-2012-08-450551. [DOI] [PubMed] [Google Scholar]
- 25.Bianchi DW. Circulating fetal DNA: its origin and diagnostic potential—a review. Placenta. 2004;25(Suppl. A):S93–S101. doi: 10.1016/j.placenta.2004.01.005. [DOI] [PubMed] [Google Scholar]
- 26.Jiang P, Lo YMD. The long and short of circulating cell-free DNA and the Ins and Outs of molecular diagnostics. Trends Genet. 2016;32:360–371. doi: 10.1016/j.tig.2016.03.009. [DOI] [PubMed] [Google Scholar]
- 27.Chang Y, et al. Review of the clinical applications and technological advances of circulating tumor DNA in cancer monitoring. Ther. Clin. Risk Manag. 2017;13:1363–1374. doi: 10.2147/TCRM.S141991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tanić M, Beck S. Epigenome-wide association studies for cancer biomarker discovery in circulating cell-free DNA: technical advances and challenges. Curr. Opin. Genet. Dev. 2017;42:48–55. doi: 10.1016/j.gde.2017.01.017. [DOI] [PubMed] [Google Scholar]
- 29.Anfossi S, Babayan A, Pantel K, Calin GA. Clinical utility of circulating non-coding RNAs - an update. Nat. Rev. Clin. Oncol. 2018;15:541–563. doi: 10.1038/s41571-018-0035-x. [DOI] [PubMed] [Google Scholar]
- 30.Lo YM, et al. Plasma placental RNA allelic ratio permits noninvasive prenatal chromosomal aneuploidy detection. Nat. Med. 2007;13:218–223. doi: 10.1038/nm1530. [DOI] [PubMed] [Google Scholar]
- 31.Pan W, et al. Simultaneously monitoring immune response and microbial infections during pregnancy through plasma cfRNA sequencing. Clin. Chem. 2017;63:1695–1704. doi: 10.1373/clinchem.2017.273888. [DOI] [PubMed] [Google Scholar]
- 32.Enache LS, et al. Circulating RNA molecules as biomarkers in liver disease. Int. J. Mol. Sci. 2014;15:17644–17666. doi: 10.3390/ijms151017644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rajkumar SV, et al. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. Lancet Oncol. 2014;15:e538–e548. doi: 10.1016/S1470-2045(14)70442-5. [DOI] [PubMed] [Google Scholar]
- 34.Eldh M, et al. Exosomes communicate protective messages during oxidative stress; possible role of exosomal shuttle RNA. PLoS ONE. 2010;5:e15353. doi: 10.1371/journal.pone.0015353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gray WD, et al. Identification of therapeutic covariant microRNA clusters in hypoxia-treated cardiac progenitor cell exosomes using systems biology. Circ. Res. 2015;116:255–263. doi: 10.1161/CIRCRESAHA.116.304360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Melé M, et al. Human genomics. The human transcriptome across tissues and individuals. Science. 2015;348:660–665. doi: 10.1126/science.aaa0355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Stunnenberg HG, Hirst M, Consortium IHE. The International Human Epigenome Consortium: a Blueprint for Scientific Collaboration and Discovery. Cell. 2016;167:1145–1149. doi: 10.1016/j.cell.2016.11.007. [DOI] [PubMed] [Google Scholar]
- 38.Lefranc MP, et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. 2009;37:D1006–D1012. doi: 10.1093/nar/gkn838. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
RNA-Seq datasets have been deposited online in Sequence Read Archive (SRA) under accession numbers PRJNA517339. Source data underlying Figs. 1B–F, 4A, 5A, C and Supplementary Figs. 1D, E are provided as a Source Data file.
Custom code used during the current study are also available at Bitbucket https://MS_JialiZhuang@bitbucket.org/MS_JialiZhuang/naturecomm2019-related-codes.git.