Skip to main content
PLOS One logoLink to PLOS One
. 2022 Feb 28;17(2):e0264631. doi: 10.1371/journal.pone.0264631

Machine learning predicts cancer subtypes and progression from blood immune signatures

David A Simon Davis 1, Sahngeun Mun 1, Julianne M Smith 1, Dillon Hammill 2, Jessica Garrett 2, Katharine Gosling 2, Jason Price 2, Hany Elsaleh 3, Farhan M Syed 1,4, Ines I Atmosukarto 1,2, Benjamin J C Quah 1,4,*
Editor: Afsheen Raza5
PMCID: PMC8884497  PMID: 35226704

Abstract

Clinical adoption of immune checkpoint inhibitors in cancer management has highlighted the interconnection between carcinogenesis and the immune system. Immune cells are integral to the tumour microenvironment and can influence the outcome of therapies. Better understanding of an individual’s immune landscape may play an important role in treatment personalisation. Peripheral blood is a readily accessible source of information to study an individual’s immune landscape compared to more complex and invasive tumour bioipsies, and may hold immense diagnostic and prognostic potential. Identifying the critical components of these immune signatures in peripheral blood presents an attractive alternative to tumour biopsy-based immune phenotyping strategies. We used two syngeneic solid tumour models, a 4T1 breast cancer model and a CT26 colorectal cancer model, in a longitudinal study of the peripheral blood immune landscape. Our strategy combined two highly accessible approaches, blood leukocyte immune phenotyping and plasma soluble immune factor characterisation, to identify distinguishing immune signatures of the CT26 and 4T1 tumour models using machine learning. Myeloid cells, specifically neutrophils and PD-L1-expressing myeloid cells, were found to correlate with tumour size in both the models. Elevated levels of G-CSF, IL-6 and CXCL13, and B cell counts were associated with 4T1 growth, whereas CCL17, CXCL10, total myeloid cells, CCL2, IL-10, CXCL1, and Ly6Cintermediate monocytes were associated with CT26 tumour development. Peripheral blood appears to be an accessible means to interrogate tumour-dependent changes to the host immune landscape, and to identify blood immune phenotypes for future treatment stratification.

Introduction

Carcinogenesis is a complex and multi-layered process involving various cellular and tissue networks. Although tumours can be recognised by the immune system, resulting in their growth suppression or elimination, they can also evolve to escape and/or suppress immune responses resulting in tumour growth and metastasis [1]. This interplay between tumour growth and the immune system is accompanied by specific perturbations to the immune landscape, manifested as changes to leukocyte frequencies and to the concentrations of immune soluble factors, both locally at the tumour site and systemically [2, 3]. Understanding the relationship between immune landscape changes and cancer subtype, disease progression and response to treatment has the potential to advance the development of new treatments, personalise therapies and improve outcomes.

Conventional cancer treatment strategies are comprised of the local modalities of surgery and radiotherapy, and the systemic approaches of endocrine treatment, cytotoxic chemotherapy, molecular targeted therapy and immunotherapy [4]. Whilst the latter two systemic therapies have, in recent years, brought the promise of dramatically improving cancer outcomes, their effect remains limited to only a subset of cancer patients [5]. Generic approaches to cancer treatment based only on tumour histology do not take into consideration the complexity of the cellular and tissue networks involved, and as such often result in variable outcomes. Moving beyond generic approaches and improving the outcome of novel immunotherapies requires robust preclinical interrogations to navigate the increasingly crowded arsenal of treatments. Preclinical models enable tumour-immune system networks to be studied in a controlled manner to establish clinically translatable workflows. Studying the peripheral immune system to identify tumour subtype-specific immune dysregulation signatures that could be associated with tumour growth and treatment outcome is one such avenue of focus in this approach.

The immunome–the genes and proteins that constitute the immune system and an ever-increasing number of immune cell types–is defined through complex patterns of antigen expression representing quantifiable metrics of an individual’s immune landscape. Over a hundred different phenotypes of immune cells, representing the cellular immunome, have been identified. These include parental cell types such as T cells, B cells, natural killer (NK) cells, NK T cells, dendritic cells (DCs), granulocytes and monocytes. Each of these parental cell types can give rise to multiple subtypes defined by activation and/or suppressive functions. The soluble immunome consists of the chemokine system, which comprises nearly 50 chemokine ligands, and the cytokine system that includes over 30 glycoproteins regulating the functions of the immune system [6, 7]. Analysis of the complex interactions of these parameters is now possible, with machine learning (ML) techniques offering a means to generate models of multivariate data and facilitate the prediction of disease progression and treatment outcomes. ML techniques can also help with inference of underlying biological mechanisms based on explanatory algorithms [8].

This study examines a multiparameter immune phenotyping approach, using blood cellular and soluble immune signatures, in a tightly controlled preclinical environment of two well-established syngeneic tumour models: a triple-negative breast cancer, 4T1, and a colorectal cancer, CT26, both implanted subcutaneously in BALB/c mice. These tumour models were selected to allow the assessment of different immune signatures, as the 4T1 line is known to generate highly aberrant immune signatures [9, 10], whereas the CT26 line generates more subtle immune phenotype changes [11, 12]. To monitor changes to the systemic cellular and soluble immune signatures of tumour-bearing animals, a small volume of blood was obtained from the animals’ tail veins in a minimally invasive, feature-rich and high-throughput strategy for clinical translation. Multiparameter flow cytometry was used to generate cell-surface immune signatures, while soluble immune profiles were readily obtained from the plasma using a bead-based immunoassay based on the same basic principles as sandwich immunoassays. This approach allows relatively high throughput generation of data and was coupled with statistical modelling to make predictions and inferences about tumour outcomes and biology. Predictive modelling and feature ranking was performed using Random Forest models, in conjunction with SHapley Additive Explanations and correlation matrices, to make inferences about the underlying immune biology of the tumour models. This relatively simple strategy successfully generated reasonably accurate models that are able to (i) confirm the presence of a tumour, (ii) differentiate between tumour types and (iii) predict current and future tumour burden, and highlighted that both tumour models generate unique blood immune signatures.

This study aims to assess the utility of blood cellular and soluble immune signatures coupled with ML to predict cancer subtype and tumour progression in a tightly controlled preclinical environment. It provides evidence of potential clinical application of immune signature-based systemic immune phenotyping to improve overall cancer diagnosis and surveillance. The study also identifies key immune features for predictive modelling and possible candidate parameters for therapeutic intervention based on those models.

Methods

To monitor changes to the systemic cellular and soluble immune signatures of tumour-bearing animals, a small volume of blood was obtained from the animals’ tail veins in a minimally invasive, feature-rich and high-throughput strategy for clinical translation. Multiparameter flow cytometry was used to generate cell-surface immune signatures, while soluble immune profiles were obtained from the plasma using a bead-based immunoassay established on the same basic principles as sandwich immunoassays. This approach allows relatively high throughput generation of data and was coupled with statistical modelling to make predictions and inferences about tumour outcomes and biology. Predictive modelling and feature ranking was performed using Random Forest models, in conjunction with SHapley Additive Explanations and correlation matrices, to make inferences about the underlying immune biology of the tumour models. This relatively simple strategy successfully generated reasonably accurate models that are able to (i) confirm the presence of a tumour, (ii) differentiate between tumour subtypes and (iii) predict current and future tumour burden, and highlighted that both tumour models generate unique blood immune signatures.

Animals

Female BALB/c mice aged between 6–10 weeks sourced from the Australian Phenomics Facility (ANU) were used throughout the study. Animals were fed ad libitum, housed in a specific-pathogen free environment and used under strict adherence to protocols approved by the institutional Animal Experimentation Ethics Committee (AEEC), ANU, under protocols A2017/43 and A2020/39. At experimental end points, animals were euthanised by cervical dislocation according to AEEC approved procedures.

Cell lines

The 4T1-Luc2 (4T1) mammary carcinoma and CT26 colorectal carcinoma (kindly gifted by Dr. Aude Fahrer, ANU) cell lines were originally sourced from American Type Culture Collection (ATCC) and confirmed clear of pathogens by Cerberus Sciences (ISO 9001 Licence No. AU843-QC) before use in animals. Both adherent cell lines were cultured in RPMI-1640 (11875093, ThermoFisher Scientific) supplemented with 10% (v/v) fetal bovine serum (F8192, Sigma), 2mM L-glutamine (250300810, ThermoFisher Scientific), 10mM HEPES (15630080, ThermoFisher Scientific), 1mM sodium pyruvate (11360070, ThermoFisher Scientific) and 55 nM 2-mercaptoethanol (21985023, ThermoFisher Scientific), detached using warmed trypsin/0.05% (v/v) EDTA solution (15400054, ThermoFisher Scientific) then passaged and maintained at up to 70–80% confluency.

Tumour establishment

Tumour cells were injected at 1 x 105 cells in 50 μL of sterile normal saline solution subcutaneously in the right-hind flank of mice randomised across several housing cages. Fur around the injection site was removed by clippers prior to tumour inoculation. Tumours were left to grow for up to 14 days, and monitored daily to ensure wellbeing was maintained. In 21 of 98 of the 4T1-bearing mice, a single dose of the Src-inhibitor eCF506 (1914078-41-3, Sun-shine Chemical) at 0.1 (eC100), 1 (eC1000), or 10 (eC10000) mg/kg was administered i.p. 7 days post-tumour establishment, which appeared to have little, if any impact on the parameters assessed in the study, and so these mice were included to increase sample size (S1 File). At end-point, the mice were humanely sacrificed by cervical dislocation, and their tumours excised and weighed.

Blood collection

At 7/8 (referred to as day 7) and 14 days post tumour establishment, mice were briefly heated (~4 minutes) under a lamp to promote vasodilation, placed in a restraint, their tail vein punctured with a 29G needle, and a 20 μL sample of blood collected into 4 μL of citrate-dextrose solution (ACD, Sigma) anticoagulant. A 5 μL sample of this blood was immediately used for antibody labelling and flow cytometry. The remaining blood was centrifuged at 16,000 x g for 10 minutes and 7 μL of plasma collected and stored in a sealed 96 well polypropylene microplate (249943, ThermoFisher Scientific) at -20°C for future cytokine and chemokines measurements using the LEGENDplex assays.

Immunophenotyping of blood leukocytes by flow cytometry

The 5 μL blood samples for cellular analysis were initially incubated for 10 minutes on ice in wells of a v-bottom 96-well microplate with 25 μL of 5 mg/mL TruStain FcXTM (anti-mouse CD16/32) antibody (101320, Biolegend) diluted in 1x RBC BD Pharm Lyse lysing buffer (555899, BD Bioscience). Samples were then incubated with 25 μL of 1x RBC BD Pharm Lyse containing fluorescent antibodies listed in S1 Table for 30 minutes on ice in the dark. In addition, 5000 Flow-Count Fluorospheres (7547053, Beckman Coulter) were spiked in to each sample with the fluorescent antibodies to allow enumeration of total cells per sample. Cells were then washed twice by resuspension in a total of 200 μL of PBS containing 5 mM EDTA, sedimentation by centrifugation at 300 x g for 5 minutes and flicking off supernatant. Samples were then resuspended in 50 μL of PBS containing 5 mM EDTA, 1% BSA (w/v) and 1 μg/ml of the dead cell dye Hoechst 33258 ready for flow cytometry.

LEGENDplex assay

Frozen plasma samples were thawed on ice, then assayed using the Macrophage/microglial (Mac/Mic) 13-plex LEGENDplex kit and the Proinflammatory (Proinflam) 13-plex LEGENDplex Kit (740451 and 780846, Biolegend). Assay methods were as described by the manufacturer, except the assay was scaled down to use 6 μL of sample/standards for each kit as follows. Seven μL of each plasma sample was diluted in 7 μL of kit assay buffer and 6 μL of this mix (or 6 μL of pre-titrated kit standard) was added to 12 μL of kit capture beads (pre-diluted 1:1 (v/v) with assay buffer) in a v-bottom 96-well microplate, and incubated with shaking for 2 hours. Beads were then pelleted at 250 x g for 5 minutes and the supernatant flicked off. Beads were washed with 50 μL of kit wash buffer and pelleted and supernatant removed as above. Twelve μL of kit biotinylated detection antibodies (pre-diluted 1:1 (v/v) in assay buffer) were then added to beads, then beads resuspended by pipetting and the mixture incubated with shaking for 1 hour at room temperature. Six μL of kit streptavidin-PE was then added to the mixture, which was incubated with shaking for a further 30 minutes. Beads were then pelleted and washed as described above and resuspended in 40 μL of kit 1x wash buffer ready for flow cytometry.

Flow cytometry

Flow cytometry was performed on a BD LSRII (BD Bioscience) flow cytometer with FACSDiva, with quality assurance performed before each experimental run using BD FACSDiva Cytometer Setup and Tracking (CS&T) beads (655051, BD Bioscience). Application Settings were applied to standardise fluorescence intensity readings between experiments and these were monitored using SpheroTM 8-peaks Rainbow Beads Fluorescence (110620, BD Bioscience). Voltages were initially setup using unlabelled RBC-lysed blood leukocytes for cellular analysis and LEGENDplex Raw Setup beads (as described by the manufacturer). BD CompBeads (552843, BD Bioscience) were labelled with selected antibodies (S1 Table) as described by the manufacturer and used as compensation controls for cellular analysis. Cell samples were acquired until a total of 2000 Flow-Count Fluorosphere beads were collected based on side scatter (log) and forward scatter (linear) plot gating. LEGENDplex beads were acquired to a total of 4000 beads. Raw Flow Cytometry Standard (FCS) files of the data are available upon request at the ANU DATA COMMONS repository (https://dx.doi.org/10.25911/6153a8ab5747c).

Flow cytometry analysis

Blood cells and LEGENDplex beads were analysed using FlowJo v10 software (BD Bioscience). A combination of manual gating and unsupervised Fast Interpolation-based t-distributed Stochastic Neighbour Embedding (FIt-SNE) analysis was use to delineate leukocyte populations, which were then named based on this analysis (S1 Fig). LEGENDplex beads were gated for each analyte as describe in S2 Fig and median PE fluorescence-intensity generated for each bead analyte. Data was then normalised as describe below for analysis.

Data normalisation and processing

To reduce the influence of inter-experimental variability on conclusions, data was normalised at several levels. Firstly, cell numbers in each flow cytometry acquisition were normalised to counting beads spiked into the sample, with each sample normalised to 2000 counting beads (a fifth of the spike load), to give the number of cells in ~2 μL of blood (“counting bead normalised” values). Secondly, these normalised counts were normalised to the mean counts from the blood of non-tumour bearing control animals within each experiment. These “nil normalised” values were used in machine learning pipelines. To get “normalised cell counts” per 2 μL of blood, for an estimate of the overall cells across the groups, the “nil normalised values” were multiplied to the overall mean of the “bead normalised cell count” from all non-tumour-bearing animals for each feature.

For the LEGENDplex assays, the raw PE median analyte values were normalised as a ratio to the mean PE median analyte values from the blood of non-tumour bearing control animals within each experiment. These “nil normalised” values were used in machine learning pipelines. To get “normalised plasma concentrations”, the “nil normalised” values were multiplied to the overall mean PE median from the blood of non-tumour bearing control animals, and concentrations interpolated using mean standard curves pooled from all experiments, with Hyperbola, 5-parameter logistic regression (5PL) and Random Forest models employed. Since 5PL models failed for many data points and Random Forest resulted in non-gaussian multi-cluster distributions, hyperbola models were used as they overcame these issues. t-distributed stochastic neighbour embedding (t-SNE) unsupervised clustering was used to monitor experimental clusters within the pooled data and helped to confirm experimental cluster minimisation using the normalisation approach. All raw and calculated data are in S1 File.

Supervised machine learning

Supervised machine learning was performed using Orange 3 software. Random Forest modelling used 500 trees, with a maximum tree depth of 3, a maximum number of features considered at each node was 4 (except when considering smaller feature numbers in which case the hyperparameter changed accordingly), subsets smaller than 5 not split, and balanced class distribution enabled in the case of classification learning since data groups were unbalanced. Missing data (that included a single sample without the 13 Proinflammatory LEGENDplex panel) was imputed using the “hot deck” 1-NN learner, which replaces the missing values with the values from the most similar example (as implemented in Orange 3 software). Initially, a learning curve was generated by plotting progressively smaller data set size (randomly generated from the entire data set) against modelling skill (assessing classification of the tumour subtype; 4T1, CT26 and Nil) to evaluate if the data set size was sufficient for the outcomes targeted (S3 Fig). This revealed the data set size at 20% appeared to plateau in modelling skill, suggesting data size was sufficient for the outcomes targeted. For the rest of the study, Random Forest model training was performed and cross-validated on 100%, 80% and/or 60% of randomized sample data and tested on any remaining data. The training data was validated using leave-one-out cross-validation. Feature ranking was done using Random Forest (built in to the Random Forest model in Orange 3 software) and the explain model function on Orange 3, which uses the SHapely Additive exPlanation (SHAP) to explain feature importance. Feature number and model fitting was optimised for classification predictions based on area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall) classification scores and for regression using, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) scores. Each train and test modelling was done a minimum of 3 times to assess variability. Once optimal features were assigned based on the above, the final predictions were modelled on all the data and results displayed using cross-validation via leave-one-out on the entire data set, either as a confusion matrix for classification analysis, or a bivariate plot to actual values for regression analysis (with Pearson correlation coefficient reported and associated p values calculated using prism software). Orange 3 workflows are provided in S2 File (for classification workflow) and S3 File (for regression workflow).

Statistical analysis

For means comparisons between Nil, CT26 and 4T1 cohorts, data was transformed using the formula Y = Log(Y+0.0001) to help normalise distributions and equalise variance, and then assessed by 2-way ANOVA using GraphPad Prism software. Multiple comparisons were performed between the cohorts for each feature using Tukey correction and p values reported to test the null hypothesis that the means are equal. For analysis of important features in tumour size, a bivariate correlation matrix was designed using the top assigned features from the machine leaning pipeline described above and Spearman’s correlation coefficients and associated p values determined using R. To determine interaction of top assigned features, a distance matrix was constructed using the absolute Spearman’s coefficients and global absolute Spearman distances summarised using multidimensional scaling with network lines (at maximum levels) using Orange 3 software (see S4 Files for Orange 3 workflow).

Results

Composition of blood immune features in cancer models reveals unique tumour immune phenotypes

To characterise tumour-bearing animal blood immune profiles, a 4T1 breast cancer cell line tumour or a CT26 colorectal cancer cell line tumour was established subcutaneously in the right-hind flank of BALB/c mice. Animals with no tumours were used as controls (Nil), benchmarking the ‘normal’ immune landscape. Tumours were left to establish and grow for 14 days and immune features assessed from a single drop of blood taken at 7 (D7) and 14 (D14) days post tumour establishment (Fig 1A). A total of 180 animals were included in the study with animal cohorts described in Fig 1B. Absolute leukocyte count per unit volume of blood were assessed using flow cytometry (S1 Fig). The cell populations were delineated using 17 leukocyte-reactive mAbs and identified using manual gating cross-checked with unsupervised dimensional reduction. The strategy also included simple light-scattering profiles to delineate lymphocytes and myeloid cells, to determine if this simple approach would be beneficial for the study aims. Blood plasma cytokine and chemokine concentrations were also assessed using two 13-plex LEGENDplex kits (S2 Fig). At the end-point (D14), solid tumours were extracted and weighed and revealed highly variable tumour mass in the two tumour models, ranging from 10 mg to >800 mg (Fig 1C).

Fig 1. Blood immune phenotyping in animal tumour models.

Fig 1

CT26 or 4T1 tumours were established and grown in female, BALB/c mice for 14 days, with blood immune phenotype determined by flow cytometry 7-(D7) and 14-(D14) days post tumour implantation, and tumours excised and weighed at end-point (D14). Animals with no tumours (Nil) were used as control to provide normal blood immune phenotype (a). A total of 180 animals were included in the study, and animals were randomly allocated to groups at D0, as indicated in (b). End-point (D14) CT26 and 4T1 tumour mass are shown in (c) with mean and SEM overlayed in black. A 20 μl of blood sample from each animal was phenotyped for immune cell populations (using cell surface marker labelling and reported as normalised cells per ~2μl of blood) and plasma analytes (using two LEGENDplex screening kits and reported as approximations of blood concentrations) at D7 and D14 by flow cytometry. Blood cell compositions at D7 (top, left panel) and D14 (bottom, left panel), and plasma analyte compositions at D7 (top, right panel) and D14 (bottom, right panel) in Nil, CT26- and 4T1-tumour bearing animals, respectively, are shown in (d). Cell data in (d) was reported as total absolute mean cell count of each population being a subset of upstream lineages. Plasma analytes were reported as a subset of total mean of analytes in the two LEGENDplex screening kits, which included the macrophage/microglial 13-plex kit (Mac/Mic) and the proinflammatory 13-plex kit (Proinflam). Three analytes overlapped in the kits, namely CCL22, CXCL1 and CCL17, and are labelled with a (1) if from the Mac/Mic panel or (2) if they are from the Proinflam panel.

To gain an overall impression of the blood immune landscape, the means of blood leukocytes and plasma factor composition were quantified across the 3 groups from the 180 animals at both D7 and D14 time points (Fig 1D), and differences further highlighted by normalising the underlying data to the mean values from Nil animals to give fold-change above normal levels (S4 Fig). This revealed a large increase in leukocytes in the blood of 4T1-bearing mice, compared to the Nil mice, a difference that increased further over time (Fig 1D). This was largely driven by expansion of myeloid cells but also a subtler trend of lymphocyte increase. In contrast, there was only a slight trend of myeloid cell increase and a concomitant trend of lymphocyte decrease in CT26-bearing animals, which became more exaggerated over time. The changes in myeloid cells in both models was largely attributed to an expansion of neutrophils and monocytes. Expansion of other minor myeloid cell populations was also apparent (Fig 1D). The initial increase in lymphocytes in 4T1-bearing mice at D7 was mostly due to an increase in B cell count, which reversed with a decrease from normal at D14 and was compensated for by slight increases in CD4 T cells, CD8 T cells and NK cells at this later time point. The decrease in blood lymphocytes in CT26-bearing mice was mainly attributed to diminishing circulating B cells. There were also changes in minor subpopulations of leukocytes in tumour-bearing animals not obvious in the compositional data due to their small numbers; these included changes to CD4 T regulatory cells, DC, macrophages and PD-L1-expressing myeloid cell populations (S4 Fig). In terms of plasma factor composition, there was a notable increase in macrophage/microglial factors in 4T1-bearing mice at D7, mainly ascribed to a large increase in G-CSF, which decreased at D14, although was still several-fold above normal levels (Fig 1D). Mice with 4T1 tumours also had a subtler increase in CXCL13 relative to normal levels, and a subtle increase in IL-6 and a subtle decrease in CXCL1 compared to CT26-bearing animals (Fig 1D and S4 Fig). In CT26-bearing mice, there was a slight rise in the proinflammatory factors in plasma, which increased marginally over time, and appeared to be due to subtle changes in a number of factors such as CCL11, CXCL1, CXCL9 and CXCL10 (Fig 1D). These changes, however, were not statistically significant from control animals (S4 Fig).

Classification of cancer models using blood immune signatures

From these initial results, it was clear that 4T1 and CT26 tumour growth results in aberrant blood immune parameters in mice, with some common changes (such as neutrophil and monocyte expansion), but also tumour-specific changes (such as the plasma factor changes), while overall changes appear to be subtler in CT26-bearing mice. To investigate how these changes might be used to predict tumour outcomes, supervised ML was used on the normalised data (S4 Fig). Random Forest was chosen as our learner, since it could be used for prediction of both our classification (tumour subtype) and regression (mass of current and future tumours) questions and has in-built feature ranking of importance in predictions allowing feature reduction and biological inference [13].

After hyperparameter tuning, Random Forest was initially used to investigate if blood immune phenotypes were unique enough to classify whether animals had no tumour (Nil), or had a CT26 or a 4T1 tumour. Our approach was to train and test the model using progressively reduced numbers of blood immune features, sorted based on importance rank. We scored the model using several prediction classification indicators (S5 Fig and Fig 2A). To train and test the model, we used data from both D7 and D14 time points, to see if there were features that could be used across time to classify a tumour-subtype. From this we found the modelling was stable and had congruent scores in both the training and test data sets across a range of features fed into the model. However, the minimum feature number needed to maintain this was 5, suggesting 5 key features could result in accurate predictions (Fig 2A). Looking at the top 21 Random Forest ranked features, there were several that were highly ranked at both the D7 and D14 time points (Fig 2B). Overall, the 5 highest ranked immune features, in descending order, were G-CSF, neutrophils, total myeloid cells, monocytes and total leukocytes. To look at how these features contributed to the model in more detail, we used the SHapley Additive exPlanations (SHAP) algorithm [8] (S5 Fig). SHAP highlighted the contribution of these 5 features: generally, as they increased, they tended to suggest a 4T1 phenotype, while there was a more complex relationship in distinguishing Nil from CT26-bearing animals. We performed dimensional reduction using t-distributed Stochastic Neighbour Embedding (t-SNE) to see if these 5 features could cluster tumour classes better than all features combined (Fig 2C). Using this unsupervised approach showed the 5 top-ranked features appeared to separate the tumour classes better than all features combined, particularly the 4T1 class. Therefore, we generated the final model incorporating these features from both time points (Fig 2D). This resulted in successful classification of all 4T1-bearing animals and most of the CT26-bearing (CA ~80%) and Nil (CA ~75%) classes (2 of each being misclassed as the other out of 72 individuals in these classes). The 5 features showed the capacity to predict class at each time point alone, but generally predicted and separated classes best at the later time point (Fig 2E and 2F). Finally, looking at their quantity in the blood of all animals showed that, while these features were all significantly higher in 4T1-bearing animals compared to CT26-bearing and Nil animals, only neutrophils and monocytes showed a significant increase in CT26-bearing mice compared to Nil (while still being lower than in 4T1-bearing mice) (Fig 2G). This highlights the association of myeloid factors with tumour presence and their potential use in tumour classification and may also suggest an underlying association of G-CSF, neutrophils and monocytes in the development of some tumours.

Fig 2. Tumour classification using blood immune phenotype.

Fig 2

Normalised blood immune features (S4 Fig) taken from the 130 animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict presence of tumour and tumour subtype (targets class being Nil, 4T1 and CT26). The model was trained initially on 80% (S5 Fig) and 60% of data, cross-validated using leave-one-out and tested using the remaining data. Modelling was done on a progressively smaller number of features, from lowest to highest ranked based on in-built Random Forest importance for class determination, and the process repeated 3 times. Model performance was assessed by several classification indicators, including area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall) with values being from 0 to 1 (and toward the latter being the best) (a). The Random Forest feature importance scores for classification of the top 21 features (ranked from lowest to highest) from the modelling are show in (b) from n = 6 modelling repeats. Based on peak modelling performance (S5 Fig), the top 5 features from both time points were compared with all features in t-distributed stochastic neighbour embedding (t-SNE) unsupervised clustering to highlight capacity of reduced features to maximise class distinction based on the overlap of groups, with dot sizing representing relative end-point tumour mass to assess for how this relates to clusters (c). These top 5 features from both time points were used to generate the final classification modelling, which was performed on the entire data set and assessed using leave-one-out cross-validation and results shown as a confusion matrix of all animals (d). The top 5 features from D7 (e) or D14 (f) samples were also used in modelling (presented as confusion matrices) and t-SNE analysis to highlight time differences. The top 5 features were plotted as estimated quantities in blood for all animals at each time point (Fig 1B) and their means and SEM displayed in yellow, and means equality tested using 2-way ANOVA on Log (y+0.0001) transformed data and multiple comparison with Tukey’s correction shown (g).

Model fitting of CT26 tumour size using blood immune signatures resulted in moderate predictability

We next wanted to see if underlying blood immune signatures could be used to predict tumour size and growth, which are often fundamental to prognosis. To do this we used the D14 end-point tumour mass as the target outcome. We first assessed whether blood immune signatures could predict current and future CT26 tumour mass with D14 and D7 blood data respectively. As with the classification approach, we trained and tested the model using progressively reduced numbers of blood immune features sorted based on importance rank, but scored the model using several regression prediction indicators (S6 Fig and Fig 3). Testing if D14 blood could predict current tumour mass, we found Random Forest modelling was stable and had similar scores in both the training and test data sets across a range of features fed into the model; however, the minimum feature number to maintain this was 3, suggesting 3 key features could result in optimal current tumour mass predictions (S6 Fig and Fig 3A). Myeloid cell populations ranked high in modelling (Fig 3B), with Ly6Cintermediate monocytes, total myeloid cells, and PD-L1-expressing Ly6C-Ly6G- (PD-L1+) myeloid cells contributing prominently to the model based on SHAP values (Fig 3C). Mice with higher numbers of these cells in the circulation typically had bigger tumours. We therefore generated the final Random Forest model with these 3 features to predict the current mass of CT26 tumour, which resulted in a significant moderate linear correlation with the actual mass (Fig 3D).

Fig 3. Predicting CT26 tumour size and growth using blood immune phenotypes.

Fig 3

Normalised blood immune features (S4 Fig) taken from the 48 CT26-bearing animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict CT26 tumour size at D14. The model was trained initially on 100%, 80% and 60% of data (S5 Fig) and cross-validated using leave-one-out and tested using the remaining data. Modelling was done on a progressively smaller number of features, from lowest to highest ranked, based on in-built Random Forest importance, and the process repeated 3 times (mean and standard error of mean shown). Model performance was assessed by several regression indicators, including the error scores, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) (which we hoped to minimise), and the coefficient of determination score R2 (which we hoped to maximise). Initially, D14 tumour size was used as the target using D14 blood samples to assess if blood immune features could predict current tumour size (a, b, c and d). Then D14 tumour size was used as the target using D7 blood samples to assess if blood immune features could predict future tumour size (e, f, g, h). Model performance was summarised showing the 60%:40%, training:testing split and equality of test and train performance score means (using the top assigned features) assessed using ANOVA (a) and (e). The Random Forest feature importance scores for regression of the top-10 features from the modelling are show in (b) and (f), and the SHAP scores of these shown in (c) and (g). Based on peak modelling performance, the top-3 features from D14 blood data (d) or top-10 features from D14 blood data (h) were used to generate the final regression modelling to predict current and future tumour mass respectively. Final modelling was performed on the entire data set and assessed using leave-one-out cross-validation and predicted mass of tumour plotted against actual tumour mass (the y-axis), in scatter-plots with dot sizing representing actual end-point tumour mass to assess for how this relates to any clusters, and the linear relationship assessed using Pearson correlation coefficient (r) and associated two-tailed p-values (d and h). Using the top-5 ranked features at each time point a correlation matrix was constructed, which displayed all pair-wise bivariate plots with loess curve fitting (lower-left half), feature names and distributions (diagonal) and Spearman’s correlation coefficient (rs) with associated p-values to test for monotonic relationships (upper-right half), which was also colour-scaled based on rs values that had p-values <0.05) (i). A distance matrix of the absolute rs (|rs|) from the correlation matrix was calculated and distances plotted in 2D using multidimensional scaling (j) and a model of the interactions summarised in (k).

Testing if D7 blood immune features could predict future D14 CT26 tumour mass, we found the minimum feature number to maintain modelling peaked at 10 features (S6 Fig and Fig 3E). While myeloid cells were an important feature, there were also several plasma immune factors, notably CCL17, CXCL10, CXCL1 and CXCL13, that had high importance (Fig 3F and 3G). However, from the SHAP explanations, it was apparent that extreme values of many of these features in only a few animals impacted on the model, suggesting poor general association with tumour size (Fig 3G). Generating the final Random Forest model with these 10 features to predict the future mass of CT26 tumour resulted in a significant moderate linear correlation with the actual mass (Fig 3H).

CT26 tumour mass prediction modelling suggests several key blood immune features associate with tumour development

SHAP values of immune features predicting CT26 tumour mass suggest several features have a relationship with tumour size that together allow moderately strong tumour mass predictions to be made. To investigate this in more detail, and possibly infer some immune mechanisms supporting tumour growth, a correlation matrix was plotted of the 5 key predictive features from both D7 and D14 blood samples, and their monotonic relationship reported via Spearman’s correlation coefficient (Fig 3I). While there appeared to be significant weak-to-strong relationships among the 5 key D7 features, only CXCL10 had a significant, but weak, relationship with end-point CT26 tumour mass. In contrast, 4 of 5 key D14 features of tumour growth had significant direct association with tumour mass and one another. The relationship of the key D14 features and the key D7 features was complex, with both negative and positive significant relationships (Fig 3I). Generally, CCL17 weakly and positively correlated with early factors of tumour grow (CXCL10), but then weakly and negatively correlated with late factors of tumour growth (myeloid cell populations); CXCL1 and CCL2 acted like CCL17 in this respect. To summarise all these interactions, the distance of absolute values of the Spearman’s correlations was plotted using multidimensional scaling, which shows the global relationships of the features and tumour size in 2 dimensions (Fig 3J). This emphasised the key association of D14 neutrophils, PD-L1+ myeloid cells and Ly6Cintermediate monocytes with CT26 tumour size, and a more distant relationship with D7 myeloid cells, CCL17, CXCL1, CCL2, CXCL10 levels, and D14 IL-10 level. From this we could postulate that CCL17, CXCL1 and CCL2 act early and in a similar way to indirectly help tumour growth, possibly by upregulating CXCL10 production and myeloid cell expansion, which act more directly on tumour growth. These correlations change with time, with early low expression of CCL17, CXCL1 and CCL2 eventually promoting myeloid cell development that maintains/promotes larger tumours. A possible model of blood immune features associated with CT26 tumour growth is depicted in Fig 3K.

Model fitting of 4T1 tumour size using blood immune signatures resulted in strong predictability

To determine if 4T1 tumour growth could also be predicted by blood immune phenotype, a similar work flow to the above was employed. First, we tested if D14 blood features could predict current 4T1 tumour mass. Random Forest modelling was stable and had similar scores in both the training and test data sets across a range of features fed in to the model, with scores peaking with 3–5 features (S7 Fig and Fig 4A). Myeloid cells and neutrophils ranked highest in modelling, and high values of these associated with larger tumours (Fig 4B and 4C). B cell count was also among the 3 top ranked features and, generally, lower B cell numbers correlated with a higher 4T1 tumour mass (Fig 4B and 4C). There was a more complex relationship between the next highest ranked feature, PD-L1+ myeloid cells and the model, with lower numbers of these cells associating with high and lower tumour size. Using the top 3 key features in the final model resulted in predictions with a significant strong linear relationship with actual current 4T1 tumour mass (Fig 4D).

Fig 4. Predicting 4T1 tumour size and growth using blood immune phenotypes.

Fig 4

Normalised blood immune features (S4 Fig) taken from 58 4T1-bearing animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict D14 4T1 tumour size. The modelling and assessment was performed as described in Fig 3. Initially, D14 tumour size was used as the target using D14 blood samples to assess if blood immune features could predict current tumour size (a, b, c and d). Then D14 tumour size was used as the target using D7 blood samples to assess if blood immune features could predict future tumour size (e, f, g, h). Model performance was summarised showing the 60%:40% training:testing split and equality of test and train performance score means (using the top assigned features) assessed using ANOVA (a and e). The Random Forest feature importance scores for regression of the top-10 features from the modelling are shown in (b) and (f), and the SHAP scores of these shown in (c) and (g). Based on peak modelling performance, the top-3 features from D14 blood data (d) and D7 blood data (h) were used to generate the final regression modelling to predict current and future tumour mass respectively. Final modelling was performed on the entire data set and assessed using leave-one-out cross-validation and predicted mass of tumour plotted against actual tumour mass (the y-axis), in scatter-plots with dot sizing representing actual end-point tumour mass to assess for how this relates to any clusters, and the linear relationship assessed using Pearson correlation coefficient (r) and associated two-tailed p-values (d and h). Using the top ranked features at each time point a correlation matrix was constructed, which displayed all pairwise bivariate plots with loess curve fitting (lower-left half), feature names and distributions (diagonal) and Spearman’s correlation coefficient (rs) with associated p-values to test for monotonic relationships (upper-right half, which was also colour-scaled based on rs values that had p-values <0.05) (i). A distance matrix of the absolute rs (|rs|) from the correlation matrix was calculated and distances plotted in 2D using multidimensional scaling (j) and a model of the interactions summarised in k.

Testing if D7 features could be used to predict future 4T1 tumour mass at D14, the Random Forest modelling had peak performance with ~3 key features (S7 Fig and Fig 4E). The 3 main model drivers were plasma G-CSF, CXCL13 and IL-6 levels, with higher plasma amounts of these factors generally associating with larger 4T1 tumours (Fig 4F and 4G). Using these 3 features in the final model resulted in predictions with a significant strong linear relationship with actual future 4T1 tumour mass (Fig 4H).

4T1 tumour mass prediction modelling suggests a few key blood immune features associate with tumour development

SHAP values of immune features predicting 4T1 mass suggest there were 6–7 features that have a relationship with tumour size that together have strong tumour mass prediction value. A correlation matrix was plotted of the 7 key features collectively from D7 and D14 blood samples, and their monotonic relationship reported via Spearman’s correlation coefficient (Fig 4I). These interactions were also summarised using multidimensional scaling to plot the distance matrix of the Spearman’s correlations’ absolute values (Fig 4J). From this it appeared that plasma G-CSF level associated directly with 4T1 tumour mass and blood neutrophil count; the latter also associated directly with 4T1 tumour growth. Plasma CXCL13 level also had a direct positive association with 4T1 tumour growth, but did not appear to correlate with plasma G-CSF level or myeloid cell counts. In contrast, plasma IL-6 level had no direct association with 4T1 tumour size, but correlated positively with factors that did, namely plasma G-CSF and CXCL13 levels. The role of B cells and PD-L1+ myeloid cells is unclear using monotonic measures, suggesting that if they do have a role, it is more complex. From this, we could postulate and form a model (Fig 4K) that IL-6 acts to promote CXCL13 and G-CSF production, which may act independently to aid 4T1 growth, and that G-CSF also promotes neutrophil expansion that supports 4T1 tumour growth.

Summary of key tumour mass associated features

From the above analysis, there were a number of features that were important in modelling predictions for tumour growth and that associated directly or indirectly with specific tumour size. The estimated quantities of these features in blood and their comparisons between the models are summarised in Fig 5. From these pairwise comparisons it is apparent that most of the blood features that are important for modelling and correlating with CT26 growth, namely CCL17, CXCL10, total myeloid cells, CCL2, IL-10 and PD-L1+ myeloid cells, were not significantly different from the healthy levels in Nil mice (Fig 5). Indeed, of the identified important features for CT26 growth, only CXCL1, Ly6Cintermediate monocytes and neutrophils had quantities in CT26-bearing animals significantly different from normal blood of Nil animals, and in all cases higher than normal.

Fig 5. Blood immune features associating with tumour growth.

Fig 5

The quantities in blood of key immune features associated with both CT26 and 4T1 tumour growth from both D7 (early) and D14 (late) samples were plotted for animals with no tumours (Nil), CT26 tumours and 4T1 tumours, displaying all values, as well as box plots with min to max whiskers and means as ‘+’symbols. These were divided into tumour-specific features and common features between the tumour subtypes. Number of samples was as described in Fig 1B. p-values to investigate significance between the cohort means as assessed using 2-way ANOVA on Log (y+0.0001) transformed data using Tukey’s multiple comparisons with *, p ≤ 0.05. **, p ≤ 0.01. ***, p ≤ 0.001. and ****, p ≤ 0.0001.

In contrast, most features associated with 4T1 tumour growth were significantly different from normal levels in Nil animals, with G-CSF level and neutrophil count being >10-fold higher, PD-L1+ myeloid cell count being ~2-fold higher, and both CXCL13 and IL-6 levels being ~<2-fold higher than normal (Fig 5). B cell number, an important early feature for 4T1 growth modelling, was the only key feature not significantly different from normal levels, although the cells at D14 had a trend of being lower than normal in these mice.

Based on the modelling there were only 2 main features in common contributing to both tumour models’ growth, D14 blood neutrophil count and D14 PD-L1+ myeloid cell count (Fig 5). The unique features associated with each tumour were mostly plasma factors. Overall, the blood immune phenotype of 4T1-bearing mice was definitively abnormal with a few obvious aberrant immune parameters, while CT26-bearing mice had less drastic changes, making inference of key immune factors more difficult without further study.

Discussion

In this study we aimed to investigate the utility of a high-throughput multiparameter flow cytometry method, coupled with a machine learning (ML)-based statistical analysis, to screen blood for immune features capable of predicting cancer presence and growth, and also make inferences about underlying cancer-immune biology. Using two syngeneic solid tumour models, a 4T1 breast cancer model and a CT26 colorectal cancer model, our workflow revealed that myeloid factors in the blood, such as neutrophils, monocytes and the levels of the myeloid cell-propagator G-CSF, feature prominently as key determinants of tumour classification (Fig 2). Myeloid cells, specifically neutrophils and PD-L1-expressing myeloid cells, were also common associates of tumour size in both models (Fig 5). Tumour-specific blood immune features were also identified, with elevated levels of G-CSF, IL-6 and CXCL13, and B cell counts associating with prediction of 4T1 growth, while blood CCL17, CXCL10, CXCL1, total myeloid cells, CCL2, Ly6Cintermediate monocytes, and IL-10 levels were involved with predicting CT26 tumour growth. Many of these factors have been implicated in cancer progression showing the potential utility of our approach.

With a growing appreciation of immune responses as a hallmark of cancer development, immune phenotyping is becoming an increasingly interesting area of research in cancer management [2]. ML is recognised as an important approach to optimising future cancer diagnosis, prognosis and treatment personalisation, and is ideally suited for interpretating the abundant and complex immune parameters involved [14]. ML approaches can also be used to help make inferences about the underlying biological mechanisms that are modelled for, with the development of model explanatory algorithms [8]. In this study, we have chosen to use the Random Forest model [13] as our learner, since it is flexible (in that it can be used in both classification and regression questions), has in-built feature ranking (to help with feature selection), has fewer overfitting issues than some other models, is relatively interpretable, and performs well in real-life clinical applications compared to other shallow models and more extensive deep learning modalities [15, 16]. The applied Random Forest modelling presented here identified several key blood immune features that, in combination, predicted tumour class (with misclassification of only 4 animals of 130) and size (with moderate to strong linear correlation of predicted to actual current and future tumour sizes). In addition, we used the combination of Random Forest feature ranking [13, 17], SHAP explanatory values [8] and Spearman’s-based bivariate correlations [18] to help make inferences about underlying features important for outcomes. Intriguingly, while these factors ranked highly in predictive modelling, and several had significant correlations either directly or indirectly with tumour growth, many did not differ significantly from levels measured in non-tumour bearing animals. This raises the question of potential additive or even synergistic roles for these factors in tumour development; the alternative possibility of chance association, however, cannot be discounted. This latter hypothesis can only be probed with further experimental input, such as blocking and/or knockdown/out studies of the identified key features in in vivo studies. While this is beyond the scope of the current study, we note that independent reports support a role for these factors in cancer development and these will be discussed below.

One of the most upregulated factors we identified as a potential early driver of 4T1 growth was G-CSF. Previous observations have shown that 4T1 tumour cells are potent producers of G-CSF [19, 20] and that abrogating G-CSF production significantly diminishes tumour growth in preclinical breast cancer models [19]. We also showed that elevated neutrophils (annotated CD11b+Ly6G+ cells) strongly correlated with advancing tumours (Fig 5). Previous reports show 4T1 tumour cells induce profound granulocytosis in vivo [9, 21] and separate reports reveal a critical role for G-CSF in 4T1 growth and metastasis through changes in granulocyte frequencies (referred to in those reports as myeloid-derived suppressor cells, MDSCs, which can have a CD11b+Ly6G+ phenotype) [22]. Clinically, G-CSF can be significantly higher in the plasma of breast cancer patients and plasma levels correlate with more advanced disease [23], as do blood levels of neutrophils [24]. Intriguingly, IL-6, another early signature of 4T1 growth that we identified, cooperates with G-CSF to promote pro-tumour function of neutrophils [25]. IL-6 is often associated with the tumour microenvironment [26] and clinically, circulating IL-6 level is associated with poor prognosis and low survival rate in patients with breast cancer [27], while IL-6 polymorphisms are linked to increased breast cancer risk [28]. Thus, IL-6 and G-CSF may work in concert on neutrophil function to promote breast cancer growth.

We also identified CXCL13 as another early factor correlating with 4T1 tumour growth, and its role in breast cancer has been widely reported [2931]. However, published studies are conflicting with regards to its role in the 4T1 model, with support for both pro-tumour activity [32] and anti-tumour activity [33]. Indeed, generally, CXCL13 has been shown to drive growth and invasive signals in many tumours, but also correlates with improved survival in other tumours [34], suggesting a context-dependent role for this cytokine in cancer progression. A further intriguing aspect of CXCL13 biology is that it acts as a chemoattractant for B cells [34], which were also identified as an important feature of 4T1 tumour growth in our analysis. The contribution of B cells in antitumour immunity remains controversial [35], with both pro- and anti- tumour effects. In addition, CXCL13 production from bone marrow endothelial cells occurs in response to IL-6 [36], which is also known to be a B cell differentiation and activation factor [34]. Based on these reports, and our data, we can formulate a model for all these factors that potentiates breast cancer growth (Fig 4K). Here, IL-6, promoted by the tumour microenvironment, may interact in concert with G-CSF to drive neutrophil protumour activity and also production of CXCL13. CXCL13 may then act as a protumour factor and, with IL-6, promote B cell responses which also act on tumour growth. Finally, we have identified a PD-L1-expressing myeloid population, a third top feature of our 4T1 ML model (Fig 4I), which correlates with circulating B cell number and thus may also act indirectly to support tumour growth. While circulating PD-L1-expressing myeloid populations are less well documented than the factors described above, it has been reported that, in lung cancer, treatment with PD-1/PD-L1 blockade response correlated with systemic PD-L1+ CD11b+ myeloid cell frequency, suggesting a potential for stratification based on systemic PD-L1+ myeloid cell subsets [37]. Further study of these cells is warranted.

In the CT26 tumour model, we identified early levels of CXCL1, CCL2 and CCL17 as having important roles in predicting tumour growth as well as similar pairwise correlations with factors directly correlating with tumour growth and one another (Fig 3I), suggesting they played similar roles in this context. CXCL1, is known to promote recruitment and activation of neutrophils [38], premetastatic niche formation [39], tumour invasive potential [40] and tumorigenicity in metastatic colorectal cancer patients [41], and therefore, not surprisingly, serves as a biomarker for poor prognosis. Similarly, CCL2 promotes the recruitment of immunosuppressive tumour-associated macrophages [42], promotes CT26 tumour growth [43] and associates with poor outcomes in metastatic human colorectal cancer [42]. In contrast, CCL17 has been reported to play a complex and somewhat contradictory role in cancer development and progression [44]. CCL17 can promote anti-CT26 tumour immune response [45], and high serum levels are associated with improved survival rates in advanced melanoma patients [46]. On the other hand, tumour-associated neutrophils can produce CCL17, recruiting CD4 T regulatory cells that promote immune evasion and cancer development in non-small cell lung cancer [47, 48]. It is possible that the location, timing and context of CCL17 expression determines its impact on cancer establishment and progression. Indeed, this may also be the case with CXCL1 and CCL2, since all these three factors associated positively with early correlates of CT26 growth, such as blood myeloid cells and plasma CXCL10 levels, but then also associated negatively with late factors correlating with CT26 growth, such as monocytes with a Ly6G-Ly6Cintermediate phenotype and neutrophils (Fig 3).

CXCL10 was identified as an early weak correlate of CT26 growth in our analysis. Clinically, CXCL10 been associated with pro- and anti- tumour responses in colorectal cancer patients [49, 50]. A recent study across 3,763 colorectal cancer patients suggested lower CXCL10 expression was significantly associated with disease spread, recurrence and overall survival, and this association was dependent on other factors such as age and population-based genetic differences [50]. This suggests that CXCL10 expression may have potential as a predictive biomarker in colorectal cancer management, once these variables are taken into account. Similarly, IL-10, a feature involved in prediction of CT26 growth in our analysis, is also associated with colorectal cancer patient prognosis, but in a context dependent manner, being generally lower in patients compared to controls, but higher in patients with poor prognosis [51].

While several myeloid cells were identified as late associates of CT26 growth, the late appearance of monocytes with a Ly6G-Ly6Cintermediate phenotype had the strongest association with tumour size (Fig 4). Tumour monocyte subsets are known to have diverse roles in tumour progression [52]. Related to this, CCL2 is a primary recruiter of tumour monocyte subsets [52] and CXCL10 is known to be a monocyte recruitment factor [53]. In our study, early levels of CCL2 and CXCL10 were associated with one another and early CXCL10 levels had a weak correlation with the later appearance of Ly6G-Ly6Cintermediate monocytes. Based on these observations and reports by others, a potential model for the role of key blood immune factors identified can be postulated in colorectal cancer development (Fig 3K). Here, early production of CCL2 and CXCL1 may help with shaping the initial myeloid cell compartment in cancer-bearing individuals, which promotes tumour development and production of CCL17 and CXCL10 [54] which in turn modulates recruitment of leukocytes. The early soluble factors may then help shape later tumour associated factors such as IL-10, neutrophils, PD-L1-expressing myeloid cells and Ly6G-Ly6Cintermediate monocytes, which play roles in tumour development.

Undoubtedly, our work is limited by the choice of models used to develop the workflow. While murine syngeneic cell line models are among the most widely used tools for studying cancer [55, 56] and have been involved in landmark discoveries [57, 58], there are several limitations to this approach. Cell line-derived models are non-autochthonous, and thus may not have the normal architecture or development that occurs in tumours evolving de novo. Indeed, the injection of the cell lines may in itself alter the inflammatory environment in a way that would not be seen in de novo tumour growth [59]. The loss of genetic heterogeneity and irreversible changes in gene expression resulting from long-term in vitro propagation of tumour cell lines may also mean that we do not observe the same level of intra-individual heterogeneity that is common in human tumours [56, 60]. Furthermore, the use of inbred mouse strains does not reflect the vast inter-individual heterogeneity present in the clinic [56]. While we have attempted to overcome some of these issues by using two distinct and diverse cell lines, there would be obvious benefit to increasing this diversity with additional models, given the resources. Nevertheless, there is clinical evidence to support our findings (as discussed above) and thus our study provides an approach that may work clinically, which is the ultimate goal. While beyond the scope of this current study, the workflow developed here is now being modified for clinical implementation in cancer patients. This will involve initial high-dimensional screens (using protein arrays and LEGENDScreenTM technologies) to identify blood cell and plasma features that may be associated with cancer-specific progression. Key features will then be rationalised in a high-throughput assay/machine learning pipeline analogous to that reported here and used to phenotype blood of cancer patients and closely matched healthy controls to assess capacity to predict patient outcomes over time.

In summary, our work demonstrates the benefit of a high-dimensional data pipeline for the identification of key immune features that interact with tumour development. Our analysis has highlighted the great complexity in the relationship between the immune response and tumour development, where expression of a single molecule may well be insufficient to predict or explain tumour progression. Indeed, it is clear that many immune factors have context-dependent roles in cancer development [34, 44, 61]. With this in mind, we believe a multivariate approach to “biomarker” identification for use in the prognostication and treatment personalisation of cancer is well warranted. Furthermore, we are confident that this work demonstrates the utility of an immune-based workflow in combination with ML to enable identification of context-dependent predictive immune features for the study of tumour outcome. It will be of further interest if such an approach can be utilised to predict treatment outcomes, justifying a role for assessing multivariate immune biomarkers for cancer treatment personalisation.

Supporting information

S1 Fig. Gating and population names for leukocyte subsets.

FlowJo software was used to delineate leukocyte populations using manual and boolean gates on concatenated samples with the scheme shown in (a) acting as a template for the entire study. FIt-SNE plots from concatenated live CD45+ samples, generated with default FlowJo setting, were overlayed with each manual gated population to ensure the gating scheme generated similar populations to those generated from the unsupervised approach (b), with the manual gate population identified by colour and name (c). The process was refined until the two approaches were good approximations of each other, resulting in the manual gates displayed in a. Generic and short form names of each population were then assigned based on marker expression and used throughout the manuscript (d).

(TIFF)

S2 Fig. Gating and names for LEGENDplex beads.

FlowJo software was used to delineate LEGENDplex bead populations using manual gates for both the Macrophage/microglial (Mac/Mic) 13-plex LEGENDplex kit (a) and the Proinflammatory (Proinflam) 13-plex LEGENDplex Kit (Biolegend) (b), which acted as a template for the entire study.

(TIFF)

S3 Fig. Random Forest learning curve for data set size to model performance evaluation.

Normalised blood immune features (S4 Fig) taken from the 130 animals that had both day 7 and day 14 blood samples (Fig 1B), were used in Random Forest modelling to predict presence of tumour and tumour subtype (targets class being Nil, 4T1 and CT26). Modelling was done on a progressively smaller number of random samples and model performance assessed using cross-validation with a training set of 80% of randomly obtained data and tested on the remaining data and this repeated 100 x. Model performance was assessed by several classification indicators, including area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall) with values being from 0 to 1 (and toward the latter being the best).

(TIFF)

S4 Fig. Normalised blood immune phenotypes in animal tumour models.

CT26 or 4T1 tumours were grown in female, BALB/c mice and blood immune phenotype determined at D7 and D14, as described in Fig 1. Animals with no tumours (Nil) were used as normal immune phenotype controls. A total of 180 animals were included in the study, and animals divided into the groups indicated in Fig 1B. A 20 μl of blood sample from each animal at each time point was phenotyped for leukocyte populations and plasma analytes (Fig 1). Cell and plasma analytes were reported as fold-changes from the mean of Nil mice or “nil normalised”, as described in the methods for both the D7 and D14 time points and presented on a log-scale (with numbers of 0-value data points indicated on the axis). Means and SEM are indicated (shown in yellow) and mean equality was tested using ANOVA on Log (y+0.0001) transformed data using Tukey’s multiple comparisons correction, with 2-way ANOVA and multiple comparison p-values indicated (*, p ≤ 0.05. **, p ≤ 0.01. ***, p ≤ 0.001. ****, p ≤ 0.0001.). Heatmap summaries of the data highlighting the changes are also shown. Three analytes overlapped in the LEGENDplex kits, namely CCL22, CXCL1 and CCL17, and are labelled with a (1) if from the Mac/Mic panel or (2) if they are from the Proinflam panel.

(TIFF)

S5 Fig. Tumour classification Random Forest modelling using blood immune phenotype.

Normalised blood immune features (S4 Fig) taken from the 130 animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict presence of tumour and tumour subtype (target classes being Nil, 4T1 and CT26). The model was trained on 80% (a) and 60% (b) of randomly selected data and cross-validated using leave-one-out, and tested using the remaining data. Modelling was done on a progressively smaller number of features, from lowest to highest ranked, based on in-built Random Forest importance for class determination, and the process repeated 3 times. Model performance was assessed by several classification indicators, including area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall) with values being from 0 to 1 (and toward the latter being the best). The SHapley Additive exPlanations (SHAP) algorithm feature importance scores for classification using the top-15 features (ranked from highest to lowest) from the SHAP values are shown in (c), and show how the feature values impact on classification of each animal cohort, namely healthily control (Nil), CT26-bearing and 4T1-bearing cohorts.

(TIFF)

S6 Fig. Random Forest modelling to predicting CT26 tumour size and growth using blood immune phenotypes.

Normalised blood immune features (S4 Fig) taken from 48 CT26-bearing animals that had both D7 and D14 blood samples (Fig 1B) were used in Random Forest modelling to predict CT26 tumour size at D14. The model was trained initially on 100%, 80% and 60% of randomised data and cross-validated using leave-one-out (Train panels) and tested using the remaining data (Test panels). Modelling was done on a progressively smaller number of features, from lowest to highest ranked based on in-built Random Forest importance, and the process repeated 3 times (mean and standard error of mean shown). Model performance was summarised showing the 60%:40%, training:testing split, and equality of test and train performance score means (using the top assigned features) assessed using ANOVA and shown in the main Fig (Fig 3). The Random Forest rank (RF rank) scores for the top-10 features are shown. Model performance was assessed by several regression indicators, including the error scores, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) (which we hoped to minimise), and the coefficient of determination score R2. D14 tumour size was used as the target using D14 blood samples to assess if blood immune features could predict current tumour size (a). D14 tumour size was used as the target using D7 blood samples to assess if blood immune features could predict future tumour size (b).

(TIFF)

S7 Fig. Random Forest modelling to predicting 4T1 tumour size and growth using blood immune phenotypes.

Normalised blood immune features (S4 Fig) taken from 58 4T1-bearing animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict 4T1 tumour size at D14. The model was trained initially on 100%, 80% and 60% of randomised data and cross-validated using leave-one-out (Train panels) and tested using the remaining data (Test panels). Modelling was done on a progressively smaller number of features, from lowest to highest ranked based on in-built Random Forest importance, and the process repeated 3 times (mean and standard error of mean shown). Model performance was summarised showing the 60%:40%, training:testing split and equality of test and train performance score means (using the top assigned features) assessed using ANOVA and shown in the main Fig (Fig 4). The Random Forest rank (RF rank) scores for the top-10 features are shown. Model performance was assessed by several regression indicators, including the error scores, MSE, MAE and RMSE (which we hoped to minimise), and the coefficient of determination score R2. D14 tumour size was used as the target using D14 blood samples to assess if blood immune features could predict current tumour size (a). D14 tumour size was used as the target using D7 blood samples to assess if blood immune features could predict future tumour size (b).

(TIFF)

S1 Table. List of antibodies for cell surface labelling.

(DOCX)

S1 File. All raw data.

(XLSX)

S2 File. RF pipeline classification.

(OWS)

S3 File. RF pipeline regression.

(OWS)

S4 File. MDS pipeline for regression.

(OWS)

Acknowledgments

We wish to acknowledge Mick Devoy and Dr Harpreet Vohra for their expert help with flow cytometry.

Data Availability

All relevant data are within the manuscript, its Supporting Information files and/or held in the Australian National University (ANU) DATA COMMONS repository at https://dx.doi.org/10.25911/6153a8ab5747c (which has the raw Flow Cytometry Standard (FCS) files).

Funding Statement

This work was partially supported by the Radiation Oncology Private Practice Trust Fund, Canberra Health Services, Canberra, Australia. The funder provided support in the form of salaries and/or research materials for authors B.J.C.Q, D.A.S.D., S.M., J.S., F.M.S., I.I.A. but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Parsons B. L., “Multiclonal tumor origin: Evidence and implications,” Mutat. Res. Rev. Mutat. Res., vol. 777, pp. 1–18, Sep. 2018, doi: 10.1016/j.mrrev.2018.05.001 [DOI] [PubMed] [Google Scholar]
  • 2.Hiam-Galvez K. J., Allen B. M., and Spitzer M. H., “Systemic immunity in cancer,” Nat. Rev. Cancer, vol. 21, no. 6, pp. 345–359, Jun. 2021, doi: 10.1038/s41568-021-00347-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Allen B. M. et al. , “Systemic dysfunction and plasticity of the immune macroenvironment in cancer models,” Nat. Med., vol. 26, no. 7, pp. 1125–1134, Jul. 2020, doi: 10.1038/s41591-020-0892-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Murciano-Goroff Y. R., Warner A. B., and Wolchok J. D., “The future of cancer immunotherapy: microenvironment-targeting combinations,” Cell Res., vol. 30, no. 6, pp. 507–519, Jun. 2020, doi: 10.1038/s41422-020-0337-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hegde P. S. and Chen D. S., “Top 10 Challenges in Cancer Immunotherapy,” Immunity, vol. 52, no. 1, pp. 17–35, Jan. 2020, doi: 10.1016/j.immuni.2019.12.011 [DOI] [PubMed] [Google Scholar]
  • 6.Dranoff G., “Cytokines in cancer pathogenesis and cancer therapy,” Nat. Rev. Cancer, vol. 4, no. 1, Art. no. 1, Jan. 2004, doi: 10.1038/nrc1252 [DOI] [PubMed] [Google Scholar]
  • 7.Ozga A. J., Chow M. T., and Luster A. D., “Chemokines and the immune response to cancer,” Immunity, vol. 54, no. 5, pp. 859–874, May 2021, doi: 10.1016/j.immuni.2021.01.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lundberg S. M. et al. , “Explainable machine-learning predictions for the prevention of hypoxaemia during surgery,” Nat. Biomed. Eng., vol. 2, no. 10, pp. 749–760, Oct. 2018, doi: 10.1038/s41551-018-0304-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.DuPré S. A., Redelman D., and Hunter K. W., “The mouse mammary carcinoma 4T1: characterization of the cellular landscape of primary tumours and metastatic tumour foci,” Int. J. Exp. Pathol., vol. 88, no. 5, pp. 351–360, Oct. 2007, doi: 10.1111/j.1365-2613.2007.00539.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schrörs B. et al. , “Multi-Omics Characterization of the 4T1 Murine Mammary Gland Tumor Model,” Front. Oncol., vol. 10, p. 1195, Jul. 2020, doi: 10.3389/fonc.2020.01195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yu J. W. et al. , “Tumor-immune profiling of murine syngeneic tumor models as a framework to guide mechanistic studies and predict therapy response in distinct tumor microenvironments,” PLoS ONE, vol. 13, no. 11, Nov. 2018, doi: 10.1371/journal.pone.0206223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhong W. et al. , “Comparison of the molecular and cellular phenotypes of common mouse syngeneic models with human tumors,” BMC Genomics, vol. 21, no. 1, Art. no. 1, Jan. 2020, doi: 10.1186/s12864-019-6344-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Breiman L., “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324 [DOI] [Google Scholar]
  • 14.Iqbal M. J. et al. , “Clinical applications of artificial intelligence and machine learning in cancer diagnosis: looking into the future,” Cancer Cell Int., vol. 21, no. 1, p. 270, May 2021, doi: 10.1186/s12935-021-01981-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Misra D. et al. , “Early Detection of Septic Shock Onset Using Interpretable Machine Learners,” J. Clin. Med., vol. 10, no. 2, p. 301, Jan. 2021, doi: 10.3390/jcm10020301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Venugopalan J., Tong L., Hassanzadeh H. R., and Wang M. D., “Multimodal deep learning models for early detection of Alzheimer’s disease stage,” Sci. Rep., vol. 11, no. 1, p. 3254, Feb. 2021, doi: 10.1038/s41598-020-74399-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fawagreh K., Gaber M. M., and Elyan E., “Random forests: from early developments to recent advancements,” Syst. Sci. Control Eng., vol. 2, no. 1, pp. 602–609, Dec. 2014, doi: 10.1080/21642583.2014.956265 [DOI] [Google Scholar]
  • 18.Mukaka M., “A guide to appropriate use of Correlation coefficient in medical research,” Malawi Med. J. J. Med. Assoc. Malawi, vol. 24, no. 3, pp. 69–71, Sep. 2012. [PMC free article] [PubMed] [Google Scholar]
  • 19.Waight J. D., Hu Q., Miller A., Liu S., and Abrams S. I., “Tumor-Derived G-CSF Facilitates Neoplastic Growth through a Granulocytic Myeloid-Derived Suppressor Cell-Dependent Mechanism,” PLOS ONE, vol. 6, no. 11, p. e27690, Nov. 2011, doi: 10.1371/journal.pone.0027690 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kowanetz M. et al. , “Granulocyte-colony stimulating factor promotes lung metastasis through mobilization of Ly6G+Ly6C+ granulocytes,” Proc. Natl. Acad. Sci. U. S. A., vol. 107, no. 50, pp. 21248–21255, Dec. 2010, doi: 10.1073/pnas.1015855107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ouzounova M. et al. , “Monocytic and granulocytic myeloid derived suppressor cells differentially regulate spatiotemporal tumour plasticity during metastatic cascade,” Nat. Commun., vol. 8, p. 14979, Apr. 2017, doi: 10.1038/ncomms14979 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Welte T. et al. , “Oncogenic mTOR signaling recruits myeloid-derived suppressor cells to promote tumor initiation,” Nat. Cell Biol., vol. 18, no. 6, pp. 632–644, Jun. 2016, doi: 10.1038/ncb3355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liu L., Liu Y., Yan X., Zhou C., and Xiong X., “The role of granulocyte colony-stimulating factor in breast cancer development: A review,” Mol. Med. Rep., vol. 21, no. 5, pp. 2019–2029, May 2020, doi: 10.3892/mmr.2020.11017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Koh C.-H. et al. , “Utility of pre-treatment neutrophil–lymphocyte ratio and platelet–lymphocyte ratio as prognostic factors in breast cancer,” Br. J. Cancer, vol. 113, no. 1, pp. 150–158, Jun. 2015, doi: 10.1038/bjc.2015.183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yan B. et al. , “IL-6 Cooperates with G-CSF To Induce Protumor Function of Neutrophils in Bone Marrow by Enhancing STAT3 Activation,” J. Immunol., vol. 190, no. 11, pp. 5882–5893, Jun. 2013, doi: 10.4049/jimmunol.1201881 [DOI] [PubMed] [Google Scholar]
  • 26.Kumari N., Dwarakanath B. S., Das A., and Bhatt A. N., “Role of interleukin-6 in cancer progression and therapeutic resistance,” Tumour Biol. J. Int. Soc. Oncodevelopmental Biol. Med., vol. 37, no. 9, pp. 11553–11572, Sep. 2016, doi: 10.1007/s13277-016-5098-7 [DOI] [PubMed] [Google Scholar]
  • 27.Bachelot T., Ray-Coquard I., Menetrier-Caux C., Rastkha M., Duc A., and Blay J.-Y., “Prognostic value of serum levels of interleukin 6 and of serum and plasma levels of vascular endothelial growth factor in hormone-refractory metastatic breast cancer patients,” Br. J. Cancer, vol. 88, no. 11, pp. 1721–1726, Jun. 2003, doi: 10.1038/sj.bjc.6600956 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hefler L. A. et al. , “Interleukin-1 and interleukin-6 gene polymorphisms and the risk of breast cancer in caucasian women,” Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res., vol. 11, no. 16, pp. 5718–5721, Aug. 2005, doi: 10.1158/1078-0432.CCR-05-0001 [DOI] [PubMed] [Google Scholar]
  • 29.Biswas S., Roy Chowdhury S., Mandal G., Purohit S., Gupta A., and Bhattacharyya A., “RelA driven co-expression of CXCL13 and CXCR5 is governed by a multifaceted transcriptional program regulating breast cancer progression,” Biochim. Biophys. Acta Mol. Basis Dis., vol. 1865, no. 2, pp. 502–511, Feb. 2019, doi: 10.1016/j.bbadis.2018.12.002 [DOI] [PubMed] [Google Scholar]
  • 30.Chen L. et al. , “The expression of CXCL13 and its relation to unfavorable clinical characteristics in young breast cancer,” J. Transl. Med., vol. 13, no. 1, p. 168, May 2015, doi: 10.1186/s12967-015-0521-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Panse J. et al. , “Chemokine CXCL13 is overexpressed in the tumour tissue and in the peripheral blood of breast cancer patients,” Br. J. Cancer, vol. 99, no. 6, pp. 930–938, Sep. 2008, doi: 10.1038/sj.bjc.6604621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Xu L., Liang Z., Li S., and Ma J., “Signaling via the CXCR5/ERK pathway is mediated by CXCL13 in mice with breast cancer,” Oncol. Lett., vol. 15, no. 6, pp. 9293–9298, Jun. 2018, doi: 10.3892/ol.2018.8510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ma Q., Chen Y., Qin Q., Guo F., Wang Y., and Li D., “CXCL13 expression in mouse 4T1 breast cancer microenvironment elicits antitumor immune response by regulating immune cell infiltration,” Precis. Clin. Med., vol. 4, no. 3, pp. 155–167, Sep. 2021, doi: 10.1093/pcmedi/pbab020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rubio A. J., Porter T., and Zhong X., “Duality of B Cell-CXCL13 Axis in Tumor Immunology,” Front. Immunol., vol. 11, p. 2283, 2020, doi: 10.3389/fimmu.2020.521110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yuen G. J., Demissie E., and Pillai S., “B lymphocytes and cancer: a love-hate relationship,” Trends Cancer, vol. 2, no. 12, pp. 747–757, Dec. 2016, doi: 10.1016/j.trecan.2016.10.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Singh S. et al. , “Serum CXCL13 positively correlates with prostatic disease, prostate-specific antigen and mediates prostate cancer cell invasion, integrin clustering and cell adhesion,” Cancer Lett., vol. 283, no. 1, pp. 29–35, Sep. 2009, doi: 10.1016/j.canlet.2009.03.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bocanegra A. et al. , “PD-L1 Expression in Systemic Immune Cell Populations as a Potential Predictive Biomarker of Responses to PD-L1/PD-1 Blockade Therapy in Lung Cancer,” Int. J. Mol. Sci., vol. 20, no. 7, p. 1631, Apr. 2019, doi: 10.3390/ijms20071631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sawant K. V. et al. , “Chemokine CXCL1 mediated neutrophil recruitment: Role of glycosaminoglycan interactions,” Sci. Rep., vol. 6, no. 1, p. 33123, Dec. 2016, doi: 10.1038/srep33123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang D., Sun H., Wei J., Cen B., and DuBois R. N., “CXCL1 Is Critical for Premetastatic Niche Formation and Metastasis in Colorectal Cancer,” Cancer Res., vol. 77, no. 13, pp. 3655–3665, Jul. 2017, doi: 10.1158/0008-5472.CAN-16-3199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Bandapalli O. R. et al. , “Down-regulation of CXCL1 inhibits tumor growth in colorectal liver metastasis,” Cytokine, vol. 57, no. 1, pp. 46–53, Jan. 2012, doi: 10.1016/j.cyto.2011.10.019 [DOI] [PubMed] [Google Scholar]
  • 41.le Rolle A.-F. et al. , “The prognostic significance of CXCL1 hypersecretion by human colorectal cancer epithelia and myofibroblasts,” J. Transl. Med., vol. 13, no. 1, p. 199, Jun. 2015, doi: 10.1186/s12967-015-0555-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Grossman J. G. et al. , “Recruitment of CCR2+ tumor associated macrophage to sites of liver metastasis confers a poor prognosis in human colorectal cancer,” Oncoimmunology, vol. 7, no. 9, p. e1470729, 2018, doi: 10.1080/2162402X.2018.1470729 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chun E. et al. , “CCL2 Promotes Colorectal Carcinogenesis by Enhancing Polymorphonuclear Myeloid-Derived Suppressor Cell Population and Function,” Cell Rep., vol. 12, no. 2, pp. 244–257, Jul. 2015, doi: 10.1016/j.celrep.2015.06.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Korbecki J., Grochans S., Gutowska I., Barczak K., and Baranowska-Bosiacka I., “CC Chemokines in a Tumor: A Review of Pro-Cancer and Anti-Cancer Properties of Receptors CCR5, CCR6, CCR7, CCR8, CCR9, and CCR10 Ligands,” Int. J. Mol. Sci., vol. 21, no. 20, p. E7619, Oct. 2020, doi: 10.3390/ijms21207619 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kanagawa N. et al. , “CC-chemokine ligand 17 gene therapy induces tumor regression through augmentation of tumor-infiltrating immune cells in a murine model of preexisting CT26 colon carcinoma,” Int. J. Cancer, vol. 121, no. 9, pp. 2013–2022, Nov. 2007, doi: 10.1002/ijc.22908 [DOI] [PubMed] [Google Scholar]
  • 46.Weide B. et al. , “Increased CCL17 serum levels are associated with improved survival in advanced melanoma,” Cancer Immunol. Immunother. CII, vol. 64, no. 9, pp. 1075–1082, Sep. 2015, doi: 10.1007/s00262-015-1714-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mishalian I. et al. , “Neutrophils recruit regulatory T-cells into tumors via secretion of CCL17—a new mechanism of impaired antitumor immunity,” Int. J. Cancer, vol. 135, no. 5, pp. 1178–1186, Sep. 2014, doi: 10.1002/ijc.28770 [DOI] [PubMed] [Google Scholar]
  • 48.Mizukami Y. et al. , “CCL17 and CCL22 chemokines within tumor microenvironment are related to accumulation of Foxp3+ regulatory T cells in gastric cancer,” Int. J. Cancer, vol. 122, no. 10, pp. 2286–2293, May 2008, doi: 10.1002/ijc.23392 [DOI] [PubMed] [Google Scholar]
  • 49.Karin N. and Razon H., “Chemokines beyond chemo-attraction: CXCL10 and its significant role in cancer and autoimmunity,” Cytokine, vol. 109, pp. 24–28, Sep. 2018, doi: 10.1016/j.cyto.2018.02.012 [DOI] [PubMed] [Google Scholar]
  • 50.Chen J. et al. , “Prognostic and predictive values of CXCL10 in colorectal cancer,” Clin. Transl. Oncol., vol. 22, no. 9, pp. 1548–1564, Sep. 2020, doi: 10.1007/s12094-020-02299-6 [DOI] [PubMed] [Google Scholar]
  • 51.Abtahi S., Davani F., Mojtahedi Z., Hosseini S. V., Bananzadeh A., and Ghaderi A., “Dual association of serum interleukin-10 levels with colorectal cancer,” J. Cancer Res. Ther., vol. 13, no. 2, pp. 252–256, Jun. 2017, doi: 10.4103/0973-1482.199448 [DOI] [PubMed] [Google Scholar]
  • 52.Olingy C. E., Dinh H. Q., and Hedrick C. C., “Monocyte heterogeneity and functions in cancer,” J. Leukoc. Biol., vol. 106, no. 2, pp. 309–322, 2019, doi: 10.1002/JLB.4RI0818-311R [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Petrovic-Djergovic D., Popovic M., Chittiprol S., Cortado H., Ransom R. F., and Partida-Sánchez S., “CXCL10 induces the recruitment of monocyte-derived macrophages into kidney, which aggravate puromycin aminonucleoside nephrosis,” Clin. Exp. Immunol., vol. 180, no. 2, pp. 305–315, 2015, doi: 10.1111/cei.12579 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.House I. G. et al. , “Macrophage-Derived CXCL9 and CXCL10 Are Required for Antitumor Immune Responses Following Immune Checkpoint Blockade,” Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res., vol. 26, no. 2, pp. 487–504, Jan. 2020, doi: 10.1158/1078-0432.CCR-19-1868 [DOI] [PubMed] [Google Scholar]
  • 55.Olson B., Li Y., Lin Y., Liu E. T., and Patnaik A., “Mouse Models for Cancer Immunotherapy Research,” Cancer Discov., vol. 8, no. 11, pp. 1358–1365, Nov. 2018, doi: 10.1158/2159-8290.CD-18-0044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Gengenbacher N., Singhal M., and Augustin H. G., “Preclinical mouse solid tumour models: status quo, challenges and perspectives,” Nat. Rev. Cancer, vol. 17, no. 12, Art. no. 12, Dec. 2017, doi: 10.1038/nrc.2017.92 [DOI] [PubMed] [Google Scholar]
  • 57.Schreiber R. D., Old L. J., and Smyth M. J., “Cancer Immunoediting: Integrating Immunity’s Roles in Cancer Suppression and Promotion,” Science, vol. 331, no. 6024, Art. no. 6024, Mar. 2011, doi: 10.1126/science.1203486 [DOI] [PubMed] [Google Scholar]
  • 58.Uno T. et al. , “Eradication of established tumors in mice by a combination antibody-based therapy,” Nat. Med., vol. 12, no. 6, pp. 693–698, Jun. 2006, doi: 10.1038/nm1405 [DOI] [PubMed] [Google Scholar]
  • 59.Bonnotte B. et al. , “Intradermal injection, as opposed to subcutaneous injection, enhances immunogenicity and suppresses tumorigenicity of tumor cells,” Cancer Res., vol. 63, no. 9, pp. 2145–2149, May 2003. [PubMed] [Google Scholar]
  • 60.Calbo J. et al. , “A functional role for tumor cell heterogeneity in a mouse model of small cell lung cancer,” Cancer Cell, vol. 19, no. 2, pp. 244–256, Feb. 2011, doi: 10.1016/j.ccr.2010.12.021 [DOI] [PubMed] [Google Scholar]
  • 61.Zamarron B. F. and Chen W., “Dual roles of immune cells and their factors in cancer development and progression,” Int. J. Biol. Sci., vol. 7, no. 5, pp. 651–658, 2011, doi: 10.7150/ijbs.7.651 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Afsheen Raza

5 Jan 2022

PONE-D-21-35248Machine learning predicts cancer outcomes from blood immune signaturesPLOS ONE

Dear Dr. Quah,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.The reviewers have asked for adding the limitations of the study as well as some clarifications. Kindly address the reviewer comments and resubmit the manuscript by Feb 19 2022 11:59PM.

 

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Afsheen Raza, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf  and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Funding Section of your manuscript:

“This work was partially supported by the Radiation Oncology Private Practice Trust Fund, Canberra Health Services.”

Please note that funding information should not appear in the Funding section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This work was partially supported by the Radiation Oncology Private Practice Trust Fund, Canberra Health Services, Canberra, Australia. The funder provided support in the form of salaries and/or research materials for authors B.J.C.Q, D.A.S.D., S.M., J.S., F.M.S., I.I.A. but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3. Thank you for stating the following in the Competing Interests section:

“We have read the journal's policy and the authors of this manuscript have the following competing interests: I.I.A., J.P., and K.G. declare that they are employees of the biotechnology company Lipotek Pty Ltd.  The remaining authors have declared that no competing interests exist.”

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study is well designed and described. The major limitations include using (1) cell lines, (2) using only two cell lines, and (3) using mouse xenograft models. The authors allude to (1) in the Discussion (last paragraph), but all of these limitations should be further discussed. Additionally, the potential translation of this approach to clinical practice should be discussed in greater detail. What are the steps to get there and how would this workflow be applied to actual patients?

Reviewer #2: This is an excellent study that establish a blood immune signature predicting cancer outcomes by machine learning. But I still have some advice about this study:

1.It’s better to valid the application of this signature in patients.

2.You can add some experiments to certify the result of this study in vivo and in vitro.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Feb 28;17(2):e0264631. doi: 10.1371/journal.pone.0264631.r002

Author response to Decision Letter 0


23 Jan 2022

Dear Editor,

Please find below responses to the specific queries by the reviewers. We have modified the manuscript’s discussion to address all of these points with track changes.

Reviewer #1: This study is well designed and described.

The major limitations include using (1) cell lines, (2) using only two cell lines, and (3) using mouse xenograft models. The authors allude to (1) in the Discussion (last paragraph), but all of these limitations should be further discussed.

Response: We have now extensively broadened the discussion to include a section (from pg 30) on the use of cell lines and murine cancer models, highlighting their limitations and benefits in cancer research. We have also highlighted the benefits of using more cell line models in this context. It should also be noted that xenografts were not used in this study as suggested by the reviewer, only syngeneic models, and so only syngeneic models were addressed in the new discussion section.

Additionally, the potential translation of this approach to clinical practice should be discussed in greater detail. What are the steps to get there and how would this workflow be applied to actual patients?

Response: We have included details of our clinical approach in the above-mentioned new section of the discussion (from pg 30).

Reviewer #2: This is an excellent study that establish a blood immune signature predicting cancer outcomes by machine learning.

But I still have some advice about this study:

1.It’s better to valid the application of this signature in patients.

Response: We agree very much with this statement and it is indeed what we are currently working towards. The current report was to provide evidence in order to help gain funding and resources to pursue such a human study. Therefore, we feel that to include a human study in this particular report is beyond the scope of the current work. However, we have now included an extensive additional discussion section (as mentioned above from pg 30) highlighting the limitation of our current study and the need for validation in the clinic, and suggesting steps to pursue this.

2.You can add some experiments to certify the result of this study in vivo and in vitro.

Response: While such a general statement is difficult to address specifically, we have modified our discussion section (on pg 26), in which we previously addressed the need for further experimentation to support for our findings, to include a brief detail of the experiments that could be done to support the numerous features we identified that may be involved in cancer development. We note that these experiments would be vast given the number of features involved and are out of the scope of the current study. Instead support our conclusions based on reference to independent peer-reviewed studies that support our findings and which we have summarised in the discussion.

Response: Please note, as requested, we would like to notify the journal that five new references are included in the revised manuscript to help support the changes to the discussion.

Finally, we have taken this opportunity to make some minor amendments to the title, abstract, introduction and discussion, as well as making some minor grammatical changes, which we believe improve the readability of the manuscript without changing the essence of these sections. If this goes beyond what is appropriate, we are happy to exclude these changes and can send this to you instead, please let us know if this is preferred.

Thank you for considering our work for publication in PLOS ONE.

Yours sincerely

Dr Ben Quah

Principal Investigator, I-Cube lab

ACRF Department of Cancer Biology & Therapeutics

The John Curtin School of Medical Research

The Australian National University

Attachment

Submitted filename: Response to Reviewers_edits.docx

Decision Letter 1

Afsheen Raza

15 Feb 2022

Machine learning predicts cancer subtypes and progression from blood immune signatures

PONE-D-21-35248R1

Dear Dr. Quah,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Afsheen Raza, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Afsheen Raza

17 Feb 2022

PONE-D-21-35248R1

Machine learning predicts cancer subtypes and progression from blood immune signatures

Dear Dr. Quah:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Afsheen Raza

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Gating and population names for leukocyte subsets.

    FlowJo software was used to delineate leukocyte populations using manual and boolean gates on concatenated samples with the scheme shown in (a) acting as a template for the entire study. FIt-SNE plots from concatenated live CD45+ samples, generated with default FlowJo setting, were overlayed with each manual gated population to ensure the gating scheme generated similar populations to those generated from the unsupervised approach (b), with the manual gate population identified by colour and name (c). The process was refined until the two approaches were good approximations of each other, resulting in the manual gates displayed in a. Generic and short form names of each population were then assigned based on marker expression and used throughout the manuscript (d).

    (TIFF)

    S2 Fig. Gating and names for LEGENDplex beads.

    FlowJo software was used to delineate LEGENDplex bead populations using manual gates for both the Macrophage/microglial (Mac/Mic) 13-plex LEGENDplex kit (a) and the Proinflammatory (Proinflam) 13-plex LEGENDplex Kit (Biolegend) (b), which acted as a template for the entire study.

    (TIFF)

    S3 Fig. Random Forest learning curve for data set size to model performance evaluation.

    Normalised blood immune features (S4 Fig) taken from the 130 animals that had both day 7 and day 14 blood samples (Fig 1B), were used in Random Forest modelling to predict presence of tumour and tumour subtype (targets class being Nil, 4T1 and CT26). Modelling was done on a progressively smaller number of random samples and model performance assessed using cross-validation with a training set of 80% of randomly obtained data and tested on the remaining data and this repeated 100 x. Model performance was assessed by several classification indicators, including area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall) with values being from 0 to 1 (and toward the latter being the best).

    (TIFF)

    S4 Fig. Normalised blood immune phenotypes in animal tumour models.

    CT26 or 4T1 tumours were grown in female, BALB/c mice and blood immune phenotype determined at D7 and D14, as described in Fig 1. Animals with no tumours (Nil) were used as normal immune phenotype controls. A total of 180 animals were included in the study, and animals divided into the groups indicated in Fig 1B. A 20 μl of blood sample from each animal at each time point was phenotyped for leukocyte populations and plasma analytes (Fig 1). Cell and plasma analytes were reported as fold-changes from the mean of Nil mice or “nil normalised”, as described in the methods for both the D7 and D14 time points and presented on a log-scale (with numbers of 0-value data points indicated on the axis). Means and SEM are indicated (shown in yellow) and mean equality was tested using ANOVA on Log (y+0.0001) transformed data using Tukey’s multiple comparisons correction, with 2-way ANOVA and multiple comparison p-values indicated (*, p ≤ 0.05. **, p ≤ 0.01. ***, p ≤ 0.001. ****, p ≤ 0.0001.). Heatmap summaries of the data highlighting the changes are also shown. Three analytes overlapped in the LEGENDplex kits, namely CCL22, CXCL1 and CCL17, and are labelled with a (1) if from the Mac/Mic panel or (2) if they are from the Proinflam panel.

    (TIFF)

    S5 Fig. Tumour classification Random Forest modelling using blood immune phenotype.

    Normalised blood immune features (S4 Fig) taken from the 130 animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict presence of tumour and tumour subtype (target classes being Nil, 4T1 and CT26). The model was trained on 80% (a) and 60% (b) of randomly selected data and cross-validated using leave-one-out, and tested using the remaining data. Modelling was done on a progressively smaller number of features, from lowest to highest ranked, based on in-built Random Forest importance for class determination, and the process repeated 3 times. Model performance was assessed by several classification indicators, including area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall) with values being from 0 to 1 (and toward the latter being the best). The SHapley Additive exPlanations (SHAP) algorithm feature importance scores for classification using the top-15 features (ranked from highest to lowest) from the SHAP values are shown in (c), and show how the feature values impact on classification of each animal cohort, namely healthily control (Nil), CT26-bearing and 4T1-bearing cohorts.

    (TIFF)

    S6 Fig. Random Forest modelling to predicting CT26 tumour size and growth using blood immune phenotypes.

    Normalised blood immune features (S4 Fig) taken from 48 CT26-bearing animals that had both D7 and D14 blood samples (Fig 1B) were used in Random Forest modelling to predict CT26 tumour size at D14. The model was trained initially on 100%, 80% and 60% of randomised data and cross-validated using leave-one-out (Train panels) and tested using the remaining data (Test panels). Modelling was done on a progressively smaller number of features, from lowest to highest ranked based on in-built Random Forest importance, and the process repeated 3 times (mean and standard error of mean shown). Model performance was summarised showing the 60%:40%, training:testing split, and equality of test and train performance score means (using the top assigned features) assessed using ANOVA and shown in the main Fig (Fig 3). The Random Forest rank (RF rank) scores for the top-10 features are shown. Model performance was assessed by several regression indicators, including the error scores, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) (which we hoped to minimise), and the coefficient of determination score R2. D14 tumour size was used as the target using D14 blood samples to assess if blood immune features could predict current tumour size (a). D14 tumour size was used as the target using D7 blood samples to assess if blood immune features could predict future tumour size (b).

    (TIFF)

    S7 Fig. Random Forest modelling to predicting 4T1 tumour size and growth using blood immune phenotypes.

    Normalised blood immune features (S4 Fig) taken from 58 4T1-bearing animals that had both D7 and D14 blood samples (Fig 1B), were used in Random Forest modelling to predict 4T1 tumour size at D14. The model was trained initially on 100%, 80% and 60% of randomised data and cross-validated using leave-one-out (Train panels) and tested using the remaining data (Test panels). Modelling was done on a progressively smaller number of features, from lowest to highest ranked based on in-built Random Forest importance, and the process repeated 3 times (mean and standard error of mean shown). Model performance was summarised showing the 60%:40%, training:testing split and equality of test and train performance score means (using the top assigned features) assessed using ANOVA and shown in the main Fig (Fig 4). The Random Forest rank (RF rank) scores for the top-10 features are shown. Model performance was assessed by several regression indicators, including the error scores, MSE, MAE and RMSE (which we hoped to minimise), and the coefficient of determination score R2. D14 tumour size was used as the target using D14 blood samples to assess if blood immune features could predict current tumour size (a). D14 tumour size was used as the target using D7 blood samples to assess if blood immune features could predict future tumour size (b).

    (TIFF)

    S1 Table. List of antibodies for cell surface labelling.

    (DOCX)

    S1 File. All raw data.

    (XLSX)

    S2 File. RF pipeline classification.

    (OWS)

    S3 File. RF pipeline regression.

    (OWS)

    S4 File. MDS pipeline for regression.

    (OWS)

    Attachment

    Submitted filename: Response to Reviewers_edits.docx

    Data Availability Statement

    All relevant data are within the manuscript, its Supporting Information files and/or held in the Australian National University (ANU) DATA COMMONS repository at https://dx.doi.org/10.25911/6153a8ab5747c (which has the raw Flow Cytometry Standard (FCS) files).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES