Abstract
Remarkable progress in molecular analyses has improved our understanding of the evolution of cancer cells towards immune escape1–5. However, the spatial configurations of immune and stromal cells, which may shed light on the evolution of immune escape across tumor geographical locations, remain unaddressed. We integrated multi-region exome and RNA-seq data with spatial histology mapped by deep learning in 100 non-small cell lung cancer (NSCLC) patients from the TRAcking Cancer Evolution through Therapy (Rx) (TRACERx) cohort6. Cancer subclones derived from immune cold regions were more closely related in mutation space, diversifying more recently than subclones from immune hot regions. In TRACERx and in an independent multi-sample cohort of 970 lung adenocarcinoma (LUAD) patients, the number of immune cold regions significantly correlated with risk of relapse, independently of tumor size, stage and number of samples per patient. In LUAD, but not lung squamous cell carcinoma (LUSC), geometrical irregularity and complexity of the cancer-stromal cell interface significantly increased in tumor regions without disruption of antigen presentation. Decreased lymphocyte accumulation in adjacent stroma was observed in tumors with low clonal neoantigen burden. Collectively, immune geospatial variability elucidates tumor ecological constraints that may shape the emergence of immune evading subclones and aggressive clinical phenotypes.
Using an artificial intelligence framework, we developed a generalizable deep learning pipeline to spatially profile immune infiltration and discover tumor topological determinants of immunosuppression in digital pathology. Convolutional neural networks were tailored for the analysis of NSCLC morphology using diverse histology samples in the multi-region TRACERx 100 cohort6 to avoid overfitting (Methods). This approach enabled the spatial mapping of cancer cells, lymphocytes, stromal cells (fibroblasts and endothelial cells), and an “other” cell class (macrophages, pneumocytes and non-identifiable cells) in hematoxylin & eosin (H&E)-stained images (275 tumor regions from 85 patients and 100 diagnostic slides from all patients, Fig. 1a-c, CONSORT diagram Extended Data Fig. 1a-b, Supplementary Table 1). T cell subsets were also identified in CD4/CD8/FOXP3 immunohistochemistry (IHC) images for all 100 diagnostic samples (Fig. 1d).
This pipeline for H&E analysis exhibited high accuracy and consistency compared with five orthogonal data types within TRACERx, including DNA-seq, RNA-seq, IHC, 5,951 single-cell annotations by pathologists (balanced accuracy, as an average of specificity and sensitivity = 0.932), and pathology tumor-infiltrating lymphocyte (TIL) estimates following the guidelines developed by the International Immuno-Oncology Biomarker Working Group7 (Extended Data Fig. 2, Supplementary Table 2). The Leicester Archival Thoracic Tumor Investigatory Cohort8 (LATTICe-A, Extended Data Fig. 1c-d), a retrospective study of 970 resected LUAD patients that included H&E sections from all diagnostic tumor blocks with a median of four samples per tumor, was used for independent validation. The pipeline’s generalizability was supported using 5,082 pathologists’ single-cell annotations (balanced accuracy = 0.913), and virtual integration of IHC and H&E images generated from the same slides (Fig. 1e-h, Extended Data Fig. 2e-g, Supplementary Table 3). Using this unbiased scalable approach, immune infiltration was quantified as the percentage of all cells that were lymphocytes in each H&E image.
High geospatial immune variability between tumor regions within the same patients was revealed (Fig. 2a-b), which did not reflect associations with pathological stage (Extended Data Fig. 3). To differentiate highly from poorly immune infiltrated tumor regions, regions containing a lymphocyte percentage greater than a quarter standard deviation above the median lymphocyte percentage were classified as immune hot, and regions containing a lymphocyte percentage below a quarter standard deviation of the median were classified as immune cold. The remaining 20% were classified as intermediate (Fig. 2b). Subsequent results were tested on four more classification schemes based on the standard deviation to ensure that results derived from this classification were not contingent upon choice of thresholds used (Extended Data Fig. 4). Significant difference in pathology TIL estimates was observed between immune hot and cold regions (R = 4.6x10-8, Extended Data Fig. 5a). Significantly higher levels of RNA-seq estimated immune infiltrate1, particularly for immune activation subsets, were consistently observed in immune hot compared to cold regions, supporting the validity of histology-based immune classification (Fig. 2c-d). We next directly compared our immune hot and cold regional classification (excluding intermediate regions) against RNA-seq-based1 classifications ( = 109 regions with histology and RNA-seq data). 78 out of 109 regions were in agreement (Fisher’s exact test for overlap: R = 7.8x10-6, Extended Data Fig. 5b). Regions with discrepant classification ( = 31) had significantly higher spatial heterogeneity of lymphocyte distribution compared to regions concordant between the two methods (R = 0.01, Extended Data Fig. 5c), suggesting spatial intratumor heterogeneity could contribute towards the discrepancy, since the different data types were derived from adjacent sections of the same tumor blocks.
Ecological selection pressures drive genetic divergence9,10. To determine if cancer genetic divergence differs according to immune context, we calculated the genomic distance as the Euclidean distance of subclonal mutations for each pair of tumor regions with the same immune phenotype in a patient. We observed significantly lower genomic distance, indicating more shared subclonal mutations, for pairs of immune cold regions than for pairs of immune hot regions in LUAD (Fig. 3a, Extended Data Fig.4b, R < 0.005 for all immune classification schemes), but not in LUSC (Extended Data Fig. 6a). In LUAD but not LUSC, analysis of immune phenotypes mapped onto the phylogenetic trees6 revealed that dominant clones (cancer cell fraction ≥ 75%, see Methods) in pairs of cold regions were more closely related on the phylogenetic tree, compared to dominant clones in pairs of immune hot regions (Fig. 3b). Moreover, dominant clones in hot regions almost always diversified at the most recent common ancestor of the tree (13/15, 87%, Fig. 3c), in contrast no such preference was observed in immune cold regions (11/23, 48%).
We investigated the impact of immune context on disease-free survival. Tumors with high number of immune cold regions were at significantly increased risk of relapse that was independent of the total number of regions sampled, tumor size and stage in both histology types in TRACERx (Fig. 3d-e, Extended Data Fig. 6c-h). This association with disease-free survival was also significant using the number of immune low regions as estimated by RNAseq1 in 64 TRACERx tumors with available RNA-seq data (R = 0.002, Extended Data Fig. 6b). Following the genomic findings in LUAD, we sought to validate this in 970 LUAD patients in the multi-sample LATTICe-A cohort, confirming the prognostic value of immune cold sample count, that was also independent of the number of samples per patient, tumor size and stage (Fig. 3f-g, Extended Data Fig. 6c-e). In both cohorts, the number of immune cold samples per patient correlated with relapse, more significantly than any other immune feature generated using deep learning, including the average and variability of lymphocyte percentage per tumor, number of immune hot regions, proportion of immune cold regions to the number of regions sampled, as well as CD8+ cell percentage or CD8+ to CD4+FOXP3+ ratio in TRACERx diagnostic slides (Extended Data Fig. 6e).
Studies have revealed immunosuppressive fibroblast subsets localizing to the boundary of tumor nests possibly contribute to T cell exclusion11–13. Therefore, we hypothesized that increased cancer-stroma physical contact may reflect stroma-modulated inhibition of antitumor immune responses14–17. To measure the physical contact between cancer and stromal cells (the majority being fibroblasts) identified by image analysis, we developed a spatial measure, using fractal dimension to quantify the geographical irregularity and complexity of the cancer-stromal cell interface (Methods, Fig. 4a, Extended Data Fig. 7a,b,e). Within the same tissue space, higher fractal dimension of cancer-stromal cell interface suggests increased geometric irregularity and more extensive physical contact between tumor and stromal cells than samples with a smooth interface. For both histology types, fractal dimension was significantly higher in immune cold regions compared to immune hot regions (Fig. 4b, Extended Data Fig. 7c). Moreover, the difference in fractal dimension between immune cold and hot regions was more significant compared to the difference in stromal cell percentage (both histology types combined: R = 0.00036, effectsize 0.49 for fractal dimension versus R = 0.018, effect size 0.38 for stromal cell percentage, Extended Data Fig. 7d), suggesting the importance ofstromal cell geographical location rather than their quantity. This supports the hypothesis that the stroma-based inhibition of immune infiltration17 may result from a specific topological pattern in the form of cancer-stroma engagement.
To understand the associations of stromal-mediated immunosuppression in the context of the genetic mechanisms of immune evasion, we related fractal dimension to dysfunction in antigen presentation through loss of heterozygosity at the human leukocyte antigen locus (HLA LOH), which has been identified as a potent immune escape mechanism1,18. A significantly higher fractal dimension was found in LUAD tumor regions with intact HLA alleles compared with regions harboring HLA LOH (Fig. 4c, Extended Data Fig. 7f). This was observed at the tumor level (see Methods for definition), independent of clonal neoantigen burden (R = 0.04, multivariate regression, Extended Data Fig. 7h), but was not observed in LUSC (Extended Data Fig. 7g, i).
Although clonal neoantigens have been associated with a cytotoxic immune response19, the spatial distribution of lymphocytes in relation to clonal neoantigens remained unclear. To provide sufficient spatial context for analysis of cell distribution, whole-section TRACERx diagnostic H&E images, typically 10x larger than the regional samples, were used. To test the relationship between lymphocyte spatial distribution and clonal neoantigens, we leveraged an established method for lymphocyte spatial modeling20. Each lymphocyte was classified into three distinct spatial compartments: intra-tumor, adjacent-to-tumor or distaltumor, based on unsupervised modeling of cancer-lymphocyte proximity (Fig. 4d). In LUAD, but not LUSC, clonal neoantigens19 were found to be associated with a specific immune spatial score to approximate pathology TIL estimates7, defined as the ratio of adjacenttumor lymphocytes to stromal cells in the diagnostic H&E samples (R = 0.0074, high clonal neoantigen defined as above median in LUAD, Fig. 4e; correlation as continuous variables Rho = 0.37, R = 0.035 after multiple testing correction, Extended Data Fig. 8a). By contrast, subclonal neoantigen burden did not correlate with any immune score (Extended Data Fig. 8a), supporting the notion that clonal but not subclonal neoantigens is associated with infiltration of cytotoxic T cells19 adjacent to tumor nests.
To determine if there was an enrichment of a specific lymphocyte subpopulation within the adjacent-tumor compartment in LUAD, we spatially aligned IHC to H&E in 10 samples with the highest adjacent-tumor lymphocytes to stromal cell ratio, and projected IHC-derived T cell subsets onto H&E images, thereby creating virtual staining of cells in the H&E sections (Methods, Fig. 4f, Extended Data Fig. 8b-c). CD4+FOXP3-, CD8+, and CD4+FOXP3+ cells classified in IHC were projected onto a density map of cancer cell distribution inferred from H&E, and were classified into adjacent-tumor, intra-tumor, and distal-tumor compartments. In this limited dataset, a significant increase of the effector-regulator balance defined by CD8+/CD4+FOXP3+ cell ratio was observed in adjacent-tumor stroma compared to the distal tumor compartment (Fig. 4g).
In summary, by training deep learning algorithms in diverse histology samples, we demonstrated that digital pathology can provide accurate tools for defining the ecological spatial context that may improve our understanding of cancer evolution and the immune response. In TRACERx and LATTICe-A cohorts, LUAD tumors with increased immune cold regions were at a significantly higher risk of cancer relapse, independent of total regions sampled and immune phenotypes of other regions. Thus, even within a tumor that has on average increased immune infiltration, if it contains regions classified as immune cold, prognosis appears to be associated with the number of cold regions. Analysis of cancer branched evolution within the ecological context of immune hot and cold regions revealed a difference in the evolution history of cancer subclones in these regions, possibly as a result of immunoediting. Based on this finding, we speculate that by identifying the subclone where immunoediting is likely to have occurred, new drivers of immune evasion may be elucidated.
Spatial histology data can extend our knowledge of the tumor microenvironment topological configuration in relation to genetic alterations relevant to immune surveillance, including HLA LOH and clonal neoantigens in LUAD (Extended Data Fig. 9). Increased cancerstromal engagement as measured by fractal dimension may signal physical constraints against T cell ingress. This is supported by previous studies in lung cancer showing restriction of CD8+ and CD4+ T cell motility in dense stromal extracellular matrix areas around tumor epithelial cell regions which prevent them from entering tumor islets13. Additionally, the association between specific spatial localization of lymphocytes in tumoradjacent stroma and clonal neoantigens further support exploration of the role of stromal cells in limiting tumor infiltration by T cells14–17.
It will be imperative to validate our findings on a larger multi-region cohort of untreated NSCLC tumors. Differences in our findings pertaining to LUAD and LUSC may reflect differences in biology21–23 and immune evasion mechanisms, including increased prevalence of antigen presentation dysfunction (HLA transcriptional repression and HLA LOH1) in LUSC. Other limitations include the lack of detailed staining using multiplexing technologies24–26 that could provide further insights into immune composition. However, with advanced deep learning developments and detailed tumor phylogenetic data, histology can be used to highlight fundamental immune contexture such as immune exclusion and its topological determinants. These data illuminate the clinical significance of immune cold regions that may reflect immune evading subclones, warranting further investigation into mechanisms that could contribute to the spatial variability of immune cells.
Methods
Tissues and digital images
The main cohort evaluated comes from the first 100 patients prospectively analyzed by the lung TRACERx study6 (Extended Data Fig. 1, Supplementary Tables 1, 4, https://clinicaltrials.gov/ct2/show/NCT01888601, approved by an independent Research Ethics Committee, 13/LO/1546). 62 were men and 38 were women, with a median age of 68. 61 were LUAD, 32 were LUSC and the remaining 7 had ‘other’ histology subtypes (including adenosquamous carcinoma, large cell carcinoma, large cell neuroendocrine carcinoma, pleomorphic carcinoma and pleomorphic carcinoma arising from adenocarcinoma).
The 85 case subcohort with regional histology consisted of 55 male and 30 female patients and of those 49 were LUAD, 32 were LUSC and 6 were ‘other’ types. 10 of these patients had a single region while the rest ranged between 2-8 regions (= 275 total regional histology samples). Snap-frozen regional samples were processed to FFPE blocks after dissecting fresh-frozen tissues for DNA-seq and RNA-seq analyses. Tissue microarrays (TMAs) were created containing 133x2mm regional tissue cores from 75 patients in 7 blocks.
In addition to the regional samples, full-sized diagnostic blocks were obtained for all 100 cases precisely mirroring the Jamal-Hanjani et al. 2017 prospective 100 patient cohort6. 4μm thick sections were cut and subjected to H&E staining and multiplex IHC for CD8/CD4/FOXP3: anti-CD8 (type: Rabbit Monoclonal, clone: SP239, cat. no.: abl78089, source: Abeam Pic, Cambridge, UK, used at 1:100); anti-CD4 (type: Rabbit Monoclonal, clone: SP35, cat. no.: ab213215, source: Abeam Pic, Cambridge, UK, used at 1:50); antiFOXP3 (type: Mouse, clone: 236A/E7, source: kind gift from Dr G Roncador, CNIO, Madrid, Spain, used at: 1:100). All regional and diagnostic slides were scanned using NanoZoomer S210 digital slide scanner (C13239-01) and NanoZoomer digital pathology system version 3.1.7 (Hamamatsu, Japan) at 40x (228 nm/pixel resolution).
The external validation cohort was obtained from the Leicester Archival Thoracic Tumor Investigatory Cohort – Adenocarcinoma (LATTICe-A) study8, a continuous retrospective series of resected primary LUAD tumors from a single surgical center between years 1998 to 2014 (Extended Data Fig. 1, Supplementary Table 5). It consists of 4,324 whole-tumor diagnostic blocks from 970 LUAD patients (ranging from 1 to 16 blocks per case with a median of 4). 455 were men and 515 were women with a median age of 69. Most clinical data (age, sex, adjuvant therapy status and time to recurrence or death) were available for all patients, with complete pathological stage for 827 and smoking history for 651. All archival slides containing tumor material were used in order to capture the full diversity of each lesion. Slides were dearchived and scanned using a Hamamatsu NanoZoomer XR at 40x (226 nm/pixel resolution) yielding 15 TB of image data. Images containing incidental lymph node tissue were excluded to avoid confounding immune infiltration analysis. For the biological validation assay, a subset of 49 paraffin blocks from 49 patients was obtained from the same study, and from these a validation TMA was prepared, containing a single 1mm core from each case. The work was ethically approved by an NHS research ethics committee (ref. 14/EM/1159). This study complies with the STROBE guidelines.
The deep learning pipeline for cell detection and classification
The deep learning pipeline consists of three parts. First, the pipeline segments tissue regions utilizing multi-resolution input/output image features (Micro-Net27). It was designed to capture global tissue context and learn weak features that could be important for identifying tissue boundary, but are often not achieved by other machine learning methods such as thresholding of the grey-scale image, active contours, watershed segmentation or Support Vector Machine-based training on local binary pattern features27. Tissue segmentation removes background noise and artefacts and subsequently allows for more computationally efficient cell detection and accurate classification. Secondly, a cell detection model modified from SCCNN28 predicts for each pixel the probability that it belongs to the center of a nucleus within tissue regions identified by Micro-Net. Nuclei are detected from the probability map obtained from the deep network. Lastly, a cell classification framework utilizes a neighboring ensemble predictor classifier coupled with SCCNN to classify each cell by type.
For tissue segmentation, each whole slide image was reduced to 1.25x resolution and segmented for tissue regions using Micro-Net-5 1 227 architecture. This architecture visualizes the image at multiple resolutions, captures context information by connecting intermediate deep layers and adds bypass connections to max-pooling to maintain weak features (Fig. 1b). 10 whole slide images were used to train the tissue segmentation network using MicroNet. The segmented images from the network were inspected visually and quantitatively (Supplementary Table 6, Supplementary Figures 1-20) to evaluate performance using an independent set of images.
The SCCNN adds two layers to conventional deep learning architecture for cell detection within the segmented tissue. SC1 estimates the location and probability of each pixel belonging to the center of a cell, and these probabilities are then mapped by SC2 to the image. A customized implementation of SCCNN was coded in Python (version 3.5) using TensorFlow29 library (version 1.3) which makes it computationally more efficient compared to the original MATLAB implementation28. To process an image of size 1000×1000 pixels, the Python implementation takes 4.8 seconds for nucleus detection compared to 41.0 seconds using the original implementation28, excluding preprocessing which remained the same in both implementations (using MATLAB (version 2018b)). In addition, through empirical experimentation, we optimized the patch size to 31x31 instead of 27x27 in the original implementation for increased cell detection accuracy. To generate nuclear locations from the SC2 probability map, peak detection was applied where thresholds for intensity and minimum grouping distance were also optimized to 0.15 and 12 pixels through experimentation using validation data.
For cell classification, a neighboring ensemble predictor was used. This predictor utilizes SCCNN to classify cells in neighboring locations to the detected center of the cell. In our implementation, the ensemble classifier required votes from SCCNN classification of nine different neighborhood locations near to the center of the cell compared to five votes in original implementation. Through experimentation, the patch size was optimized to 51x51 for classification instead of 27x27 as originally proposed. This permitted incorporation of greater tissue spatial context while maintaining the accuracy of classifying small cells.
Altogether, this pipeline enabled the spatial mapping of four cell types from H&E images: cancer (malignant epithelial) cells, lymphocytes (including plasma cells), noninflammatory stromal cells (fibroblasts and endothelial cells), and an “other” cell type that included nonidentifiable cells, less abundant cells such as macrophages and chondrocytes, and ‘normal’ pneumocytes and bronchial epithelial cells.
Training the deep learning pipeline
To improve neural network generalizability and to avoid overfitting for cell detection and classification, we trained and tested our pipeline on a variety of sample types, including diagnostic (= 100), regional (= 275) and 133 cores corresponding to 75 TRACERx patients from TMA slides (63 patients had two cores and 12 patients had a single core). Both cell detection and classification were trained based on single-cell annotations from pathologists. Two thoracic pathologists annotated 26,960 cells on 53 whole slide images (3 TMAs, 35 regional slides and 15 diagnostic slides) to incorporate morphological variations in appearance of various cell types and stain variability. Several hundred examples of each cell class were marked on 76 cores selected at random from TMA images. In total, 4,056, 5,310, 15,007, 2,587 annotations were collected for stromal cells, lymphocytes, cancer cells and “other” cell types, respectively. These whole slide images were divided into small tile images of size 2000×2000 pixels (each pixel = 0.5μm), which were then divided into three sample sets maintaining the class distribution of cells. These included: 13 diagnostic, 58 regional and 134 TMA tile images for training; 4 diagnostic, 21 regional and 72 TMA tile images for validation; and 3 diagnostic, 22 regional and 61 TMA tile images for testing. As a result, the annotations were divided between the three groups; 2/3 for training, 1/6 for validation and 1/6 for testing. The training set included annotations for 2,147 stromal cells, 3,183 lymphocytes, 10,103 cancer and 1,357 other cell types. The validation set had annotations for 473 stromal cells, 825 lymphocytes, 2,562 tumor and 359 other cell types. Breakdown for the test set is provided in Supplementary Table 2.
For IHC cell classification, we used a pretrained SCCNN network on samples stained for CD4/CD8/FOXP3. The training set consisted of 1,657 CD4+FOXP3-, 3,187 CD8+, 1,001 CD4+FOXP3+, and 3,488 other (negative) cells. The trained network was tested on 5,028 cell annotations collected on 6 lung diagnostic whole slide images, including 251 CD4+FOXP3-, 406 CD8+, 123 CD4+FOXP3+ and 4,248 other cells to test the ability of the algorithm in correctly detecting and classifying negative cells. See Supplementary Table 7 for the total number of identified cells in the H&E diagnostic, H&E multi-region and IHC diagnostic datasets.
Validation of the H&E deep learning pipeline with orthogonal data types
The algorithms’ performance in detecting and classifying single cells in H&E were first evaluated against the test set of 5951 cells. Individual class accuracy statistics were calculated using the R function ‘confusionMatrix’ from the R package ‘caret’.
Pathology TIL estimates were scored following the international guidelines developed by the International Immuno-Oncology Biomarker Working Group7. Briefly, by inspection of H&E slide of a given tumor region, the fraction of the stromal area infiltrated by TILs was assessed.
For regional samples, tumor cellularity, estimated as the computed percentage cancer cells was correlated with tumor purity estimated by ASCAT based on DNA-seq copy number and VAF purity (both available from Jamal-Hanjani et al.6, = 239 regional tumor samples). The RNA-seq-based CD8+ T cell signature (available from Rosenthal et al.1, computed using the Danaher et al. method30) was correlated with the deep learning based lymphocyte percentage for 142 regional tumor samples. For diagnostic samples, deep learning-based lymphocyte percentage from H&E was correlated with deep learning-based CD8+ cell percentage from IHC (= 100 diagnostic samples, Extended Data Fig. 2a-d).
Discordance rate between RNA-seq based1 and histology/deep learning-based immune hot and cold regional classification was calculated by cross-tabulation of immune hot and cold (from histology) versus high and low (from RNA-seq), disregarding any regions without one of these two types of data. The RNA-seq method used 15 immune cell signatures presenting different T- and B-cell subsets, as well as neutrophils, macrophages, mast and dendritic cells, to classify tumor regions into high and low categories. A Fisher’s exact test was used to compute the overlap between the two immune classifications. Distributions of multiple immune scores (lymphocyte percentage, intra-tumor lymphocytes and adjacent-tumor lymphocytes/stroma) as well as ASCAT tumor purity were compared between hot versus cold (deep learning) and high versus low (RNA-seq) classifications (Extended Data Fig. 5).
Validation of the deep learning pipeline with the independent LATTICe-A cohort
The external validity of the proposed deep learning pipeline was performed on 100 randomly selected patients from the LATTICe-A cohort8. This validation ensures that the trained cell detection and cell classification models from the TRACERx tumor blocks are generalizable to a distinct dataset which is processed, stained and scanned in another center (the LATTICe-A study, University of Leicester).
All 100 whole-tumor H&E sections were processed using the same TRACERx trained model. The validation was then performed using two data types. First, a pathologist provided 5,082 single-cell annotations following the same protocol for TRACERx in 20 randomly selected LATTICe-A sections. The breakdown for single-cell annotations was 1,997 stromal cells, 787 lymphocyte cells, 1,839 cancer cells and 459 other cells (see Supplementary Table 3). Second, two independent pathologists jointly scored the remaining 80 sections for overall fraction of lymphocytic infiltration and pathology TIL estimates7. These manual scores were correlated with the deep learning-based lymphocyte percentage and adjacent-tumor lymphocytes/total stroma (Extended Data Fig. 2e).
Validation of the deep learning pipeline with biological assays
A new biological validation method was developed to overcome the challenge of obtaining large quantities of cell-specific validation data (Fig. 1f-h, Extended Data Fig. 2f-g). 48 cores were available for the TTF1-H&E image pairs, 38 for the CD45-H&E pairs, and 33 for the SMA-H&E pairs. Stains were performed using a Ventana BenchMark ULTRA instrument (H&E, TTF-1) or a Dako Link 48 (CD-45, SMA). Digital images were acquired using a Hamamatsu Nanozoomer slide scanner. First, H&E staining was performed using a Leica Infinity kit, and a digital image was collected. The slide was subsequently de-coverslipped, the H&E stain removed by acid alcohol washing, and then an immunohistochemical stain with haematoxylin counterstain was applied using a standard diagnostic antigen retrieval and antibody protocol. A second digital image was acquired after mounting and coverslipping. Through experimentation, no difference in the staining was observed when the procedure was reversed.
TTF-1 (type: Novocastra Liquid Mouse Monoclonal antibody thyroid transcription factor 1, clone: SPT24, cat. no.: NCL-L-TTF-1, source: Leica biosystems, Germany, used at 1:100) was selected as the cancer cell marker in these LUAD samples because it is the most robust and widely used immunohistochemical marker of LUAD cells31. It is very specific, both in that only epithelial cells are stained in the lung, and in that very few tumors of non-lung or thyroid origin are stained32. The sensitivity of the antibody clone used (SPT24) is also high, staining >75% of tumor cells in 76% of LUAD tumors in one published series33. However, as this implies, there are many tumors in which tumor cell staining is incomplete (i.e. <100%). Therefore, only cores showing near-universal TTF-l-positivity of tumor cells were used for validation, in order to provide the best possible ‘gold standard’ comparator for the deep learning algorithm. The same procedure was followed for pairs of H&E-CD45 (anti-human CD45, type: Mouse Monoclonal, clone: 2B11 + PD7/26, cat. no.: M0701, source: Agilent DAKO, USA, used at 1:200) and H&E-SMA (myofibroblast marker, type: Mouse Monoclonal antibody Smooth Muscle Actin (1A4), cat. no.: 760-2833, source: Roche, Switzerland, a ready to use antibody) to biologically validate the accuracy of single cell classification.
In total, 64,976 TTF1+ cells, 26,284 CD45+ cells and 46,343 SMA+ cells were detected from the IHC images, denoting the advantage of this method in acquiring large amount of validation data at single-cell resolution. The correlation measured (Fig. 1f-h, Extended Data Fig. 2g) was that between the fraction of classified cells in the H&E versus fraction of positively stained IHC cells per 100μm2.
Immune phenotype classification
To classify tumor regions into different immune phenotypes, we assigned each region to an immune hot, cold or intermediate category based on lymphocyte percentage. The dependency of our subsequently results on thresholds chosen for this classification scheme was tested after applying perturbations to the thresholds used. Four new classification schemes were tested: no intermediate zone (i.e. using median lymphocyte percentage for separating hot and cold regions), regions with lymphocyte percentage greater than standard deviation/2 above/below the median lymphocyte percentage classified as immune hot/cold,, and similarly for standard deviation/3 and standard deviation/6 (Extended Data Fig. 4a-b). For every new classification, we repeated the multivariate survival analysis to confirm the significance of the number of immune cold regions in predicting disease-free survival as well as the genomic distance test for pairs of immune hot versus immune cold regions in LUAD patients (Extended Data Fig. 4b). In addition, the CD8+ RNA-seq signature was used to test the difference in CD8+ levels between immune hot and immune cold phenotypes across all classification schemes (Extended Data Fig. 4c).
Genomic distance measure
Genomic distance was calculated as described previously1, by taking the Euclidean distance of the mutations present for every pair of immune hot and immune cold regions from the same patient. All mutations present in a region from a tumor were turned into a binary matrix of which the rows were mutations and columns were the tumor regions. From this matrix, the pairwise distance was determined.
Distance between dominant clones to the last common ancestor of region pair
Deep learning-based immune phenotypes were integrated with the TRACERx phylogenetics data6. Dominant clones (using the upper quartile of cancer cell fraction, ≥ 75%) were labelled for all tumor regions’ trees which had an available H&E sample in LUAD patients (= 76 regions, 15 immune hot pairs and 23 immune cold pairs). For every pair of immune hot ≥ cold regions within a tumor, the distance between the dominant clones (as measured by branch length, i.e. number of mutations) via their last common ancestor was computed. The recently shared ancestry clone between the two dominant clones was labelled as the ‘last common ancestor of region pair’ (annotated with arrows in Fig 3.c). To ensure this analysis was not dependent on a certain cancer cell fraction threshold, multiple thresholds (CCF ≥ 80%, 85%) were placed while repeating the same analysis. Next, by identifying the last common ancestral subclone for pairs of the same phenotype, each pair was categorized into one of two diversification patterns: ‘diversifying at the most recent common ancestor (MRCA) of the tree’ or ‘diversifying at a descendant subclone of the MRCA of the tree’. The latter category included a pattern exclusive to immune cold pairs, where the two regions shared the same dominant subclone that was the direct descendant of the MRCA of the tree.
Tumor spatial modelling
H&E and IHC cell abundance scores (e.g. lymphocyte percentage, CD8+ percentage) were computed as the percentage of a cell type in the total sample cell count. Stromal TILs were identified using spatial modelling20,34,35, where lymphocytes were classified (using unsupervised clustering) into intra-tumor lymphocytes, adjacent-tumor lymphocytes and distal-tumor lymphocytes based on their spatial proximity to epithelial cell nests in H&Es. The immune hotspot score was calculated using the Getis-Ord algorithm as previously described36. To capture the emergence of complex morphological patterns that dictate cancer-stromal cell spatial contact preserved over varying spatial scales, a fractal dimension calculation (Minkowski-Bouligand dimension) was performed using the boxcounting algorithm37. This algorithm calculates the number of boxes of a certain size needed to cover a geometric pattern. We modified a MATLAB-based algorithm38 to include both spatial information of cancer and stromal cells, as opposed to its conventional use on one variable (i.e. pixel information of an image). The analysis was carried out on spatial maps generated using coordinates of classified stromal and cancer cells, while utilizing the tissue segmented image (as a boundary mask) to exclude all empty tissue areas. Choices of box size were informed by the distribution of minimum and maximum Euclidean distance for each stromal cell to its nearest cancer cell in all 275 tumor regions (Extended Data Fig. 7a). The mean minimum distance was 21.43μm. We limited the upper box size at 300μm, which is just above a previously proposed cell-cell communication distance of 250μm39 but designed to be more inclusive. For statistical tests where fractal dimension was represented at tumor level, the maximum regional score was used.
H&E-IHC spatial alignment/immune subset projection
For a H&E diagnostic slide, we determined the number of intra-tumor lymphocytes, adjacent-tumor lymphocytes and distal-tumor lymphocytes (nI, nA, nD) based on spatial modelling of the H&Es. After spatial alignment of IHC and projecting IHC-derived cells onto the H&E, the number of CD8+ cells that were also intra-tumor lymphocytes was determined (nCD8 ITL), and similarly for other cell types. As a result, intra-tumor lymphocytes were deconvolved by nI = nCD8 I, + nCD4 I, + nFOXP3, + nother I. Two-sided paired Wilcox was used to test the difference in the percentage of CD8+ cells among intra-tumor lymphocytes, adjacent-tumor lymphocytes and distal-tumor lymphocytes (nCD8 ATL, nCD8 DTL, nCD8 ITL). The same test was performed for CD4+FOXP3- and CD4+FOXP3+ cells.
The 10 LUAD patients with the highest adjacent-tumor lymphocytes to stromal cell ratio were selected for this immune subset spatial projection. All samples had above median CD8+%. One sample was excluded due to poor HE-IHC alignment quality and the subsequent analysis was performed on the remaining nine samples. The quality of alignment was evaluated by manually identifying 238 visible landmarks and placed on corresponding positions in H&E and IHC tiles (total number of tiles = 249, maximum landmarks per tile = 5), as shown in Extended Data Fig. 8b. These marked points were used to compute the Euclidean distance (difference in, coordinates) between them to obtain a quantitative measurement ofalignment accuracy. The average distance between matching landmarks was 9.57μm, whereas the maximum distance between the H&E and CD4/CD8/FOXP3 sections was 16μm.
Survival analysis and other statistical methods
Survival tests were conducted using Kaplan-Meier estimator (‘ggsurvplo’ R function from the ‘survminer’ and ‘survival’ R packages) as well as Cox model (‘coxph’ R function and displayed using ‘ggfores’ R function). Forest plots show the hazard ratio in the x-axis; each variable’s hazard ratio is plotted and annotated with a 95% confidence interval. The clinical parameters included in the multivariate model were age, sex, smoking pack years, histology (whether LUAD, LUSC or otherwise), tumor stage, adjuvant therapy (whether received or not). Because of its prognostic importance in TRACERx, the upper quartile of clonal neoantigens in each histology cohort was also incorporated in the multivariate model. The range of available disease-free survival data was 34-1364 days (median = 915 days) in TRACERx, and 1-6139 days (median = 684 days) in LATTICe-A. All hazard ratios were computed on all time points (i.e. the whole survival curve, not at a specific time point). Correlation tests used Spearman’s method and were generated using the function ‘ggscatter’ from the ‘ggpubr’ R package. All correlation plots show the Rho (ρ) coefficient and the significance R-value. For statistical comparisons among groups, a two-sided, non-parametric, unpaired, Wilcoxon signed-rank test was used, unless stated otherwise. All box plots were generated using the function ‘ggboxplo’ from the ‘ggpubr’ R package (all data points are plotted with the ‘jitter’ option, the median value is indicated by a thick horizontal line; minimum and maximum values are indicated by the extreme points; the first and third quantiles are represented by the box edges; and vertical lines indicate the error range) or the function ‘ggbetweenstats’ from the ‘ggstatplo’ R package for more than two groups. Tests for concordance between two data classes were analyzed using a Fisher’s exact test. All statistical tests were two-sided, a R value of less than .05 was considered statistically significant. To adjust R-values for multiple comparisons, the Benjamini & Hochberg method was used. To measure effect size, Cohen’s d method was used. All statistical analyses were conducted in R (version 3.5.1).
Extended Data
Supplementary Material
Reporting summary.
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Acknowledgements
This study is funded by a Cancer Research UK Career Establishment Award to Y.Y. (C45982/A21808). The TRACERx study (Clinicaltrials.gov no: NCT01888601) is sponsored by University College London (UCL/12/0279) and has been approved by an independent Research Ethics Committee (13/LO/1546). TRACERx is funded by Cancer Research UK (C11496/A17786) and coordinated through the Cancer Research UK and UCL Cancer Trials Centre. Y.Y. acknowledges additional support from Breast Cancer Now (2015NovPR638), Children’s Cancer and Leukaemia Group (CCLGA201906), NIH U54 CA217376 and R01 CA185138, CDMRP Breast Cancer Research Program Award BC132057, European Commission ITN (H2020-MSCA-ITN-2019), Wellcome Trust (105104/Z/14/Z), and The Royal Marsden/ICR National Institute of Health Research Biomedical Research Centre. C.S. is Royal Society Napier Research Professor. This work was supported by the Francis Crick Institute that receives its core funding from Cancer Research UK (FCOOII69,FC001202), the UK Medical Research Council (FC001169, FC001202), and the Wellcome Trust (FC001169, FC001202). C.S. is funded by Cancer Research UK (TRACERx, PEACE and CRUK Cancer Immunotherapy Catalyst Network), the CRUK Lung Cancer Centre of Excellence, the Rosetrees Trust, NovoNordisk Foundation (ID16584) and the Breast Cancer Research Foundation (BCRF). This research is supported by a Stand Up To Cancer-LUNGevityAmerican Lung Association Lung Cancer Interception Dream Team Translational Research Grant (Grant Number: SU2C-AACR-DT23-17). Stand Up To Cancer is a program of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the Scientific Partner of SU2C. CS receives funding from the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013) Consolidator Grant (FP7-THESEUS-617844), European Commission ITN (FP7-PloidyNet 607722), an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 835297), and Chromavision from the European Union’s Horizon 2020 research and innovation programme (grant agreement 665233). S.A.Q. is funded by a Cancer Research UK Senior Cancer Research Fellowship (C36463/A22246) and a Cancer Research UK Biotherapeutic Program Grant (C36463/A20764). S.L. is supported by the National Breast Cancer Foundation of Australia Endowed Chair and the Breast Cancer Research Foundation, New York. L.Z. has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 846614. C.T.H. is funded by the UCL Biomedical Research Council. M.J.H. has received funding from Cancer Research UK, National Institute for Health Research, Rosetrees Trust and UKI NETs. We thank the members ofthe TRACERx and PEACE consortia for participating in this study. We thankthe Tissue Image Analytics lab at the University of Warwick, Coventry, UK for their help in method implementation. We thank the Scientific Computing team at The Institute of Cancer Research, London for technical support. We also thankAna Teodósìo and Catherine Ficken from the MRC Toxicology Unit core histology facility for their expert technical assistance.
Footnotes
Author Contributions
K.A. and S.E.A.R. contributed equally to this work. S.E.A.R. and K.A. developed the image processing and deep learning pipeline and performed the geospatial analysis. K.A. performed the bioinformatics and statistical analyses. J.L.Q., R.S. and D.A.M. provided pathological expertise. M.J.-H. provided clinical expertise and patient characterization. S.V. performed histology sample generation and digitized H&E slides. A.A. generated and digitized IHC slides underthe supervision ofT.M. T.L. provided annotations fortraining and validating IHC analysis. N.M., R.R. and L.Z. assisted with genomic data integration. J.L.Q., R. S., S.L., M.A.B., D.A.M., C.T.H., and T.L. analyzed pathology TIL estimates. J.L.Q., L.O., M.S., and C. R. S. provided data and advice for LATTICe-A. Y.Y., N.M., J.L.Q., C.S., A.H. and S. A.Q. provided data analysis support and supervision. K.A., R.R., N.M., C.S. and Y.Y. wrote the manuscript with input from all authors. Y.Y. and C.S. jointly conceived and supervised the study.
Competing Interests
Y.Y. has received speakers bureau honoraria from Roche and is a consultant for Merck and Co Inc. C.S. receives grant support from Pfizer, AstraZeneca, BMS, Roche-Ventana, Boehringer-Ingelheim and Ono Pharmaceutical. C.S. has consulted for Pfizer, Novartis, GlaxoSmithKline, MSD, BMS, Celgene, AstraZeneca, Illumina, Genentech, Roche-Ventana, GRAIL, Medicxi, and the Sarah Cannon Research Institute. C.S. is a shareholder of Apogen Biotechnologies, Epic Bioscience, GRAIL, and has stock options in and is co-founder of Achilles Therapeutics. M.A.B. is a consultant for Achilles Therapeutics. S.L. receives research funding to her institution from Novartis, Bristol Meyers Squibb, Merck, Roche-Genentech, Puma Biotechnology, Pfizer, Eli Lilly and Seattle Genetics. S.L. has acted as consultant (not compensated) to Seattle Genetics, Pfizer, Novartis, BMS, Merck, AstraZeneca and RocheGenentech. S.L. has acted as consultant (paid to her institution) to Aduro Biotech, Novartis, and G1 Therapeutics. D.A.M. has received speaker’s fees from AstraZeneca. M.J.H. is a member of the Advisory Board for Achilles Therapeutics.
Contributor Information
TRACERx consortium:
Charles Swanton, Mariam Jamal-Hanjani, John Le Quesne, Allan Hackshaw, Sergio A Quezada, Nicholas McGranahan, Rachel Rosenthal, Crispin T Hìley, Selvaraju Veeriah, David A Moore, Maise Al Bakir, Teresa Marafioti, Roberto Salgado, Yenting Ngai, Abigail Sharp, Cristina Rodrigues, Oliver Pressey, Sean Smith, Nicole Gower, Harjot Dhanda, Joan Riley, Lindsay Primrose, Luke Martinson, Nicolas Carey, Jacqui A Shaw, Dean Fennell, Gareth A Wilson, Nicolai J Birkbak, Thomas BK Watkins, Mickael Escudero, Aengus Stewart, Andrew Rowan, Jacki Goldman, Peter Van Loo, Richard Kevin Stone, Tamara Denner, Emma Nye, Sophia Ward, Emilia L Lim, Stefan Boeing, Maria Greco, Kevin Litchfield, Jerome Nicod, Clare Puttick, Katey Enfield, Emma Colliver, Brittany Campbell, Christopher Abbosh, Yin Wu, Marcin Skrzypski, Robert E Hynds, Andrew Georgiou, Mariana Werner Sunderland, James L Reading, Karl S Peggs, John A Hartley, Pat Gorman, Helen L Lowe, Leah Ensell, Victoria Spanswick, Angeliki Karamani, Dhruva Biswas, Maryam Razaq, Stephan Beck, Ariana Huebner, Michelle Dietzen, Cristina Naceur-Lombardelli, Mita Afroza Akther, Haoran Zhaì, Nnennaya Kannu, Elizabeth Manzano, Supreet Kaur Bola, Ehsan Ghorani, Marc Robert de Massy, Elena Hoxha, Emine Hatipoglu, Stephanie Ogwuru, Benny Chain, Gillian Price, Sylvie DuboisMarshall, Keith Kerr, Shirley Palmer, Heather Cheyne, Joy Miller, Keith Buchan, Mahendran Chetty, Mohammed Khalil, Veni Ezhìl, Vineet Prakash, Girija Anand, Sajid Khan, Kelvin Lau, Michael Sheaff, Peter Schmid, Louise Lim, John Conibear, Roland Schwarz, Jonathan Tugwood, Jackie Pierce, Caroline Dive, Ged Brady, Dominic G Rothwell, Francesca Chemi, Elaine Kilgour, Fiona Blackhall, Lynsey Priest, Matthew G Krebs, Philip Crosbie, Apostolos Nakas, Sridhar Rathinam, Louise Nelson, Kim Ryanna, Mohamad Tuffaĺl, Amrita Bajaj, Jan Brozik, Fiona Morgan, Malgorzata Kornaszewska, Richard Attanoos, Haydn Adams, Helen Davies, Mathew Carter, Lindsay CR, Fabio Gomes, Zoltán Szallasĺ, Istvan Csabai, Miklos Dìossy, Hugo Aerts, Alan Kirk, Mo Asif, John Butler, Rocco Bìlanca, Nikos Kostoulas, Mairead Mackenzie, Maggie Wilcox, Sara Busacca, Alan Dawson, Mark R Lovett, Michael Shackcloth, Sarah Feeney, Julius Asante-Siaw, John Gosney, Angela Leek, Nicola Totten, Jack Davies Hodgkinson, Rachael Waddington, Jane Rogan, Katrina Moore, William Monteiro, Hilary Marshall, Kevin G Blyth, Craig Dick, Andrew Kidd, Eric Lim, Paulo De Sousa, Simon Jordan, Alexandra Rice, Hilgardt Raubenheimer, Harshĺl Bhayani, Morag Hamilton, Lyn Ambrose, Anand Devaraj, Hema Chavan, Sofìna Begum, Aleksander Mani, Daniel Kaniu, Mpho Malima, Sarah Booth, Andrew G Nicholson, Nadia Fernandes, Jessica E Wallen, Pratibha Shah, Sarah Danson, Jonathan Bury, John Edwards, Jennifer Hill, Sue Matthews, Yota Kitsanta, Jagan Rao, Sara Tenconì, Laura Soccì, Kim Suvarna, Faith Kibutu, Patricia Fisher, Robin Young, Joann Barker, Fiona Taylor, Kirsty Lloyd, Teresa Light, Tracey Horey, Dionysis PapadatosPastos, Peter Russell, Sara Lock, Kayleigh Gilbert, David Lawrence, Martin Hayward, Nikolaos Panagiotopoulos, Robert George, Davide Patrini, Mary Falzon, Elaine Borg, Reena Khiroya, Asia Ahmed, Magali Taylor, Junaìd Choudhary, Penny Shaw, Sam M Janes, Martin Forster, Tanya Ahmad, Siow Ming Lee, Javier Herrero, Dawn Carnell, Ruheena Mendes, Jeremy George, Neal Navani, Marco Scarci, Elisa Bertoja, Robert CM Stephens, Emilie Martinoni Hoogenboom, James W Holding, Steve Bandula, Babu Naidu, Gerald Langman, Andrew Robinson, Hollìe Bancroft, Amy Kerr, Salma Kadiri, Charlotte Ferris, Gary Middleton, Madava Djearaman, Akshay Patel, Christian Ottensmeier, Serena Chee, Benjamin Johnson, Aiman Alzetanì, Emily Shaw, Jason Lester, Yvonne Summers, Raffaele Calìfano, Paul Taylor, Rajesh Shah, Piotr Krysiak, Kendadai Rammohan, Eustace Fontaine, Richard Booton, Matthew Evison, Stuart Moss, Juliette Novasio, Leena Joseph, Paul Bishop, Anshuman Chaturvedi, Helen Doran, Felice Granato, Vijay Joshi, Elaine Smith, and Angeles Montero
Data availability
The digital pathology images from the TRACERx study generated or analysed during this study are not publicly available and restrictions apply to its use. A test subset of such digital pathology images are available through the Cancer Research UK & University College London Cancer Trials Centre (ctc.tracerx@ucl.ac.uk) for non-commercial research purposes and access will be granted upon review of a project proposal that will be evaluated by a TRACERx data access committee and entering into an appropriate data access agreement, subject to any applicable ethical approvals. Digital pathology images for LATTICe-A samples with expert pathologist’s annotations used for validation are available: https://github.com/qalid7/compath. Request for data access for the remaining LATTICe-A samples can be submitted to J.L.Q.
Code availability
The deep learning pipeline for digital pathology image analysis is available for noncommercial research purposes: https://github.com/qalid7/compath. All code used for statistical analyses of image data was developed in R version (3.5.1) and is available: https://github.com/qalid7/tx100_compath.
References
- 1.Rosenthal R, et al. Neoantigen-directed immune escape in lung cancer evolution. Nature. 2019;1 doi: 10.1038/s41586-019-1032-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Morris LGT, Chan TA. Lung Cancer Evolution: What’s Immunity Got to Do with It? Cancer Cell. 2019;35:711–713. doi: 10.1016/j.ccell.2019.04.009. [DOI] [PubMed] [Google Scholar]
- 3.Morris LGT, et al. Pan-cancer analysis of intratumor heterogeneity as a prognostic determinant of survival. Oncotarget. 2016;7:10051–10063. doi: 10.18632/oncotarget.7067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Milo I, et al. The immune system profoundly restricts intratumor genetic heterogeneity. Scì Immunol. 2018;3 doi: 10.1126/sciimmunol.aat1435. [DOI] [PubMed] [Google Scholar]
- 5.Jia Q, et al. Local mutational diversity drives intratumoral immune heterogeneity in non-small cell lung cancer. Nat Commun. 2018;9 doi: 10.1038/s41467-018-07767-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jamal-Hanjani M, et al. Tracking the Evolution of Non-Small-Cell Lung Cancer. N Engl J Med. 2017;376:2109–2121. doi: 10.1056/NEJMoa1616288. [DOI] [PubMed] [Google Scholar]
- 7.Hendry S, et al. Assessing Tumor-Infiltrating Lymphocytes in Solid Tumors. Adv ANat Pathol. 2017;24:311–335. doi: 10.1097/PAP.0000000000000161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Moore DA, et al. In situ growth in early lung adenocarcinoma may represent precursor growth or invasive clone outgrowth—a clinically relevant distinction. Mod Pathol. 2019;1 doi: 10.1038/s41379-019-0257-1. [DOI] [PubMed] [Google Scholar]
- 9.Whittaker KA, Rynearson TA. Evidence for environmental and ecological selection in a microbe with no geographic limits to gene flow. Proc Natl Acad Scì U S A. 2017;114:2651–2656. doi: 10.1073/pnas.1612346114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shafer ABA, Wolf JBW. Widespread evidence for incipient ecological speciation: A meta-analysis of isolation-by-ecology. Ecology Letters. 2013;16:940–950. doi: 10.1111/ele.12120. [DOI] [PubMed] [Google Scholar]
- 11.Costa A, et al. Fibroblast Heterogeneity and Immunosuppressive Environment in Human Breast Cancer. Cancer Cell. 2018;33:463–479.:e10. doi: 10.1016/j.ccell.2018.01.011. [DOI] [PubMed] [Google Scholar]
- 12.Öhlund D, et al. Distinct populations of inflammatory fibroblasts and myofibroblasts in pancreatic cancer. J Exp Med. 2017;214:579–596. doi: 10.1084/jem.20162024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Salmon H, et al. Matrix architecture defines the preferential localization and migration of T cells into the stroma of human lung tumors. J Clin Invest. 2012;122:899–910. doi: 10.1172/JCI45817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Thomas DA, Massague J. TGF-ß directly targets cytotoxic T cell functions during tumor evasion of immune surveillance. Cancer Cell. 2005;8:369–380. doi: 10.1016/j.ccr.2005.10.012. [DOI] [PubMed] [Google Scholar]
- 15.Joyce JA, Fearon DT. T cell exclusion, immune privilege, and the tumor microenvironment. Science (80-) 2015;348:74–80. doi: 10.1126/science.aaa6204. [DOI] [PubMed] [Google Scholar]
- 16.Sorokin L. The impact of the extracellular matrix on inflammation. Nat Rev Immunol. 2010;10:712–723. doi: 10.1038/nri2852. [DOI] [PubMed] [Google Scholar]
- 17.Chen DS, Mellman I. Elements ofcancer immunity and the cancer-immune set point. Nature. 2017;541:321–330. doi: 10.1038/nature21349. [DOI] [PubMed] [Google Scholar]
- 18.McGranahan N, et al. Allele-Specific HLA Loss and Immune Escape in Lung Cancer Evolution. Cell. 2017;171:1259–1271.:e11. doi: 10.1016/j.cell.2017.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McGranahan N, et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science (80-) 2016;351 doi: 10.1126/science.aaf1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yuan Y. Modelling the spatial heterogeneity and molecular correlates of lymphocytic infiltration in triple-negative breast cancer. J R Soc Interface. 2015;12:20141153. doi: 10.1098/rsif.2014.1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Thomas A, Liu SV, Subramaniam DS, Giaccone G. Refining the treatment of NSCLC according to histological and molecular subtypes. Nat Rev Clin Oncol. 2015;12:511–526. doi: 10.1038/nrclinonc.2015.90. [DOI] [PubMed] [Google Scholar]
- 22.Hammerman PS, et al. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489:519–525. doi: 10.1038/nature11404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Collisson EA, et al. Comprehensive molecular profiling of lung adenocarcinoma: The cancer genome atlas research network. Nature. 2014;511:543–550. doi: 10.1038/nature13385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Keren L, et al. A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging. Cell. 2018;174:1373–1387.:e19. doi: 10.1016/j.cell.2018.08.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Giesen C, et al. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat Methods. 2014;11:417–422. doi: 10.1038/nmeth.2869. [DOI] [PubMed] [Google Scholar]
- 26.Goltsev Y, et al. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell. 2018;174:968–981.:e15. doi: 10.1016/j.cell.2018.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Raza SEA, et al. Micro-Net: A unified mode1 for segmentation of various objects in microscopy images. Med Image Anal. 2019;52:160–173. doi: 10.1016/j.media.2018.12.003. [DOI] [PubMed] [Google Scholar]
- 28.Sirinukunwattana K, et al. Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images. IEEE Trans Med Imaging. 2016;35:1196–1206. doi: 10.1109/TMI.2016.2525803. [DOI] [PubMed] [Google Scholar]
- 29.Abadi M, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. 2016.
- 30.Danaher P, et al. Gene expression markers of Tumor Infiltrating Leukocytes. J Immunother Cancer. 2017;5:18. doi: 10.1186/s40425-017-0215-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Holzinger A, et al. Monoclonal Antibody to Thyroid Transcription Factor-1: Production, Characterization, and Usefulness in Tumor Diagnosis. Hybridoma. 1996;15:49–53. doi: 10.1089/hyb.1996.15.49. [DOI] [PubMed] [Google Scholar]
- 32.Matoso A, et al. Comparison of thyroid transcription factor-1 expression by 2 monoclonal antibodies in pulmonary and nonpulmonary primary tumors. Appl Immunohistochem Mol Morphol AIMM. 2010;18:142–9. doi: 10.1097/PAI.0b013e3181bdf4e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pelosi G, et al. ΔNp63 (p4O) and Thyroid Transcription Factor-1 Immunoreactivity on Small Biopsies or Ce1lblocks for Typing Non-small Ce1l Lung Cancer: A Nove1 Two-Hit, Sparing-Material Approach. J Thorac Oncol. 2012;7:281–290. doi: 10.1097/JTO.0b013e31823815d3. [DOI] [PubMed] [Google Scholar]
- 34.Heindl A, et al. Re1evance of Spatial Heterogeneity of Immune Infiltration for Predicting Risk of Recurrence After Endocrine Therapy of ER+ Breast Cancer. JNCI J Natl Cancer Inst. 2018;110 doi: 10.1093/jnci/djx137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Heindl A, et al. Microenvironmental niche divergence shapes BRCAl-dysregulated ovarian cancer morphological plasticity. Nat Commun. 2018;9:3917. doi: 10.1038/s41467-018-06130-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Nawaz S, Heindl A, Koe1ble K, Yuan Y. Beyond immune density: critical role of spatial heterogeneity in estrogen receptor-negative breast cancer. Mod Pathol. 2015;28:766–777. doi: 10.1038/modpathol.2015.37. [DOI] [PubMed] [Google Scholar]
- 37.Dubuc Quiniou, Roques-Carmes Tricot, Zucker Evaluating the fractal dimension of profiles. Phys Rev A, GeN Phys. 1989;39:1500–1512. doi: 10.1103/physreva.39.1500. [DOI] [PubMed] [Google Scholar]
- 38.MOISY F, JIMENEZ J. Geometry and clustering of intense structures in isotropic turbulence. J Fluid Meeh. 2004;513:111–133. [Google Scholar]
- 39.Francis K, Palsson BO. Effective interce1lular communication distances are determined by the re1ative time constants for cyto/chemokine secretion and diffusion. Proc Natl Acad Sci U S A. 1997;94:12258–62. doi: 10.1073/pnas.94.23.12258. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The digital pathology images from the TRACERx study generated or analysed during this study are not publicly available and restrictions apply to its use. A test subset of such digital pathology images are available through the Cancer Research UK & University College London Cancer Trials Centre (ctc.tracerx@ucl.ac.uk) for non-commercial research purposes and access will be granted upon review of a project proposal that will be evaluated by a TRACERx data access committee and entering into an appropriate data access agreement, subject to any applicable ethical approvals. Digital pathology images for LATTICe-A samples with expert pathologist’s annotations used for validation are available: https://github.com/qalid7/compath. Request for data access for the remaining LATTICe-A samples can be submitted to J.L.Q.
The deep learning pipeline for digital pathology image analysis is available for noncommercial research purposes: https://github.com/qalid7/compath. All code used for statistical analyses of image data was developed in R version (3.5.1) and is available: https://github.com/qalid7/tx100_compath.