Table 1.
Stain | Approach | Ref | Data set | Method | Ground truth | Notes |
---|---|---|---|---|---|---|
H&E | Patch classification | 24 | Multiple sites | CNN | Labeled patches (yes/no TILs) | Strengths: large-scale study with investigation of spatial TIL maps. AV includes molecular correlates. |
TCGA data set | Annotations are open-access | Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs*. | ||||
Other: we defined CTA TIL score as fraction of patches that contain TILs, and found this to be correlated with VTA (R = 0.659, p = 2e-35). | ||||||
Semantic segmentation | 16 | Breast | FCN | Traced region boundaries (exhaustive) | Strengths: large sample size and regions; investigates inter-rater variability at different experience levels; delineation of tumor, stroma and necrosis regions. | |
TCGA data set | Annotations are open-access | Limitations: only detects dense TIL infiltrates*; does not classify individual TILs*. | ||||
Semantic segmentation + Object detection | 25 | Breast | Seeding + FCN | Traced region boundaries (exhaustive) | Strengths: mostly follows TIL-WG VTA guidelines. AV includes correlation with consensus VTA scores and inter-pathologist variability. | |
Private data set | Labeled & segmented nuclei within labeled region | Limitations: heavy ground truth requirement*; underpowered CV; and limited manually annotated slides. | ||||
Object detection | 26 | Breast | SVM using morphology features | Labeled nuclei | Strengths: robust analysis and exploration of molecular TIL correlates. | |
METABRIC data set | Qualitative density scores | Limitations: individual labeled nuclei are limited; does not distinguish TILs in different histologic regions*. | ||||
27 | Breast | RG and MRF | Labeled patches (low-medium-high density) | Strengths: explainable model and modular pipeline. | ||
Private data set | Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs. Limited AV sample size. | |||||
28 | NSCLC | Watershed + SVM classifier | Labeled nuclei | Strengths: explainable model; robust CV; captures spatial TIL clustering. | ||
Private data sets | Limitations: limited AV; does not distinguish sTIL and iTIL. | |||||
Object detection + inferred TIL localization | 31 | Breast | SVM classifier using morphology features | Labeled nuclei | Strengths: infers TIL localization using spatial localization. Robust CV. Investigation of spatial TIL patterns. | |
METABRIC + private data sets | Qualitative density scores | Limitations: individual labeled nuclei are limited. not clear if spatial clustering has 1:1 correspondence with regions. | ||||
IHC | Object detection + manual regions | 29 | Colon | Complex pipeline (non-DL) | Overall density estimates | Strengths: CTA within manual regions, including invasive margin. |
Private data set | Limitations: unpublished AV. | |||||
Object detection | 30 | Multiple | Multiple DL pipelines | Labeled nuclei within FOV (exhaustive) | Strengths: large-scale, robust AV. Systematic benchmarking. | |
Private data set | Limitations: no CV; does not distinguish TILs in different regions*. |
This non-exhaustive list has been restricted to H&E and chromogenic IHC, although excellent works exist showing CTA based on other approaches like multiplexed immunofluorescence21–23. Published CTA algorithms vary markedly in their approach to TIL scoring, the robustness of their validation, their interpretability, and their consistency with published VTA guidelines. Strengths and limitations of each publication is highlighted, with general limitations (related to the broad approach used, not the specific paper) are marked with an asterisk (*). Going forward, nuanced approaches are needed, ideally incorporating workflows for robust quantification and validation as presented in this paper. Different approaches have different ground truth requirements (illustrated in Fig. 1, panel f), hence the need for large-scale ground truth data sets. We encourage all future CTA publications to open-access their data sets whenever possible. Of note are two major efforts: 1. A group of scientists, including the US FDA and the TIL-WG, is collaborating to crowdsource pathologists and collect images and pathologist annotations that can be qualified by the FDA medical device development tool program; 2. The TIL-WG is organizing a challenge to validate CTA algorithms against clinical trial outcome data (CV).
AV analytical validation, CNN convolutional neural network, DL deep learning, FCN fully convolutional network, FOV field of view, MRF markov random field, RG region growing, NSCLC non-small cell lung cancer, SVM support vector machine.