. 2020 May 12;6:16. doi: 10.1038/s41523-020-0154-2

Table 1.

Sample CTA algorithms from the published literature.

Stain	Approach	Ref	Data set	Method	Ground truth	Notes
H&E	Patch classification	²⁴	Multiple sites	CNN	Labeled patches (yes/no TILs)	Strengths: large-scale study with investigation of spatial TIL maps. AV includes molecular correlates.
			TCGA data set		Annotations are open-access	Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs*.
			TCGA data set		Annotations are open-access	Other: we defined CTA TIL score as fraction of patches that contain TILs, and found this to be correlated with VTA (R = 0.659, p = 2e-35).
	Semantic segmentation	¹⁶	Breast	FCN	Traced region boundaries (exhaustive)	Strengths: large sample size and regions; investigates inter-rater variability at different experience levels; delineation of tumor, stroma and necrosis regions.
	Semantic segmentation	¹⁶	TCGA data set	FCN	Annotations are open-access	Limitations: only detects dense TIL infiltrates; does not classify individual TILs.
	Semantic segmentation + Object detection	²⁵	Breast	Seeding + FCN	Traced region boundaries (exhaustive)	Strengths: mostly follows TIL-WG VTA guidelines. AV includes correlation with consensus VTA scores and inter-pathologist variability.
	Semantic segmentation + Object detection	²⁵	Private data set	Seeding + FCN	Labeled & segmented nuclei within labeled region	Limitations: heavy ground truth requirement*; underpowered CV; and limited manually annotated slides.
	Object detection	²⁶	Breast	SVM using morphology features	Labeled nuclei	Strengths: robust analysis and exploration of molecular TIL correlates.
	Object detection	²⁶	METABRIC data set	SVM using morphology features	Qualitative density scores	Limitations: individual labeled nuclei are limited; does not distinguish TILs in different histologic regions*.
		²⁷	Breast	RG and MRF	Labeled patches (low-medium-high density)	Strengths: explainable model and modular pipeline.
		²⁷	Private data set	RG and MRF	Labeled patches (low-medium-high density)	Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs. Limited AV sample size.
		²⁸	NSCLC	Watershed + SVM classifier	Labeled nuclei	Strengths: explainable model; robust CV; captures spatial TIL clustering.
		²⁸	Private data sets	Watershed + SVM classifier	Labeled nuclei	Limitations: limited AV; does not distinguish sTIL and iTIL.
	Object detection + inferred TIL localization	³¹	Breast	SVM classifier using morphology features	Labeled nuclei	Strengths: infers TIL localization using spatial localization. Robust CV. Investigation of spatial TIL patterns.
	Object detection + inferred TIL localization	³¹	METABRIC + private data sets	SVM classifier using morphology features	Qualitative density scores	Limitations: individual labeled nuclei are limited. not clear if spatial clustering has 1:1 correspondence with regions.
IHC	Object detection + manual regions	²⁹	Colon	Complex pipeline (non-DL)	Overall density estimates	Strengths: CTA within manual regions, including invasive margin.
IHC	Object detection + manual regions	²⁹	Private data set	Complex pipeline (non-DL)	Overall density estimates	Limitations: unpublished AV.
	Object detection	³⁰	Multiple	Multiple DL pipelines	Labeled nuclei within FOV (exhaustive)	Strengths: large-scale, robust AV. Systematic benchmarking.
	Object detection	³⁰	Private data set	Multiple DL pipelines	Labeled nuclei within FOV (exhaustive)	Limitations: no CV; does not distinguish TILs in different regions*.

This non-exhaustive list has been restricted to H&E and chromogenic IHC, although excellent works exist showing CTA based on other approaches like multiplexed immunofluorescence^21–23. Published CTA algorithms vary markedly in their approach to TIL scoring, the robustness of their validation, their interpretability, and their consistency with published VTA guidelines. Strengths and limitations of each publication is highlighted, with general limitations (related to the broad approach used, not the specific paper) are marked with an asterisk (*). Going forward, nuanced approaches are needed, ideally incorporating workflows for robust quantification and validation as presented in this paper. Different approaches have different ground truth requirements (illustrated in Fig. 1, panel f), hence the need for large-scale ground truth data sets. We encourage all future CTA publications to open-access their data sets whenever possible. Of note are two major efforts: 1. A group of scientists, including the US FDA and the TIL-WG, is collaborating to crowdsource pathologists and collect images and pathologist annotations that can be qualified by the FDA medical device development tool program; 2. The TIL-WG is organizing a challenge to validate CTA algorithms against clinical trial outcome data (CV).

AV analytical validation, CNN convolutional neural network, DL deep learning, FCN fully convolutional network, FOV field of view, MRF markov random field, RG region growing, NSCLC non-small cell lung cancer, SVM support vector machine.