. 2022 Dec 30;14:100177. doi: 10.1016/j.jpi.2022.100177

Table 1.

Limitations to diagnostic AI implementation.

Limitations	Discussion	Current and future solutions
Data
Data standardization	A considerable volume of data is generated for algorithm development, often derived from multiple sources, presented via multiple file formats, and analyzed through multiple AI models. Current analytical methods are non-standardized, consequently predisposing to variations in classification and poor predictive capacity.	Fostering of a single open-source file format, similar to DICOM in radiology, to facilitate expeditious access and interrogation.
Data availability and cost	• Paucity of WSI datasets with pathologist annotations for ground-truth determination limits employment of supervised learning techniques. The lack of WSI datasets impeding the analytic capacity of deep learning techniques is emphasized in GUP primarily pertaining to immunohistochemical (IHC) staining. • Labeled WSI data is expensive, difficult to acquire, and time-consuming to produce. WSI data storage costs have posed barriers to digital implementation in many laboratories.	• “Transfer learning,” e.g., pretrained networks, and data augmentation techniques may be utilized to mitigate the cumbersome nature of network training and data shortages, though are not currently capable of acting as substitutes for pathologist-annotated data • Increased utilization of unsuperised learning techniques, which do not require labeld data • Circumvention of restrictions brought forth by data privacy and proprietary techniques through open-source accessability in conjunction with capital leverage for pathologist annotations
Data size	Workflow (hardware and infrastructure) limitations, e.g., large network bandwith requirements to handle large WSI file sizes	Advances in WSI scanning technology and digital data transmission coupled with decreased costs of implementation on the horizon
Data quality	• High-resolution image reduction techniques, e.g., patch extraction, may compromise data quality. Higher-level structural information, e.g., tumor extent or shape, may only be captured through analysis of larger regions. • Clinical translation of algorithms requires generalizability throughout a wide breadth of patient populations and clinical institutions. IHC / H&E staining of tissue sections can vary significantly across laboratories and at intra-laboratory level. Analysis performed on low-quality tissue, histology slides, or staining will ultimately compromise the validity of data	• Focused spatial correlation amongst patches, multi-level magnification patch extraction, utilization of larger patch sizes. • Normalization techniques, e.g., scale normalization for multiple image aquisition devices with varying pixels sizes, stain normalization, pixel-wise and patch-wise and semantic segmentation CNN training for enhanced region of interest detection, flexible thresholding techniques which compromise for variations in input data luminance.
Data utilization capacity	Deep learning systems are currently only able to classify WSI specimens with a single diagnosis.	• Removal of biological restricitons during algorithmic training • “Artificial General Intelligence (AGI)” of the future will consist of advanced algorithms employing multiple levels of classification and segmentation in conjunction with a litany of diagnostic deductive variables, mimicking the process of human conciousness.
Regulation / medico-legal / accountability and/or liability	• Demonstration of algorithm reproducibility on large patient populations containing outliers and non-representative individuals has caused difficulties for AI development. • “Black Box” transparency concerns surround the uninterpretable pathway of algorithmic classification deduction. Segmentation, e.g., extraction, of image objects correlated with clinical endpoints are hidden from pathologist interpretation.	• AI Models of the future may be used to develop "universal" tumor grading systems applicable to the entire GU system through combination of prognostic, morphologic, tumor marker, and clinical course data. • Rule extraction’, through which information about histopathologic features used by an algorithm during its previously hidden segmentation process, may mitigate such concerns.