Automated Curation and AI Workflow Management System for Digital Pathology

V K Cody Bumgardner; Sam Armstrong; Alexandr Virodov; Caylin Hickey

. 2023 Jun 16;2023:71–80.

Automated Curation and AI Workflow Management System for Digital Pathology

V K Cody Bumgardner ¹, Sam Armstrong ¹, Alexandr Virodov ¹, Caylin Hickey ¹

PMCID: PMC10283146 PMID: 37350884

Abstract

Digital pathology applications present several challenges, including the processing, storage, and distribution of gigapixel images across distributed computational resources and viewing stations. Individual slides must be available for interactive review, and large repositories must be programmatically accessible for dataset and model building. We present a platform to manage and process multi-modal pathology data (images and case information) across multiple locations. Using an agent-based system coupled with open-source automated machine learning and review tools allows not only dynamic load-balancing and cross-network operation but also the development of research and clinical AI models using the data managed by the platform. The platform presented covers end-to-end AI workflow from data acquisition and curation through model training and evaluation allowing for sharing and review. We conclude with a case study of colon and prostate cancer model development utilizing the presented system.

Introduction

Computer vision and artificial intelligence (AI) applications in medicine are becoming increasingly important and clinically relevant. An active area of research and AI practice is in medical imaging, where machine learning (ML) models can be applied to recognize patterns, discriminate between data classes, and synthetically generate data. While pathology produces a large percentage of actionable medical data, this data is not always produced in a way that is convenient for the application of AI. FDA-approved digital pathology scanning and interpretation platforms used in primary diagnosis do not typically generate AI-ready datasets or manage AI pipelines. Unlike radiology, which made a digital transition in the 1990s, digital pathology is a less established practice with many competing whole slide image (WSI) formats and standards. There has yet to emerge a dominant image format like DICOM or an image management system like PACS, which are the standard in radiology. Vendor systems and formats, including case reporting, must be navigated to link case and image data to construct training and evaluation datasets. Harnessing digital pathology data generated as part of clinical workflows for AI typically involves numerous manual steps. Slides must be digitized, associated with case information, potentially pre-processed, and deidentified before being curated into image repositories or submitted for AI inferencing. Even fully digital departments with tightly integrated case information and direct access to data must often rely on manual efforts for slide cohort selection, labeling, and dataset generation. Once datasets are generated, additional manual steps are required to train and evaluate AI workflows.

Several approaches have emerged to integrate AI into clinical pathology workflows. One approach, which is common with commercial vendors, is to provide cloud-based portals for pathologists and researchers to interactively annotate images, apply platform-specific AI, and view results. A limited number of vendors provide FDA-approved AI using these methods. The downside of this approach is that unless these platforms curate all incoming cases, methods must still be developed to identify datasets to be directed toward these systems. Alternatively, organizations looking to develop their own AI and research programs typically deploy their own independent data repositories, ML training, and inferencing services. The benefit of this approach is that the data management system is agnostic to the AI pipelines used, allowing both commercial and open-source systems to be implemented. The drawback of this approach is that there is a high degree of operational complexity compared to vendor solutions. For dataset curation and web-based annotation, several open-source tools have emerged such as Omero¹and Cytomine². Likewise, for medical image analysis Fiji³ and ASAP⁴ are common. We present our efforts to automate and integrate a clinical digital pathology AI platform from data collection to training, evaluation, and operation.

Methods

The presented system is designed to harness data generated as part of clinical workflows to construct user-defined datasets, train ML models, and evaluate models against current and future data. The platform operates as a federated system, providing methods to acquire, integrate, and process data across a wide range of heterogeneous infrastructure, including standalone servers, public clouds, and local high-performance clusters (HPC). A central web-based interface is provided to manage dataset construction, annotation, AI pipeline description, and evaluation. In addition, an independent public-facing web interface, called “Project Hub” is provided, which provides reproducibility and model performance information. The platform provides the ability to define and execute automated software-defined pipelines from the data curation stage through data and model evaluation. These user-defined AI pipelines support both one-time and continuous training and evaluation.

To accomplish these functions, we have implemented four primary modules: 1) Dataset query and description services; 2) Slide and dataset management; 3) Automated ML management; and 4) Quality control and review tools. Figure 1 shows a high-level view of the platform and module interaction. In the following subsections, we describe module methods in detail:

Dataset query and description services provide tools to acquire case data and slide metadata from clinical and research sources, explore available data, to describe datasets interactively and programmatically. Given the lack of standardization across digital pathology, we must support dynamic schemas, while maintaining common ontology references across datasets. For example, common elements like conditions ontologies (ICD, SNOMED), staining, and tissue type are harmonized across vendors and source systems, while vendor-specific data is maintained in dynamic schemas (Figure 1.1a). Case information is curated into a single JSON document (Figure 1.1b), preserving standard case information, while allowing for optional data where it exists. Given there is a many-to-one relationship between cases and affiliated slides, we embed slide identifiers within the case document. We make use of MongoDB⁵, a highly scalable open-source database, to manage our JSON document collections. MongoDB aggregation queries are used to query and transform datasets across data collections with dynamic schemas. A web-based interface is provided (Figure 1c) to manage dataset definition. We define the description of a set of cases and associated slides within the same identified group as a cohort. Figure 2 shows the interactive cohort-building interface. This interface is used to explore available datasets and define cohorts.

The format of schemas maintained within MongoDB is discoverable, so as elements and features are added to case collections, they are made available as Search Criteria to the interactive interface.

The interactive cohort interface functions as a complex query constructor, composing the aggregation query, which can be saved and shared across the system to define specific case cohorts.

For many ML tasks, we are interested in differentiating between two or more groups. We define a collection of cohorts designated by a cohort label to be a dataset. Figure 3 shows the interactive dataset-building interface. As shown in the figure, labels are applied to cohort definitions, which include the underlying queries used to identify slide candidates for the described dataset.

The described services allow users to interactively explore federated datasets, define cohorts, and datasets. The description of complex cohorts and datasets can be tedious. For example, there are nearly three thousand ICD codes associated with cancer alone. To support large and complex cohorts and datasets we provide platform APIs to programmatically interact with the backend system.
Slide and dataset management provides services to process and transfer slide data. Integrating federated resources into a single system is not trivial. In many institutions, outgoing network traffic is permitted, but incoming traffic is blocked by default. In addition to firewalls, computational resources and storage might be located on private isolated networks, making access, processing, and data transfer difficult. Communications issues aside, resources might operate on differing operating systems and application stacks. To address these challenges, we make use of Cresco⁶, an agent-based edge computing platform, previously developed by the authors, serving as the foundation for this effort. We provide methods for distributed data collection, transmission, and processing across heterogeneous environments, allowing for federated processing, storage, and evaluation. Within this computational network modules for slide acquisition (Figure 1.2a), conversion, pre-processing (tissue masking, tile and cell metrics, etc.), and deidentification (Figure 1.2b) have been implemented. Figure 4 illustrates the use of a novel agent-based system across embedded systems, HPC, and public cloud providers. If each agent can communicate with at least one other agent, communication between all agents is possible. While it is beyond the scope of this paper, Cresco provides methods to organize hierarchies of agents providing resilient, secure, and high-performance networks. For dynamic resources like HPC (batch processing) and cloud (dynamically allocated), Interface Agents are used to schedule and provision infrastructure. Newly provisioned resources serve as infrastructure for Worker Agents.

Repository Agents are used to inventory and transmit data between workers and other repositories. One or more Control Agents are deployed strategically, typically on a network that is accessible to participating networks, to provide connectivity between otherwise isolated agents. Control Agents are often used to host WebSocket-based⁷ API services. Each agent contains an embedded ActiveMQ message broker⁸, which can be used to establish multi-homed connections between agents, route messages between agents, and support subscriptable data streams, which we refer to as the dataplane. The dataplane is used to broadcast messages to agents throughout the network that are subscribed to specific channels, identified by key-value pairs. For example, a Repository Agent might broadcast a status change on the dataplane, which is then routed to one or more remote Repository Agents, triggering a data synchronization task. The Application Controller makes use of the provided API, allowing the controller application to participate in the computational network masquerading as the Control Agent. Advanced agent functions such as dataplanes, complex event processors, semantic queries, and remote task execution are extended to the Application Controller. The Application Controller is responsible for managing tasks and policies throughout the network from data curation through results evaluation.
Automated ML management provides services to conduct ML training and evaluation using defined datasets and AI pipelines. There exist numerous ML technologies ranging from libraries⁹ to cloud-based AutoML solutions^10,11. Fortunately, general-purpose vision models work well in medical imaging, and pathology-specific model architectures are not necessary. However, medical data, especially WSIs (Whole Slide Images) come with their own challenges. WSIs are too large (hundreds of millions of pixels) to fit into models as a single image. In addition, a single case associated with a single person might contain multiple slides. Care must be taken to keep slides and slide sub-images isolated between training and testing sets. These complications require that one must either adapt the model to the problem using low-level libraries or conform the data to the ML platform. Given the rapid development of new model architectures and technologies, we chose to adapt our data to support common AI input formats. Given the active development of modeling tools, we augmented the data building system with the ability to define custom dataset extractions and transformation tasks (Figure 1.3b). An example JSON task definition is shown in Figure 5 and the interpreted interface in Figure 6. This approach allows new dataset generation requirements to be added without the need to modify the web-based user portal. Dataset extraction tasks are intended to not only provide data but also provide a registered instance of a dataset extraction, including dataset provenance label. Provenance labels provide reports of demographic (race, age, location site, etc.) labels in the derived dataset.

Currently, we support three types of AI-ready dataset extractions, including WSI, metric-based tile (image) selection, and annotation-based tile selection. WSI selection queries the dataset definition for candidate slides and based on the extraction criteria provides a collection of deidentified WSI images and the corresponding provenance label. Metric-based tile selection makes use of over 31 tile and cell-based metrics generated during data curation (Figure 1.2b) to select the top N tiles per slide to include. Unlike WSI extraction, this method returns select tiles as individual images, not the entire slide. This type of extraction is useful for isolating tiles and regions of interest (ROI) where pixel-level annotations don’t exist. As the name would suggest, annotation-based tile selection extracts tiles that exist within pixel-level ROI annotations and generates the same output format as metric-based selection. Metric- and annotation-based selection can be combined to further refine datasets. We make use of MONAI¹², an open-source framework for healthcare workflows, which provides implementations for Multiple Instance Learning (MIL)¹³ and image classification, the two most common digital pathology AI tasks. MIL makes use of WSI image extraction services and image classification pipelines make use of metric- and annotation-based extractions. We make use of Digital Slide Archive (DSA)¹⁴ and Open Microscopy Environment (OME)¹⁵ for dataset viewing and annotation. In addition, we use MONAI-label¹⁶, an AI-enabled annotation modeling and inferencing tool, that is integrated with DSA (Figure 1.3a). MONAI-label allows multiple users to train and use central annotation models across datasets. While not currently implemented, an extraction task could be defined to trigger based on changes in annotation models, automatically regenerating new and improved datasets.

In the previous sections, we have described methods to define cohorts, datasets, and extraction tasks. We have described how through this programmatically defined data generation process, data provenience is tracked and datasets are registered. We have also explained how agent-based networks are used to enable data transfer, processing, and platform control. As with data curation and extraction tasks, ML training and evaluation can be executed anywhere in the network where dependencies can be satisfied. Agents contain built-in methods to conduct device and network discovery on their host system, including resource utilization. While the underlying system can identify ML resources and execute ML tasks, we intend to develop clinically verifiable models, which require the same level of rigorous auditing and tracking we have implemented in dataset generation. A number of both standalone and cloud platforms have emerged to manage so-called AutoML or MLOps operations. These systems typically track code and data used in training, orchestrate the generation of repeatable training environments, and automate training functions such as hyperparameter optimization. We make use of ClearML¹⁷, an open-source ML development and production suite (Figure 1.3c). Our MONAI and custom AI pipelines are instrumented with ClearML libraries, which track all aspects of the training and evaluation cycle, including the execution environment, code attributes and settings, and reported training and inference statistics. Template pipelines and datasets are executed in a controlled reproducible environment, commonly a registered Docker¹⁸ container. Once the template pipeline has been executed and associated with a project, ClearML can re-execute the process with new variables. Execution variables can include the definition of a new dataset, which is provided by the platform, or a range of variables supplied by ClearML during the hyper-parameter tuning process. The description of the ML template, extracted dataset, and template variables is used to inform ML processing tasks. As shown in Figure 4, agents can be statically deployed on dedicated resources, or Interface Agents can be used to dynamically provision Worker Agents in distributed environments. Worker Agents communicate their availability to the agent network. When new ML processing tasks are generated, the Application Controller will assign a Worker Agent, which is responsible for obtaining and verifying the registered dataset. Once the dataset has been validated the Worker Agent configures and executes the ClearML client for use within a specific task group, associated by ClearML queue name. This process can be repeated multiple times allowing for simultaneous model training using different models and model parameters. Once resources are available for a specific task group the Application Controller submits the job to the ClearML server, which prepares the execution environment, executes the defined ML tasks, captures job information, and produces results and models. The ClearML server provides an API where additional project and job information can be queried.
Quality control and review tools provide services used in the tracking and evaluation of project configurations, performance, ethical data use, and reproducibility. These services are broken into the ML Quality Control (QC) review portal (Figure 1.4a) and a public-facing AI project hub portal (Figure 1.4b). The purpose of the QC portal is to review the current and historical state of projects and models over time. As previously described, dataset extraction and associated ML processing tasks can be triggered automatically based on pre-defined thresholds. Based on the integration of automated data generation and the use of ClearML to track model training, we can track model and data parameters over time. We have found the project dashboard provided by ClearML to be sufficient for QC review, including multi-model comparisons, as shown in Figure 7.

The AI Project Hub is based on the WordPress¹⁹ content management system and the IKNOW²⁰ WordPress theme. The portal provides searchable access to AI projects and curated datasets. Projects and datasets are organized into thematic groups, as shown in Figure 8. Individual projects are intended to be maintained by researchers. A tagging system is provided in the project description to designate project sections, such as summaries, datasets, performance, code, and models. The researcher-maintained information is intended to provide sufficient information to evaluate the maturity of the project and where appropriate recreate results or make use of available models. In addition to user-provided data, the system provides optional platform integrations using key tags in project descriptions. Platform integrations include reporting related to the registered dataset used to generate the model, the current dataset available based on the dataset definition, and a model performance report including current, best, and historical model statistics.

Figure 2. — Build cohorts portal interface

Figure 3. — Build datasets portal interface

Figure 4. — Agent-based data and process management

Results

Our platform allows users to define the end-to-end data selection, training, and evaluation parameters, conduct processing, and report results without manual intervention or dependence on technology professionals. In this section we describe the challenges and results of the deployment of this system at the University of Kentucky (UK).

Our slide information is derived from two sources, a Leica Aperio²¹ for research scanning and a Philips IMS²² for clinical cases. The Aperio scanning produces WSIs in the commonly supported SVS²³ format, while the Philips system uses the iSyntax²⁴ format. The iSyntax format is not currently supported in any open-source libraries and requires the use of the Philips iSyntax SDK. To address the technical and licensing restrictions of the iSyntax SDK, we implemented our own native libraries, the details of which have been accepted for a separate publication, but will be briefly discussed in this section. Our research cases (SVS) are typically manually curated for a specific purpose, including case and other metadata. Given the common use of SVS and manual curation in research repositories, we will focus on how our platform passively harnesses data generated as part of existing clinical workflows.

Our first challenge is slide and case data acquisition. Slides scanned for clinical use are stored on a shared network volume, which we can access with a Repository Agent (Figure 5). The Repository Agent determines if new files are available and if the barcodes of the slide conform to our clinical standard. A custom barcode reader has been implemented within this system to read iSyntax headers, including barcode information. New images are transferred over the agent network to protected research storage, where they are deidentified. Our deidentification system maintains the structure of iSyntax file by replacing and padding identifiable header information within the same bounds as the original file. This process allows storage deduplication to take place allowing us to store both original and deidentified slides with limited additional realized storage. Prior to the development of our own iSyntax image libraries a conversion to generic TIFF²⁵ would take place. This conversion using the iSyntax SDK was very computationally intensive and in practice prone to conversion failure. Once deidentified, slides are pre-processed and tile metrics are extracted by dividing large images into tiles and tile-based metrics such as tissue percentage, color factor, saturation value, quantity factor, saturation value, and blur²⁶. In addition, we have developed a script that uses QuPath Cell Detection²⁷ to extract 31 cell metrics per tile. Once slides have been deidentified and preprocessed, case information is obtained from our enterprise data warehouse where it is associated with slides and deidentified. The resulting deidentified case and deidentified slide information is uploaded to their respective repository databases.

In our limited deployment our platform has automatically acquired over ten thousand slides across 4500 clinical cases. Information such as condition codes and demographics are available for cohort identification. Identified cohorts are available for dataset definition. Annotations, tile metrics and cell metrics, are available for dataset extraction tasks. To demonstrate the platform we will walk through the end-to-end automated dataset generation and training of colon and prostate cancer models. This model will function as a binary benign or malignant classifier for the two tissue types. We will make use of the previously described MIL architecture for model training, which is especially well-suited for WSI binary classification where pixel-level annotations are not present. The first step in the process is to define the cohorts, which in this case will be benign and malignant cases for colon and prostate. Cohorts will be defined based on ICD10 code, which is sufficient to determine tissue type and classification. While it is possible to interactively define the cohort using the self-service portal, we made use of the API to programmatically define malignant cohorts with known malignant condition codes and the benign cohorts will both known benign codes and non-cancerous condition codes matching the tissue type. The interactive portal was used to construct the two datasets, where malignant cohorts were assigned a label of class “1” and benign class “0”. Cases containing both labels were discarded, with the remaining cases being balanced between training (80%) and testing (20%) sets. Balancing on the case-level ensures that slide data originating from the same case is not distributed between training and testing sets. While cases can be balanced using this approach, this does not guarantee that the associated cases are balanced across sets. We have observed considerably larger numbers of slides in cases containing malignancy, which must be balanced with benign cases. To address this imbalance down sampling was used to reduce dominate slides in cases based on the average slide count per the least prevalent class type. A one-time WSI data extraction task was created for each tissue type using the described data extraction task. ML processes tasks using MIL modeling were defined for the respective dataset extractions. The platform executed the ML tasks, recording training and tuning parameters, reported results, and stored the resulting models. For the prostate model a set of over 350 slides were extracted after filtering and down sampling. The prostate MIL model achieved a maximum AUC of 0.91. Likewise, the colon model was composed of over 250 slides and achieved a maximum AUC of 0.85. While the results of the training are not remarkable, we have demonstrated the ability to generate ML models programmatically, continuously, and repeatably.

Discussion

Numerous software packages exist to manage digital pathology efforts. Platforms indented for clinical interpretation focus on slide and case viewing^28,29. Research repository platforms focus on data curation and dataset identification^30,31. Digital pathology AI platforms augment slide viewing systems with annotation and platform-specific AI modules^1,2,4. To our knowledge no widely accepted platform has been developed with the intended purpose of systematically generating, testing, and operationally deploying clinical validatable models derived from existing clinical sources. Indeed, the existing platforms we compared against were either limited to monolithic applications residing on a single machine or did not cover the entire pipeline from data acquisition to functioning AI implementation. The automated curation of digital pathology assets, generation of AI-actionable data, and the tracking of model and data provenance and performance over time, while critical, are not commonly available in most platforms. Additionally, by leveraging a worker distribution system for task dissemination, the system becomes scalable to larger datasets and AI models in both size and number.

In future work we hope to address a few known platform limitations. Currently, implementing models developed within the platform is a manual process. We are actively exploring new methods to advance operational inferencing capabilities, including the integration of model inferencing servers³² as part of the automated platform. While the platform can operate using local passwords, we have implemented Single Sign-On (SSO) using OAuth2³³ across the platform web-interfaces. However, this implementation only works with our institutional identity provider. As part of future effort, we will implement an identity provider allowing for federated identity management and authentication.

In this paper we have discussed the motivations and approach we have taken to address what we consider gaps in existing digital pathology platforms. We have described novel approaches to distributed data and process management using agent-based networks. Finally, we demonstrated the ability of our platform to programmatically generate two classification models using a described pipeline. While the platform has been described in the context of model training, the same processes are used for model inferencing. For example, to prospectively evaluate our prostate and colon models, we would describe incoming datasets in terms of tissue type and define an inference pipeline with associated models.

Acknowledgements

This work has been supported by the University of Kentucky Institute for Biomedical Informatics, Center for Clinical and Translational Sciences, and the College of Medicine AI in Medicine Alliance.

The project described was supported by the NIH National Center for Advancing Translational Sciences through grant number UL1TR001998. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Figures & Table

References

1.Allan C, Burel JM, Moore J, Blackburn C, Linkert M, Loynton S, et al. OMERO: flexible, model-driven data management for experimental biology. Nat Methods. 2012 Mar;9(3):245–53. doi: 10.1038/nmeth.1896. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Collaborative analysis of multi-gigapixel imaging data using Cytomine | Bioinformatics | Oxford Academic [Internet] [cited 2023 Jan 24]. Available from: https://academic.oup.com/bioinformatics/article/32/9/1395/1744553. [DOI] [PMC free article] [PubMed]
3.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012 Jul;9(7):676–82. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.ASAP - Automated Slide Analysis Platform [Internet] [cited 2023 Jan 24]. Available from: https://computationalpathologygroup.github.io/ASAP/
5.Banker K, Garrett D, Bakkum P, Verch S. MongoDB in Action: Covers MongoDB version 3.0. Simon and Schuster. 2016:680. [Google Scholar]
6.Bumgardner VKC, Marek VW, Hickey CD. Cresco: A distributed agent-based edge computing framework. 2016 12th International Conference on Network and Service Management (CNSM) 2016:400–5. [Google Scholar]
7.Melnikov A, Fette I. The WebSocket Protocol [Internet] Internet Engineering Task Force. 2011 Dec [cited 2023 Jan 24]. Report No.: RFC 6455. Available from: https://datatracker.ietf.org/doc/rfc6455. [Google Scholar]
8.Snyder B, Bosnanac D, Davies R. ActiveMQ in Action. Greenwich Conn.: Manning; 2011. [Google Scholar]
9.Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of Pathology Informatics. 2013 Jan 1;4(1):27. doi: 10.4103/2153-3539.119005. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Amazon SageMaker Autopilot | Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning [Internet] [cited 2023 Jan 24]. Available from: https://dl.acm.org/doi/abs/10.1145/3399579.3399870.
11.Pachmann T. An evaluation and comparison of AutoML solutions: Azure AutoML and EvalML. 2022 [cited 2023 Jan 24]; Available from: https://opus4.kobv.de/opus4-h-da/frontdoor/index/index/docId/271. [Google Scholar]
12.MONAI Consortium. MONAI: Medical Open Network for AI
13.Ilse M, Tomczak J, Welling M. Attention-based Deep Multiple Instance Learning. Proceedings of the 35th International Conference on Machine Learning [Internet]. PMLR. 2018 [cited 2023 Jan 24]. 2127–36. Available from: https://proceedings.mlr.press/v80/ilse18a.html. [Google Scholar]
14.Gutman DA, Khalilia M, Lee S, Nalisnik M, Mullen Z, Beezley J, et al. The Digital Slide Archive: A Software Platform for Management, Integration, and Analysis of Histology for Cancer Research. Cancer Research. 2017 Oct 31;77(21):e75–8. doi: 10.1158/0008-5472.CAN-17-0629. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Goldberg IG, Allan C, Burel JM, Creager D, Falconi A, Hochheiser H, et al. The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 2005 May 3;6(5):R47. doi: 10.1186/gb-2005-6-5-r47. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Diaz-Pinto A, Alle S, Ihsani A, Asad M, Nath V, Pérez-García F, et al. MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images [Internet] arXiv. 2022. [cited 2023 Jan 24]. Available from: http://arxiv.org/abs/2203.12362. [DOI] [PubMed]
17.ClearML: MLOps for Data Scientists, ML Engineers, and DevOps [Internet]. ClearML. [cited 2023 Jan 24]. Available from: https://clear.ml/
18.Anderson C. Docker [Software engineering] IEEE Software. 2015 May;32(3):102–c3. [Google Scholar]
19.Williams B, Damstra D, Stern H. Professional WordPress: Design and Development. John Wiley & Sons; 2015. p. 504. [Google Scholar]
20.Free WordPress Theme IKnow by Wow-Company [Internet] Wow-Company. 2020. [cited 2023 Jan 24]. Available from: https://wow-company.com/iknow-theme/
21.Digital Pathology Microscope Slide Scanners - Whole Slide Imaging [Internet] [cited 2023 Jan 24]. Available from: https://www.leicabiosystems.com/us/digital-pathology/scan/
22.Philips IntelliSite Pathology Solution [Internet]. Philips. [cited 2023 Jan 24]. Available from: https://www.usa.philips.com/healthcare/resources/landing/philips-intellisite-pathology-solution.
23.Helin H, Tolonen T, Ylinen O, Tolonen P, Napankangas J, Isola J. Optimized JPEG 2000 Compression for Efficient Storage of Histopathological Whole-Slide Images. Journal of Pathology Informatics. 2018 Jan 1;9(1):20. doi: 10.4103/jpi.jpi_69_17. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hulsken B. Fast Compression Method for Medical Images on the Web [Internet] arXiv. 2020 [cited 2023 Jan 24]. Available from: http://arxiv.org/abs/2005.08713. [Google Scholar]
25.Leigh R, Gault D, Linkert M, Burel JM, Moore J, Besson S, et al. OME Files - An open source reference library for the OME-XML metadata model and the OME-TIFF file format [Internet] bioRxiv. 2017 [cited 2023 Jan 24]. p. 088740. Available from: https://www.biorxiv.org/content/10.1101/088740v2. [Google Scholar]
26.Marcolini A, Bussola N, Arbitrio E, Amgad M, Jurman G, Furlanello C. histolab: A Python library for reproducible Digital Pathology preprocessing with automated testing. SoftwareX. 2022 Dec 1;20:101237. [Google Scholar]
27.Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. 2017 Dec 4;7(1):16878. doi: 10.1038/s41598-017-17204-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019 Aug;25(8):1301–9. doi: 10.1038/s41591-019-0508-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Modern Digital Pathology Software to Accelerate R&D [Internet]. Proscia. [cited 2023 Jan 24]. Available from: https://proscia.com/concentriq-for-research/
30.Guiter GE, Sapia S, Wright AI, Hutchins GGA, Arayssi T. Development of a Remote Online Collaborative Medical School Pathology Curriculum with Clinical Correlations, across Several International Sites, through the Covid-19 Pandemic. MedSciEduc. 2021 Apr 1;31(2):549–56. doi: 10.1007/s40670-021-01212-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Virtual Slide Box - Home [Internet] [cited 2023 Jan 24]. Available from: https://www.pathology.med.umich.edu/slides/index.php.
32.Jahanshahi A, Sabzi HZ, Lau C, Wong D. GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers. IEEE Computer Architecture Letters. 2020 Jul;19(2):139–42. [Google Scholar]
33.Boyd R. Getting Started with OAuth 2.0. O’Reilly Media, Inc. 2012:81. [Google Scholar]

[r1-2214] 1.Allan C, Burel JM, Moore J, Blackburn C, Linkert M, Loynton S, et al. OMERO: flexible, model-driven data management for experimental biology. Nat Methods. 2012 Mar;9(3):245–53. doi: 10.1038/nmeth.1896. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2-2214] 2.Collaborative analysis of multi-gigapixel imaging data using Cytomine | Bioinformatics | Oxford Academic [Internet] [cited 2023 Jan 24]. Available from: https://academic.oup.com/bioinformatics/article/32/9/1395/1744553. [DOI] [PMC free article] [PubMed]

[r3-2214] 3.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012 Jul;9(7):676–82. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4-2214] 4.ASAP - Automated Slide Analysis Platform [Internet] [cited 2023 Jan 24]. Available from: https://computationalpathologygroup.github.io/ASAP/

[r5-2214] 5.Banker K, Garrett D, Bakkum P, Verch S. MongoDB in Action: Covers MongoDB version 3.0. Simon and Schuster. 2016:680. [Google Scholar]

[r6-2214] 6.Bumgardner VKC, Marek VW, Hickey CD. Cresco: A distributed agent-based edge computing framework. 2016 12th International Conference on Network and Service Management (CNSM) 2016:400–5. [Google Scholar]

[r7-2214] 7.Melnikov A, Fette I. The WebSocket Protocol [Internet] Internet Engineering Task Force. 2011 Dec [cited 2023 Jan 24]. Report No.: RFC 6455. Available from: https://datatracker.ietf.org/doc/rfc6455. [Google Scholar]

[r8-2214] 8.Snyder B, Bosnanac D, Davies R. ActiveMQ in Action. Greenwich Conn.: Manning; 2011. [Google Scholar]

[r9-2214] 9.Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of Pathology Informatics. 2013 Jan 1;4(1):27. doi: 10.4103/2153-3539.119005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10-2214] 10.Amazon SageMaker Autopilot | Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning [Internet] [cited 2023 Jan 24]. Available from: https://dl.acm.org/doi/abs/10.1145/3399579.3399870.

[r11-2214] 11.Pachmann T. An evaluation and comparison of AutoML solutions: Azure AutoML and EvalML. 2022 [cited 2023 Jan 24]; Available from: https://opus4.kobv.de/opus4-h-da/frontdoor/index/index/docId/271. [Google Scholar]

[r12-2214] 12.MONAI Consortium. MONAI: Medical Open Network for AI

[r13-2214] 13.Ilse M, Tomczak J, Welling M. Attention-based Deep Multiple Instance Learning. Proceedings of the 35th International Conference on Machine Learning [Internet]. PMLR. 2018 [cited 2023 Jan 24]. 2127–36. Available from: https://proceedings.mlr.press/v80/ilse18a.html. [Google Scholar]

[r14-2214] 14.Gutman DA, Khalilia M, Lee S, Nalisnik M, Mullen Z, Beezley J, et al. The Digital Slide Archive: A Software Platform for Management, Integration, and Analysis of Histology for Cancer Research. Cancer Research. 2017 Oct 31;77(21):e75–8. doi: 10.1158/0008-5472.CAN-17-0629. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15-2214] 15.Goldberg IG, Allan C, Burel JM, Creager D, Falconi A, Hochheiser H, et al. The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 2005 May 3;6(5):R47. doi: 10.1186/gb-2005-6-5-r47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16-2214] 16.Diaz-Pinto A, Alle S, Ihsani A, Asad M, Nath V, Pérez-García F, et al. MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images [Internet] arXiv. 2022. [cited 2023 Jan 24]. Available from: http://arxiv.org/abs/2203.12362. [DOI] [PubMed]

[r17-2214] 17.ClearML: MLOps for Data Scientists, ML Engineers, and DevOps [Internet]. ClearML. [cited 2023 Jan 24]. Available from: https://clear.ml/

[r18-2214] 18.Anderson C. Docker [Software engineering] IEEE Software. 2015 May;32(3):102–c3. [Google Scholar]

[r19-2214] 19.Williams B, Damstra D, Stern H. Professional WordPress: Design and Development. John Wiley & Sons; 2015. p. 504. [Google Scholar]

[r20-2214] 20.Free WordPress Theme IKnow by Wow-Company [Internet] Wow-Company. 2020. [cited 2023 Jan 24]. Available from: https://wow-company.com/iknow-theme/

[r21-2214] 21.Digital Pathology Microscope Slide Scanners - Whole Slide Imaging [Internet] [cited 2023 Jan 24]. Available from: https://www.leicabiosystems.com/us/digital-pathology/scan/

[r22-2214] 22.Philips IntelliSite Pathology Solution [Internet]. Philips. [cited 2023 Jan 24]. Available from: https://www.usa.philips.com/healthcare/resources/landing/philips-intellisite-pathology-solution.

[r23-2214] 23.Helin H, Tolonen T, Ylinen O, Tolonen P, Napankangas J, Isola J. Optimized JPEG 2000 Compression for Efficient Storage of Histopathological Whole-Slide Images. Journal of Pathology Informatics. 2018 Jan 1;9(1):20. doi: 10.4103/jpi.jpi_69_17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24-2214] 24.Hulsken B. Fast Compression Method for Medical Images on the Web [Internet] arXiv. 2020 [cited 2023 Jan 24]. Available from: http://arxiv.org/abs/2005.08713. [Google Scholar]

[r25-2214] 25.Leigh R, Gault D, Linkert M, Burel JM, Moore J, Besson S, et al. OME Files - An open source reference library for the OME-XML metadata model and the OME-TIFF file format [Internet] bioRxiv. 2017 [cited 2023 Jan 24]. p. 088740. Available from: https://www.biorxiv.org/content/10.1101/088740v2. [Google Scholar]

[r26-2214] 26.Marcolini A, Bussola N, Arbitrio E, Amgad M, Jurman G, Furlanello C. histolab: A Python library for reproducible Digital Pathology preprocessing with automated testing. SoftwareX. 2022 Dec 1;20:101237. [Google Scholar]

[r27-2214] 27.Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. 2017 Dec 4;7(1):16878. doi: 10.1038/s41598-017-17204-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28-2214] 28.Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019 Aug;25(8):1301–9. doi: 10.1038/s41591-019-0508-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29-2214] 29.Modern Digital Pathology Software to Accelerate R&D [Internet]. Proscia. [cited 2023 Jan 24]. Available from: https://proscia.com/concentriq-for-research/

[r30-2214] 30.Guiter GE, Sapia S, Wright AI, Hutchins GGA, Arayssi T. Development of a Remote Online Collaborative Medical School Pathology Curriculum with Clinical Correlations, across Several International Sites, through the Covid-19 Pandemic. MedSciEduc. 2021 Apr 1;31(2):549–56. doi: 10.1007/s40670-021-01212-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31-2214] 31.Virtual Slide Box - Home [Internet] [cited 2023 Jan 24]. Available from: https://www.pathology.med.umich.edu/slides/index.php.

[r32-2214] 32.Jahanshahi A, Sabzi HZ, Lau C, Wong D. GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers. IEEE Computer Architecture Letters. 2020 Jul;19(2):139–42. [Google Scholar]

[r33-2214] 33.Boyd R. Getting Started with OAuth 2.0. O’Reilly Media, Inc. 2012:81. [Google Scholar]

PERMALINK

Automated Curation and AI Workflow Management System for Digital Pathology

V K Cody Bumgardner, PhD

Sam Armstrong, BS

Alexandr Virodov, MS

Caylin Hickey, BS

Abstract

Introduction

Methods

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Results

Discussion

Acknowledgements

Figures & Table

Figure 9.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Automated Curation and AI Workflow Management System for Digital Pathology

V K Cody Bumgardner, PhD

Sam Armstrong, BS

Alexandr Virodov, MS

Caylin Hickey, BS

Abstract

Introduction

Methods

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Results

Discussion

Acknowledgements

Figures & Table

Figure 9.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases