Abstract
Adopting a computational approach for the assessment of urine cytology specimens has the potential to improve the efficiency, accuracy and reliability of bladder cancer screening, which has heretofore relied on semi-subjective manual assessment methods. As rigorous, quantitative criteria and guidelines have been introduced for improving screening practices, e.g., The Paris System for Reporting Urinary Cytology (TPS), algorithms to emulate semi-autonomous diagnostic decision-making have lagged behind, in part due to the complex and nuanced nature of urine cytology reporting. In this study, we report on a deep learning tool, AutoParis-X, which can facilitate rapid semi-autonomous examination of urine cytology specimens. Through a large-scale retrospective validation study, results indicate that AutoParis-X can accurately determine urothelial cell atypia and aggregate a wide-variety of cell and cluster-related information across a slide to yield an Atypia Burden Score (ABS) that correlates closely with overall specimen atypia, predictive of TPS diagnostic categories. Importantly, this approach accounts for challenges associated with assessment of overlapping cell cluster borders, which improved the ability to predict specimen atypia and accurately estimate the nuclear-to-cytoplasm (NC) ratio for cells in these clusters. We developed an interactive web application that is publicly available and open-source, which features a simple, easy-to-use display for examining urine cytology whole-slide images (WSI) and determining the atypia level of specific cells, flagging the most abnormal cells for pathologist review. The accuracy of AutoParis-X (and other semi-automated digital pathology systems) indicates that these technologies are approaching clinical readiness and necessitates full evaluation of these algorithms via head-to-head clinical trials.
Introduction
Urothelial carcinoma is highly prevalent (9th most common worldwide) and has the highest recurrence rate among all forms of cancer (74%) 1,2. The treatment and management of urothelial carcinoma requires follow-up urine cytology (UC), expensive, painful chemotherapy, and/or invasive cystoscopy procedures for long periods of time (typically the remainder of the patient’s life), necessitating the development and implementation of less invasive screening and follow up measures 3.
The detection and screening for bladder cancer has greatly improved since the earliest recorded evaluation of hematuria was recorded in the papyrus of Kahun, circa 1900 B.C.. In 1550 B.C., it was suggested that hematuria originated from “worms in the belly” 4. A causative agent, S. haematobium, was identified in 1854 by Theodor Bilharz 5,6. In 1947, Dr. George Papanicolaou, widely considered the father of modern cytopathology, proposed a formal system for evaluation of malignant cells exfoliated from the bladder’s epithelium, which has largely remained intact 7,8. Over the past half-century, efforts to rigorously define quantitative assessment criteria (e.g., nuclear-to-cytoplasm (NC) ratio, chromatin structure, etc.) and improve specimen preparation methods have sought to resolve remaining ambiguity. Yet, traditional cytological approaches are still hampered by inter-rater variability, specimen quality issues, and the tendency towards ‘hedging’ to the atypical category 9-12.
In recent years, The Paris System for Reporting Urinary Cytology (TPS), formulated in 2013, published in 2016, and updated in 2022, has emerged as a more quantitative and reproducible reporting system bladder cancer 13-17. TPS criteria are applied to assign one of four main ordered categories (negative, atypical urothelial cells, suspicious for high-grade urothelial carcinoma, positive for high-grade urothelial carcinoma) based on the following criteria for a positive diagnosis: (1) at least five malignant urothelial cells (updated to ten in 2022), (2) an NC ratio at or above 0.7, (3) nuclear hyperchromasia, (4) markedly irregular nuclear membrane, and (5) coarse/clumped chromatin 2. It is often easier to evaluate specimens that have clear-cut diagnoses, either negative or positive, than those that are atypical or suspicious. Atypical specimens are those that are hedged against a negative diagnosis, while suspicious specimens are those that are hedged against a positive diagnosis, but allow fewer than five malignant cells to be detected. Unsurprisingly, the two indeterminate designations suffer from poor inter-rater variability 12,17.
There are a number of drawbacks to cytological assessments, despite improvements in screening criteria: cytology slides are far less structured than traditional histological specimens (as they are a random dispersion of cells); there is high inter-rater variability; and the workload involved often leads to cytologist exhaustion– all of these factors increase the likelihood of misclassification. Furthermore, TPS does not introduce rigorous screening criteria for urothelial cell clusters, instead mainly relying on aggregates of individual cellular estimates. Systems to automate the assessment of cytology specimens can provide more quantitative assessments of atypia, while improving reliability and reproducibility.
Advances in cytopathology vis-à-vis increased automation can bring several benefits to all stakeholders in the healthcare space 18-22. The adoption of computer assisted Papanicolaou (‘Pap’) test screening helped laboratories address overwhelming numbers of tests that formerly required manual screening, leading to inevitable workflow backlogs and diagnostic errors resulting from overwork. The end result of this practice was the drafting of the CLIA-88 regulations concerning cytotechnologist workload limits and the development of semi-automated Pap screening devices such as the FocalPoint™ GS and the ThinPrep ® Imaging System (TIS) 23,24. The commercial success of these automated systems in the gynecologic cytology market provides a window into the possibilities of future computational applications in urine cytology 25-33. The factors which drove the creation of automated gynecologic cytology systems are similarly present in urine cytology: to improve clinical outcomes and integrate smoothly within the daily workflows of cytopathology laboratories. Outside of gynecologic cytology, several computational methods have been developed for cytological applications in screening cancers of varying types of specimens 18,34-36. For instance, efforts have been made to screen potential malignancies in thyroid fine-needle aspirations (FNA), liquid-based lung cancer specimens, pancreaticobiliary FNA, breast lesions, and urine specimens 37-42.
Systems to automate cytology screening can provide more quantitative assessments of atypia while improving reliability, precision and reproducibility of findings. State-of-the-art approaches leverage deep learning, which relies on the use of artificial neural networks (ANN– inspired by the central nervous system), to construct indicators of atypia that can be formulated into diagnostic tests. For instance, Sanghvi et al. developed a semi-autonomous diagnostic decision aid for bladder cancer using a deep learning algorithm to quantify abnormal cytomorphological features 43. The algorithm detected urothelial cells using QuPath, urothelial clusters using density-based clustering and used convolutional neural networks for scoring cells for atypia (e.g., NC ratio, hyperchromasia, etc.). Although the effectiveness of QuPath, the scoring algorithms, and density-based clustering was not fully discussed, the study showed promising results in estimating overall atypia and could potentially improve bladder cancer screening. However, it should be noted that other studies have highlighted the limitations of QuPath in disaggregation of cells within clusters in favor of detection-based approaches, indicating a need for further refinement of the algorithm 44-47.
We previously developed the AutoParis system to automatically report the presence of malignant cells across cytology specimens through cross-tabulation of the degree of atypia and NC ratio for all urothelial cells in the preparation 48. Cross-tabulation is used to generate an Atypia Burden Score (ABS) to directly classify the specimen. The current AutoParis system operates by: 1) using connected component analysis (morphometry) and watershedding to separate individual cells from cell clusters within the specimen; 2) estimating the NC ratio of the cell using a segmentation neural network to separate the nucleus and cytoplasmic components on a pixel-by-pixel basis; 3) simultaneously assigning the cell as urothelial and recording whether the cell is atypical (atypia score) from a classifier which separates negative urothelial cells, positive urothelial cells, leukocytes, red blood cells (RBCs), debris, squamous, and crystals; and 4) generating digital images in which the cells are arranged in order of atypia, which could be helpful to pathologists. Limitations in current classification systems for urine cytology include 20: 1) confounding by the presence of blood, high cellularity, neobladders (abundant degenerated enterocytes) and scanning artifacts. Other previously unaccounted for cell types may also confound classifiers (e.g., polyomavirus encrusted cells conflated with positive urothelial cells, leukocytes vs. clusters of leukocytes, urothelial cells with no nucleus present, renal tubule cells) 49; 2) morphometry algorithms may not scale to hundreds of thousands of cells at maximal resolution; 3) density-based clustering / watershedding is likely insufficient to separate overlapping cells; 4) using a single classifier does not adequately separate the tasks of determining whether a cell is both urothelial and atypical; 6) orientation and size of cell could confound the classifier; and 7) existing graphical displays for communicating the burden of atypia are static rather than dynamic.
We set out to improve on the AutoParis classification tool by addressing the above limitations and additionally trained the models using a more expansive dataset– we dub the new tool AutoParis-X (AP-X). In AutoParis-X, we addressed challenges associated with cell cluster assessment by developing an artificial intelligence tool that uses detection models to localize urothelial cells, overlapping cell boundaries, dense regions of significant overlap, and identify visual markers of urothelial atypia. By breaking clusters into their constituent architectural components, this preprocessing tool facilitates downstream association studies and predictive algorithms that incorporate quantitative cluster-level features. The cell border identification tool helped develop a more comprehensive understanding of urothelial cell cluster atypia as it pertains to bladder cancer screening. In comparison to the previous AutoParis study, which was validated on a small well-curated test set, we performed a large-scale retrospective validation of AutoParis-X on nearly 1,300 real-world specimens from internal cohorts. In this manuscript, we discuss improvements to the previous approach and its potential for real-time assessment as a mature diagnostic decision aid.
Methods
Specimen Collection and Slide Processing
A total of 1,303 urine specimens were collected across 140 bladder cancer patients (median of 8 specimens per patient; IQR: [8-13]) from 2008 to 2019 at Dartmouth-Hitchcock Medical Center. Forty-seven of these specimens were used to curate data for training the cell and cluster-level machine learning models (cell and cluster-level training and validation cohort). Four specimens were removed due to equivocal findings and/or excessive confluent cellularity. AutoParis-X was further trained and validated on 1,252 specimens after curating slide-level cell/cluster predictors (slide-level training and validation cohorts; see Calculation of Cell and Cluster Slide-Level Scores). The specimens were prepared using ThinPrep® and Papanicolaou staining before being examined microscopically 24. The urine slides were scanned using a Leica Aperio-AT2 scanner at 40× resolution and were stored as 70% quality SVS files representing whole slide images. The slides were manually focused (by a trained technician) on a single plane during scanning, and z-stacking was not used. Patient and slide-level characteristics from the slide-level training and validation cohorts can be found in Table 1. All slides were assessed by a group of five cytopathologists using TPS criteria (negative for high grade urothelial carcinoma, atypical urothelial cells, suspicious for high grade urothelial carcinoma, positive for high grade urothelial carcinoma) 12.
Table 1:
Overall | |
---|---|
Number Specimens | 1252 |
Voided (%) | 1103 (88.1) |
Prior History Hematuria (%) | 171 (13.7) |
Diagnosis (%) | |
Negative for High Grade Urothelial Carcinoma | 810 (64.7) |
Atypical Urothelial Cells | 296 (23.6) |
Suspicious for High Grade Urothelial Carcinoma | 98 (7.8) |
Positive for High Grade Urothelial Carcinoma | 48 (3.8) |
Contains Artifact (%) | 265 (21.2) |
Number Patients | 140 |
Age (mean (SD)) | 71.19 (12.37) |
Sex = M (%) | 106 (75.7) |
Methods Overview
In this section, we summarize improvements introduced in AutoParis-X, which will be elaborated on in following sections. AutoParis-X was written using the Python programming language and neural networks were implemented using the PyTorch and Detectron2 frameworks 50,51. Statistical and machine learning models were implemented in Python and R 52-54. A graphical overview is provided in Figure 1:
Slide processing– Connected components analysis to isolate individual cells and cell clusters, sped up through parallel processing 55.
Cell border detection (BorderDet)– Isolates cells within urothelial clusters with overlapping cytoplasmic borders through neural network detection model 44.
- Cell-Level Measures:
- Morphometric measures– Additional morphological features to improve cell-type classification and atypia estimation (e.g., size / area).
- Urothelial Classifier (UroNet)– Used to filter urothelial cells from potentially conflated cell types through a convolutional neural network, which operates on images of cells and their morphometric measures 56– trained on an expanded dataset with more cell classes.
- NC ratio estimation (UroSeg)– Estimates the NC ratio by neural network pixelwise segmentation of background, nucleus and cytoplasm. Used as objective marker of atypia.
- Atypia score (AtyNet)– For predicted urothelial cells at a particular cutoff threshold, a subjective score which incorporates multiple screening criteria (e.g., hyperchromasia, etc.) is determined using another convolutional neural network which operates on images of cells and their morphometric measures and outputs an atypia score 48.
Cell- and Cluster- Slide-level scores– Established through a combination of the above scoring methods, counting the number of cells/clusters in the slide with atypical morphology / cluster architecture as defined by previous works 43,48. Optimal decision cutoffs for determining cellular/cluster atypia were decided using Bayesian Optimization techniques 57.
Classifier development– Machine learning classifier which integrates cell and cluster level scores and other demographic/specimen characteristics into an Atypia Burden Score (ABS), accounting for repeat measures by patient 58-64.
Model interpretation– A hierarchical logistic regression model was constructed from the machine learning model to identify important indicators of atypia, in addition to analogous univariable models. Helpful graphical displays were generated through an interactive web application 65.
Demo– A demo was deployed to an Amazon Web Services (AWS) server and software released through GitHub and PyPI.
Slide Preprocessing
As detailed in a previous work, individual objects in the image were identified through a connected component analysis 48. In brief, WSI were converted into grey scale images using opencv2 in Python (version 3.8) 66. The background of WSIs were converted to white through intensity thresholding of the grey scale image to form an object mask. Small objects, defined as a pixelwise area of 50 or below, were filtered using the remove_small_objects (scipy, Python v3.8) morphological operation 67. Large objects (e.g., ink markings) were similarly filtered as defined by a minimal area of 500,000 pixels. After small and large object removal, holes within the object mask were filled through the fill_voids function (which is faster than offerings from the scipy package) 68. We leveraged the cupy package (Python v3.8) to reduce compute time through usage of Graphics Processing Units (GPU) where appropriate after extensive timing tests 69.
Subimages of slide objects (e.g., candidate urothelial cells and clusters) were returned using the scipy regionprops function, which also returned various other morphometric measures and bounding boxes. Inference time and memory usage for the connected component analysis for object identification was reduced through distributed computing procedures (e.g., Dask), which use optimized parallelization to operate on larger-than-memory arrays. Using multiprocessing through dask, operations were also parallelized across subregions within the slide 55.
Cell Border Identification for Cell Cluster Analysis
To improve detection of individual cells within clusters, we previously developed a cell detection neural network, BorderDet, (using the state-of-the-art Detectron2 framework) to identify: 1) location of cells through estimation of bounding boxes (one box per cell) and 2) identify cell boundaries by separating overlapping cytoplasm from adjacent cells . BorderDet was developed using cell clusters identified from the cluster-level training cohort. In brief, two cytopathologists (LJV and XL) annotated 800 cell cytoplasmic boundaries for squamous cells, inflammatory cells, negative/atypical urothelial cells, and dense regions of overlapping/indistinguishable cell borders (dense region). BorderDet is an object detection neural network that can detect multiple objects/instances (i.e., cells) in a cell cluster image 44. It looks for areas in the image that may contain an object and then assigns a score that indicates how likely it is that the region contains an object. The program labels identified objects with the appropriate label (e.g., squamous cell, dense region) and draws a line around the edges of the object (i.e., segmentation mask) to portray the exact boundary, which can overlap with adjacent cells. This allows the program to accurately identify and locate multiple objects in a single cluster. Objects were then filtered using non-max suppression, a technique which ranks overlapping objects, as defined through their intersection over union (IoU), based on their “objectness” score and removes objects with a lower score 70.
To reduce the number of objects assessed using BorderDet, a size filter was enforced, assessing candidate cell clusters with a pixelwise area of at least 1800 pixels, determined through a sensitivity analysis and visual inspection. Parallel processing through multithreading and multiprocessing was integrated using dask for rapid evaluation 55. Individual cells extracted through the connected component analysis (area between 256 and 1800 pixels) and objects extracted from clusters using their instance segmentation masks were further assessed using single-cell algorithms which report quantitative metrics of atypia (cell-level measures).
In comparison to the density-based clustering approach that validated urothelial clusters using a CNN (Sanghvi et al.), which could lead to many false negative findings (i.e., approach only “screens out” candidate cell clusters), urothelial cell clusters were identified by BorderDet if they contained urothelial cells 43. This approach improves on watershedding (AutoParis v1) and density-clustering (Sanghvi et al.) techniques as these two methods do not precisely identify cells within larger candidate clusters 20,43,44,48. BorderDet also improves upon previous methods by locating dense urothelial cell architectures with overlapping indistinguishable cytoplasmic borders which are challenging to assess for individual cells. Furthermore, while presence of a dense architectural region in a cluster as defined by an area cutoff was used as an atypia predictor, dense architectures themselves were further subclassified as atypical if surrounding urothelial cells were labeled as atypical (as defined by morphology).
Cellular Morphometric Measures
Various morphometric features were estimated from individual candidate cells, including: 1) area; 2) convex area; 3) eccentricity; 4) equivalent diameter; 5) extent; 6) Feret’s diameter; 7) maximum diameter; 8) filled area; 9) major axis length; 10) minor axis length; 11) perimeter; and 12) solidity, extracted using scikit-image (Python v3.8) 56,71. These morphometric features were primarily used to help demarcate urothelial cells. As an example, urothelial cells are significantly larger than leukocytes, so cell area is an important criterion for separating the two cell types. Morphometric features were standardized using quantile transformation (implemented in scikit-learn, Python v3.8) within the training set to reduce the influence of any given cell on specifically which morphometric features were important for the assessment 72. This places greater emphasis on the imaging findings as means to delineate between different cell types.
Urothelial Cell Classification
Urothelial cell classification was accomplished using UroNet, which was modified significantly from its original incarnation. While AutoParis estimated both the presence and atypia of the urothelial cell simultaneously 48, as differentiated from several other specimen constituents, AutoParis-X is chiefly focused on delineating urothelial cells from potentially conflated cell types and slide objects prior to estimating atypia. When aiming to validate the AutoParis algorithm, we noticed that a nontrivial number of urothelial cells lacked a nucleus, potentially related to being out of focus (no Z-stacking) 73, but were not included in our original training set and thus were often confused with other cell types with a smaller nuclear area (e.g., squamous cells). We also identified rare urothelial cells with changes consistent with a Polyomavirus cytopathic effect 49,74. These cells are benign but assessment can often mimic HGUC and would certainly mislead any attempt to accurately predict the NC ratio and are thus removed by UroNet.
A total of 108,388 and 27,097 cells were manually labeled by two cytopathologists (LJV and XL) and used to train and validate the cell level model respectively from the cell-level training and validation cohort. A breakdown of cell types present in this training and validation cohort is listed in Table 2. These cell images were combined into the following classes: 1) urothelial cells (benign/atypical), 2) urothelial cells with polyomavirus cytopathic effect, 3) debris, crystals and red blood cells (RBC), 4) leukocytes, 5) clusters of leukocytes, and 6) squamous cells. UroNet was developed using a residual neural network (ResNet18), augmented with an auxiliary layer which combines the morphometric information (e.g., area/size, eccentricity, etc.) with features extracted from ResNet18 by fusing the penultimate layer of the network with this information. The auxiliary neural network first maps the number of morphometric features, , to the number of ResNet18 features using a multi-layer perceptron, . Then the morphometric information (same dimensionality as the ResNet features) is fused with the deep learning features using a gated attention operation, which decides dynamically on a cell-by-cell basis which set of features (deep learning, , vs morphometric, ) to weight more. The weight is dynamically determined using the gating neural network, 75.
Table 2:
Benign Urothelial Cells |
Atypical Urothelial Cells |
Polyomavirus Infected Cells |
RBC | Crystals | Debris | Leukocyte | Leukocyte Cluster |
Squamous Cells |
|
---|---|---|---|---|---|---|---|---|---|
Training | 3522 | 3795 | 3606 | 11199 | 220 | 63317 | 8037 | 3425 | 11267 |
Validation | 880 | 949 | 901 | 2800 | 55 | 15830 | 2009 | 856 | 2817 |
This operation permits UroNet to filter out cells with significant size differences (e.g., leukocytes are much smaller than urothelial cells). After model training using the PathflowAI package 76, the performance of UroNet was assessed using the cell-level validation set through the area under the receiver operating characteristic curve (AUC), reported for each class. To assess how much weight was placed on the morphometric features for prediction, we investigated the attention weights, , across the validation set. We used Integrated Gradients 77,78, a deep learning interpretation method, to assess which specific image/deep learning and morphometric features were important for each cell type.
NC Ratio Estimation
For cells classified as urothelial, the NC ratio was calculated for both isolated and cluster cells using a segmentation neural network, UroSeg, which employed a U-Net architecture to assign on a pixelwise basis the presence of nucleus, cytoplasm, or background 48,79,80. These areas were annotated/outlined by cytopathologists and UroSeg was trained and validated on 3,690 and 1,231 urothelial cells respectively. Performance was reported using the area under the receiver operating characteristic curve (AUC), reported on a pixelwise basis. For select cell clusters, we compared the impact of running BorderDet, followed by UroNet and UroSeg to calculate the NC ratio as compared to running UroSeg then watershedding, as was originally done by the previous AutoParis algorithm.
Atypia Score
Several cytopathologists determined whether every urothelial cell extracted from the cell-level training and validation cohort (Table 2) was benign or atypical, based on existing markers of atypia (e.g., presence of nuclear membrane irregularity, abnormal chromatin, hyperchromasia, etc.). From this information, AtyNet, a CNN based on ResNet18 with a similar architecture as UroNet, was trained to recapitulate these subjective findings 81. For every urothelial cell, AtyNet calculates a subjective marker of atypia– the atypia score– which is a value from 0-1 that reflects the probability that a cell is atypical. We used IntegratedGradients, a deep learning interpretation method, to assess which specific image/deep learning and morphometric features were important for atypia assignment.
Calculation of Cell and Cluster Slide-Level Scores
All extracted individual cell and cluster level statistics are placed into Rich Information Frames (RIF), which are data frame/tabular data structures 48. For any given WSI, there are three RIFs (see Table 3 for description of features):
Table 3:
Level | Predictor | Algorithm | Description |
---|---|---|---|
Cell | Urothelial cell score | UroNet | Predicted probability of urothelial cell from convolutional neural network, used to dynamically isolate urothelial cells in specimen |
Atypia score | AtyNet | Predicted probability of presence of atypical features in urothelial cell (e.g., hyperchromasia, irregular nuclear membrane, etc.), determined using convolutional neural network | |
NC Ratio | UroSeg | Nuclear to cytoplasm area ratio derived from pixelwise segmentation of nucleus and cytoplasm using segmentatio neural network | |
Morphometric measures | Custom | Complements binning of urothelial cells and assignment of atypia score, features: 1) area; 2) convex area; 3) eccentricity; 4) equivalent diameter; 5) extent; 6) Feret’s diameter; 7) maximum diameter; 8) filled area; 9) major axis length; 10) minor axis length; 11) perimeter; and 12) solidity | |
Cluster | Dense Area | BorderDet | Whether cluster contains dense architecture of overlapping and indistinguishable cytoplasmic borders |
Number urothelial cells | BorderDet/UroNet | Whether cluster contained urothelial cells, determined by counting cells with high urothelial cell score | |
Number atypical urothelial cells (atypia score) | BorderDet/UroNet/AtyNet | Whether cluster contained abnormal urothelial cells, determined by counting cells with high atypia score | |
Number atypical urothelial cells (NC ratio) | BorderDet/UroNet/UroSeg | Whether cluster contained abnormal urothelial cells, determined by counting cells with high NC ratio | |
Dense & Atypical | BorderDet/UroNet/AtyNet/UroSeg | Whether cluster contained both dense architecture and atypical cellular features | |
Slide | Patient characteristics | Supplied | Includes age, sex, history of hematuria, specimen source (e.g., voided), presence of specimen artifact |
Isolated Cell-SIF Scores | Bayesian Optimization | Counting the number of cells with the following features from cells not associated with clusters: 1) cellularity (urothelial score), 2) atypia (atypia score), 3) atypia (NC ratio), 4) other morphometric measures | |
Cluster Cell-SIF Scores | Bayesian Optimization | Counting the number of cells with the following features from cells associated with clusters: 1) cellularity (urothelial score), 2) atypia (atypia score), 3) atypia (NC ratio), 4) other morphometric measures | |
All Cell-SIF Scores | Bayesian Optimization | Combines Isolated Cell-SIF Scores and Cluster Cell-SIF Scores | |
Cluster-SIF | Bayesian Optimization | Counting the number of clusters with the following features: 1) number of urothelial clusters, 2) atypical urothelial clusters (atypia score), 3) atypical clusters (NC ratio), 4) dense clusters, 5) dense and atypical clusters | |
Atypia Burden Score | Mixed effects machine learning | Integrates all slide-level predictors using machine learning model to calculate a score between 0-1 reflecting overall specimen atypia, correlated with UC diagnostic category |
Isolated-Cell-RIF: Stores morphometric measures; bounding box locations within specimens, cell type assignment probabilities; NC ratios; and atypia scores for each cell not associated with clusters (isolated urothelial cells).
Cluster-Cell-RIF: Stores morphometric measures; bounding box locations within specimens; cell type assignment probabilities; NC ratios; and atypia scores for each cell associated with clusters, in addition to their cluster assignment label (cluster urothelial cells).
Cluster-RIF: Stores bounding box locations within WSI; cluster size; cytoplasmic borders; area of dense regions in cluster; and associated cluster label/identifier. Information on cellular atypia (e.g., number of atypical cells), number of urothelial cells, amongst other cluster-level measures, were added to this RIF from the Cluster-Cell-RIFs.
All RIFs are cross-tabulated to form a Slide Inference Frame (SIF), which represents slide-level statistics, aggregated across all urothelial cells and urothelial cell clusters. This is accomplished by thresholding the cutoff probabilities for the cell and cluster-level scores and counting the number of cells and clusters which meet these criteria. For instance, given an atypia score cutoff of 0.7 (i.e., cell is atypical if AtyNet assigns a 70% probability), a cluster is deemed to exhibit cellular atypia if, for instance, more than 20% of the cells within the cluster are atypical under this definition. Based on the definition of a urothelial cluster (e.g., number of urothelial cells), the number of atypical clusters within the WSI can be estimated. All urothelial cells with an NC ratio of 0 were removed prior to calculating these scores. SIF contains the following statistics:
Isolated cell subscores: Derived from Isolated-Cell-RIF, for cells which were not associated with clusters, including the following statistics: 1) number of urothelial cells; 2) number of atypical urothelial cells as determined using the atypia score; 3) number of atypical urothelial cells as determined using the NC ratio; 4) number of urothelial cells; and 5) center and spread of various morphometric measures.
Cluster cell subscores: Derived from Cluster-Cell-RIF. Similar to isolated cell subscores, only considering cells which were associated with / identified within clusters.
All cell subscores: Combines isolated and cluster cell subscores, considering all cells, irrespective of whether there was a cluster assignment.
Cluster subscores, representing aggregate Cluster-RIF statistics, including: 1) number of urothelial clusters (defined by a minimum threshold of urothelial cells); 2) number of atypical urothelial clusters (defined by either NC ratio or atypia score); 3) number of dense clusters; and 4) number urothelial clusters that are both atypical and dense. Unlike the previous three scores which focus on individual urothelial cells, identified urothelial cell clusters represent the principal unit of analysis.
Using AutoParis-X, RIF-SIF scores were calculated across the slide-level training and validation cohorts. We added the following patient-level characteristics to the RIF-SIF scores: 1) age; 2) sex; 3) history of hematuria; and 4) specimen source 82,83. We also noted where slides contained significant blood, high cellularity, acellularity, neobladders (abundant degenerated enterocytes) and scanning artifacts.
Estimating Specimen Atypia with Machine Learning
Specimen atypia was reported through dichotomization of TPS categories into the following classes: 1) negative, atypical and 2) suspicious, positive. The Atypia Burden Score (ABS) reflects the predicted probability of a specimen being atypical as assessed by AutoParis-X. We implemented several machine learning and statistical modeling approaches to predict specimen atypia, including: 1) generalized linear mixed effects modeling (hierarchical logistic regression; GLMM; brms package, R v4.1), accounting for patient- and pathologist-level random intercepts,
2) Random Forest, which does not account for clustering by patient, 3) Gaussian Process Tree Boosting (GPBoost), and 4) Bayesian Additive Regression Trees (BART) 58-61,64. GPBoost and BART account for clustering by patient by fitting patient- and pathologist-level random intercepts while capturing interactions and nonlinear associations between SIF predictors using ensemble tree models, :
Overall model performance was communicated using fivefold cross-validation, which randomly partitions the data into a training and validation set and reports the overall performance (using the AUC) over the validation folds. Specimens belonging to the same patient were partitioned into the same training/validation fold for each cross-validation split to avoid potential inflation of test statistics. Confidence intervals (CI) were reported using 1000-sample nonparametric bootstrapping of each fold to yield 1000 samples of cross-validation statistics. Cell and cluster-level thresholds (e.g., atypical cell if NC>0.7; atypical cluster if at least 3 urothelial cells are atypical), which are used to generate RIF-SIF scores, were optimally aligned with specimen atypia through a Bayesian Optimization routine 57.
Interpretation
We identified significant ABS predictors by extracting salient interactions from the tree ensemble models and reporting odds ratios (OR) from univariable and multivariable Bayesian GLMM models: . As many of the ABS predictors were highly multicollinear, variance inflation factors and horseshoe lasso priors were used to select predictors 84,85. Univariable associations adjusting for age, sex and hematuria were reported to give credence to omitted collinear predictors in the multivariable statistical modeling. Hierarchical Bayesian cumulative link models (i.e., ordinal regression) in a similar specification were also used to report associations between the predictors and specimen atypia, treating the urine cytology assignment as an ordinal variable 86,87. Statistical significance was reported using the p-value, as derived from the probability of direction (pd): p ≈ 2 * (1 − pd). A p-value less than 0.05 indicates a significant atypia predictor. Credible intervals, similar to confidence intervals, communicated uncertainty in the effect estimates.
Web Application and Software Availability
We also developed an interactive web application which allows for rapid assessment of cytology slides. In brief, users first select a slide to examine. An ABS score is returned for the specimen as assessed using AutoParis-X. The Cell-RIF is converted into a 2D scatter plot of the NC ratio and atypia score– each point represents a cell. Using a “lasso tool”, users select cells within this scatterplot. The urothelial cells are highlighted on a zoomable WSI viewer (openseadragon) and additionally made available through an image gallery for additional examination (Figure 2) 88. The WSI viewer will highlight cells based on their relative degree of atypia as assessed algorithmically, focusing the end-user on a small subset of potentially malignant cells. A demo of this interactive web application can be found at the following URL: http://edit.autoparis.demo.levylab.host.dartmouth.edu/ (user: edit_user, password: qdp_2022; full-screen display is encouraged for optimal viewing experience). The web application also features a tutorial video for operating the application. The AutoParis-X software is also open-source, available to download on GitHub (https://github.com/jlevy44/AutoParisX) and installable using the following PyPI package: autoparis. Users aiming to run AutoParis-X will need to train compatible neural networks as neural networks were only trained on data from a single institution and would need additional finetuning to generalize.
Results
Performance of UroNet
UroNet demonstrated remarkable performance in the task of delineating among 6 different classes of cell types / objects to determine which cells are urothelial (Figure 2; Table 4). Figure 3A demonstrates a nearly perfect ROC curve (AUC=0.997 macro-averaged) for all 6 cell types across the validation set, indicating high classification accuracy. In addition, raw imaging features interpreted using IntegratedGradients corroborated with known histomorphology for specific cell types (e.g., highlighting dense chromatin to depict urothelial cells, surrounding membrane for squamous cells, etc.; Figure 3B). Many morphometric features were found to be important– for instance: 1) eccentricity as a defining feature of urothelial cells versus other cell types, 2) solidity for RBCs, 3) convex area as an important predictor for leukocyte clusters which have highly irregular formations, and 4) both convex area and solidity for squamous cells, which are larger than the other cell types and typically solid shapes without any notable deformations (Supplementary Figure 1). These findings suggest that UroNet can accurately identifying urothelial cells, important for establishing assessment of urothelial cells as the basis for AutoParis-X’s automated assessment.
Table 4: Performance Statistics for UroNet, UroSeg, and AtyNet;
Algorithm | Quantity | Measure | Estimate | 2.5% CI | 97.5% CI |
---|---|---|---|---|---|
UroNet | Urothelial | AUC | 0.994 | 0.993 | 0.995 |
Polyomavirus | AUC | 0.996 | 0.995 | 0.996 | |
Debris, Crystals, RBCs | AUC | 0.998 | 0.998 | 0.998 | |
Leukocytes | AUC | 0.997 | 0.996 | 0.998 | |
Leukocyte Clusters | AUC | 0.997 | 0.996 | 0.997 | |
Squamous Cells | AUC | 0.998 | 0.998 | 0.999 | |
AtyNet | Atypia Score | AUC | 0.917 | 0.905 | 0.929 |
UroSeg | Background | AUC | 0.993 | 0.993 | 0.993 |
Cytoplasm | AUC | 0.977 | 0.977 | 0.977 | |
Nucleus | AUC | 0.944 | 0.944 | 0.944 | |
NC Ratio | Spearman | 0.965 | 0.954 | 0.973 | |
Mean Absolute Error | 0.015 | 0.014 | 0.017 |
Performance of UroSeg
UroSeg, a neural network segmentation tool, demonstrated excellent performance on our internal validation set in predicting the pixelwise presence of the nucleus and cytoplasm (AUC=0.971 macro-averaged) in order to calculate nuclear to cytoplasm (NC) ratio (Figures 2-3; Table 4). Figure 3F also shows nearly perfect receiver operating characteristic curves for both the nucleus and cytoplasm, indicating the high accuracy of UroSeg in predicting these structures. Additionally, we found that the NC ratios calculated from the segmentation masks produced by UroSeg correlated nearly perfectly with the ground truth NC ratios (r=0.965; MAE=0.015) annotated by the cytopathologists (Figure 3G). Figure 3E demonstrates the alignment of the true and predicted nuclear and cytoplasmic segmentation masks, further highlighting the accuracy of UroSeg.
UroSeg was similarly effective when used in conjunction with BorderDet, our previously established urothelial cluster border separation tool. Cells extracted from urothelial clusters using BorderDet and confirmed to be urothelial via UroNet were assessed using UroSeg. We compared the NC ratios, averaged across each urothelial cluster, in our internal validation set with what was accomplished using watershedding techniques (which divided the clusters after seeding the watershed based on the location of the nuclei). Watershedding was not sensitive to the cell type as it did not leverage BorderDet and UroNet. In addition, for clusters containing urothelial cells and background debris or other confounding cell types, watershed heavily underestimated the NC ratio (Figure 4). This was universal across all of the urothelial clusters in the internal validation set. Through visual examination, it is clear that by precisely demarcating cytoplasmic borders between immediately adjacent and overlapping cells, BorderDet and UroNet allow for precise estimation of the NC ratio. Opting for alternative assessment approaches (e.g., watershedding) could reduce the predictive capacity of slides containing abundance of urothelial cell clusters by removing or unnecessarily skewing the reported statistics for these cells as compared to isolated cells.
Performance of AtyNet
Performance for AtyNet, the neural network which provides an atypia score estimate for each urothelial cell, was equally promising (Figure 2; Table 4). The algorithm achieved an area under the receiver operating characteristic curve of 0.917 on the internal validation set, indicating a strong ability to distinguish between atypical and normal cells. Model interpretation using integrated gradients revealed that the algorithm placed a high emphasis on irregularities in the nuclear membrane as a key feature in determining cytological atypia (Figure 2B) 56.
ABS Classifier Performance
Individual cell and cluster level features were cross tabulated across the slide and assessed using multiple statistical and machine learning algorithms. Many cellular and cluster level features correlated closely with specimen atypia (Supplementary Figures 2-4). Atypical urothelial cells as defined by both the NC ratio and atypia score, which were contained within clusters were, in some cases, more predictive of specimen atypia than assessment of isolated cells alone (e.g., cells with high NC ratio in clusters were more predictive than isolated cells with high NC ratio), further suggesting the importance of employing BorderDet for separating cells. The number of urothelial cells and cell clusters correlated directly with potential for malignancy. Urothelial cell clusters which were both atypical and contained dense regions were the third most predictive variable when assessed using univariable regression.
As part of the AutoParis-X framework, each machine learning model outputs the Atypia Burden Score (ABS)– the probability of assigning suspicious or positive UC exam as judged using AutoParis-X. Across all algorithms, ABS correlated closely with specimen atypia. The machine learning models which accounted for patient and pathologist-level variation, GPBoost and BART, outperformed all other approaches with AUCs of 0.89 and 0.88 respectively (Figure 5A; Table 5). The generalized linear mixed effects models also performed well. Across all models, ABS scores preserved the ordering of the UC categories (Negative<Atypical<Suspicious<Positive; Figure 5B). We fit an ordinal regression model to this data, which demonstrated a strong positive association with atypia (UC categories; β = 3.61; 95%CI: [3.12 – 4.11]; p < 0.0001). This information is corroborated by density heatmaps depicting the NC Ratio and Atypia score for individual urothelial cells across the entire cohort, after being filtered using UroNet. This yielded more than 6 million cells, which were separated based on their UC class. Figure 5D demonstrates the progression in cellular atypia across the categories– negative cases typically do not contain cells that have both high NC ratio and atypia, while these cells can be increasingly found at higher UC categories. Positive cases contain many cells that are both highly atypical with high NC ratio.
Table 5: Performance statistics for ABS Classifiers;
AUC | 2.5% CI | 97.5% CI | |
---|---|---|---|
RF | 0.873 | 0.846 | 0.897 |
GPBoost | 0.889 | 0.866 | 0.913 |
BART | 0.876 | 0.847 | 0.901 |
BGLMM | 0.833 | 0.788 | 0.873 |
BGLMM-Int | 0.843 | 0.808 | 0.874 |
Univariable and Multivariable associations with Specimen Atypia
Table 6 demonstrates the importance of the individual slide level predictors through both univariable and multivariable regression modeling. A few predictors remained in the unpenalized statistical model after applying the horseshoe lasso (Figure 5C). This included positive associations with number of clusters, number of both atypical and dense clusters, number of isolated atypical cells and an interaction between age and atypia. The interaction demonstrates that overall specimen atypia younger individuals more greatly impacted by number of atypical urothelial cells as compared to older individuals.
Table 6: Effect estimates, 95% credible intervals and p-values for multivariable regression model.
Parameter | OR | 2.5% CI | 97.5% CI | p-value |
---|---|---|---|---|
Number of Clusters | 1.31 | 1.06 | 1.68 | 0.016 |
Age | 1.57 | 1.07 | 2.39 | 0.029 |
History of Hematuria | 1.40 | 1.17 | 1.69 | 0.003 |
Dense/Atypical Clusters | 1.84 | 1.41 | 2.39 | <0.001 |
Number Isolated Atypical Cells | 1.81 | 1.32 | 2.44 | <0.001 |
Age:Number Isolated Atypical Cells | 0.80 | 0.65 | 0.99 | 0.050 |
Web Application Example
As a demonstration of Autoparis-X’s ability to facilitate rapid examination of UC specimens, we examined four specimens with the web application (see Supplementary Figures 5-7 for screenshots). Among thousands of specimens examined using this web tool, select cases (negative, atypical, suspicious, positive) can be further inspected using the demo application (see Web Application and Software Availability). The first case (Supplementary Figure 5) yielded an Atypia Burden Score of 0.14. Urothelial cells were selected with high atypia and were plotted on the WSI, revealing their locations. Zooming in on the WSI confirmed the reported cell-level statistics. We also used the table as means to rapidly examine all atypical cells in order of decreasing atypia as a faster method to examine cells versus zooming in using the web application. These examinations confirmed that this was in fact an atypical specimen. The second case produced an atypia burden score of 0.6– a similar examination revealed specimen atypia on par with that of a suspicious assignment. The final case was a positive patient with an atypia burden score of 0.76. We focused on only a few cells which demonstrated the highest potential for malignancy in order to focus our examination given the high cellularity of the specimen. Many of these cells were nested in urothelial cell clusters. This search identified cells which were indeed highly malignant morphologically, allowing for rapid assignment of a positive finding. In Supplementary Figure 8, we used the WSI viewer to zoom in on a few malignant cells identified using the AutoParis-X web application.
Discussion
Advances in urine examination from ancient times to the information age have been accompanied by improvements in both specimen preparation and rigorous quantitative bladder cancer screening criteria 4. Urine cytology (UC) examination for specimen atypia has emerged as the staple of modern-day bladder cancer screening and is often accompanied by more invasive methods for cases demonstrating suspicious or positive classifications. For example, TPS is a widely used grading system in urine cytology screening for bladder cancer, which assigns four main categories based on the presence of high-grade urothelial carcinoma cells and specific cellular features. Yet, despite advances in manual examination methods, there is often poor inter-rater variability in the interpretation of atypical or suspicious specimens, and TPS does not include rigorous criteria for evaluating urothelial cell clusters 11,17,89-94. Automation in cytopathology can improve the reliability of cytological assessments and help clinicians address growing numbers of tests and avoid diagnostic errors, as has been demonstrated in the gynecologic cytology market with the adoption of systems such as ThinPrep® Imaging System and FocalPoint™ GS Imaging system 24. Existing systems for semi-autonomous UC examination have addressed many existing challenges, though have yet to adequately account for many additional complexities which can confound assessment (e.g., clusters, polyomavirus, etc.) 20,21. In this study, we detailed the development of an artificial intelligence tool, AutoParis-X, which improves upon its previous incarnation, to allow for the rapid and nuanced examination of UC specimens; validation on a large-scale retrospective cohort illustrated the maturity and technical sophistication of this tool. For instance, challenges associated with calculation of NC ratios and overall cellular atypia within dense, overlapping urothelial cell clusters were addressed with remarkably good performance 44. The importance of many previously understudied predictors were evaluated (e.g., number of atypical and dense urothelial clusters). Finally, the featured interactive web application was designed for ease-of-use for semi-autonomous diagnostic decision making.
All of these innovations suggest AutoParis-X’s potential to greatly facilitate the process of bladder cancer screening, potentially resulting in a significant increase in diagnostic accuracy and a subsequent decrease in potential avenues for error (similar to what occurred with wide adoption of FocalPoint for Pap tests) 31,95. For instance, results suggest that UroSeg can be used to accurately calculate NC ratios in a high-throughput manner. AutoParis-X can be used to examine hundreds to thousands of cytology specimens overnight, permitting semi-autonomous evaluation from the cytopathologist via the web application the following day (or in real time as results are generated). This is expected to increase the number and throughput of cytology exams that can be performed by any given institution while accounting for the necessary safeguards (i.e., secondary manual review of random cohort of cases as is now done with Pap tests). Cases unable to be assessed using this web-based platform could be shunted to the classical manual interpretation pathway. With any newly introduced technology, rigorous real-world clinical trials will be required to evaluate the potential impact of adopting this system. As there are only limited applications of AI technologies in digital pathology that have been approved by the FDA for clinical usage, several existing practicalities are worth addressing before AutoParis-X can be safely employed in the clinic. Social barriers for adoption can be identified through surveys on attitudes and beliefs about the tool, which will allow for iterative refinement of the output display and additional algorithmic finetuning. AutoParis-X will also need to demonstrate non-inferiority in a clinical trial (i.e., random assignment of individuals to assessment via manual and semi-autonomous examination). As non-inferiority is evaluated with respect to a ground-truth measurement, it will be difficult to prove the utility of AutoParis-X to assign specimen atypia based on alignment to cytopathologist ratings alone given the high inter-observer variation (e.g. there is no universal, quantitative ground truth in urine cytology) 12,17,93. Additional validation will likely require assessment of its capacity to predict more objective outcomes, such as disease recurrence or death 96-99. Additionally, its cost-effectiveness over traditional methods will also need to be proven (e.g., CPT codes, RVUs, number of specimens per day, technologist and pathologist time spent), which will communicate revenue to be expected / workforce needed when operating the device 100-103. A clearer understanding of how these tools can impact clinical decision making is needed before implementation (e.g., what conditions/thresholds are necessary to flag the case for manual review under a microscope) 104.
There are several limitations worth noting that will require future improvements and developments. We observed potential scanning artifacts (e.g., pixelation of cells), deficiencies in specimen preparation, high cellular density, and blood in the samples, which complicate the assessment. However, we have not yet developed methods to address these challenges. In addition to surveying attitudes, beliefs and adoption barriers, cytopathologists unfamiliar with digital technologies may favor assessment through analog means (e.g., microscope)– this will either require additional training and education on how to operate these nascent technologies or may require further subspecialization / training of cytopathologists to perform a digital assessment 105-109. AutoParis-X does not account for Z-stacking of cytology slides which can be accounted for in future iterations to model cells in 3D 73,110. Annotation of individual cells and clusters were performed by a small group of cytopathologists. Some of these annotations (e.g., nucleus, delineation of cytoplasmic borders in clusters, cell type) may differ between cytopathologists. In addition, data was only collected and validated at a single institution which may limit generalization of these approaches as other institutions may have heterogenous patient characteristics/demographics and different specimen preparation methods 111. Additional data collection from multiple institutions can ameliorate these potential challenges by improving the diversity of the dataset, allowing additional flexibility. There is also room for improvement for deriving slide level features. While we utilized Bayesian Optimization to decide which cells/clusters were atypical, dense, clusters, etc., consideration of additional thresholds or forms to summarize this information could improve the model accuracy. There exists a plethora of modeling approaches which can be utilized to predict specimen atypia. For instance, attention and graph-based neural network architectures can take as input the entire WSI broken into constituent cells, each of which has stored attribute/morphological information. and perform what amounts to a weighted average across the cells to derive a final summary statistic 112,113. The ordinal nature of UC class assignment was not explicitly taken into account for most of the results in this study and can be incorporated into these machine learning models using the appropriate model likelihoods 114. Institutions aiming to adopt these digital technologies will also require significant computing infrastructure. This requires the purchase and utilization of GPU enabled compute nodes (cloud computing services such as AWS and Google Cloud present viable alternatives to in-house purchases), adoption of containerized workflows, which standardize and scale analyses, and hosting of front-facing applications with appropriate databasing, security and credentialling.
Conclusion
Bladder cancer screening through urine cytology exams is a tedious and fatigable process as cytopathologists assess tens to hundreds of thousands of cells per specimen. Algorithmic techniques to emulate these assessments are beginning to address the incredibly nuanced nature of these assessments. This study featured the design and large-scale validation of a digital diagnostic decision aid, AutoParis-X, which iterates on previous incarnations of urine cytology assessment algorithms to address many remaining complexities associated with challenging examination; further, it features a web application that allows for accurate and rapid examination of specimens. We encourage interested parties to utilize the AutoParis-X workflow and consider validating and finetuning the algorithm for other practice settings to enhance its wider generalizability. The current study demonstrated that quantitative digital urine cytology assessment methods have come of age and are prepared for further rigorous prospective evaluation to investigate its future role in augmenting clinical diagnostic decision making.
Supplementary Material
Footnotes
Conflict of Interest
None to disclose.
References
- 1.Barkan GA et al. The Paris System for Reporting Urinary Cytology: The Quest to Develop a Standardized Terminology. ACY 60, 185–197 (2016). [DOI] [PubMed] [Google Scholar]
- 2.Bostwick DG 7 - Urine Cytology. in Urologic Surgical Pathology (Fourth Edition) (eds. Cheng L, MacLennan GT & Bostwick DG) 322–357.e7 (Elsevier, 2020). [Google Scholar]
- 3.Mossanen M & Gore JL The burden of bladder cancer care: direct and indirect costs. Curr Opin Urol 24, 487–491 (2014). [DOI] [PubMed] [Google Scholar]
- 4.Magiorkinis E & Diamantis A The fascinating story of urine examination: From uroscopy to the era of microscopy and beyond. Diagnostic Cytopathology 43, 1020–1036 (2015). [DOI] [PubMed] [Google Scholar]
- 5.Salem S, Mitchell RE, El-Alim El-Dorey A, Smith JA & Barocas DA Successful control of schistosomiasis and the changing epidemiology of bladder cancer in Egypt. BJU international 107, 206–211 (2011). [DOI] [PubMed] [Google Scholar]
- 6.Botelho MC, Alves H & Richter J Halting Schistosoma haematobium-associated bladder cancer. International journal of cancer management 10, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Papanicolaou GN Cytology of the urine sediment in neoplasms of the urinary tract. The Journal of urology 57, 375–379 (1947). [DOI] [PubMed] [Google Scholar]
- 8.Layfield LJ, Elsheikh TM, Fili A, Nayar R & Shidham V Review of the state of the art and recommendations of the Papanicolaou Society of Cytopathology for urinary cytology procedures and reporting: the Papanicolaou Society of Cytopathology Practice Guidelines Task Force. Diagnostic cytopathology 30, 24–30 (2004). [DOI] [PubMed] [Google Scholar]
- 9.Wang Y-H et al. Diagnostic Agreement for High-Grade Urothelial Cell Carcinoma in Atypical Urine Cytology: A Nationwide Survey Reveals a Tendency for Overestimation in Specimens with an N/C Ratio Approaching 0.5. Cancers 12, 272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Barkan GA Enough is enough: adequacy of voided urine cytology. (2016). [DOI] [PubMed] [Google Scholar]
- 11.Roy M et al. An institutional experience with The Paris System: A paradigm shift from ambiguous terminology to more objective criteria for reporting urine cytology. Cytopathology 28, 509–515 (2017). [DOI] [PubMed] [Google Scholar]
- 12.Levy JJ et al. Large-scale longitudinal comparison of urine cytological classification systems reveals potential early adoption of The Paris System criteria. J Am Soc Cytopathol S2213-2945(22)00241–1 (2022) doi: 10.1016/j.jasc.2022.08.001. [DOI] [PubMed] [Google Scholar]
- 13.Kurtycz DF, Wojcik EM & Rosenthal DL Perceptions of Paris: an international survey in preparation for The Paris System for Reporting Urinary Cytology 2.0 (TPS 2.0). Journal of the American Society of Cytopathology 12, 66–74 (2023). [DOI] [PubMed] [Google Scholar]
- 14.Nikas IP et al. The Paris System for Reporting Urinary Cytology: A Meta-Analysis. Journal of Personalized Medicine 12, 170 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wojcik EM, Kurtycz DF & Rosenthal DL The Paris system for reporting urinary cytology. (Springer, 2022). [DOI] [PubMed] [Google Scholar]
- 16.Wojcik EM, Kurtycz DF & Rosenthal DL We’ll always have Paris the Paris system for reporting urinary cytology 2022. Journal of the American Society of Cytopathology 11, 62–66 (2022). [DOI] [PubMed] [Google Scholar]
- 17.Long T. et al. Interobserver reproducibility of The Paris System for Reporting Urinary Cytology. Cytojournal 14, 17 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Landau MS & Pantanowitz L Artificial intelligence in cytopathology: a review of the literature and overview of commercial landscape. Journal of the American Society of Cytopathology 8, 230–241 (2019). [DOI] [PubMed] [Google Scholar]
- 19.Pouliakis A. et al. Artificial Neural Networks as Decision Support Tools in Cytopathology: Past, Present, and Future. Biomed Eng Comput Biol 7, 1–18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.McAlpine ED, Pantanowitz L & Michelow PM Challenges Developing Deep Learning Algorithms in Cytology. ACY 65, 301–309 (2021). [DOI] [PubMed] [Google Scholar]
- 21.Thiryayi SA & Rana DN Urine cytopathology: challenges, pitfalls, and mimics. Diagnostic Cytopathology 40, 1019–1034 (2012). [DOI] [PubMed] [Google Scholar]
- 22.Jiang H et al. Deep learning for computational cytology: A survey. Med Image Anal 84, 102691 (2023). [DOI] [PubMed] [Google Scholar]
- 23.Rezende MT, Bianchi AGC & Carneiro CM Cervical cancer: Automation of Pap test screening. Diagn Cytopathol 49, 559–574 (2021). [DOI] [PubMed] [Google Scholar]
- 24.Okuda C. et al. Quantitative cytomorphological comparison of SurePath and ThinPrep liquid-based cytology using high-grade urothelial carcinoma cells. Cytopathology 32, 654–659 (2021). [DOI] [PubMed] [Google Scholar]
- 25.Tolles WE The cytoanalyzer-an example of physics in medical research. Trans N Y Acad Sci 17, 250–256 (1955). [DOI] [PubMed] [Google Scholar]
- 26.Bourghardt S, Hyden H & Nyquist B A scanning and computing microphotometer for cell analyses. Experientia 11, 163–165 (1955). [DOI] [PubMed] [Google Scholar]
- 27.Abels E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association. J Pathol 249, 286–294 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Pantanowitz L. Improving the Pap test with artificial intelligence. Cancer Cytopathol 130, 402–404 (2022). [DOI] [PubMed] [Google Scholar]
- 29.Xue P. et al. Deep learning in image-based breast and cervical cancer detection: a systematic review and meta-analysis. NPJ Digit Med 5, 19 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hou X. et al. Artificial Intelligence in Cervical Cancer Screening and Diagnosis. Front Oncol 12, 851367 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Thrall MJ Automated screening of Papanicolaou tests: A review of the literature. Diagn Cytopathol 47, 20–27 (2019). [DOI] [PubMed] [Google Scholar]
- 32.Chantziantoniou N. BestCyte® Cell Sorter Imaging System: Primary and adjudicative whole slide image rescreening review times of 500 ThinPrep Pap test thin-layers - An intra-observer, time-surrogate analysis of diagnostic confidence potentialities. J Pathol Inform 13, 100095 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Delga A. et al. Evaluation of CellSolutions BestPrep® automated thin-layer liquid-based cytology Papanicolaou slide preparation and BestCyte® cell sorter imaging system. Acta Cytol 58, 469–477 (2014). [DOI] [PubMed] [Google Scholar]
- 34.Pantanowitz L & Bui MM Image Analysis in Cytopathology. in Monographs in Clinical Cytology (eds. Bui MM & Pantanowitz L) vol. 25 91–98 (S. Karger AG, 2020). [Google Scholar]
- 35.Pantanowitz L, Hornish M & Goulart RA Informatics applied to cytology. Cytojournal 5, 16 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wilbur DC Digital cytology: current state of the art and prospects for the future. Acta Cytol 55, 227–238 (2011). [DOI] [PubMed] [Google Scholar]
- 37.Dov D. et al. Weakly supervised instance learning for thyroid malignancy prediction from whole slide cytopathology images. Med Image Anal 67, 101814 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yao K et al. A Study of Thyroid Fine Needle Aspiration of Follicular Adenoma in the ‘Atypia of Undetermined Significance’ Bethesda Category Using Digital Image Analysis. J Pathol Inform 13, 100004 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Girolami I. et al. Impact of image analysis and artificial intelligence in thyroid pathology, with particular reference to cytological aspects. Cytopathology 31, 432–444 (2020). [DOI] [PubMed] [Google Scholar]
- 40.Sanyal P, Mukherjee T, Barui S, Das A & Gangopadhyay P Artificial Intelligence in Cytopathology: A Neural Network to Identify Papillary Carcinoma on Thyroid Fine-Needle Aspiration Cytology Smears. J Pathol Inform 9, 43 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Elliott Range DD et al. Application of a machine learning algorithm to predict malignancy in thyroid cytopathology. Cancer Cytopathol 128, 287–295 (2020). [DOI] [PubMed] [Google Scholar]
- 42.Guan Q. et al. Deep convolutional neural network VGG-16 model for differential diagnosing of papillary thyroid carcinomas in cytological images: a pilot study. J Cancer 10, 4876–4882 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sanghvi AB, Allen EZ, Callenberg KM & Pantanowitz L Performance of an artificial intelligence algorithm for reporting urine cytopathology. Cancer Cytopathology 127, 658–666 (2019). [DOI] [PubMed] [Google Scholar]
- 44.Levy JJ et al. Uncovering additional predictors of urothelial carcinoma from voided urothelial cell clusters through a deep learning-based image preprocessing technique. Cancer Cytopathol (2022) doi: 10.1002/cncy.22633. [DOI] [PubMed] [Google Scholar]
- 45.Mahmood F. et al. Deep Adversarial Training for Multi-Organ Nuclei Segmentation in Histopathology Images. IEEE Transactions on Medical Imaging (2020) doi: 10.1109/TMI.2019.2927182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Bankhead P. et al. QuPath: Open source software for digital pathology image analysis. Sci Rep 7, 1–7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Humphries MP, Maxwell P & Salto-Tellez M QuPath: The global impact of an open source digital pathology system. Computational and Structural Biotechnology Journal 19, 852–859 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Vaickus LJ, Suriawinata AA, Wei JW & Liu X Automating the Paris System for urine cytopathology—A hybrid deep-learning and morphometric approach. Cancer Cytopathology 127, 98–115 (2019). [DOI] [PubMed] [Google Scholar]
- 49.Singh HK, Bubendorf L, Mihatsch MJ, Drachenberg CB & Nickeleit V Urine Cytology Findings of Polyomavirus Infections. in Polyomaviruses and Human Diseases (ed. Ahsan N) 201–212 (Springer, 2006). doi: 10.1007/0-387-32957-9_15. [DOI] [PubMed] [Google Scholar]
- 50.Paszke A et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv:1912.01703 [cs, stat] (2019). [Google Scholar]
- 51.Wu Y, Kirillov A, Massa F, Lo W-Y & Girshick R Detectron2. (2019). [Google Scholar]
- 52.Matthes E Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming. (No Starch Press, 2019). [Google Scholar]
- 53.Bürkner P-C brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software 80, 1–28 (2017). [Google Scholar]
- 54.Tippmann S. Programming tools: Adventures with R. Nature 517, 109–110 (2015). [DOI] [PubMed] [Google Scholar]
- 55.Rocklin M Dask: Parallel Computation with Blocked algorithms and Task Scheduling. in 126–132 (2015). doi: 10.25080/Majora-7b98e3ed-013. [DOI] [Google Scholar]
- 56.Harvey SE & VandenBussche CJ Nuclear membrane irregularity in high-grade urothelial carcinoma cells can be measured by using circularity and solidity as morphometric shape definitions in digital image analysis of urinary tract cytology specimens. Cancer Cytopathol (2023) doi: 10.1002/cncy.22682. [DOI] [PubMed] [Google Scholar]
- 57.Louppe G. Bayesian optimisation with scikit-optimize. in PyData Amsterdam (2017). [Google Scholar]
- 58.Sigrist F Gaussian Process Boosting. Journal of Machine Learning Research 23, 1–46 (2022). [Google Scholar]
- 59.Sigrist F Latent Gaussian Model Boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence 1–1 (2022) doi: 10.1109/TPAMI.2022.3168152. [DOI] [PubMed] [Google Scholar]
- 60.Tan YV & Roy J Bayesian additive regression trees and the General BART model. Statistics in Medicine 38, 5048–5069 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Chipman HA, George EI & McCulloch RE BART: Bayesian additive regression trees. The Annals of Applied Statistics 4, 266–298 (2010). [Google Scholar]
- 62.Hajjem A, Bellavance F & Larocque D Mixed-effects random forest for clustered data. Journal of Statistical Computation and Simulation 84, 1313–1328 (2014). [Google Scholar]
- 63.Levy JJ et al. Mixed Effects Machine Learning Models for Colon Cancer Metastasis Prediction using Spatially Localized Immuno-Oncology Markers. Pac Symp Biocomput 27, 175–186 (2022). [PMC free article] [PubMed] [Google Scholar]
- 64.Bürkner P-C Advanced Bayesian Multilevel Modeling with the R Package brms. The R Journal 10, 395–411 (2018). [Google Scholar]
- 65.Perkel JM Data visualization tools drive interactivity and reproducibility in online publishing. Nature 554, 133–134 (2018). [DOI] [PubMed] [Google Scholar]
- 66.Bradski G The openCV library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer 25, 120–123 (2000). [Google Scholar]
- 67.Virtanen P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 261–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Silversmith W. fill-voids: Fill voids in 3D binary images fast. [Google Scholar]
- 69.Nishino R & Loomis SHC Cupy: A numpy-compatible library for nvidia gpu calculations. 31st confernce on neural information processing systems 151, (2017). [Google Scholar]
- 70.Cheng B. et al. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation. arXiv:1911.10194 [cs] (2020). [Google Scholar]
- 71.Van der Walt S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011). [Google Scholar]
- 73.Kim D. et al. Evaluating the role of Z-stack to improve the morphologic evaluation of urine cytology whole slide images for high-grade urothelial carcinoma: Results and review of a pilot study. Cancer Cytopathology 130, 630–639 (2022). [DOI] [PubMed] [Google Scholar]
- 74.Allison DB et al. Should the BK polyomavirus cytopathic effect be best classified as atypical or benign in urine cytology specimens? Cancer cytopathology 124, 436–442 (2016). [DOI] [PubMed] [Google Scholar]
- 75.Wu Z. et al. Representing long-range context for graph neural networks with global attention. Advances in Neural Information Processing Systems 34, 13266–13279 (2021). [Google Scholar]
- 76.Levy JJ, Salas LA, Christensen BC, Sriharan A & Vaickus LJ PathFlowAI: A High-Throughput Workflow for Preprocessing, Deep Learning and Interpretation in Digital Pathology. Pac Symp Biocomput 25, 403–414 (2020). [PMC free article] [PubMed] [Google Scholar]
- 77.Sundararajan M, Taly A & Yan Q Axiomatic attribution for deep networks. in Proceedings of the 34th International Conference on Machine Learning - Volume 70 3319–3328 (JMLR.org, 2017). [Google Scholar]
- 78.Kokhlikyan N. et al. Captum: A unified and generic model interpretability library for PyTorch. arXiv:2009.07896 [cs, stat] (2020). [Google Scholar]
- 79.Falk T. et al. U-Net: deep learning for cell counting, detection, and morphometry. Nature methods 16, 67–70 (2019). [DOI] [PubMed] [Google Scholar]
- 80.Ronneberger O, Fischer P & Brox T U-net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 234–241 (Springer, 2015). [Google Scholar]
- 81.He K, Zhang X, Ren S & Sun J Deep Residual Learning for Image Recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016). doi: 10.1109/CVPR.2016.90. [DOI] [Google Scholar]
- 82.Koss LG, Deitch D, Ramanathan R & Sherman AB Diagnostic value of cytology of voided urine. Acta cytologica 29, 810–816 (1985). [PubMed] [Google Scholar]
- 83.Onur I, Rosenthal DL & VandenBussche CJ Benign-appearing urothelial tissue fragments in noninstrumented voided urine specimens are associated with low rates of urothelial neoplasia. Cancer Cytopathology 123, 180–185 (2015). [DOI] [PubMed] [Google Scholar]
- 84.Thompson CG, Kim RS, Aloe AM & Becker BJ Extracting the variance inflation factor and other multicollinearity diagnostics from typical regression results. Basic and Applied Social Psychology 39, 81–90 (2017). [Google Scholar]
- 85.Carvalho CM, Polson NG & Scott JG Handling Sparsity via the Horseshoe. in Artificial Intelligence and Statistics 73–80 (PMLR, 2009). [Google Scholar]
- 86.McKinley TJ, Morters M & Wood JLN Bayesian Model Choice in Cumulative Link Ordinal Regression Models. Bayesian Analysis 10, 1–30 (2015). [Google Scholar]
- 87.Bender R & Grouven U Ordinal Logistic Regression in Medical Research. J R Coll Physicians Lond 31, 546–551 (1997). [PMC free article] [PubMed] [Google Scholar]
- 88.OpenSeadragon. http://openseadragon.github.io/.
- 89.Wolfson WL & Rosenthal DL Cell clusters in urinary cytology. Acta Cytol 22, 138–141 (1978). [PubMed] [Google Scholar]
- 90.Mikou P. et al. Evaluation of the Paris System in atypical urinary cytology. Cytopathology 29, 545–549 (2018). [DOI] [PubMed] [Google Scholar]
- 91.Kurtycz DF et al. Paris interobserver reproducibility study (PIRST). Journal of the American Society of Cytopathology 7, 174–184 (2018). [DOI] [PubMed] [Google Scholar]
- 92.Kurtycz DFI, Sundling KE & Barkan GA The Paris system of Reporting Urinary Cytology: Strengths and opportunities. Diagnostic Cytopathology 48, 890–895 (2020). [DOI] [PubMed] [Google Scholar]
- 93.Bakkar R. et al. Impact of the Paris system for reporting urine cytopathology on predictive values of the equivocal diagnostic categories and interobserver agreement. Cytojournal 16, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Hassan M. et al. Impact of Implementing the Paris System for Reporting Urine Cytology in the Performance of Urine Cytology: A Correlative Study of 124 Cases. American Journal of Clinical Pathology 146, 384–390 (2016). [DOI] [PubMed] [Google Scholar]
- 95.Pantanowitz L. Automated pap tests. Practical Informatics for Cytopathology 147–155 (2014). [Google Scholar]
- 96.Yamashita S. et al. Urethral recurrence following neobladder in bladder cancer patients. The Tohoku Journal of Experimental Medicine 199, 197–203 (2003). [DOI] [PubMed] [Google Scholar]
- 97.Pierconti F. et al. DNA methylation analysis in urinary samples: A useful method to predict the risk of neoplastic recurrence in patients with urothelial carcinoma of the bladder in the high-risk group. Cancer Cytopathology n/a,. [DOI] [PubMed] [Google Scholar]
- 98.Shalata AT et al. Predicting Recurrence of Non-Muscle-Invasive Bladder Cancer: Current Techniques and Future Trends. Cancers 14, 5019 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Soorojebally Y. et al. Urinary biomarkers for bladder cancer diagnosis and NMIBC follow-up: a systematic review. World J Urol 41, 345–359 (2023). [DOI] [PubMed] [Google Scholar]
- 100.Lujan G. et al. Dissecting the business case for adoption and implementation of digital pathology: a white paper from the digital pathology association. Journal of Pathology Informatics 12, 17 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Acs B & Rimm DL Not just digital pathology, intelligent digital pathology. JAMA oncology 4, 403–404 (2018). [DOI] [PubMed] [Google Scholar]
- 102.Jones-Hall Y Digital pathology in academia: Implementation and impact. Lab Animal 50, 229–231 (2021). [DOI] [PubMed] [Google Scholar]
- 103.Olswang LB & Prelock PA Bridging the gap between research and practice: Implementation science. Journal of Speech, Language, and Hearing Research 58, S1818–S1826 (2015). [DOI] [PubMed] [Google Scholar]
- 104.Li RC, Asch SM & Shah NH Developing a delivery science for artificial intelligence in healthcare. npj Digit. Med 3, 1–3 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Char DS, Abràmoff MD & Feudtner C Identifying ethical considerations for machine learning healthcare applications. The American Journal of Bioethics 20, 7–17 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Jackson BR et al. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice. Acad Pathol 8, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Chauhan C & Gullapalli RR Ethics of AI in pathology: current paradigms and emerging issues. The American journal of pathology 191, 1673–1683 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Baxi V, Edwards R, Montalto M & Saha S Digital pathology and artificial intelligence in translational medicine and clinical practice. Modern Pathology 35, 23–32 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Niazi MKK, Parwani AV & Gurcan MN Digital pathology and artificial intelligence. The Lancet Oncology 20, e253–e261 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Bouyssoux A, Fezzani R & Olivo-Marin J-C Cell Instance Segmentation Using Z-Stacks in Digital Cytology. in 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI) 1–4 (2022). doi: 10.1109/ISBI52829.2022.9761495. [DOI] [Google Scholar]
- 111.Vaickus LJ & Tambouret RH Young investigator challenge: The accuracy of the nuclear-to-cytoplasmic ratio estimation among trained morphologists. Cancer Cytopathol 123, 524–530 (2015). [DOI] [PubMed] [Google Scholar]
- 112.Butke J. et al. End-to-end Multiple Instance Learning for Whole-Slide Cytopathology of Urothelial Carcinoma. in Proceedings of the MICCAI Workshop on Computational Pathology 57–68 (PMLR, 2021). [Google Scholar]
- 113.Lu MY et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 1–16 (2021) doi: 10.1038/s41551-020-00682-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Vargas VM, Gutiérrez PA & Hervás-Martínez C Cumulative link models for deep ordinal classification. Neurocomputing 401, 48–58 (2020). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.