Tissue Contamination Challenges the Credibility of Machine Learning Models in Real World Digital Pathology

Ismail Irmakci; Ramin Nateghi; Rujoi Zhou; Mariavittoria Vescovo; Madeline Saft; Ashley E Ross; Ximing J Yang; Lee AD Cooper; Jeffery A Goldstein

doi:10.1016/j.modpat.2024.100422

. Author manuscript; available in PMC: 2025 Mar 1.

Published in final edited form as: Mod Pathol. 2024 Jan 6;37(3):100422. doi: 10.1016/j.modpat.2024.100422

Tissue Contamination Challenges the Credibility of Machine Learning Models in Real World Digital Pathology

Ismail Irmakci ¹, Ramin Nateghi ¹, Rujoi Zhou ¹, Mariavittoria Vescovo ¹, Madeline Saft ¹, Ashley E Ross ¹, Ximing J Yang ¹, Lee AD Cooper ¹, Jeffery A Goldstein ^1,^*

PMCID: PMC10960671 NIHMSID: NIHMS1959156 PMID: 38185250

Abstract

Machine learning (ML) models are poised to transform surgical pathology practice. The most successful use attention mechanisms to examine whole slides, identify which areas of tissue are diagnostic, and use them to guide diagnosis. Tissue contaminants, such as floaters, represent unexpected tissue. Although human pathologists are extensively trained to consider and detect tissue contaminants, we examined their impact on ML models. We trained 4 whole-slide models. Three operate in placenta for the following functions: (1) detection of decidual arteriopathy, (2) estimation of gestational age, and (3) classification of macroscopic placental lesions. We also developed a model to detect prostate cancer in needle biopsies. We designed experiments wherein patches of contaminant tissue are randomly sampled from known slides and digitally added to patient slides and measured model performance. We measured the proportion of attention given to contaminants and examined the impact of contaminants in the t-distributed stochastic neighbor embedding feature space. Every model showed performance degradation in response to one or more tissue contaminants. Decidual arteriopathy detection–balanced accuracy decreased from 0.74 to 0.69 ± 0.01 with addition of 1 patch of prostate tissue for every 100 patches of placenta (1% contaminant). Bladder, added at 10% contaminant, raised the mean absolute error in estimating gestational age from 1.626 weeks to 2.371 ± 0.003 weeks. Blood, incorporated into placental sections, induced false-negative diagnoses of intervillous thrombi. Addition of bladder to prostate cancer needle biopsies induced false positives, a selection of high-attention patches, representing 0.033 mm², and resulted in a 97% false-positive rate when added to needle biopsies. Contaminant patches received attention at or above the rate of the average patch of patient tissue. Tissue contaminants induce errors in modern ML models. The high level of attention given to contaminants indicates a failure to encode biological phenomena. Practitioners should move to quantify and ameliorate this problem.

Keywords: artificial intelligence, digital pathology, histology, machine learning, placenta, prostate, quality, tissue contaminants

Introduction

Machine learning (ML) models are poised to transform pathology practice. Models have been developed to detect and grade cancers, quantify immunohistochemistry, and identify transplant rejection.^1–5 The most successful and potentially transformative models are supervised or weakly supervised that take entire slides as input and produce slide- or patient-level diagnoses.^6–10 To process the large amount of data represented by each slide, these models can rely on attention or other pooling mechanisms to identify key areas on the slide. The attention is then used to explain the model’s decision and guide pathologists to areas of concerns. Any ML model can give unpredictable results when presented with out-of-distribution data not seen during training, but the impact on attention is less studied. Pathology also presents a unique challenge to attention-based models because of the sporadic presence of tissue contaminants—material from different patients or specimens that are unintentionally included in the slide.¹¹ The goal of this study was to test the impact of tissue contaminants on model performance and examine how tissue contaminants interact with the attention mechanism.

Tissue Contamination

The process of tissue handling, in which patient tissue becomes a slide, contains multiple steps in which tissues from one patient can appear on the slide of a different patient. This could be a “push” from an insufficiently cleaned tool at the grossing bench, “block contamination” that occurs during transport processing in a retort shared by tissues from multiple patients, or a “floater” that occurs when histology water baths are insufficiently cleaned between blocks.^11–14 These errors, collectively referred to as “tissue contamination,” are well described in pathology literature, but often come as a surprise to nonpathologist researchers or physicians.¹⁵

Tissue contaminants have been identified in up to 3% of slides examined with an average size of 1 mm.^2,11,13 In all, 12.7% of contaminants appeared to be neoplasia and 0.4% were judged to have a high risk of causing misdiagnosis. This low proportion belies a surprising count. An institution that produces 1 million slides per year—as is common for many high-volume academic and commercial laboratories—can expect 30,000 slides with tissue contaminants, of which 120 have a high risk of causing errors. Nonetheless, the actual error rate may be much lower. In a review of 276 legal cases against pathologists for misdiagnosis during 2004–2010, only 1 involved a floater.¹⁶

Human Learning and Machine Learning

The low error rate owing to tissue contaminants in real-world clinical practice may relate to pathologist education. Demonstrated knowledge of normal histology is a basic level-1 (of 5) milestone competency in pathology resident education in the United States.^17,18 Identification of specimen integrity issues, specifically including floaters, is a level-2 skill.^17,18 Thus, any pathologist examining a placental slide with tissue contamination by prostate should perform the following: (1) identify which portions of the specimen are placenta and which are prostate and (2) recognize that the prostate tissue is present in error and ignore it.

The contrast with ML approaches is stark. ML approaches in digital pathology generally address a single question (cancer detection and antibody quantification) or a set of related questions (multiple mutation identification) in a narrowly defined specimen. For example, Paige Prostate was trained to detect prostate adenocarcinoma in needle prostate biopsies and is Food and Drug Administration authorized for that diagnosis in that specimen type.^19,20

What Do We Know About Contaminants?

There is robust literature around slide quality and artifacts in digital pathology.^21–24 Recent work has shown that digitally mimicking artifacts, including out-of-focus areas, threads, folds, markers, and crush artifacts results in increasing rates of patch misclassification with more severe artifacts.^21,24 Some countermeasures have been considered—inclusion of fields with imaging artifacts during training improves robustness to those artifacts at inference without compromising performance on pristine images.²⁵ Pantanowitz et al²⁶ reported on a version of their Yottixel system to suggest the probable source tissue of floaters, although their system relies on pathologists to identify the tissue regions of concern.

Systematic Contaminants

Differences in procedure between institutions or over time may create nonrandom patterns of contamination. In placental pathology, guidelines support submission of membrane rolls and umbilical cord in the same block.²⁷ However, our institution submits membrane rolls and umbilical cord separately. Thus, a model trained using membrane roll slides from our institution would not be exposed to umbilical cord during training. If it were deployed at a site that cosubmitted membranes and umbilical cord, the response to the cord is uncertain. Another example may be seen in prostate biopsies. Historically, most biopsies were performed transrectally, with a shift in the past decade to transperineal biopsies.^28,29 Biopsy needles are designed to capture only prostatic tissue, but there is some risk of “pick-up” from the tissues transited. Thus, older prostate biopsies are more likely to contain fragments of large bowel, whereas newer biopsies are more likely to contain skin and subcutaneous tissue.

Placenta Models

A diverse spectrum of placental abnormalities have been linked with a variety of fetal and neonatal outcomes.^27,30 The breadth of these anomalies, high interobserver variability, and sparsity of perinatal pathology expertise motivate ongoing work in this field.^31–34 Placenta is considered a high-risk destination for tissue contaminants due to its sponge-like architecture.¹⁴ Prior studies on placenta have demonstrated classification and detection of normal and abnormal villous morphology,^35–37 cell type and relationships,³⁸ gestational age (GA),³⁹ and decidual arteriopathy (DA).⁴⁰ For this study, we chose to examine 3 models in placental pathology and physiology.

DA is a spectrum of abnormalities in decidual vessels associated with gestational hypertension and preeclampsia.²⁷ DA is part of the maternal vascular malperfusion group of diagnoses. DA, specifically mural hypertrophy of membrane arterioles, has been associated with SARS-CoV-2 infection in pregnancy, though concerns have been raised about interobserver variability in this diagnosis.^41–43 Previous work by Clymer et al⁴⁰ has demonstrated the identifiability of DA in whole-slide images of placental membranes.

The placenta undergoes a reproducible series of changes over gestation, correlating with the clinically estimated GA. Accelerated maturation is associated with growth restriction and gestational hypertension, whereas delayed maturation may be associated with diabetes in pregnancy.^44–46 We previously published a model to predict GA.³⁹

Several macroscopic lesions can be identified within the placental disc, including villous infarction, perivillous fibrin deposition (PVFD), and intervillous thrombi (IVT). Villous infarction are areas of interrupted maternal circulation associated with hypertension in pregnancy, preeclampsia, growth restriction, stillbirth, and risk of cerebral palsy.^47–51 PVFD may be focal, in which it is thought to represent a reparative response to turbulent flow, or massive, in which it can be associated with fetal growth restriction and stillbirth.^52–56 IVT are foci of clotted blood in the intervillous space. They have no clear clinical significance but merit recognition due to their frequency and risk of confusion with infarcts and PVFD.^41,57,58 We recently developed a model to distinguish these 3 lesions from one another and from placental disc sections without macroscopic lesions.⁵⁹

These problems correspond to 3 of the most common tasks in pathology—finding a small area of abnormality, estimating a quantity based on gestalt, and classifying a known abnormality.

Prostate Models

Cancer detection is a common paradigm in ML, and models to identify prostate cancer are among those Food and Drug Administration authorized. We developed a model to detect prostatic adenocarcinoma in needle biopsy specimens.

Materials and Methods

Data Set

Placentas

Inclusion criteria were patients who underwent placental examination and reporting at our institution between 2011 and 2023 and had slides scanned as part of an ongoing digitization study (institutional review board number: STU00214052; Table 1). Each model used additional criteria as noted below. Placentas were examined and diagnoses were rendered according to the Amsterdam criteria or precursor guidelines.²⁷ Slides were digitized using a Leica GT450 scanner with ×40 objective magnification (0.263 μm per pixel). A ×10 magnification layer was used for placental studies. Pathology reports were obtained from the institutional electronic data warehouse and processed using natural language processing.⁴² Patient and diagnosis information was stored in REDcap.

Table 1.

Placenta cohort

Maternal age (y, n = 3624)		33.0 ± 5.3
Gestational age (wk, n = 3667)		37.2 ± 3.6
Nulliparous (n = 3681)		2033 (0.55)
Race (n = 3681)	Black	577 (0.16)
	White	1944 (0.53)
	More than one	618 (0.17)
	American Indian, Asian, Hawaiian	271 (0.07)
	Declined	271 (0.07)
Ethnicity (n = 3681)	Hispanic or Latino	772 (0.21)
	Non-Hispanic or Latino	2645 (0.72)
	Declined	264 (0.07)
Fetal sex (n = 3530)	Female	1700 (0.48)
	Male	1830 (0.52)
Placental diagnoses	DA (n = 3665)	506 (0.14)
	Villous infarction (n = 3665)	570 (0.16)
	IVT (n = 3642)	421 (0.12)
	PVFD (n = 3642)	311 (0.09)
Comorbidities (n = 3681)	Diabetes in pregnancy (any)	725 (0.20)
	Hypertension in pregnancy (any)	898 (0.24)

Open in a new tab

DA, GA, and mass lesion cases represent a subset of these cases. Comorbidities are identified by ICD9/ICD10 codes using the electronic data warehouse. DA, decidual arteriopathy; GA, gestational age; ICD, International Classification of Diseases; IVT, intervillous thrombus; PVFD, perivillous fibrin deposition.

Decidual Arteriopathy Model

Additional inclusion criteria were patients delivering after 26 weeks, 0 days.²⁷ DA cases were those with a clinical diagnosis of mural hypertrophy of membrane arterioles. Controls were those without this diagnosis. One slide containing membrane rolls from each placenta was selected. A total of 2336 cases were used, split 70:15:15 between training, validation, and test sets with stratification based on case or control. The model was trained using the Adam optimizer and focal loss. Early stopping was used when the validation loss plateaued.

Gestational Age Model

Additional inclusion criteria were patients delivering a singleton at 24 weeks, 0 days gestation or later with a clinical diagnosis of appropriate villous maturation for the stated GA.^39,60 All placentas meeting these criteria were used. Up to 3 slides containing nonlesional villous tissue were used per patient. Eight hundred forty-six placentas were used, randomly split 70:10:20 between the training, validation, and test sets. Splits were performed stratified based on GA. The model was trained using the Adam optimizer and Huber loss. Early stopping was used when the validation loss plateaued.

Macroscopic Lesion Model

Additional inclusion criteria were diagnoses of infarction, PVFD, or IVT (for cases) or none of those for controls. Exclusion criteria included other macroscopic lesions, such as infarction hematoma, chorangioma, or SARS-CoV-2 placentitis. One slide was used per placenta, either containing lesions (for cases) or nonlesional villous tissue (for controls). Eight hundred thirty-three cases were split 70:15:15 into training, validation, and test sets. The model was trained using the Adam optimizer and categorical crossentropy loss. Early stopping was used when the validation loss plateaued.

Prostate Adenocarcinoma Model

Inclusion criteria were patients with prostatic needle biopsies examined at our institution, with slides scanned as part of an ongoing research effort. The data set included 1 to 3 hematoxylin and eosin slides each from 2602 blocks representing 647 patients (Table 2). The Gleason grade assigned to each block was retrieved, with blocks classified as cancer if any cancer was present and otherwise as negative. Cases were split 80% training: 20% test, stratified by the presence of cancer, with all slides from a single patient assigned to the same group. Training and validation were similar to those of the placenta models except that features were extracted at ×20 magnification with a patch size of 224 × 224 pixel and no overlap. The feature extractor used was ConvNeXtXLarge, which produces a feature vector with 2048 values. Hinge loss was used.

Table 2.

Prostate cohort

Patient age (y)		67.1 ± 8.2
Race (n = 647)	American Indian or Alaskan native	2 (<0.01)
	Asian	10 (0.02)
	Black	88 (0.14)
	Native Hawaiian or other Pacific islander	1 (<0.01)
	White	442 (0.68)
	More than one/declined	104 (0.16)
Biopsies (n = 2602)
Gleason grade group	Count	Percent involvement (IQR)
Negative	1681	-
Grade group 1	305	10% (5%−25%)
Grade group 2	351	40% (20%−70%)
Grade group 3	141	56% (30%−80%)
Grade group 4	61	60% (20%−82.5%)
Grade group 5	63	60% (30%−90%)

Open in a new tab

Patients had a median of 4 biopsies.

Model: Conceptual

All the models were implemented using Python version 3.7.7 and TensorFlow version 2.9.⁶¹

An overview of the model is shown in Figure 1. First, the slide or slides comprising each case are split into a set of smaller patches. Patches undergo feature extractiondthe first step of any image ML pipeline. Features are fed into an attention subnetwork, which assigns an attention to each patch and produces a pooled feature vector weighted to the most highly attended patches. The pooled vector is run through a fully connected subnetwork to produce the result. The attention subnetwork and fully connected subnetwork are trained to minimize errors. All models use a batch size of 1.

Feature Extraction

An image layer (×10 for placenta and ×20 for prostate) was split into patches (256 × 256 pixel for placenta and 224 × 224 pixel for prostate; Fig. 1). Nontissue patches were masked using the Otsu’s method.⁶² Each patch was passed through a fixed feature extractor network. EfficientNetV2L trained on ImageNet was used for placental models.⁶³ ConvNeXtXLarge, trained on ImageNet, or KimiaNet, a version of Densenet trained on The Cancer Genome Atlas slides, were used for prostate.^64,65 After that, the feature vectors were generated as NxD, in which N was the number of patches that were analyzed for each case and D was the number of features in each vector—1280 for EfficientNetV2L and 2048 for ConvNeXtXLarge. Feature vectors were written as TensorFlow record files offline.

Attention Subnetwork

The feature vectors obtained from feature extraction were given to a trainable dense layer, reducing dimensionality to a 512-dimensional feature. Then, to generate attention for each patch, we used the dot product of 2 parallel layers with 256 neurons, one with hyperbolic tangent activation and the other with sigmoid activation. Attentions for all examined patches were normalized to sum to 1 using a softmax function.

Classification Subnetwork

The pooled feature vector is submitted to fully connected layers with appropriate activation. For determining class boundaries, individual patches are submitted to the classification network.

Contaminant Slides

Contaminant slides were one each of low-grade urothelial bladder tumors retrieved using transurethral resection (“bladder”), postdelivery nonadherent blood received with a placenta (“blood”), colonic mucosal biopsies showing a tubular adenoma (“colon”), fallopian tube fimbriae removed using salpingectomy for fertility control (“fallopian”), full-thickness sections of the placental disc (“placenta”), hypertrophic prostate excised using holmium laser (“prostate”), excised skin with intradermal nevus (“skin”), small bowel resected after traumatic injury (“small bowel”), and umbilical cord cross sections (“umbilical”). Contaminant slides were reviewed to ensure they themselves were not contaminated, but not otherwise selected or resampled. Contaminant slides were scanned using the same Leica GT450 scanner as that used for placentas and prostate biopsies and underwent the same feature extraction with the same magnification, tile size, and overlap. To add contaminants to a patient slide, a random subset of contaminant patches was selected and appended to the set of feature vectors from the placenta. The quantity of contaminant was varied, with 10% of the contaminant indicating 10 patches of the contaminant added for every 100 patches of relevant tissue. Unlike typical image corruption paradigms, the patient tissue patches were not altered or removed.

Interpretation Model Using T-Distributed Stochastic Neighbor Embedding Plots

t-distributed stochastic neighbor embedding (tSNE) is used to view high-dimensional data in a lower dimensional space (embedded space). In this study, we used tSNE from sklearn 1.2.0⁶⁶ for quantitative model performance analysis to reveal how attentio and classification of individual patches clustered. tSNE parameters perplexity and random were 30 and 0, respectively, for 2 dimensions of the embedded space.

Isolation Forest

The IsolationForest algorithm from scikit learn (version 1.3.0) was used.⁶⁷ Forests were trained using features from 10 slides of prostate from the training set. The proportion of rejected patches was measured for each contaminant and for 10 sets of 10 prostate slides from the test set. The proportion of patches rejected from each slide was measured. Ten forests were trained using different sets of slides and performance was measured for each.

Results

Multiple Contaminants Interfere With Detection of Decidual Arteriopathy

We trained a model to detect DA. At baseline, the model showed sensitivity of 0.6, specificity of 0.88, an area under the curve of 0.81, and balanced accuracy of 0.74 (Fig. 2 and Supplementary Table S1). Slides of normal membranes far outnumber DA cases in this data set, so balanced accuracy is a more representative measure. Each contaminant decreased balanced accuracy somewhat, with prostate showing the largest impact. Adding 1% prostate tissue reduced the balanced accuracy from 0.74 to 0.69 ± 0.01. Ten percent prostate tissue reduced accuracy to almost chance levels. Prostate contains abundant smooth muscles and medium-to-large muscular arterioles—this may explain the higher risk of misdiagnosis. Surprisingly, different contaminants resulted in different types of errors. Small bowel, fallopian tube, and bladder resulted in false-positive calls, whereas blood, umbilical cord, and prostate resulted in false negatives. We examined our test set and identified a slide in which umbilical cord was included, in which it induced a false-negative result.

Figure 2. — Multiple contaminants impair balanced accuracy in detecting decidual arteriopathy (DA). (A) A normally remodeled spiral artery (top) and a spiral artery with DA, specifically mural hypertrophy of membrane arterioles (bottom), are shown for illustration. (B) The addition of even 1% of prostate tissue causes noticeable decrements in balanced accuracy, whereas bladder, fallopian tube, and blood at higher proportions also bring the accuracy near to or at chance level (0.5). (C) High proportions of contaminants may be seen deliberately, as in the inclusion of umbilical cord, when the model has only been trained with membrane rolls. DA is present, but in this case, the model produces a false-negative result. Scale bar: 100 μm (A) and 5 mm (C).

Contaminants Decrease the Accuracy of Gestational Age Estimation

We developed a model to estimate the GA by the examination of placental villi. At baseline, the model showed strong performance, with an R² value of 0.75 and a mean absolute error (MAE)—the average difference between the estimated GA and the chronological GA—of 1.66 weeks (Fig. 3 and Supplementary Table S2). Adding contaminants resulted in an increased MAE, with bladder causing errors of 2.37 weeks at 5% contaminant and 3.36 weeks at 10% contaminant. Interestingly, the impact on the estimated GA was monotonous, such that even though the MAE with 30% bladder was 6.87 weeks, the R² value, which depends primarily on the order of data points, only fell to 0.72. The impact of different contaminants was quite variable. Umbilical cord had a minimal impact—this may be because it resembles stem villi and the chorionic plate that are normally present in the placental disc. Likewise, unclotted blood is a normal constituent of the intervillous space.

Figure 3. — The impact of contaminants on the estimation of gestational age. Placental villi undergo a reproducible series of morphologic changes between (A) 24 and (B) 42 weeks of gestation. Villi are narrower with denser stromata, more exteriorized capillaries, and increased syncytial knots. (C) Our baseline model is highly accurate with a mean absolute error of 1.66 weeks. The addition of a contaminant, such as 30% bladder (example shown in panel D), results in a sharply higher mean absolute error. (E) The decrease appears monotonous, with each case showing a similar decrease in estimated gestational age, which results in a misleadingly preserved correlation coefficient (R²). (F) Although bladder clearly shows the highest impact, other tissues including blood, prostate, small bowel, and fallopian tube also increase errors, although at higher proportions. Umbilical cord did not result in significantly increased errors. Scale bar: 50 μm.

Blood Contaminant Causes Misclassification of Macroscopic Placental Tissue as Intervillous Thrombi

We tested the impact of tissue contaminants on a model that classifies placental slides as either villous infarction (infarct), IVT, PVFD, or none of the above (normal). At baseline, the model had accuracy of 0.89. Bladder, fallopian tube, prostate, small bowel, and umbilical cord had minor impacts (Supplementary Table S3). However, the addition of blood caused misclassification of normal and, to a lesser extent, PVFD, slides as IVT. Four cases were misclassified as IVT at baseline, reaching 26.5 ± 2.0 cases at 20% contaminant and 64.2 ± 0.4 at 70% (n = 10 replicates). Infarcts, actual IVT, and, to a lesser extent, PVFD remained accurately classified, so balanced accuracy poorly reflected the error, falling only from 0.894 at baseline to 0.891 at 20% contaminant, with a larger drop to 0.60 at 70% contaminant.

Contaminants Cause False Positives in a Prostate Cancer Detection Model

We developed a prostate cancer detection model with baseline accuracy of 0.923 and an area under the curve of 0.954 (Fig. 4 and Supplementary Table S4). The model was extremely sensitive to bladder contaminants, which resulted in high rates of false positives. Even small amounts of otherwise unremarkable low-grade bladder tumors could induce false positives (Fig. 4D). The cause of this is unclear, although the bladder tumor shows nucleoli and high mitotic rates, which could be seen as cancer features. Other contaminants, including colon and fallopian tube, resulted in lower magnitude errors, including false-negative diagnoses.

Figure 4. — Prostate cancer detection model. (A) We developed a model to detect prostate carcinoma in needle biopsies, which attends to areas of cancer and results in high block-level accuracy (scale bar: 1 mm). (B) Adding bladder resulted in marked increases in the false-positive rate, from 20/314 at baseline to 51 ± 2.7 at 10% bladder contaminant and 197 ± 7.9 when the amount of added bladder equaled the amount of prostate tissue (100% contaminant; n = 10 replicates). (C) Patch-level attention of one selected case with 10% bladder contaminant showed the expected power law distribution. Surprisingly, 3 of the 4 most highly attended patches were contaminant, rather than patient, tissue. We identified the (D) 10 most highly attended patches of the contaminant bladder slide (scale bar: 20 μm), all of which showed unremarkable low-grade bladder tumors. Adding these 10 patches to each prostate case resulted in 306/316 false positives. These patches represent a minute quantity of tissue (E, patches from panel D resized to the same scale as those from panel A).

All Contaminants Distract Attention in All Models

We measured the proportion of attention given to each contaminant by each model (Fig. 5). Unsurprisingly, increasing amounts of contaminants received increasing amounts of attention. More surprisingly, the level of attention given to contaminant tissues often exceeded, patch-for-patch, that given to patient tissue. This represents a significant failure of the attention mechanism, even if it did not result in errors. High attention did not always correlate with high rates of errors. For example, umbilical cord was among the most highly attended tissues by the GA model, particularly at lower proportions, but it did not result in errors.

Figure 5. — All contaminants distract attention. We measured the proportion of attention given to contaminant patches as a function of the proportion of contaminant added for models of (A) decidual arteriopathy, (B) gestational age, (C) macroscopic lesions, and (D) prostate adenocarcinoma. The dashed line represents attention at par, in which the average patch of contaminant receives as much attention as the average patch of patient tissue. For example, 20% contaminant represents 20 patches of contaminant for each 100 patches of patient tissue. Therefore, the contaminant represents 20/(100 + 20) = 16.6% of the patches present. Every contaminant received some attention. Multiple contaminants in decidual arteriopathy, gestational age, lesions, and prostate adenocarcinoma were attended above par.

Contaminants Occupy a Somewhat Distinct Location in Embedded Feature Space and Divert Pooled Features From the Decision Boundary

To track the impact of contaminants on model decisions, we visualized the feature space using tSNE (Fig. 6). The patches classified as positive or negative occupy distinct regions. By applying the same transformation to the pooled feature vector, we can see where the model places the specimen overall. In a case that shifts from true negative to false positive at a low contaminant proportion, the feature vector is close to the apparent boundary between the positive and negative instances. Adding a relatively small amount of contaminant shifts the pooled feature vector in the feature space, consistent with the changed slide-level interpretation. The relative separation of contaminant and native features raises the possibility that simple anomaly detection can be used to exclude contaminants. We trained isolation forests using prostate slides and examined the proportion of patches rejected (Supplementary Fig. S2). In this technique, a model is trained on in-distribution data points—in this case, on patches of prostate tissue. The model is then presented with new data points and classifies them as in distribution or out of distribution. An ideal model would classify all prostate patches as in distribution and reject all contaminant patches. However, we found that the vast majority of placenta, colon, and bladder patches were accepted as in distribution, whereas a small, but nonzero, proportion of prostate patches was rejected.

Figure 6. — The impact of contaminants in the feature space. Feature maps for one slide, (A) a true negative at baseline that becomes (B) a false positive when 10% bladder contaminant is added. Each point represents one patch, transformed into a feature vector and then plotted in 2-dimensional space using tSNE. Patches are classified as decidual arteriopathy (DA) (rose) or not DA (light blue), which are largely distinguishable. The pooled feature vector, representing a weighted average, is marked in green. Added contaminants are largely classified as DA (bright red), with a few not classified as DA (dark blue). Adding contaminants causes the pooled feature vector to more clearly cluster with the positive instances, resulting in the false-positive call. tSNE, t-distributed stochastic neighbor embedding.

Contaminants Present in the Test Set Receive High Attention and Alter Diagnoses

To supplement our findings in digitally adding contaminants, we examined the impact of contaminants that were present in the original slides. Umbilical cord in the diagnosis of DA (Fig. 2C) and blood in the diagnosis of IVT (Fig. 7F) have been described. In prostate, we identified cases with systematic contaminant colonic mucosa, skeletal muscles, and colon polyp floaters in which the contaminant received the highest attention, resulting in false-positive or false-negative diagnoses (Fig. 8).

Figure 7. — Blood causes misclassification of macroscopic placental lesions. Paradigmatic examples of (A) normal placenta, (B) infarct, (C) intervillous thrombi (IVT), and (D) PVFD. (E) A section of fresh blood was used as the contaminant. Note lines of Zahn as a critical distinguishing morphology (B vs E). (F) In clinical practice, areas of unclotted blood may be submitted incidentally with normal placenta sections, causing misclassification as IVT. (G) The lesion classification model is highly accurate at baseline, with balanced multiclass accuracy of 0.89. With increasing amounts of blood contaminant, normal and PVFD cases are progressively misclassified as IVT. At baseline, 4 cases are misclassified as IVT. At 20% blood, 26.5 ± 2.0 cases are misclassified as IVT. Scale bar: 10 mm. PVFD, perivillous fibrin deposition.

Figure 8. — Tissue contaminants in the wild. Prostate needle biopsy slides from our test set with contaminants. Relative model attention is shown (clear:low -> green:high). (A, B) False-positive diagnoses with high attention to colonic mucosa picked up during transrectal biopsy. (C) A false-negative diagnosis with high attention to an area of skeletal muscle. (D) A false-negative biopsy with high attention to a folded piece of tissue that appears to be a “floater” of colon polyp. Scale bar: 1 mm (A-D) and 100 μm (C&D inset).

The Feature Extractor Fine Tuned Using Histology Images Resolves Some Aspects of Contamination

We tested the impact of training a prostate cancer detection algorithm using the KimiaNet feature extractor, which was fine-tuned using histology images.⁶⁵ The accuracy showed little to no degradation when the tested contaminants, including bladder, were added (Supplementary Fig. S2). The contaminants were assigned significant levels of attention, in some cases higher, patch-for-patch than for the main model. An isolation forest built using KimiaNet features rejected a modest proportion of contaminant patches.

Discussion

Recitation

We report 4 whole-slide ML models developed with contemporary techniques. The models encompass tissues including placental membrane, disc, and prostate core and tasks including detection, classification, and regression. Each model, other than the KimiaNet-based model, shows degraded performance when tissue contaminants are added to previously well-classified data. Each model, including the KimiaNet-based model, attends to contaminating tissue, often at or above the level shown to patient tissue. Contaminating tissue induces errors by altering the attention distribution within the feature space and redirecting the pooled feature vector across the decision boundary.

Comparison

In the broadest sense, all ML models perform reliably when given data like that seen in training and perform unpredictably when given data from outside that distribution. More narrowly, this work falls within the digital pathology ML and computer vision topics on image corruptions or visual artifact pathology.^21–24 Our findings contribute to this literature with a few key differentiations. Our focus on the attention mechanism identifies a key weakness for supervised whole-slide models. The impact of contaminants on unsupervised or self-supervised learning is unclear. Using added contaminants, rather than distorting or obscuring the underlying image, guarantees that the original diagnostic information is preserved intact. Tissue contaminants themselves are a recognized problem in pathology quality literature but have not attracted broader attention.^11,14

False positives due to tissue contaminants have been previously reported in ML cancer detection tasks. Liu et al⁶⁸ reported one floater (of 80 slides) causing false-positive metastatic breast cancer diagnoses in the CAMELYON data set, although they did not further consider the implications. Similarly, Campanella et al⁶⁹ showed a false-positive lymph node owing to a benign papillary inclusion in a lymph node—an example of a biological systemic contaminant. da Silva et al⁷⁰ reported 27 false suspicious calls from the Paige prostate algorithm in a data set containing 404 negative biopsies. Of these, 5 represented intermixed muscle and fibrous tissue (their Figure 4, panels M, V, W, X, and Y).⁷⁰ Naito et al⁷¹ reported only 1 false-positive case of pancreatic cancer in their test set (of 34 slides); however, that sole false positive was due to a “pick-up” of gastric tissue transited during the endoscopic biopsy. This form of systemic contamination is described as “ever present” in pathology textbooks, so the true risk may be higher.⁷² These previous publications suggest that false-positive cancer detection owing to tissue contaminants may be endemic.

Our methods—altering patient data to test robustness—are similar to those seen in adversarial attacks.⁷³ In digital pathology literature, these studies have taken the form of alterations to individual patches that are imperceptible to human observers yet induce incorrect responses from models.^10,74,75 Our work is similar, in that the alterations—subtle pixel-level alterations or tissue contaminants—are ignored by trained human observers. However, there are key differences. We focus on attention-level mechanisms, leaving the original data completely unchanged. Our “attack” does not rely on access to the model or model architecture. Most significantly, although adversarial attack invokes the (so far unobserved) possibility of bad actors gaining access to digital pathology images and making targeted alterations, tissue contaminants are already inside the building.

Attention Is a Core Model Mechanism and Key Model Output

Saliency mapping, such as gradient-weighted class activation mapping, has been used to demonstrate shortcuts taken by models, such as using hospital-specific markings to classify chest X-rays as COVID-19 pneumonia, rather than examination of the lungs.⁷⁶ The case in digital pathology is similar, but the problem is more acute. First, while GradCAM and other saliency mechanisms are post-hoc explanations, attention in whole-slide models is central to how the model produces output. Second, in a computer-assisted diagnosis paradigm, pathologists will primarily be working with visualizations of the attention maps, rather than the final prediction. In that sense, attention is a much more important output than prediction. If the model is generating accurate slide-level predictions but attending to irrelevant tissue, pathologists will question the credibility of the model.

Those doubts are well founded. Models are credible insofar as they represent the underlying process. By attending to contaminant tissue, the models demonstrate that they do not encode the underlying biological phenomena, in this case, that pathologies are tissue specific. Prostate cancer does not arise in the bladder. DA arises only in the decidua—not in the prostate. Although it is shocking that the GA model attends to the average patch of prostate twice as much as it attends to the average patch of placenta, any significant attention given to contaminants demonstrates the failure of the attention mechanism.

Considering Contaminants: Which Tissues?

The contaminants tested in this study represent a small sample of tissues commonly encountered in surgical pathology practice and include some of those—colon, bladder, prostate, and placenta—seen as at highest risk for producing or capturing contaminants.^12,14 We tested only one slide per tissue type, with only one of the myriad diagnoses that the tissue may display. That is, we included a slide of low-grade bladder tumor, but not that of normal bladder, high-grade tumor, malakoplakia, or other diagnoses. Therefore, it represents only a small subset of the possible combinations of the tissue of interest, diagnosis of interest, and contaminant. Yet, this small sample shows striking variability. Some explanations suggest themselves. DA consists of increased smooth muscles surrounding decidual arterioles, and the abundance of smooth muscles in prostate tissue may cause misdiagnosis. Normal placental tissues, such as umbilical cord, have limited impact on GA. The lesion model is quite resistant to the contaminants tested but fails to distinguish clotted from unclotted blood, which resembles IVT. The nucleoli in bladder tumors could drive false-positive diagnoses of prostate cancer. Understanding these errors may lead to future studies in improving model design.

Considering Contaminants: How Much Is It Realistic?

Sporadic contaminants are reported to be 1 mm² on average.¹³ For our prostate cores, this represents a median of 2.4%, with an IQR of 1.5% to 3.3%. This is enough to produce a small but meaningful increase in errors. The averages conceal significant variation—the 10 patches we identified that trigger a 97% false-positive rate represent only 0.14 mm² of tissue.

The situation with placental diagnoses is quite different. The average slide of membrane rolls has 687 mm² of tissue (IQR, 592–796 mm²), so a prostate contaminant, 2.6 mm on a side, would be needed to measurably impair diagnostic accuracy. Therefore, the highest risk tissues are those present in the placenta specimen that may be included knowingly or deliberately in the material submitted for histology. Our DA model is trained using placental membrane sections only. Many institutions submit umbilical cord sections with their membrane rolls, resulting in high contaminant percentages. In our model, this induces significant errors (Fig. 2). In our test set, we observed false-positive diagnoses associated with this mode of contamination (Fig. 2C). Similarly, unclotted blood is commonly received with placental specimens and submitted for histology (see Fig. 7F). The distinction between this blood and true thrombi, indicated by the lines of Zahn, is a core pathologist skill.⁷⁷ The volume of blood thus submitted is often large, easily representing one-third of the tissue present (ie, 50% contaminant), sufficient to induce errors in multiple placental classifiers.

Next Steps: What Can Be Done?

The data presented in this study are sufficient to establish tissue contaminants as a risk to whole-slide learning models in modern digital pathology. We do not aim for a complete accounting or explanation of tissue contaminants, nor are there simple answers to address this problem. The use of humans or different ML models to “pre-screen” slides for contaminants suggests itself as a solution but is fraught. If ML models are needed to increase efficiency in the face of increasing specimens and a declining pathologist workforce, adding human intervention is self-defeating.⁷⁸ Regarding the use of a prescreening ML model to identify contaminants—the diversity of tissues and their pathologies in the human body necessitates a very advanced model to distinguish contaminants from atypical presentations of the disease of interest. Such an advanced prescreener would obviate the need for specialized single-organ models. Building a specialized classifier on top of a foundation model could capture these advantages, though current digital pathology foundation models have limitations.

There are some findings that suggest future actions for pathologists, decision makers, and ML practitioners, which are as follows: (1) an ML transformation of pathology may require a renewed emphasis within the laboratory on reducing tissue contaminants. (2) Clinical decisions should continue be made with a human-in-the-loop. As described in the “Introduction” section, human pathologists recognize tissue contaminants as a basic skilldfor which expertise should be used. (3) Model vulnerability to common tissue contaminants should be assessed. (4) More work is needed to understand how the attention mechanism reacts to out-of-distribution data.

Supplementary Material

NIHMS1959156-supplement-1.png^{(78.6KB, png)}

NIHMS1959156-supplement-2.png^{(155.7KB, png)}

NIHMS1959156-supplement-3.xlsx^{(16.6KB, xlsx)}

NIHMS1959156-supplement-4.xlsx^{(12.6KB, xlsx)}

NIHMS1959156-supplement-5.xlsx^{(12.8KB, xlsx)}

NIHMS1959156-supplement-6.xlsx^{(17.5KB, xlsx)}

Funding

J.A.G. is supported by National Institute of Biomedical Imaging and Bioengineering K08EB030120 and the Walder Foundation Fund to Retain Clinician Scientists. L.A.D.C. is supported by R01LM013523 and U01CA220401. REDCap and other key infrastructure are supported by UL1TR001422.

Footnotes

Declaration of Competing Interest

The authors state that no conflict of interest exists.

Supplementary Material

The online version contains supplementary material available at https://doi.org/10.1016/j.modpat.2024.100422.

Data Availability

Whole-slide image data are available after execution of a data use agreement with Northwestern University.

References

1.Steiner DF, Chen PC, Mermel CH. Closing the translation gap: AI applications in digital pathology. Biochim Biophys Acta Rev Cancer 2021;1875(1):188452. 10.1016/j.bbcan.2020.188452 [DOI] [PubMed] [Google Scholar]
2.Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer 2021;124(4):686–696. 10.1038/s41416-020-01122-x [DOI] [PMC free article] [PubMed] [Google Scholar]
3.He Y, Zhao H, Wong STC. Deep learning powers cancer diagnosis in digital pathology. Comput Med Imaging Graph 2021;88: 101820. 10.1016/j.compmedimag.2020.101820 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Baxi V, Edwards R, Montalto M, Saha S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod Pathol 2022;35(1):23–32. 10.1038/s41379-021-00919-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Parwani AV. Next generation diagnostic pathology: use of digital pathology and artificial intelligence tools to augment a pathological diagnosis. Diagn Pathol 2019;14(1):138. 10.1186/s13000-019-0921-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Gadermayr M, Tschuchnig M. Multiple instance learning for digital pathology: a review on the state-of-the-art, limitations & future potential Published online June 9, 2022. Accessed September 28, 2022. http://arxiv.org/abs/2206.04425 [DOI] [PubMed]
7.Lu MY, Chen TY, Williamson DFK, et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 2021;594(7861):106–110. 10.1038/s41586-021-03512-4 [DOI] [PubMed] [Google Scholar]
8.Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 2021;5(6):555–570. 10.1038/s41551-020-00682-w [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Lipkova J, Chen TY, Lu MY, et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat Med 2022;28(3): 575–582. 10.1038/s41591-022-01709-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ghaffari Laleh N, Muti HS, Loeffler CML, et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med Image Anal 2022;79: 102474. 10.1016/j.media.2022.102474 [DOI] [PubMed] [Google Scholar]
11.Gephardt GN, Zarbo RJ. Extraneous tissue in surgical pathology: a College of American Pathologists Q-Probes study of 275 laboratories. Arch Pathol Lab Med 1996;120(11):1009–1014. [PubMed] [Google Scholar]
12.Zarbo RJ. The unsafe archaic processes of tissue pathology. Am J Clin Pathol 2022;158(1):4–7. 10.1093/ajcp/aqac018 [DOI] [PubMed] [Google Scholar]
13.Layfield LJ, Witt BL, Metzger KG, Anderson GM. Extraneous tissue: a potential source for diagnostic error in surgical pathology. Am J Clin Pathol 2011;136(5):767–772. 10.1309/AJCP4FFSBPHAU8IU [DOI] [PubMed] [Google Scholar]
14.Carll T, Fuja C, Antic T, Lastra R, Pytel P. Tissue contamination during transportation of formalin-fixed, paraffin-embedded blocks. Am J Clin Pathol 2022;158(1):96–104. 10.1093/ajcp/aqac014 [DOI] [PubMed] [Google Scholar]
15.Burke NG, McCaffrey D, Mackle E. Contamination of histology biopsy specimen—a potential source of error for surgeons: a case report. Cases J 2009;2: 7619. 10.1186/1757-1626-0002-0000007619 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Troxel DB. Trends in pathology malpractice claims. Am J Surg Pathol 2012;36(1):e1–e5. 10.1097/PAS.0b013e31823836bb [DOI] [PubMed] [Google Scholar]
17.Naritoku WY, Alexander CB. Pathology milestones. J Grad Med Educ 2014;6(suppl 1):180–181. 10.4300/JGME-06-01s1-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.ACGME. Pathology Milestones Accessed April 3, 2023. https://www.acgme.org/specialties/pathology/milestones/
19.Raciti P, Sue J, Ceballos R, et al. Novel artificial intelligence system increases the detection of prostate cancer in whole slide images of core needle biopsies. Mod Pathol 2020;33(10):2058–2066. 10.1038/s41379-020-0551-y [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Zhu S, Gilbert M, Chetty I, Siddiqui F. The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: an analysis of the characteristics and intended use. Int J Med Inform 2022;165: 104828. 10.1016/j.ijmedinf.2022.104828 [DOI] [PubMed] [Google Scholar]
21.Schömig-Markiefka B, Pryalukhin A, Hulla W, et al. Quality control stress test for deep learning-based diagnostic model in digital pathology. Mod Pathol 2021;34(12):2098–2108. 10.1038/s41379-021-00859-x [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Zanjani FG, Zinger S, Piepers B, Mahmoudpour S, Schelkens P, With PHN de. Impact of JPEG 2000 compression on deep convolutional neural networks for metastatic cancer detection in histopathological images. J Med Imaging (Bellingham) 2019;6(2):027501. 10.1117/1.JMI.6.2.027501 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Chen Y, Janowczyk A, Madabhushi A. Quantitative assessment of the effects of compression on deep learning in digital pathology image analysis. JCO Clin Cancer Inform 2020;(4):221–233. 10.1200/CCI.19.00068 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wang NC, Kaplan J, Lee J, Hodgin J, Udager A, Rao A. Stress testing pathology models with generated artifacts. J Pathol Inform 2021;12(1):54. 10.4103/jpi.jpi_6_21 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Wright AI, Dunn CM, Hale M, Hutchins GGA, Treanor DE. The effect of quality control on accuracy of digital pathology image analysis. IEEE J Biomed Health Inform 2021;25(2):307–314. 10.1109/JBHI.2020.3046094 [DOI] [PubMed] [Google Scholar]
26.Pantanowitz L, Michelow P, Hazelhurst S, et al. A digital pathology solution to resolve the tissue floater conundrum. Arch Pathol Lab Med 2021;145(3): 359–364. 10.5858/arpa.2020-0034-OA [DOI] [PubMed] [Google Scholar]
27.Khong TY, Mooney EE, Ariel I, et al. Sampling and definitions of placental lesions: Amsterdam Placental Workshop Group Consensus Statement. Arch Pathol Lab Med 2016;140(7):698–713. 10.5858/arpa.2015-0225-CC [DOI] [PubMed] [Google Scholar]
28.Xiang J, Yan H, Li J, Wang X, Chen H, Zheng X. Transperineal versus transrectal prostate biopsy in the diagnosis of prostate cancer: a systematic review and meta-analysis. World J Surg Oncol 2019;17(1):31. 10.1186/s12957-019-1573-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Bhanji Y, Allaway MJ, Gorin MA. Recent advances and current role of transperineal prostate biopsy. Urol Clin North Am 2021;48(1):25–33. 10.1016/j.ucl.2020.09.010 [DOI] [PubMed] [Google Scholar]
30.Roescher AM, Timmer A, Erwich JJHM, Bos AF. Placental pathology, perinatal death, neonatal outcome, and neurological development: a systematic review. PLoS One 2014;9(2):e89419. 10.1371/journal.pone.0089419 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Marchevsky AM, Walts AE, Lissenberg-Witte BI, Thunnissen E. Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability. Ann Diagn Pathol 2020;47:151561. 10.1016/j.anndiagpath.2020.151561 [DOI] [PubMed] [Google Scholar]
32.Kramer MS, Chen MF, Roy I, et al. Intra- and interobserver agreement and statistical clustering of placental histopathologic features relevant to preterm birth. Am JObstet Gynecol 2006;195(6):1674–1679. 10.1016/j.ajog.2006.03.095 [DOI] [PubMed] [Google Scholar]
33.Sun CCJ, Revell VO, Belli AJ, Viscardi RM. Discrepancy in pathologic diagnosis of placental lesions. Arch Pathol Lab Med 2002;126(6):706–709. 10.5858/2002-126-0706-DIPDOP [DOI] [PubMed] [Google Scholar]
34.Redline RW, Vik T, Heerema-McKenney A, et al. Interobserver reliability for identifying specific patterns of placental injury as defined by the Amsterdam Classification. Arch Pathol Lab Med 2022;146(3):372–378. 10.5858/arpa.2020-0753-OA [DOI] [PubMed] [Google Scholar]
35.Mukherjee A Pattern Recognition and Machine Learning as a Morphology Characterization Tool for Assessment of Placental Health. Dissertation 2021. 10.20381/RUOR-26948 [DOI]
36.Khodaee A, Grynspan D, Bainbridge S, Ukwatta E, Chan ADC. Automatic placental distal villous hypoplasia scoring using a deep convolutional neural network regression model. 2022 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) IEEE; 2022:1–5. 10.1109/I2MTC48687.2022.9806589 [DOI] [Google Scholar]
37.Kidron D, Vainer I, Fisher Y, Sharony R. Automated image analysis of placental villi and syncytial knots in histological sections. Placenta 2017;53: 113–118. 10.1016/j.placenta.2017.04.004 [DOI] [PubMed] [Google Scholar]
38.Vanea C, Džigurski J, Rukins V, et al. HAPPY: a deep learning pipeline for mapping cell-to-tissue graphs across placenta histology whole slide images Posted online February 27, 2023. 10.1101/2022.11.21.517353 [DOI] [PMC free article] [PubMed]
39.Mobadersany P, Cooper LAD, Goldstein JA. GestAltNet: aggregation and attention to improve deep learning of gestational age from placental whole-slide images. Lab Invest 2021;101(7):942–951. 10.1038/s41374-021-00579-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Clymer D, Kostadinov S, Catov J, et al. Decidual vasculopathy identification in whole slide images using multiresolution hierarchical convolutional neural networks. Am J Pathol 2020;190(10):2111–2122. 10.1016/j.ajpath.2020.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Shanes ED, Miller ES, Otero S, et al. Placental pathology after SARS-CoV-2 infection in the pre-variant of concern, alpha/gamma, delta, or omicron eras. Int J Surg Pathol 2023;31(4):387–397. 10.1177/10668969221102534 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Shanes ED, Mithal LB, Otero S, Azad HA, Miller ES, Goldstein JA. Placental pathology in COVID-19. Am J Clin Pathol 2020;154(1):23–32. 10.1093/ajcp/aqaa089 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Conde-Agudelo A, Romero R. SARS-CoV-2 infection during pregnancy and risk of preeclampsia: a systematic review and meta-analysis. Am J Obstet-Gynecol 2022;226(1):68–89.e3. 10.1016/j.ajog.2021.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Christians JK, Grynspan D. Placental villous hypermaturation is associated with improved neonatal outcomes. Placenta 2019;76:1–5. 10.1016/j.placenta.2019.01.012 [DOI] [PubMed] [Google Scholar]
45.Leavey K, Benton SJ, Grynspan D, Bainbridge SA, Morgen EK, Cox BJ. Gene markers of normal villous maturation and their expression in placentas with maturational pathology. Placenta 2017;58:52–59. 10.1016/j.placenta.2017.08.005 [DOI] [PubMed] [Google Scholar]
46.Jaiman S, Romero R, Pacora P, et al. Placental delayed villous maturation is associated with evidence of chronic fetal hypoxia. J Perinat Med 2020;48(5): 516–518. 10.1515/jpm-2020-0014 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Blair E, de Groot J, Nelson KB. Placental infarction identified by macroscopic examination and risk of cerebral palsy in infants at 35 weeks of gestational age and over. Am J Obstet Gynecol 2011;205(2):124.e1–124.e7. 10.1016/j.ajog.2011.05.022 [DOI] [PubMed] [Google Scholar]
48.Vinnars MT, Vollmer B, Nasiell J, Papadogiannakis N, Westgren M. Association between cerebral palsy and microscopically verified placental infarction in extremely preterm infants. Acta Obstet Gynecol Scand 2015;94(9):976–982. 10.1111/aogs.12688 [DOI] [PubMed] [Google Scholar]
49.Vinnars MT, Nasiell J, Ghazi S, Westgren M, Papadogiannakis N. The severity of clinical manifestations in preeclampsia correlates with the amount of placental infarction. Acta Obstet Gynecol Scand 2011;90(1):19–25. 10.1111/j.1600-0412.2010.01012.x [DOI] [PubMed] [Google Scholar]
50.Gibbins KJ, Silver RM, Pinar H, et al. Stillbirth, hypertensive disorders of pregnancy, and placental pathology. Placenta 2016;43:61–68. 10.1016/j.placenta.2016.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Roberts DJ, Post MD. The placenta in pre-eclampsia and intrauterine growth restriction. J Clin Pathol 2008;61(12):1254–1260. 10.1136/jcp.2008.055236 [DOI] [PubMed] [Google Scholar]
52.Faye-Petersen OM, Ernst LM. Maternal floor infarction and massive perivillous fibrin deposition. Surg Pathol Clin 2013;6(1):101–114. 10.1016/j.path.2012.10.002 [DOI] [PubMed] [Google Scholar]
53.Katzman PJ, Genest DR. Maternal floor infarction and massive perivillous fibrin deposition: histological definitions, association with intrauterine fetal growth restriction, and risk of recurrence. Pediatr Dev Pathol 2002;5(2): 159–164. 10.1007/s10024001-0195-y [DOI] [PubMed] [Google Scholar]
54.Romero R, Whitten A, Korzeniewski SJ, et al. Maternal floor infarction/massive perivillous fibrin deposition: a manifestation of maternal antifetal rejection? Am J Reprod Immunol 2013;70(4):285–298. 10.1111/aji.12143 [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Becroft DMO, Thompson JMD, Mitchell EA. Placental infarcts, intervillous fibrin plaques, and intervillous thrombi: incidences, cooccurrences, and epidemiological associations. Pediatr Dev Pathol 2004;7(1):26–34. 10.1007/s10024-003-4032-3 [DOI] [PubMed] [Google Scholar]
56.Redline RW. Extending the spectrum of massive perivillous fibrin deposition (maternal floor infarction). Pediatr Dev Pathol 2021;24(1):10–11. 10.1177/1093526620964353 [DOI] [PubMed] [Google Scholar]
57.Romero R, Kim YM, Pacora P, et al. The frequency and type of placental histologic lesions in term pregnancies with normal outcome. J Perinat Med 2018;46(6):613–630. 10.1515/jpm-2018-0055 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Basnet KM, Bentley-Lewis R, Wexler DJ, Kilic F, Roberts DJ. Prevalence of intervillous thrombi is increased in placentas from pregnancies complicated by diabetes. Pediatr Dev Pathol 2016;19(6):502–505. 10.2350/15-11-1734-OA.1 [DOI] [PubMed] [Google Scholar]
59.Goldstein JA, Nateghi R, Irmakci I, Cooper LAD. Machine learning classification of placental villous infarction, perivillous fibrin deposition, and intervillous thrombus. Placenta In revision. [DOI] [PMC free article] [PubMed]
60.Dobbs v. Jackson Women’s Health Organization United States Supreme Court. [Google Scholar]
61.Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous systems Published online 2015. https://www.tensorflow.org
62.Otsu N A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 1979;9(1):62–66. 10.1109/TSMC.1979.4310076 [DOI] [Google Scholar]
63.Tan M, Le Q. Efficientnetv2: smaller models and faster training. International Conference on Machine Learning. PMLR 2021:10096–10106.
64.Woo S, Debnath S, Hu R, et al. ConvNeXt V2: co-designing and scaling convnets with masked autoencoders Published online 2023. 10.48550/ARXIV.2301.00808 [DOI]
65.Riasatian A, Babaie M, Maleki D, et al. Fine-tuning and training of densenet for histopathology image representation using TCGA diagnostic slides. Med Image Anal 2021;70:102032. 10.1016/j.media.2021.102032 [DOI] [PubMed] [Google Scholar]
66.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–2830. [Google Scholar]
67.Liu FT, Ting KM, Zhou ZH. Isolation forest. 2008 Eighth IEEE International Conference on Data Mining IEEE; 2008:413–422. 10.1109/ICDM.2008.17 [DOI] [Google Scholar]
68.Liu Y, Kohlberger T, Norouzi M, et al. Artificial intelligenceebased breast cancer nodal metastasis detection: insights into the black box for pathologists. Arch Pathol Lab Med 2019;143(7):859–868. 10.5858/arpa.2018-0147-OA [DOI] [PubMed] [Google Scholar]
69.Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 2019;25(8):1301–1309. 10.1038/s41591-019-0508-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
70.da Silva LM, Pereira EM, Salles PG, et al. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol 2021;254(2):147–158. 10.1002/path.5662 [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Naito Y, Tsuneki M, Fukushima N, et al. A deep learning model to detect pancreatic ductal adenocarcinoma on endoscopic ultrasound-guided fine-needle biopsy. Sci Rep 2021;11(1):8454. 10.1038/s41598-021-87748-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Cibas ES, Ducatman BS. Cytology: Diagnostic Principles and Clinical Correlates 5th ed. Elsevier; 2021. [Google Scholar]
73.Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science 2019;363(6433):1287–1289. 10.1126/science.aaw4399 [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Foote A, Asif A, Azam A, Marshall-Cox T, Rajpoot N, Minhas F. Now you see it, now you dont: adversarial vulnerabilities in computational pathology Published online 2021. 10.48550/ARXIV.2106.08153 [DOI]
75.Korpihalkola J, Sipola T, Kokkonen T. Color-optimized one-pixel attack against digital pathology images. 2021 29th Conference of Open Innovations Association (FRUCT). IEEE. 2021:206–213. 10.23919/FRUCT52173.2021.9435562 [DOI] [Google Scholar]
76.DeGrave AJ, Janizek JD, Lee SI. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat Mach Intell 2021;3(7):610–619. 10.1038/s42256-021-00338-7 [DOI] [Google Scholar]
77.Baergen RN, Benirschke K. Manual of Benirschke and Kaufmann’s Pathology of the Human Placenta Springer; 2005. [Google Scholar]
78.Robboy SJ, Weintraub S, Horvath AE, et al. Pathologist workforce in the United States: I. Development of a predictive model to examine factors influencing supply. Arch Pathol Lab Med 2013;137(12):1723–1732. 10.5858/arpa.2013-0200-OA [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1959156-supplement-1.png^{(78.6KB, png)}

NIHMS1959156-supplement-2.png^{(155.7KB, png)}

NIHMS1959156-supplement-3.xlsx^{(16.6KB, xlsx)}

NIHMS1959156-supplement-4.xlsx^{(12.6KB, xlsx)}

NIHMS1959156-supplement-5.xlsx^{(12.8KB, xlsx)}

NIHMS1959156-supplement-6.xlsx^{(17.5KB, xlsx)}

Data Availability Statement

Whole-slide image data are available after execution of a data use agreement with Northwestern University.

[R1] 1.Steiner DF, Chen PC, Mermel CH. Closing the translation gap: AI applications in digital pathology. Biochim Biophys Acta Rev Cancer 2021;1875(1):188452. 10.1016/j.bbcan.2020.188452 [DOI] [PubMed] [Google Scholar]

[R2] 2.Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer 2021;124(4):686–696. 10.1038/s41416-020-01122-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.He Y, Zhao H, Wong STC. Deep learning powers cancer diagnosis in digital pathology. Comput Med Imaging Graph 2021;88: 101820. 10.1016/j.compmedimag.2020.101820 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Baxi V, Edwards R, Montalto M, Saha S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod Pathol 2022;35(1):23–32. 10.1038/s41379-021-00919-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Parwani AV. Next generation diagnostic pathology: use of digital pathology and artificial intelligence tools to augment a pathological diagnosis. Diagn Pathol 2019;14(1):138. 10.1186/s13000-019-0921-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Gadermayr M, Tschuchnig M. Multiple instance learning for digital pathology: a review on the state-of-the-art, limitations & future potential Published online June 9, 2022. Accessed September 28, 2022. http://arxiv.org/abs/2206.04425 [DOI] [PubMed]

[R7] 7.Lu MY, Chen TY, Williamson DFK, et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 2021;594(7861):106–110. 10.1038/s41586-021-03512-4 [DOI] [PubMed] [Google Scholar]

[R8] 8.Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 2021;5(6):555–570. 10.1038/s41551-020-00682-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Lipkova J, Chen TY, Lu MY, et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat Med 2022;28(3): 575–582. 10.1038/s41591-022-01709-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Ghaffari Laleh N, Muti HS, Loeffler CML, et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med Image Anal 2022;79: 102474. 10.1016/j.media.2022.102474 [DOI] [PubMed] [Google Scholar]

[R11] 11.Gephardt GN, Zarbo RJ. Extraneous tissue in surgical pathology: a College of American Pathologists Q-Probes study of 275 laboratories. Arch Pathol Lab Med 1996;120(11):1009–1014. [PubMed] [Google Scholar]

[R12] 12.Zarbo RJ. The unsafe archaic processes of tissue pathology. Am J Clin Pathol 2022;158(1):4–7. 10.1093/ajcp/aqac018 [DOI] [PubMed] [Google Scholar]

[R13] 13.Layfield LJ, Witt BL, Metzger KG, Anderson GM. Extraneous tissue: a potential source for diagnostic error in surgical pathology. Am J Clin Pathol 2011;136(5):767–772. 10.1309/AJCP4FFSBPHAU8IU [DOI] [PubMed] [Google Scholar]

[R14] 14.Carll T, Fuja C, Antic T, Lastra R, Pytel P. Tissue contamination during transportation of formalin-fixed, paraffin-embedded blocks. Am J Clin Pathol 2022;158(1):96–104. 10.1093/ajcp/aqac014 [DOI] [PubMed] [Google Scholar]

[R15] 15.Burke NG, McCaffrey D, Mackle E. Contamination of histology biopsy specimen—a potential source of error for surgeons: a case report. Cases J 2009;2: 7619. 10.1186/1757-1626-0002-0000007619 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Troxel DB. Trends in pathology malpractice claims. Am J Surg Pathol 2012;36(1):e1–e5. 10.1097/PAS.0b013e31823836bb [DOI] [PubMed] [Google Scholar]

[R17] 17.Naritoku WY, Alexander CB. Pathology milestones. J Grad Med Educ 2014;6(suppl 1):180–181. 10.4300/JGME-06-01s1-10 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.ACGME. Pathology Milestones Accessed April 3, 2023. https://www.acgme.org/specialties/pathology/milestones/

[R19] 19.Raciti P, Sue J, Ceballos R, et al. Novel artificial intelligence system increases the detection of prostate cancer in whole slide images of core needle biopsies. Mod Pathol 2020;33(10):2058–2066. 10.1038/s41379-020-0551-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Zhu S, Gilbert M, Chetty I, Siddiqui F. The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: an analysis of the characteristics and intended use. Int J Med Inform 2022;165: 104828. 10.1016/j.ijmedinf.2022.104828 [DOI] [PubMed] [Google Scholar]

[R21] 21.Schömig-Markiefka B, Pryalukhin A, Hulla W, et al. Quality control stress test for deep learning-based diagnostic model in digital pathology. Mod Pathol 2021;34(12):2098–2108. 10.1038/s41379-021-00859-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Zanjani FG, Zinger S, Piepers B, Mahmoudpour S, Schelkens P, With PHN de. Impact of JPEG 2000 compression on deep convolutional neural networks for metastatic cancer detection in histopathological images. J Med Imaging (Bellingham) 2019;6(2):027501. 10.1117/1.JMI.6.2.027501 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Chen Y, Janowczyk A, Madabhushi A. Quantitative assessment of the effects of compression on deep learning in digital pathology image analysis. JCO Clin Cancer Inform 2020;(4):221–233. 10.1200/CCI.19.00068 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Wang NC, Kaplan J, Lee J, Hodgin J, Udager A, Rao A. Stress testing pathology models with generated artifacts. J Pathol Inform 2021;12(1):54. 10.4103/jpi.jpi_6_21 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Wright AI, Dunn CM, Hale M, Hutchins GGA, Treanor DE. The effect of quality control on accuracy of digital pathology image analysis. IEEE J Biomed Health Inform 2021;25(2):307–314. 10.1109/JBHI.2020.3046094 [DOI] [PubMed] [Google Scholar]

[R26] 26.Pantanowitz L, Michelow P, Hazelhurst S, et al. A digital pathology solution to resolve the tissue floater conundrum. Arch Pathol Lab Med 2021;145(3): 359–364. 10.5858/arpa.2020-0034-OA [DOI] [PubMed] [Google Scholar]

[R27] 27.Khong TY, Mooney EE, Ariel I, et al. Sampling and definitions of placental lesions: Amsterdam Placental Workshop Group Consensus Statement. Arch Pathol Lab Med 2016;140(7):698–713. 10.5858/arpa.2015-0225-CC [DOI] [PubMed] [Google Scholar]

[R28] 28.Xiang J, Yan H, Li J, Wang X, Chen H, Zheng X. Transperineal versus transrectal prostate biopsy in the diagnosis of prostate cancer: a systematic review and meta-analysis. World J Surg Oncol 2019;17(1):31. 10.1186/s12957-019-1573-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Bhanji Y, Allaway MJ, Gorin MA. Recent advances and current role of transperineal prostate biopsy. Urol Clin North Am 2021;48(1):25–33. 10.1016/j.ucl.2020.09.010 [DOI] [PubMed] [Google Scholar]

[R30] 30.Roescher AM, Timmer A, Erwich JJHM, Bos AF. Placental pathology, perinatal death, neonatal outcome, and neurological development: a systematic review. PLoS One 2014;9(2):e89419. 10.1371/journal.pone.0089419 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Marchevsky AM, Walts AE, Lissenberg-Witte BI, Thunnissen E. Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability. Ann Diagn Pathol 2020;47:151561. 10.1016/j.anndiagpath.2020.151561 [DOI] [PubMed] [Google Scholar]

[R32] 32.Kramer MS, Chen MF, Roy I, et al. Intra- and interobserver agreement and statistical clustering of placental histopathologic features relevant to preterm birth. Am JObstet Gynecol 2006;195(6):1674–1679. 10.1016/j.ajog.2006.03.095 [DOI] [PubMed] [Google Scholar]

[R33] 33.Sun CCJ, Revell VO, Belli AJ, Viscardi RM. Discrepancy in pathologic diagnosis of placental lesions. Arch Pathol Lab Med 2002;126(6):706–709. 10.5858/2002-126-0706-DIPDOP [DOI] [PubMed] [Google Scholar]

[R34] 34.Redline RW, Vik T, Heerema-McKenney A, et al. Interobserver reliability for identifying specific patterns of placental injury as defined by the Amsterdam Classification. Arch Pathol Lab Med 2022;146(3):372–378. 10.5858/arpa.2020-0753-OA [DOI] [PubMed] [Google Scholar]

[R35] 35.Mukherjee A Pattern Recognition and Machine Learning as a Morphology Characterization Tool for Assessment of Placental Health. Dissertation 2021. 10.20381/RUOR-26948 [DOI]

[R36] 36.Khodaee A, Grynspan D, Bainbridge S, Ukwatta E, Chan ADC. Automatic placental distal villous hypoplasia scoring using a deep convolutional neural network regression model. 2022 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) IEEE; 2022:1–5. 10.1109/I2MTC48687.2022.9806589 [DOI] [Google Scholar]

[R37] 37.Kidron D, Vainer I, Fisher Y, Sharony R. Automated image analysis of placental villi and syncytial knots in histological sections. Placenta 2017;53: 113–118. 10.1016/j.placenta.2017.04.004 [DOI] [PubMed] [Google Scholar]

[R38] 38.Vanea C, Džigurski J, Rukins V, et al. HAPPY: a deep learning pipeline for mapping cell-to-tissue graphs across placenta histology whole slide images Posted online February 27, 2023. 10.1101/2022.11.21.517353 [DOI] [PMC free article] [PubMed]

[R39] 39.Mobadersany P, Cooper LAD, Goldstein JA. GestAltNet: aggregation and attention to improve deep learning of gestational age from placental whole-slide images. Lab Invest 2021;101(7):942–951. 10.1038/s41374-021-00579-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Clymer D, Kostadinov S, Catov J, et al. Decidual vasculopathy identification in whole slide images using multiresolution hierarchical convolutional neural networks. Am J Pathol 2020;190(10):2111–2122. 10.1016/j.ajpath.2020.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Shanes ED, Miller ES, Otero S, et al. Placental pathology after SARS-CoV-2 infection in the pre-variant of concern, alpha/gamma, delta, or omicron eras. Int J Surg Pathol 2023;31(4):387–397. 10.1177/10668969221102534 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Shanes ED, Mithal LB, Otero S, Azad HA, Miller ES, Goldstein JA. Placental pathology in COVID-19. Am J Clin Pathol 2020;154(1):23–32. 10.1093/ajcp/aqaa089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Conde-Agudelo A, Romero R. SARS-CoV-2 infection during pregnancy and risk of preeclampsia: a systematic review and meta-analysis. Am J Obstet-Gynecol 2022;226(1):68–89.e3. 10.1016/j.ajog.2021.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Christians JK, Grynspan D. Placental villous hypermaturation is associated with improved neonatal outcomes. Placenta 2019;76:1–5. 10.1016/j.placenta.2019.01.012 [DOI] [PubMed] [Google Scholar]

[R45] 45.Leavey K, Benton SJ, Grynspan D, Bainbridge SA, Morgen EK, Cox BJ. Gene markers of normal villous maturation and their expression in placentas with maturational pathology. Placenta 2017;58:52–59. 10.1016/j.placenta.2017.08.005 [DOI] [PubMed] [Google Scholar]

[R46] 46.Jaiman S, Romero R, Pacora P, et al. Placental delayed villous maturation is associated with evidence of chronic fetal hypoxia. J Perinat Med 2020;48(5): 516–518. 10.1515/jpm-2020-0014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Blair E, de Groot J, Nelson KB. Placental infarction identified by macroscopic examination and risk of cerebral palsy in infants at 35 weeks of gestational age and over. Am J Obstet Gynecol 2011;205(2):124.e1–124.e7. 10.1016/j.ajog.2011.05.022 [DOI] [PubMed] [Google Scholar]

[R48] 48.Vinnars MT, Vollmer B, Nasiell J, Papadogiannakis N, Westgren M. Association between cerebral palsy and microscopically verified placental infarction in extremely preterm infants. Acta Obstet Gynecol Scand 2015;94(9):976–982. 10.1111/aogs.12688 [DOI] [PubMed] [Google Scholar]

[R49] 49.Vinnars MT, Nasiell J, Ghazi S, Westgren M, Papadogiannakis N. The severity of clinical manifestations in preeclampsia correlates with the amount of placental infarction. Acta Obstet Gynecol Scand 2011;90(1):19–25. 10.1111/j.1600-0412.2010.01012.x [DOI] [PubMed] [Google Scholar]

[R50] 50.Gibbins KJ, Silver RM, Pinar H, et al. Stillbirth, hypertensive disorders of pregnancy, and placental pathology. Placenta 2016;43:61–68. 10.1016/j.placenta.2016.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Roberts DJ, Post MD. The placenta in pre-eclampsia and intrauterine growth restriction. J Clin Pathol 2008;61(12):1254–1260. 10.1136/jcp.2008.055236 [DOI] [PubMed] [Google Scholar]

[R52] 52.Faye-Petersen OM, Ernst LM. Maternal floor infarction and massive perivillous fibrin deposition. Surg Pathol Clin 2013;6(1):101–114. 10.1016/j.path.2012.10.002 [DOI] [PubMed] [Google Scholar]

[R53] 53.Katzman PJ, Genest DR. Maternal floor infarction and massive perivillous fibrin deposition: histological definitions, association with intrauterine fetal growth restriction, and risk of recurrence. Pediatr Dev Pathol 2002;5(2): 159–164. 10.1007/s10024001-0195-y [DOI] [PubMed] [Google Scholar]

[R54] 54.Romero R, Whitten A, Korzeniewski SJ, et al. Maternal floor infarction/massive perivillous fibrin deposition: a manifestation of maternal antifetal rejection? Am J Reprod Immunol 2013;70(4):285–298. 10.1111/aji.12143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Becroft DMO, Thompson JMD, Mitchell EA. Placental infarcts, intervillous fibrin plaques, and intervillous thrombi: incidences, cooccurrences, and epidemiological associations. Pediatr Dev Pathol 2004;7(1):26–34. 10.1007/s10024-003-4032-3 [DOI] [PubMed] [Google Scholar]

[R56] 56.Redline RW. Extending the spectrum of massive perivillous fibrin deposition (maternal floor infarction). Pediatr Dev Pathol 2021;24(1):10–11. 10.1177/1093526620964353 [DOI] [PubMed] [Google Scholar]

[R57] 57.Romero R, Kim YM, Pacora P, et al. The frequency and type of placental histologic lesions in term pregnancies with normal outcome. J Perinat Med 2018;46(6):613–630. 10.1515/jpm-2018-0055 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] 58.Basnet KM, Bentley-Lewis R, Wexler DJ, Kilic F, Roberts DJ. Prevalence of intervillous thrombi is increased in placentas from pregnancies complicated by diabetes. Pediatr Dev Pathol 2016;19(6):502–505. 10.2350/15-11-1734-OA.1 [DOI] [PubMed] [Google Scholar]

[R59] 59.Goldstein JA, Nateghi R, Irmakci I, Cooper LAD. Machine learning classification of placental villous infarction, perivillous fibrin deposition, and intervillous thrombus. Placenta In revision. [DOI] [PMC free article] [PubMed]

[R60] 60.Dobbs v. Jackson Women’s Health Organization United States Supreme Court. [Google Scholar]

[R61] 61.Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous systems Published online 2015. https://www.tensorflow.org

[R62] 62.Otsu N A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 1979;9(1):62–66. 10.1109/TSMC.1979.4310076 [DOI] [Google Scholar]

[R63] 63.Tan M, Le Q. Efficientnetv2: smaller models and faster training. International Conference on Machine Learning. PMLR 2021:10096–10106.

[R64] 64.Woo S, Debnath S, Hu R, et al. ConvNeXt V2: co-designing and scaling convnets with masked autoencoders Published online 2023. 10.48550/ARXIV.2301.00808 [DOI]

[R65] 65.Riasatian A, Babaie M, Maleki D, et al. Fine-tuning and training of densenet for histopathology image representation using TCGA diagnostic slides. Med Image Anal 2021;70:102032. 10.1016/j.media.2021.102032 [DOI] [PubMed] [Google Scholar]

[R66] 66.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–2830. [Google Scholar]

[R67] 67.Liu FT, Ting KM, Zhou ZH. Isolation forest. 2008 Eighth IEEE International Conference on Data Mining IEEE; 2008:413–422. 10.1109/ICDM.2008.17 [DOI] [Google Scholar]

[R68] 68.Liu Y, Kohlberger T, Norouzi M, et al. Artificial intelligenceebased breast cancer nodal metastasis detection: insights into the black box for pathologists. Arch Pathol Lab Med 2019;143(7):859–868. 10.5858/arpa.2018-0147-OA [DOI] [PubMed] [Google Scholar]

[R69] 69.Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 2019;25(8):1301–1309. 10.1038/s41591-019-0508-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R70] 70.da Silva LM, Pereira EM, Salles PG, et al. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol 2021;254(2):147–158. 10.1002/path.5662 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] 71.Naito Y, Tsuneki M, Fukushima N, et al. A deep learning model to detect pancreatic ductal adenocarcinoma on endoscopic ultrasound-guided fine-needle biopsy. Sci Rep 2021;11(1):8454. 10.1038/s41598-021-87748-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] 72.Cibas ES, Ducatman BS. Cytology: Diagnostic Principles and Clinical Correlates 5th ed. Elsevier; 2021. [Google Scholar]

[R73] 73.Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science 2019;363(6433):1287–1289. 10.1126/science.aaw4399 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Foote A, Asif A, Azam A, Marshall-Cox T, Rajpoot N, Minhas F. Now you see it, now you dont: adversarial vulnerabilities in computational pathology Published online 2021. 10.48550/ARXIV.2106.08153 [DOI]

[R75] 75.Korpihalkola J, Sipola T, Kokkonen T. Color-optimized one-pixel attack against digital pathology images. 2021 29th Conference of Open Innovations Association (FRUCT). IEEE. 2021:206–213. 10.23919/FRUCT52173.2021.9435562 [DOI] [Google Scholar]

[R76] 76.DeGrave AJ, Janizek JD, Lee SI. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat Mach Intell 2021;3(7):610–619. 10.1038/s42256-021-00338-7 [DOI] [Google Scholar]

[R77] 77.Baergen RN, Benirschke K. Manual of Benirschke and Kaufmann’s Pathology of the Human Placenta Springer; 2005. [Google Scholar]

[R78] 78.Robboy SJ, Weintraub S, Horvath AE, et al. Pathologist workforce in the United States: I. Development of a predictive model to examine factors influencing supply. Arch Pathol Lab Med 2013;137(12):1723–1732. 10.5858/arpa.2013-0200-OA [DOI] [PubMed] [Google Scholar]

PERMALINK

Tissue Contamination Challenges the Credibility of Machine Learning Models in Real World Digital Pathology

Ismail Irmakci

Ramin Nateghi

Rujoi Zhou

Mariavittoria Vescovo

Madeline Saft

Ashley E Ross

Ximing J Yang

Lee AD Cooper

Jeffery A Goldstein

Abstract

Introduction

Tissue Contamination

Human Learning and Machine Learning

What Do We Know About Contaminants?

Systematic Contaminants

Placenta Models

Prostate Models

Materials and Methods

Data Set

Placentas

Table 1.

Decidual Arteriopathy Model

Gestational Age Model

Macroscopic Lesion Model

Prostate Adenocarcinoma Model

Table 2.

Model: Conceptual

Figure 1.

Feature Extraction

Attention Subnetwork

Classification Subnetwork

Contaminant Slides

Interpretation Model Using T-Distributed Stochastic Neighbor Embedding Plots

Isolation Forest

Results

Multiple Contaminants Interfere With Detection of Decidual Arteriopathy

Figure 2.

Contaminants Decrease the Accuracy of Gestational Age Estimation

Figure 3.

Blood Contaminant Causes Misclassification of Macroscopic Placental Tissue as Intervillous Thrombi

Contaminants Cause False Positives in a Prostate Cancer Detection Model

Figure 4.

All Contaminants Distract Attention in All Models

Figure 5.

Contaminants Occupy a Somewhat Distinct Location in Embedded Feature Space and Divert Pooled Features From the Decision Boundary

Figure 6.

Contaminants Present in the Test Set Receive High Attention and Alter Diagnoses

Figure 7.

Figure 8.

The Feature Extractor Fine Tuned Using Histology Images Resolves Some Aspects of Contamination

Discussion

Recitation

Comparison

Attention Is a Core Model Mechanism and Key Model Output

Considering Contaminants: Which Tissues?

Considering Contaminants: How Much Is It Realistic?

Next Steps: What Can Be Done?

Supplementary Material

Funding

Footnotes

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases