Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 13.
Published in final edited form as: Circ Heart Fail. 2024 Feb 13;17(2):e010950. doi: 10.1161/CIRCHEARTFAILURE.123.010950

Failing to Make the Grade: Conventional Cardiac Allograft Rejection Grading Criteria are Inadequate for Predicting Rejection Severity

Sara Arabayarmohammadi 1, Cai Yuan 2, Vidya Sankar Viswanathan 2, Priti Lal 3, Michael D Feldman 3, Pingfu Fu 4, Kenneth B Margulies 5, Anant Madabhushi 2,6,*, Eliot G Peyster 5,*
PMCID: PMC10940208  NIHMSID: NIHMS1958077  PMID: 38348670

Abstract

Background:

Cardiac allograft rejection is the leading cause of early graft failure and is a major focus of post-heart transplant patient care. While histologic grading of endomyocardial biopsy samples remains the diagnostic standard for acute rejection, this standard has limited diagnostic accuracy. Discordance between biopsy rejection grade and patient clinical trajectory frequently leads to both over-treatment of indolent processes and delayed treatment of aggressive ones, spurring the need to investigate the adequacy of the current histologic criteria for assessing clinically important rejection outcomes.

Methods:

N=2881 endomyocardial biopsy images were assigned a rejection grade label (high vs. low grade) and a clinical trajectory label (evident vs. silent rejection). Using an image analysis approach, n=370 quantitative morphology features describing the lymphocytes and stroma were extracted from each slide. Two models were constructed to compare the subset of features associated with rejection grades vs. those associated with clinical trajectories. A proof-of-principle machine learning pipeline – the Cardiac Allograft Rejection Evaluator (CARE) – was then developed to test the feasibility of identifying the clinical severity of a rejection event.

Results:

The histopathologic findings associated with conventional rejection grades differ substantially from those associated with clinically evident allograft injury. Quantitative assessment of a small set of well-defined morphologic features can be leveraged to more accurately reflect the severity of rejection, as compared to that achieved by ISHLT grades.

Conclusions:

Conventional endomyocardial samples contain morphologic information that enables accurate identification of clinically evident rejection events, and this information is incompletely captured by the current, guideline-endorsed, rejection grading criteria.

Keywords: Heart transplantation, Cardiac allograft rejection, Endomyocardial Biopsies, Computational image analysis, Machine learning, Cardiac pathology, ISHLT grading criteria, Histologic rejection grades, Rejection diagnosis, Rejection syndrome severity

Introduction

Cardiac allograft rejection (CAR) is the leading cause of allograft injury and loss in the first year after heart transplantation (HT)[1]–[4]. As a result, methods for detecting CAR have been a major focus of research since HT first became a viable clinical procedure. Despite evolving methods for periodic non-invasive CAR screening, the reference standard for CAR detection is still a histologic examination of endomyocardial biopsy tissue (EMB) to detect infiltrating immune cells[5]–[8]. HT EMBs are assigned International Society of Heart and Lung Transplantation (ISHLT) histologic grades 0R, 1R, 2R, or 3R corresponding to no rejection, mild rejection, moderate rejection, and severe rejection respectively[6]–[8]. These grades are based on a rough estimate of the number, extent, and invasiveness of sporadically distributed infiltrating lymphocyte foci, and cardiomyocyte injury. In clinical practice, the higher grades - 2R and 3R - are generally considered the threshold for providing CAR treatment with augmented immunosuppression.

The ISHLT grading criteria have been criticized since their inception due to limitations in both reliability and accuracy[6], [8]–[15]. Grading reliability, assessed as the agreement between pathologist graders, has been shown to be modest in repeated investigations, with published agreement rates in the 60-70% range[12], [13]. Diagnostic accuracy, assessed as the agreement between histologic CAR grade and clinical CAR severity, is also modest, with the majority of high-grade EMBs (2R/3R) occurring in patients without any evidence of allograft injury, and a large proportion of CAR events with evident allograft injury occurring in patients without high-grade EMBs[8]–[14]. Given the potential for harm both in over-treating benign conditions and in delaying treatment of significant rejection, the frequent critiques of ISHLT grading framework[5], [13], [15]–[20] seem well founded.

Recent efforts using artificial-intelligence (AI)-enabled methods to perform automated grading of standard, H&E-stained, digitized EMB pathology slides have shown promise for improving grading reliability[12], [21]. However, none of these grading tools have been shown to improve the diagnostic accuracy of the ISHLT grades which they assign. Prior work by our group utilizing high-plex immunofluorescence staining of EMBs demonstrated that clinically evident rejection events differ substantially from clinically silent rejection events with regards to immune cell contents, and that these differences are largely independent of ISHLT grade[5]. Although reliant on a more sophisticated staining method than is conventionally available for clinical CAR diagnosis, this work suggests that EMB samples contain biological information relevant for making more accurate CAR severity determinations. We hypothesize that the set of histologic features needed to reproduce ISHLT grading is distinct from the set of features needed to accurately identify CAR severity. In this manuscript, we describe the development of a digital pathology-based image analysis pipeline designed to take a holistic look at conventional EMB slides, computationally extracting distinct sets of ‘histologic biomarkers’: those associated with ISHLT grade versus those associated with clinical CAR severity.

Materials and Methods

The source code for the CARE image analysis pipeline is publicly available on http://github.com/sarayar/CARE. Additional data are available from the corresponding author on reasonable request.

Study Cohort and Design Overview:

The study cohort was selected from the transplant records at the Hospital of the University of Pennsylvania and consisted of biopsy events between 2007 and 2020. This cohort consisted of N=2900 archived hematoxylin and eosin (H&E) stained histology slides generated from transplant EMB tissue blocks obtained as part of routine clinical care. Access to clinical data and archival tissue was approved by the University of Pennsylvania Institutional Review Board, with waiver of consent authorized by 45 CFR 46.116(d) and 45 CFR 164.512(i).

For these retrospective cases, the ISHLT grade assigned to the slide by the clinical pathologist of record was used as the reference standard for rejection diagnosis (see Supplemental Table S1 for summary statistics of the pathology diagnoses assigned to study biopsies). These grades were further simplified by giving a binary grade label: “low-grade” rejection was defined as ISHLT 2004 consensus criteria histologic grades 0R or 1R, and “high-grade” rejection was defined as ISHLT cellular rejection grade 2R or 3R, or humoral grade pAMR-1(h+) or pAMR-2 (the two grades which require pathologic findings on H&E slides). From a clinical perspective, this dichotomy divides the population into a group that almost never receives augmented immunosuppression (low grade), and a group that typically does receive empiric increased immunosuppression[6]. For the same N=2900 EMB events, “clinically silent” and “clinically evident” rejection labels were assigned using previously published criteria for determining whether allograft injury was present (Table 1)[5]. Briefly, clinical metadata from within seven days before and after each EMB event was collected to determine the clinical trajectory corresponding to the EMB event. These data were derived from electronic health records, and include symptoms, physical exam findings, lab results, echocardiographic parameters, electrocardiogram findings, and invasive hemodynamic data.

Table 1.

Criteria for determining ‘clinically evident’ rejection trajectory

Admission to hospital for rejection treatment, along with at least 1 major or 2 minor criteria:
Major Criteria
Cardiac index ≤2.0 and use of inotropes
Absolute decrease in LVEF of ≥20%
Minor Criteria
Cardiac index ≤2.3, provided this represents a ≥20% decrease in cardiac index from baseline
Right atrial pressure >10 mm Hg or pulmonary capillary wedge pressure >18 mm Hg provided this represents a ≥40% increase from baseline
Absolute decrease in LVEF of ≥10% and to a level of ≤50%
New arrhythmia—atrial fibrillation, flutter, or ventricular arrhythmia
New low voltage ECG not due to pericardial effusion or pulmonary disease
Cardiac troponin elevated ≥3× the upper limit of normal and ≥3× the patient’s baseline, not due to coronary artery disease/graft vasculopathy
Documented diagnosis of increased LV wall thickness and an LV wall thickness increase of >2 mm from baseline value
Documented new or worsened right ventricular dysfunction by echo
Documented clinical signs or symptoms of rejection or heart failure:
    Sign = new gallop, new low pulse volumes, new rales.
    Symptoms = new or worsened dyspnea, orthopnea, and exercise intolerance documented by a provider as likely due to a cardiac cause.

ECG = electrocardiogram; LV = left ventricular; LVEF = left ventricular ejection fraction.

Reprinted from Peyster et al5

Finally, based on assigned ISHLT grades and clinical rejection trajectory labels, EMBs were either labeled ‘concordant’ if clinical and histologic findings matched (eg. clinically silent low-grade or clinically evident high-grade EMB event) or ‘discordant’ if they did not match (eg. clinically evident low-grade or clinically silent high-grade EMB events).

Digitized slides underwent quality control (QC) assessments using HistoQC, an open-source, digital pathology software for identifying artifacts and measuring slide quality[22]. HistoQC uses a combination of image metrics (e.g., intensity), features (e.g., edge detectors), and supervised classifiers (e.g., pen detection) to identify artifact-free regions on slide images that helped to identify and remove n=19 slides with low resolution and excessive artifacts from the study.

Finally, the cohort contained n=2274 concordant (n=134 evident high-grade and n=2140 silent low-grade) cases and n=471 discordant (n=171 evident low-grade and n=300 silent high-grade) cases. Figure 1 provides an overview of the study’s EMB distribution and utilization, while Figure 2 illustrates the CARE pipeline development and deployment workflow.

Figure 1.

Figure 1

A CONSORT diagram outlines the eligibility criteria and distribution of patients in this study.

Figure 2.

Figure 2

Illustration of Cardiac Allograft Rejection Evaluator (CARE) pipeline development, including image analysis, feature extraction, feature selection, model training, and prediction validation. The CARE pipeline involves automatic identification, quantitation of immune cell infiltrates, and automatic segmentation and characterization of stromal fibers. After feature extraction, features were evaluated and ranked based on the ability to discriminate between cardiac rejection trajectories. Features were added to the CARE models until optimal performance for differentiating “silent” vs “evident”, and “low-grade” vs “high-grade” was achieved on the training set. These features were subsequently ‘locked down into the final models and deployed on the validation set.

Every whole slide image passing quality control was then divided into 4096 x 4096 pixel (1024 x 1024 micron) image tiles for ‘CARE’ pipeline development and validation. In total, n=89,518 tiles were processed by the CARE pipeline.

Image Analysis:

The Image analysis workflow for CARE represents an extension of prior ‘feature engineered” pipelines designed for assigning rejection grades[12]. Our ‘hand-crafted’ feature extraction approach is built on fundamental cardiac histology principles, identifying clearly defined structures, fibers, cell types, and tissue regions, then conducting diverse measurements of these features. In addition to identifying, counting, and assessing the local environment around lymphocyte infiltrates as in prior rejection grading efforts, CARE is designed to also perform comprehensive analysis of interstitial and other stromal elements in the EMB sample. A brief overview of the technical methods and extracted features is provided below:

Lymphocyte detection and foci identification.

A stain color deconvolution algorithm was used on tiles of H&E-stained cardiac biopsy images to detect lymphocytes across tissue specimens. This method determines the color densities and surface areas stained with a specific color[23]. As was described in previous work[12], the approach involves first identifying lymphocyte clusters by performing disc-dilation and area thresholding and then identifying lymphocyte foci by aggregating the clusters using proximity graphs with individual lymphocytes acting as vertices. Subsequently, the edges of the graph are built within and between clusters based on thresholding of Euclidean distances.

Interstitial Fiber segmentation.

A local difference-local binary pattern (LD-LBP) operator combined with the OTSU algorithm[24] was used to segment interstitial fibers in three regions (endocardial, interstitial and myocardial replacement). LD-LBP is a texture operator that labels an image’s pixels by thresholding the magnitude relationship between the target pixel and the neighboring pixels[25]–[27], and OTSU is a thresholding algorithm that separates pixels into two classes by maximizing inter-class variance. Experienced cardiac pathologists visually confirmed the quality of interstitial fiber segmentation in this study. Figure S1 in the supplements shows an example of interstitial fiber segmentation.

Feature Extraction.

Features are designed to reflect a spatial pattern, arrangement of immune cells, interstitial fiber shape, orientation, distribution patterns, and heterogeneity. Each feature’s mean, median, standard deviation, and skewness were calculated across all tiles from a patient to arrive at a patient-level feature value. This process yielded a 370 domain-inspired, quantitative histologic feature vector for each patient, which encodes their associated immune cell and interstitial fiber presentation characteristics. These features pertained to two main categories of 154 immune cell features and 216 interstitial fiber features.

Data Analysis and Statistical Methods:

Feature Selection.

Feature selection and model development was conducted in a training subset (St) of n = 100 random EMB slides. St comprised of slides from every category including 1) evident low grade, 2) silent low grade, 3) evident high grade, and 4) silent high grade to arrive at a balanced training set of n = 400 randomly selected slides. The remaining n = 2345 cases were reserved for validation (Sv).

Experiment 1: Construction of CARE using pathological features most closely associated with the clinical trajectory

A univariate analysis of image features extracted by the CARE image analysis pipeline was performed in training set St. Exact binomial 95% confidence intervals (CIs) and odds ratio were calculated by using glm function in R, version 4.3.0. The variables with p-value less than 0.05 were introduced into a multivariable linear regression model for model derivation. The regression model was trained in training set St, with the top 5 most discriminating features which had optimal performance for predicting clinical trajectories comprising the final CARE model for making predictions of silent vs. evident rejection trajectories. The hyperparameters were all tuned using grid searching. we ultimately used “family=binomial()”, “alpha=1” which is LASSO penalty, and “link=”logit” as hyper parameters. CARE model performance was then validated on unseen data from validation set Sv. The use of strategies to select a few features that are mostly correlated with clinical trajectories helped avoid overfitting with CARE, as also the use of dedicated and independent training and validation sets.

The performance of CARE was assessed based on the percentage agreement of CARE outputs with the rejection trajectory labels assigned based on the criteria in table 1. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were also calculated.

Uniform manifold approximation and projection (UMAP) embedding was employed to provide an unsupervised evaluation of potential differential patterns between silent and evident CAR groups. After removing highly correlated features (where the Pearson correlation coefficient for the two features was more than 0.85) from the n=364 immune cell and stromal fiber features, the remaining 226 features were embedded and then plotted into two dimensions using the UMAP algorithm[28] to visualize the distribution of features between the silent and evident groups.

Experiment 2: Compare the constellation of morphologic biomarkers needed to predict ISHLT rejection grades vs. those needed to predict clinical rejection trajectories

This experiment was designed to assess which morphologic features as extracted by the CARE pipeline were most important for predicting clinical CAR trajectories and conventional ISHLT CAR grades. The goal was to investigate whether the set of morphologic features needed for optimal ISHLT grading prediction was distinct from those needed for optimal clinical rejection trajectory assessments.

500 iterations of 3-fold cross-validation on the combined set of EMBs (St+Sv, n= 2745) allowed for tracking and monitoring features’ importance through multiple independent observations (1500). The Wilcoxon rank-sum test method was applied across every iteration of 3-fold cross validation to identify the top features associated with clinically evident vs. silent disease (for rejection trajectory assessments) or for high-vs-low ISHLT grade (for conventional grading assessments). In each iteration, one of the three folds was held out and the top 10 features for binary classification were selected based on the other two folds of the dataset. At the end of each iteration the data were shuffled. After the last iteration, the frequency of every feature appearing in the list of top features was measured iteratively to assess feature importance for the desired binary classification task. After model development for both classification tasks (clinical trajectories and ISHLT grading), we compared the importance of every feature in terms of its association with different endpoints (trajectories vs grades).

Experiment 3: Assess the performance of classification models designed to predict ISHLT rejection grades vs. clinical rejection trajectories:

Given the recent development of digital pathology grading systems and given the concerns over the accuracy of ISHLT grading, this experiment was designed to test whether a digital pathology pipeline optimized for grading can provide accurate assessments of rejection trajectories and vice-versa.

Using analogous methods to those described for CARE model derivation in Experiment 1, a multivariable logistic regression classifier (Mgrd) was trained on St to predict “high” (ACR>2R or pAMR>1) vs. “low” (ACR<1R and pAMR=0) ISHLT histologic rejection grades. The hyper parameters were “family=binomial()”, “alpha=1” which is LASSO penalty, and “link=”logit”. The Mgrd grading model was then tested to assess performance at predicting clinical rejection trajectories (i.e. “silent” vs. “evident” rejection) in validation set Sv. In a separate experiment, the CARE pipeline – optimized for predicting clinical trajectory as described in Experiment 1 – was tested to assess performance at predicting low vs. high rejection grades in Sv. AUC, Accuracy, sensitivity, and specificity were calculated at the optimal operating point of the ROC curve, defined as the threshold which maximized overall accuracy.

Results

Experiment 1: Validation of CARE for predicting clinical trajectories

The five pathological features that were associated with clinical trajectories and were selected on St are listed in Table 2. The results from the univariate analysis for initial feature ranking is summarized in supplemental Table S2, while the results of the logistic regression model derivation are summarized in Table 3. In validation set SV, the CARE achieved an agreement of 86% with clinical rejection trajectory, an AUC of 0.81, sensitivity of 0.68, and specificity of 0.81. Results for CARE performance are summarized in Figure 3A.

Table 2.

Predictive features for differentiating ‘clinically silent’ from ‘clinically evident’ rejection trajectories

No. Feature Description
1 Endocardial stroma solidity quantifies area fraction of the interstitial fibers in the endocardial region as compared to their convex hull. For any convex fiber, the solidity is 1.
2 Interstitial stroma eccentricity shows how “un-circular” the interstitial fibers are. A circle has an eccentricity of zero, and bigger eccentricities are less curved.
3 Lymphocyte foci count captures the sum of proximity graphs that group lymphocytic clusters across WSI.
4 Lymphocyte area ratio captures the area covered by lymphocyte nuclei in lymphocyte foci divided by tissue area.
5 Lymphocyte foci count in myocardium captures the sum of proximity graphs that group lymphocytic clusters only in myocardium region.

Table 3.

Multivariate regression analysis in predicting rejection trajectories for combination of immune cell and stromal fiber features in St.

No. Covariate (Feature) Odds Ratio (OR) 95% CI p-value
1 Endocardial stroma solidity 0.674 0.490 0.926 0.0149
2 Interstitial stroma eccentricity 1.843 1.328 2.558 0.0002
3 Lymphocyte foci count 1.060 1.006 1.117 0.0298
4 Lymphocyte area ratio 1.738 1.284 2.352 0.0003
5 Lymphocyte foci count in myocardium 0.941 0.889 0.996 0.0366

Figure 3.

Figure 3

The results from the first and third experiments conducted in this study. A. The precision-recall curve, ROC and confusion matrix show the CARE’s performance for predicting clinical CAR trajectories in large validation set Sv, achieving good discrimination with an AUC of 0.81. B. The ROC and confusion matrix show the CARE’s performance to predict ISHLT rejection grades in Sv – a task at which negligible predictive performance was achieved (AUC 0.59). C. The ROC and confusion matrix for the Mgrd predictive model, trained to predict conventional ISHLT rejection grades, when performance is tested in Sv for ability to predict clinical trajectories. Negligible predictive performance was observed, with an AUC of 0.49. D. The UMAP plot for low-grade cases using interstitial fiber and immune cell features. E. The UMAP plot for high-grade patients using interstitial fiber and immune cell features.

The results of UMAP embedding, illustrated in Figure 3 (panels D, E), suggest that unsupervised plotting of CARE-derived morphologic features yields segregation of silent and evident clinical rejection cases. Note in Figure 3D the clinically discordant, serious low-grade biopsies almost exclusively occupy the margins of the low-grade cluster, representing true ‘edge’ cases in both a literal and figurative sense. And, note in Figure 3E the bulk of the clinically serious, high grade cases on the right hand plot (the ‘high-grade’ plot) occupy the upper and right-most portions of the figure. Taken together, the UMAP results reinforce the central premise of the paper: that there are morphologic differences, even within grade groups, between clinically silent and clinically serious rejection.

Qualitatively, Figure 4 illustrates the discriminability of endocardial stroma solidity, density and interstitial stroma eccentricity features for representative silent and evident low-grade patients. The values for the solidity feature are higher in evident low-grade patients compared to silent low-grade patients.

Figure 4.

Figure 4

The qualitative image shows the differences between the evident high-grade and silent high-grade groups in terms of the stromal fiber features. In A, and B, the endocardial stroma solidity and density is visualized on two random samples for clinically evident and clinically silent cases, respectively. As the figure illustrates, the stromal fibers in the endocardial region are more dispersed and shorter/more convex in an evident case compared to a silent case. In C, and D, the interstitial stroma eccentricity is shown on evident and silent cases respectively. The stromal fibers appear to have less eccentricity and are closer in appearance to a circle in silent cases compared to evident.

Experiment 2: Comparing Importance of Features in terms of their association with ISHLT Grade vs. Clinical Rejection Trajectory

A comparison of the feature-importance scores for the morphologic predictors most strongly associated with ISHLT grades vs. clinical rejection trajectories reveals substantial differences between these two prediction tasks. As demonstrated in Figure 5, both stromal and immune cell features were important in both prediction tasks, though the profile and distribution of the most important features are quite distinct. While there were some features in common (mostly pertaining to stromal fiber features as seen at the very top of Figure 5), among the top 66 total features that appeared in the final list of most important features for predicting ISHLT grades and clinical rejection trajectories, 15 (22%) were common to both. This means that more than 50% of the features needed for each prediction task are exclusive to that task (70% were exclusive to predicting trajectories and 53% exclusive to predicting rejection grades), highlighting the importance of carefully calibrating morphologic analysis to the specific question of interest.

Figure 5.

Figure 5

The vertical axis corresponds to each of the N=364 stromal fiber and immune cell morphologic features extracted by CARE. The horizontal axis shows the importance of each feature for predicting “evident” versus “silent” clinical rejection trajectory (blue bars on the Right) and “high-“versus. “low-ISHLT” grade (pink bars on the Left). It is evident that a different constellation of morphologic features with high importance for predicting conventional rejection grades differs substantially from those with high importance for predicting clinical rejection trajectory.

Experiment 3: An automated digital pathology analysis pipeline optimized to predict ISHLT grades cannot predict clinical rejection trajectories:

The Mgrd model, trained to optimize ISHLT grade prediction, failed to achieve meaningful performance when attempting to predict clinical trajectories in the large validation set Sv. Specifically, Mgrd achieved an accuracy of 0.48, AUC of 0.48, sensitivity of 0.50, and specificity of 0.48 in distinguishing silent vs. evident clinical rejection trajectories. In the partner experiment, the CARE model, trained to optimize prediction of clinical rejection trajectory, demonstrated a similar phenomenon with poor performance for predicting ISHLT grades. CARE achieved an accuracy of 0.56, AUC of 0.59, sensitivity of 0.66, and specificity of 0.55 in distinguishing low vs. high rejection grades in Sv. Results are summarized in Figure 3, panels B, and C.

Discussion

In this manuscript, we sought to explore whether there were meaningful differences in the morphologic features needed to predict ISHLT rejection grades vs. those needed to predict the clinical severity of the rejection syndrome. Using a novel cohort design and advanced digital pathology image analysis methods, we developed the CARE pipeline to test this question. In Experiment 1, we confirm the existence of distinct a set of ‘allograft injury’ morphologic features which can used to assess the clinical severity of a rejection event. In subsequent experiments, the CARE pipeline proved far better at assessing clinical rejection severity than an alternative digital pathology pipeline which was optimized to assign ISHLT rejection grades.

A key conclusion from the present research is that the ISHLT rejection grading framework may be fundamentally ill-suited to generating clinically accurate rejection diagnoses. In the context of allograft rejection, accuracy can reasonably be defined as concordance between EMB-based diagnoses and clinical metrics of allograft injury[5], [11]. Numerous theories have been proposed to explain clinical-histologic discordance, though despite this long record of publications ([5], [10], [11], [29]), published research exploring specific solutions has been scarce.

It has been theorized that early detection of clinically silent/high-grade rejection, with subsequent early treatment, might mitigate the development of an impending overt clinical CAR. While plausible, this theory would not explain the existence of clinically evident, low-grade, rejection events[14], [15], since no confounding, early intervention is ever administered in these cases due to the falsely reassuring initial histologic grade. It has also been theorized that sampling error during biopsy procedures could affect the diagnostic accuracy of ISHLT grading, based on the fact that lymphocyte foci can be heterogeneously distributed in tissues. While also plausible, our prior work utilizing quantitative high-plex immunofluorescence staining of transplant EMBs calls into question the true impact of sampling error. In this prior research, the cellular environment of clinically evident rejection differed substantially from clinically silent rejection, with these differences conferring risk in a grade-independent manner[5]. Thus, while sampling error may indeed affect ISHLT grading, there may still exist tissue-level findings capable of predicting rejection severity if sufficiently sophisticated methods are used.

Given that neither early detection nor sampling related errors fully explain the discordance between ISHLT grades and clinical severity, it is important to consider a third possibility: that the histologic findings which currently comprise the ISHLT grading schema simply do not include all the relevant findings needed to accurately identify the clinical severity of a CAR event. The results from Experiments 2 and 3 strongly support this theory. Experiment 2 showed substantial differences in the sets of morphologic features most closely associated with an EMB’s rejection grade vs. those most closely associated with the patient’s clinical rejection syndrome. Experiment 3 takes this finding one step further, showing that the sets of features associated with rejection grade and rejection severity are not only distinct, but that the overlap between them is insufficient to enable ‘cross-over’ prediction. In Experiment 3, the digital pathology model optimized to predict ISHLT grades cannot provide reliable clinical rejection assessments (AUC: 0.48), while the CARE model – optimized to predict clinical trajectory – is similarly unable to provide reliable ISHLT grade prediction (AUC: 0.59).

The findings in this manuscript have important implications for both transplant clinicians and for the burgeoning application of digital pathology methods for EMB analysis. Most prior research utilizing digital pathology in heart transplantation has focused on recapitulating ISHLT rejection grades[12], [21]. While these prior publications have been successful from a technical perspective, they are limited by their shared focus on reproducing a flawed and unreliable diagnostic metric. Although ISHLT grades are the most readily apparent and easily collated acute rejection outcome, they are not the most clinically relevant. This study differs from previous efforts by focusing on a undeniably important clinical outcome: the presence of evident allograft injury in patients.

Our proof-of-principle CARE pipeline not only demonstrates the feasibility of identifying such a translationally relevant outcome, but also highlights the importance of experimental design in conducting impactful computational pathology research. The ISHLT criteria consist of a small set of morphologic findings, chosen for ease-of-use and prioritized based on a-priori assumptions about which findings should be important. Attempts to reproduce ISHLT grades needlessly limit computational pathology models to considering only the small fraction of morphologic features which overlap with the conventional, simplistic, and potentially biased criteria. Our ‘outcome-first’ approach, enabled by a carefully phenotyped cohort, allows us to fully leverage the comprehensive morphologic data generated by computational pathology methods. In this regard, the present research is most similar to another recent publication in which a similar ‘outcome-first’ framework was used as the basis for selecting morphologic features associated with the important clinical outcome of allograft vasculopathy[30].

The CARE pipeline utilizes a ‘hand-crafted’ feature extraction method in which morphologic features are intentionally selected based on expert knowledge of cardiac pathobiology. In contrast to more opaque deep learning methods[21], this hand-crafted approach permits a direct, readily interpretable, assessment of the morphologic processes which predict allograft injury. Although lymphocyte features describing number, density, and distribution of infiltrating cells clearly have a role in identifying rejection trajectory (features partially captured by conventional ISHLT grading), two of the top predictive features comprising CARE do not involve lymphocytes. These non-lymphocyte features pertained to both the endocardial and interstitial stroma. Specifically, stromal fibers appear to be more stretched/straightened and less tightly packed in clinically evident rejection, possibly representing mass-effect from infiltrating cells, myocyte loss, and/or tissue edema. The concepts of quantifying myocyte death/damage, tissue edema, and the relative quantity of infiltrating lymphocytes in a given tissue area are not novel in-and-of-themselves, having been described in prior published descriptions of allograft histopathology[6]–[8], [31]. However, quantitatively (and hence, reliably) measuring these findings and appropriately weighting their contribution to risk is only feasible with a hand-crafted computational pathology approach .

This study has limitations which merit discussion. Although the cohort was quite large, the slides with evident symptoms (evident low-grades and evident high-grade cases) were relatively few, resulting in an imbalanced dataset. Although this reflects real-world incidence (a potential strength of the cohort), the relatively small number of evident rejection events, and low-grade evident rejection events in particular, represents a challenge in both model development and in the strength of the validation results for these cases. An additional limitation of the study cohort is that all study slides originated from a single site. A larger multicenter cohort will be necessary for future definitive CARE model development and validation. A final cohort limitation worth considering is the assignment of the ‘silent high-grade’ label to study EMBs. Due to usual clinical practice based on established transplant guidelines[8], patients with high-histologic grades as defined in this study typically receive some form of altered immunosuppression, even in the absence of any clinical signs of disease. Because of this empiric treatment, there is no way to be certain that these events would never have manifested clinical signs/symptoms of evident rejection at some point in the coming days or weeks. However, we can say with certainty that neither testing nor documentation within 1-week of EMB met criteria for evident rejection, and thus, these cases are either less severe or further removed from overt allograft injury than any of the cases classified as ‘evident’ in this study. Regarding our image analysis approach, although the inclusion of novel, quantitative stromal features assisted in evaluating clinical rejection trajectory, the CARE pipeline does not currently offer an exhaustive assessment of allograft micro-architecture. Further enhancement of the CARE pipeline with incorporation of additional morphologic elements (e.g., myocyte nuclear features, cell-size/shape parameters, vascular features, and more elaborate spatial interactions) could potentially further boost predictive performance[32], [33].

Conclusions

This research represents an important step toward precision diagnosis in heart transplantation. Using the CARE automated digital pathology analysis pipeline, we conclusively demonstrate that ISHLT rejection grading based on a limited assessment of immune cell presence and distribution is insufficient for accurately identifying the clinical severity of an alloimmune reaction. Using the same CARE pipeline and conventionally prepared EMB slides, we also show that incorporation of additional morphologic features can improve clinical rejection severity assessments. Future efforts aimed at maximally extracting the biologically relevant data contained within clinical EMB histology slides is warranted to fulfill the translational potential of computational pathology analysis tools.

Supplementary Material

Supplemental Publication Material

Clinical Perspective:

What is New?

  • The current standard for diagnosing cardiac allograft rejection – manual histologic grading of cardiac biopsy samples – relies on a small set of histopathology findings that often do not reflect the clinical severity of the patient-level rejection syndrome.

  • A novel, quantitative, computational pathology approach which extracts a more comprehensive set of cardiac histopathology parameters provides improved discrimination between clinically serious and clinically silent rejection syndromes.

What are the Clinical Implications?

  • The lack of a true ‘gold-standard’ for cardiac allograft rejection diagnosis remains a barrier to timely and appropriate patient care, with potential for both overtreatment and undertreatment

  • Quantitative computational pathology can improve the diagnostic accuracy of histologic rejection assessments and has the potential to the histologic changes which confer the greatest risk of imminent allograft injury.

  • Such a “gold-standard” approach would provide direct patient-care benefits while also providing a better reference standard for calibrating and validating emerging, non-invasive diagnostic methods

Sources of Funding:

Research reported in this publication was supported by the National Heart, Lung and Blood Institute of the National Institutes of Health under award numbers K08HL159344 – 01 to E.G. P. and R01HL151277-01A1 to K.B.M. and A.M.,by the W.W. Smith Charitable Trust to E.G.P, and the Bogle Family Foundation to E.G.P.

Disclosures:

Dr. Feldman is an equity holder and has technology licensed to both Elucid Bioimaging and Inspirata Inc. Dr. Feldman is a scientific advisory consultant for Inspirata Inc. and sits on its scientific advisory board. Dr. Feldman is also a consultant for Phillips Healthcare, XFIN, and Virbio. Dr. Margulies holds research grants from Amgen and serves as a scientific consultant/advisory board member for Bristol Myers Squibb. Dr. Madabhushi is an equity holder in Elucid Bioimaging, Inspirata Inc, and Picture Health. In addition, he has served as a scientific advisory board member for Picture Health, SimbioSys, and Aiforia Inc. He also has sponsored research agreements with Astrazeneca, Bristol Myers Squibb, Boehringer-Ingelheim and Eli-Lilly. His technology has been licensed to Elucid Bioimaging and Picture Health. He is also involved in three different NIH R01 grants with Inspirata Inc.

Non-standard Abbreviations and Acronyms

CAR

Cardiac allograft rejection

EMB

Endomyocardial biopsy tissue

ISHLT

International Society of Heart and Lung Transplantation

HT

Heart transplantation

CARE

Cardiac Allograft Rejection Evaluator

AI

Artificial intelligence

H&E

Hematoxylin and eosin

QC

Quality control

LD-LBP

A local difference local binary pattern

CI

Confidence interval

AUC

Area under the receiver operating characteristic curve

UMAP

Uniform manifold approximation and projection

References:

  • [1].Lund LH, Edwards LB, Kucheryavaya AY, Benden C, Christie JD, Dipchand AI, Dobbels F, Goldfarb SB, Levvey BJ, Meiser B, et al. “The registry of the international society for heart and lung transplantation: Thirty-first official adult heart transplant report - 2014; Focus theme: Retransplantation,” in Journal of Heart and Lung Transplantation, Oct. 2014, vol. 33, no. 10, pp. 996–1008, doi: 10.1016/j.healun.2014.08.003. [DOI] [PubMed] [Google Scholar]
  • [2].Kobashigawa J, Mancini D, Sørensen K, Hummel M, Lind JM, Abeywickrama KH, Bernhardt P, “Everolimus for the Prevention of Allograft Rejection and Vasculopathy in Cardiac- Transplant Recipients,” pp. 847–858, 2003. [DOI] [PubMed] [Google Scholar]
  • [3].Kobashigawa JA, Miller LW, Russell SD, Ewald GA, Zucker MJ, Goldberg LR, Eisen HJ, Salm K, Tolzman D, Gao J, Fitzsimmons W, et al. “Tacrolimus with Mycophenolate Mofetil (MMF) or Sirolimus vs. Cyclosporine with MMF in Cardiac Transplant Patients: 1-Year Report,” Am. J. Transplant, vol. 6, no. 6, pp. 1377–1386, Jun. 2006, doi: 10.1111/j.1600-6143.2006.01290.x. [DOI] [PubMed] [Google Scholar]
  • [4].Patel JK, & Kobashigawa JA, “Should we be doing routine biopsy after heart transplantation in a new era of anti-rejection?,” Curr. Opin. Cardiol, vol. 21, no. 2, pp. 127–131, 2006, doi: 10.1097/01.hco.0000210309.71984.30. [DOI] [PubMed] [Google Scholar]
  • [5].Peyster EG, Wang C, Ishola F, Remeniuk B, Hoyt C, Feldman MD, & Margulies KB, “In Situ Immune Profiling of Heart Transplant Biopsies Improves Diagnostic Accuracy and Rejection Risk Stratification,” Basic to Transl. Sci, vol. 5, no. 4, pp. 328–340, Apr. 2020, doi: 10.1016/j.jacbts.2020.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Costanzo MR, Dipchand A, Starling R, Anderson A, Chan M, Desai S, Fedson S, Fisher P, Gonzales-Stawinski G, Martinelli L, et al. “The International Society of Heart and Lung Transplantation Guidelines for the care of heart transplant recipients,” J. Hear. lung Transplant, vol. 29, no. 8, pp. 914–956, 2010, doi: 10.1016/j.healun.2010.05.034. [DOI] [PubMed] [Google Scholar]
  • [7].Berry GJ, Angelini A, Burke MM, Bruneval P, Fishbein MC, Hammond E, Miller D, Neil D, Revelo MP, Rodriguez ER, et al. “The ISHLT working formulation for pathologic diagnosis of antibody-mediated rejection in heart transplantation: evolution and current status (2005--2011),” J. Hear. lung Transplant, vol. 30, no. 6, pp. 601–611, 2011. [DOI] [PubMed] [Google Scholar]
  • [8].Stewart S, Winters GL, Fishbein MC, Tazelaar HD, Kobashigawa J, Abrams J, Andersen CB, Angelini A, Berry GJ, Burke MM, et al. “Revision of the 1990 working formulation for the standardization of nomenclature in the diagnosis of heart rejection,” in Journal of Heart and Lung Transplantation, Nov. 2005, vol. 24, no. 11, pp. 1710–1720, doi: 10.1016/j.healun.2005.03.019. [DOI] [PubMed] [Google Scholar]
  • [9].Pham MX, Teuteberg JJ, Kfoury AG, Starling RC, Deng MC, Cappola TP, Kao A, Anderson AS, Cotts WG, Ewald GA, et al. “Gene-Expression Profiling for Rejection Surveillance after Cardiac Transplantation,” N. Engl. J. Med, vol. 362, no. 20, pp. 1890–1900, 2010, doi: 10.1056/nejmoa0912965. [DOI] [PubMed] [Google Scholar]
  • [10].Dandel M, Hummel M, Meyer R, Müller J, Kapell S, Ewert R, & Hetzer R, “Left ventricular dysfunction during cardiac allograft rejection: early diagnosis, relationship to the histological severity grade, and therapeutic implications,” in Transplantation proceedings, 2002, vol. 6, no. 34, pp. 2169–2173. [DOI] [PubMed] [Google Scholar]
  • [11].Kobashigawa JA, “The search for a gold standard to detect rejection in heart transplant patients: are we there yet?,” Circulation, vol. 135, no. 10. Am Heart Assoc, pp. 936–938, 2017. [DOI] [PubMed] [Google Scholar]
  • [12].Peyster EG, Arabyarmohammadi S, Janowczyk A, Azarianpour-Esfahani S, Sekulic M, Cassol C, Blower L, Parwani A, Lal P, Feldman MD, et al. “An automated computational image analysis pipeline for histological grading of cardiac allograft rejection,” Eur. Heart J, vol. 42, no. 24, pp. 2356–2369, Jun. 2021, doi: 10.1093/EURHEARTJ/EHAB241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Crespo-Leiro MG, Zuckermann A, Bara C, Mohacsi P, Schulz U, Boyle A, Ross HJ, Parameshwar J, Zakliczyński M, Fiocchi R, et al. “Concordance among pathologists in the second cardiac allograft rejection gene expression observational study (CARGO II),” Transplantation, vol. 94, no. 11, pp. 1172–1177, Dec. 2012, doi: 10.1097/TP.0B013E31826E19E2. [DOI] [PubMed] [Google Scholar]
  • [14].Tang Z, Kobashigawa J, Rafiei M, Stern LK, & Hamilton M, “The natural history of biopsy-negative rejection after heart transplantation,” J. Transplant, vol. 2013, pp. 1–6, 2013, doi: 10.1155/2013/236720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Fishbein MC, & Kobashigawa J, “Biopsy-negative cardiac transplant rejection: Etiology, diagnosis, and therapy,” Curr. Opin. Cardiol, vol. 19, no. 2, pp. 166–169, 2004, doi: 10.1097/00001573-200403000-00018. [DOI] [PubMed] [Google Scholar]
  • [16].Duong Van Huyen JP, Fedrigo M, Fishbein GA, Leone O, Neil D, Marboe C, Peyster E, von der Thüsen J, Loupy A, Mengel M, Revelo MP, et al. “The XVth Banff Conference on Allograft Pathology the Banff Workshop Heart Report: Improving the diagnostic yield from endomyocardial biopsies and Quilty effect revisited,” Am. J. Transplant, vol. 20, no. 12, pp. 3308–3318, Dec. 2020, doi: 10.1111/AJT.16083. [DOI] [PubMed] [Google Scholar]
  • [17].Rodriguez ER, “The pathology of heart transplant biopsy specimens: revisiting the 1990 ISHLT working formulation,” J. Hear. lung Transplant, vol. 22, no. 1, pp. 3–15, 2003. [DOI] [PubMed] [Google Scholar]
  • [18].Winters G, “The challenge of endomyocardial biopsy interpretation in assessing cardiac allograft rejection,” Curr Opin Cardiol, vol. 12, no. 2, pp. 146–52, 1997. [DOI] [PubMed] [Google Scholar]
  • [19].Winters GL, Marboe CC, and M. E. B., “The International Society for Heart and Lung Transplantation grading system for heart transplant biopsy specimens: clarification and commentary.,” J Hear. Lung Transpl, vol. 17, no. 8, pp. 754–760, 1998. [PubMed] [Google Scholar]
  • [20].Marboe CC, Billingham M, Eisen H, Deng MC, Baron H, Mehra M, Hunt S, Wohlgemuth J, Mahmood I, Prentice J, et al. “Nodular endocardial infiltrates (Quilty lesions) cause significant variability in diagnosis of ISHLT Grade 2 and 3A rejection in cardiac allograft recipients,” J. Hear. Lung Transplant, vol. 24, no. 7 SUPPL., pp. 219–226, 2005, doi: 10.1016/j.healun.2005.04.001. [DOI] [PubMed] [Google Scholar]
  • [21].Lipkova J, Chen TY, Lu MY, Chen RJ, Shady M, Williams M, Wang J, Noor Z, Mitchell RN, Turan M, et al. “Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies,” Nat. Med, vol. 28, no. 3, pp. 575–582, Mar. 2022, doi: 10.1038/S41591-022-01709-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Janowczyk A, Zuo R, Gilmore H, Feldman M, Madabhushi A, “HistoQC: an open-source quality control tool for digital pathology slides,” JCO Clin. cancer informatics, vol. 3, pp. 1–7, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Ruifrok AC, Johnston DA, “Quantification of histochemical staining by color deconvolution,” Anal. Quant. Cytol. Histol, vol. 23, no. 4, pp. 291–299, 2001. [PubMed] [Google Scholar]
  • [24].Otsu N, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst. Man. Cybern, vol. 9, no. 1, pp. 62–66, 1979. [Google Scholar]
  • [25].Liu Y, Zhu X, Huang Z, Cai J, Chen R, Xiong S, Chen G, & Zeng H, “Texture analysis of collagen second-harmonic generation images based on local difference local binary pattern and wavelets differentiates human skin abnormal scars from normal scars,” J. Biomed. Opt, vol. 20, no. 1, p. 16021, 2015. [DOI] [PubMed] [Google Scholar]
  • [26].Reinhard E, Adhikhmin M, Gooch B, & Shirley P, “Color transfer between images,” IEEE Comput. Graph. Appl, vol. 21, no. 5, pp. 34–41, 2001. [Google Scholar]
  • [27].Yuan C, Arabyarmohammadi S, Li H, Peyster E, Lal P, Feldman MD, Margulies KB, & Madabhushi A, “Novel Morphologic Biomarkers of Cardiac Allograft Remodeling are Associated with Multiple Peri- and Post-transplant Inflammatory Processes - ATC Abstracts,” 2021 American Transplant Congress, 2021. https://atcmeetingabstracts.com/abstract/novel-morphologic-biomarkers-of-cardiac-allograft-remodeling-are-associated-with-multiple-peri-and-post-transplant-inflammatory-processes/ (accessed Dec. 16, 2022).
  • [28].McInnes L, Healy J, & Melville J, “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv Prepr. arXiv1802.03426, 2018. [Google Scholar]
  • [29].Peyster EG, Madabhushi A, & Margulies KB, “Advanced morphologic analysis for diagnosing allograft rejection: the case of cardiac transplant rejection,” Transplantation, vol. 102, no. 8, p. 1230, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Peyster EG, Janowczyk A, Swamidoss A, Kethireddy S, Feldman MD, & Margulies KB, “Computational Analysis of Routine Biopsies Improves Diagnosis and Prediction of Cardiac Allograft Vasculopathy,” Circulation, vol. 145, no. 21, pp. 1563–1577, 2022, doi: 10.1161/CIRCULATIONAHA.121.058459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Billingham ME, Cary NRB, Hammond ME, Kemnitz J, Marboe C, McCallister HA, Snovar DC, Winters GL, & Zerbe A, “A working formulation for the standardization of nomenclature in the diagnosis of heart and lung rejection: Heart rejection study group,” J. Heart Transplant, vol. 9, no. 6, pp. 587–592, 1990. [PubMed] [Google Scholar]
  • [32].Ahmedt-Aristizabal D, Armin MA, Denman S, Fookes C, & Petersson L, “A survey on graph-based deep learning for computational histopathology,” Comput. Med. Imaging Graph, vol. 95, p. 102027, Jan. 2022, doi: 10.1016/J.COMPMEDIMAG.2021.102027. [DOI] [PubMed] [Google Scholar]
  • [33].Ali S, Veltri R, Epstein J, Christudass C, Madabhushi Sahirzeeshan Ali A, Epstein JA, & Madabhushi A, “Cell cluster graph for prediction of biochemical recurrence in prostate cancer patients from tissue microarrays,” 10.1117/12.2008695, vol. 8676, pp. 164–174, Mar. 2013, doi: 10.1117/12.2008695. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Publication Material

RESOURCES