Abstract
In kidney transplantation, day-zero biopsies are used to assess organ quality and discriminate between donor-inherited lesions and those acquired post-transplantation. However, many centers do not perform such biopsies since they are invasive, costly and may delay the transplant procedure. We aim to generate a non-invasive virtual biopsy system using routinely collected donor parameters. Using 14,032 day-zero kidney biopsies from 17 international centers, we develop a virtual biopsy system. 11 basic donor parameters are used to predict four Banff kidney lesions: arteriosclerosis, arteriolar hyalinosis, interstitial fibrosis and tubular atrophy, and the percentage of renal sclerotic glomeruli. Six machine learning models are aggregated into an ensemble model. The virtual biopsy system shows good performance in the internal and external validation sets. We confirm the generalizability of the system in various scenarios. This system could assist physicians in assessing organ quality, optimizing allograft allocation together with discriminating between donor derived and acquired lesions post-transplantation.
Subject terms: Renal replacement therapy, Epidemiology, Pathology
Despite being recommended, day-zero biopsies are often not performed, due to the cost and time. Here, the authors show that machine learning and donor’s basic parameters can predict the biopsy, offering a reliable virtual estimation of the day-zero biopsy findings.
Introduction
In medicine, biopsy has become a standard test for establishing a diagnosis for both malignant and benign tumors as well as characterizing inflammatory diseases and other pathologic processes, thereby guiding therapeutic management1.
In transplant medicine, the biopsy of the organ has been performed since the first pioneering work of Barry et al. and of Hamburger in Paris, becoming the gold standard for diagnosing allograft rejection and other various pathological processes that harm the allograft2,3. The histological evaluation of donors, also called “day-zero biopsies,” has been implemented in several transplant programs4–6 to judge the quality of a donor organ and, on occasion, to rule out the possibility of underlying diseases in donors7. In addition, day-zero biopsies provide a valuable baseline to which the findings of subsequent biopsies of the kidney allograft can be compared and may also advocate therapeutic strategies8,9.
Despite their potential usefulness, day-zero biopsies are still not performed at many transplant centers and happen only in specific situations10,11 since they remain invasive, time-consuming, and costly procedures that require organization of surgical, medical, pathological, and technical resources and might increase cold ischemia time associated with worst outcomes12. In addition, as we previously reported, the organ quality assessment has become ever more important in the current worldwide increase of transplantation from older donors, donation after circulatory death, and donors with significant clinical risk factors to optimize the use of these kidneys to improve transplant outcomes13–15. These vulnerable organs may carry, at the time of transplantation, arteriosclerosis, fibrosis, hyalinosis, and glomerulosclerosis lesions16. If identified in a post-transplantation biopsy without the finding of a day-zero biopsy, these histological lesions, because of their non-specificity, might be wrongly attributed to calcineurin inhibitor toxicity, infectious diseases, or allo-immune response with significant impact for decision-making and patient management6–8.
To circumvent these limitations, we designed a study to develop and validate a non-invasive virtual biopsy system that uses routinely collected donor parameters to predict the kidney day-zero biopsy findings to help physicians in guiding diagnostics, therapeutics, and immediate patient management post-transplant. The virtual biopsy system, an artificial intelligence model, provides virtual results that would have been obtained if a biopsy would have been performed. Since machine learning has demonstrated its clinical relevance in some medical specialities and comparative discriminative performance to logistic regression17–20, we based our analyses on machine learning methods, using a large and qualified international cohort of donors who underwent routine and protocolized collection of donor parameters, together with day-zero biopsy assessment using the standards of the international Banff allograft histopathology classification21.
Results
Baseline characteristics of the derivation cohort
We included a total of 12,402 day-zero biopsies from the 15-participating transplant centers for the derivation cohort. The mean donor age was 46.7 ± 14.9 (standard deviation, SD) years; 5450 (44.0%) were female, and 9395 (75.8%) were deceased donors. The mean serum creatinine was 1.2 ± 1.0 mg/dL. Baseline characteristics of the derivation cohort by country are shown in Table 1. The population is described in detail in Supplementary Method 1. Baseline characteristics of the derivation cohort stratified by center are described in Supplementary Table 1.
Table 1.
Overall(n = 12,402) | France(n = 2594) | USA(n = 5744) | Canada(n = 1578) | Australia(n = 370) | Belgium(n = 864) | Spain(n = 799) | Croatia(n = 453) | |
---|---|---|---|---|---|---|---|---|
Age (years), mean (SD) | 46.7 (14.9) | 52.0 (16.2) | 43.6 (13.7) | 42.0 (13.5) | 46.9 (14.4) | 46.2 (12.8) | 61.1 (11.2) | 47.8 (12.2) |
Sex female, No. (%) | 5450 (44.0%) | 1072 (41.3%) | 2581 (44.9%) | 757 (48.0%) | 187 (51.9%) | 386 (44.7%) | 281 (35.2%) | 186 (41.1%) |
Donor type | ||||||||
Deceased donor, No. (%) | 9395 (75.8%) | 2528 (97.5%) | 3402 (59.2%) | 1065 (67.5%) | 284 (76.8%) | 864 (100.0%) | 799 (100.0%) | 453 (100.0%) |
Death from circulatory disease, No. (%)a | 1471 (15.7%) | 195 (7.7%) | 531 (15.8%) | 126 (11.8%) | 65 (23.0%) | 225 (26.0%) | 329 (41.2%) | 0 (0.0%) |
Death from cerebrovascular disease, No. (%)a | 4001 (42.8%) | 1391 (55%) | 942 (28.0%) | 326 (30.6%) | 113 (40.5%) | 433 (50.1%) | 513 (64.2%) | 283 (62.5%) |
Diabetes mellitus, No. (%) | 782 (7.4%) | 175 (6.9%) | 428 (8.5%) | 48 (3.6%) | 10 (2.7%) | 3 (3.3%) | 111 (14.2%) | 7 (1.5%) |
Hypertension, No. (%) | 2375 (21.1%) | 613 (24.9%) | 916 (18.1%) | 145 (11.1%) | 47 (12.8%) | 122 (14.4%) | 407 (52.2%) | 125 (27.6%) |
BMI (kg/), mean (SD) | 26.9 (5.5) | 25.2 (4.7) | 28.1 (6.0) | 26.4 (5.3) | 26.8 (5.7) | 25.3 (4.1) | 27.6 (5.0) | 26.3 (3.6) |
HCV status, No. (%) | 233 (1.9%) | 34 (1.4%) | 180 (3.2%) | 16 (1.1%) | 0 (0.0%) | 0 (0.0%) | 3 (0.4%) | 0 (0.0%) |
Creatinine (mg/dL), mean (SD) | 1.2 (1.0) | 1.0 (0.6) | 1.6 (1.3) | 1.0 (0.6) | 0.8 (0.3) | 0.8 (0.5) | 1.0 (0.5) | 0.9 (0.4) |
Proteinuria, No. (%) | 1904 (20.7%) | 1101 (49.2%) | 317 (6.2%) | 266 (35.6%) | 3 (3.7%) | 42 (42.4%) | 83 (17.9%) | 92 (20.3%) |
Number of Glomeruli, mean (SD) | 39.3 (33.5) | 21.5 (15.5) | 65.0 (41.4) | 32.9 (24.8) | 38.2 (17.4) | 27.6 (16.7) | N/A | 57.0 (33.9) |
Arteriosclerosis (cv) Banff score, No. (%) | ||||||||
0 | 7073 (60.2%) | 915 (36.9%) | 3697 (65.6%) | 1007 (73.9%) | 81 (51.3%) | 765 (88.5%) | 305 (38.3%) | 303 (66.9%) |
1 | 3105 (26.4%) | 818 (33.0%) | 1335 (23.7%) | 263 (19.3%) | 48 (30.4%) | 76 (8.8%) | 425 (53.3%) | 140 (30.9%) |
2 | 1325 (11.3%) | 645 (26.0%) | 482 (8.5%) | 89 (6.5%) | 18 (11.4%) | 22 (2.5%) | 63 (7.9%) | 6 (1.3%) |
3 | 252 (2.1%) | 103 (4.2%) | 125 (2.2%) | 4 (0.3%) | 11 (7.0%) | 1 (0.1%) | 4 (0.5%) | 4 (0.9%) |
Arteriolar hyalinosis (ah) Banff score, No. (%) | ||||||||
0 | 8242 (68.8%) | 1010 (39.8%) | 4857 (86.8%) | 831 (54.2%) | 280 (79.3%) | 640 (74.2%) | 375 (59.2%) | 249 (55.0%) |
1 | 2546 (21.3%) | 950 (37.4%) | 548 (9.8%) | 415 (27.1%) | 63 (17.8%) | 178 (20.6%) | 214 (33.8%) | 178 (39.3%) |
2 | 968 (8.1%) | 462 (18.2%) | 141 (2.5%) | 251 (16.4%) | 8 (2.3%) | 41 (4.8%) | 42 (6.6%) | 23 (5.1%) |
3 | 217 (1.8%) | 117 (4.6%) | 52 (0.9%) | 37 (2.4%) | 2 (0.6%) | 4 (0.5%) | 2 (0.3%) | 3 (0.7%) |
Interstitial fibrosis and tubular atrophy (IFTA) Banff score, No. (%) | ||||||||
0 | 7822 (64.4%) | 1594 (62.0%) | 4072 (71.9%) | 830 (58.1%) | 328 (88.6%) | 648 (75.0%) | 185 (23.2%) | 165 (36.4%) |
1 | 3647 (30.0%) | 806 (31.3%) | 1229 (21.7%) | 549 (38.4%) | 37 (10.0%) | 198 (22.9%) | 572 (71.6%) | 256 (56.5%) |
2 | 562 (4.6%) | 131 (5.1%) | 293 (5.2%) | 48 (3.4%) | 4 (1.1%) | 14 (1.6%) | 41 (5.1%) | 31 (6.8%) |
3 | 117 (1.0%) | 41 (1.6%) | 68 (1.2%) | 1 (0.1%) | 1 (0.3%) | 4 (0.5%) | 1 (0.1%) | 1 (0.2%) |
Glomerulosclerosis, median (interquartile range) | 3.0 (0.0–10.0) | 5.9 (0.0–13.3) | 0.0 (0.0–6.0) | 4.8 (0.0–9.5) | 3.9 (0.0–9.1) | 0.0 (0.0–8.3) | 7.4 (3.1–7.4) | 3.3 (0.0–7.7) |
Proteinuria values were positive when dipstick greater than or equal to 1 or urine protein to creatinine ratio (UPCR, g/g) greater than or equal to 0.5 g/g.
BMI body mass index, HCV hepatitis C virus.
aNumber and % were calculated among deceased donors.
Kidney histology lesions in the derivation cohort
Table 1 depicts the day-zero kidney biopsy findings of the derivation cohort. The median percentage of glomerulosclerosis was of 3.0% (interquartile range, IQR 0.0–10.0). The arteriosclerosis (Banff score cv) lesion score’s distribution was 60.2%, 26.4%, 11.3%, and 2.1% for Banff scores None (Banff score 0), Mild (Banff score 1), Moderate (Banff score 2), and Severe (Banff score 3), respectively. The arteriolar hyalinosis (Banff score ah) lesion score’s distribution was 68.8%, 21.3%, 8.1%, and 1.8% for scores 0, 1, 2, and 3, respectively. Finally, the interstitial fibrosis and tubular atrophy (Banff score IFTA) lesion score’s distribution was 64.4%, 30.0%, 4.6%, and 1.0% for scores 0, 1, 2, and 3, respectively. Most moderate or severe (score 2 or 3) lesions of cv, ah, and IFTA were from deceased donors (Supplementary Table 2).
Kidney virtual biopsy system development
The population cohort was imputed separately by derivation and external cohorts then pre-processed (Supplementary Tables 3, 4). We tuned and generated the best performing models for predicting the lesion scores, based on the donor parameters. The details of the hyperparameters tuning are available in Supplementary Table 5. Then, the ensemble model that groups these models together was generated. For each biopsy lesion score, we selected the ensemble models as a virtual biopsy system (see methods).
Donor parameters’ relative importance on lesion prediction
We examined the importance of the 11 donor parameters used for the virtual biopsy system development by averaging the importance produced by the models (Fig. 1). Overall, the three most important and predictive parameters for the biopsy lesions were age, serum creatinine, and the body mass index (BMI). The hypertension and cerebrovascular cause of death were the following highly important parameters overall.
Model prediction performance on derivation cohort
The ensemble models showed discrimination performance during cross-validation with the multi-area under the curves (multi-AUC) of 0.833 (SD 0.013), 0.773 (0.020), 0.830 (0.027) for cv, ah, and IFTA lesions, respectively. Additionally, the ensemble models achieved area under the receiver operating characteristic curves (AUROC) of 0.880 (0.016), 0.823 (0.019), and 0.900 (0.023) for cv, ah, and IFTA lesions, respectively (Fig. 2). Ensemble models’ cut-offs were calibrated to maximize Youden’s J statistic. With the calibrated cut-offs of 0.582 for cv, 0.596 for ah, and 0.637 for IFTA, balanced accuracies (mean of sensitivity and specificity) were 0.786 (0.021) for cv, 0.736 (0.021) for ah, and 0.813 (0.024) for IFTA. For the glomerulosclerosis lesion, the mean absolute error (MAE) was 5.999 (0.032) and the root mean square error (RMSE) was 8.888 (0.059). The ensemble models and random forest models showed comparative performance. Table 2 summarizes the performances of all generated models. The detail cross-validation results are available in Supplementary Table 6. Calibration is shown as confusion matrix for each model in Supplementary Table 7.
Table 2.
Models | Hand and Till’sMulti-AUC | Mean Absolute Error | ||
---|---|---|---|---|
Arteriosclerosis(cv Banff score) | Arteriolar hyalinosis(ah Banff score) | Interstitial fibrosis tubular atrophy(IFTA Banff score) | Glomerulosclerosis in percentage | |
Random Forest | 0.836 | 0.774 | 0.830 | 5.807 |
Gradient Boosting Machine | 0.807 | 0.750 | 0.805 | 6.486 |
Extreme Gradient Boosting Tree | 0.830 | 0.767 | 0.827 | 5.768 |
Linear Discriminant Analysisa | 0.761 | 0.703 | 0.750 | -a |
Model Averaged Neural Network | 0.777 | 0.720 | 0.757 | 6.573 |
Multinomial Logistic Regressiona | 0.763 | 0.706 | 0.753 | -a |
Ensemble Model | 0.833 | 0.773 | 0.830 | 5.999 |
The models used for ordinal scores (multiclass classification) are as follows: random forest, gradient boosting machine, extreme gradient boosting tree, linear discriminant analysis, model averaged neural network, and multinomial logistic regression. The models used for the percentage of glomerulosclerosis (regression) are as follows: random forest, gradient boosting machine, extreme gradient boosting tree, and model averaged neural network. Finally, we created ensemble models; for the ordinal day-zero lesion scores, we averaged the probabilities of the six models; for the percentage of glomerulosclerosis, we used linear regression of the four models we created. For the ordinal day-zero lesion scores, model performances were assessed by Hand and Till’s area under the curve (multi-AUC). For the percentage of glomerulosclerosis, model performances were assessed by mean absolute error (MAE). Ensemble models were selected as virtual biopsy system. Model performances were assessed in 3-times repeated 10-folds cross-validation (30 resamples).
AUC area under the curve (higher the better). MAE mean absolute error (lower the better).
aLinear discriminant analysis and multinomial logistic regression are not developed for regression but for classification.
External validation of the virtual biopsy system
We included a total of 1630 day-zero biopsies from the USA and China for the external validation (Supplementary Method 1). Comparison with the derivation cohort and the baseline donor characteristics are available in Supplementary Tables 8, 9. The median percentage of glomerulosclerosis was 2.1% (IQR 0.0-12.5). The cv lesion score’s distribution was 27.9%, 33.9%, 36.3%, and 1.9% for Banff scores None (Banff score 0), Mild (Banff score 1), Moderate (Banff score 2), and Severe (Banff score 3), respectively. The ah lesion scores 0 to 3 were distributed into 53.8%, 38.4%, 6.4%, and 1.4% for scores, respectively. The IFTA scores 0 to 3 were distributed into 40.4%, 30.7%, 28.7%, and 0.2%, respectively. Similar to the derivation cohort, most moderate or severe (score 2 or 3) lesions of cv, ah, and IFTA were from deceased donors (Supplementary Table 10).
In the Columbia University cohort, the ensemble models performed with the multi-AUCs of 0.740 (95% confidence interval [CI] 0.711–0.768), 0.733 (0.694–0.778), and 0.723 (0.705–0.772), for cv, ah, and IFTA lesions, respectively. Additionally, the ensemble models performed with the AUROCs of 0.880 (0.862–0.896), 0.922 (0.882–0.955), and 0.905 (0.889–0.920) for cv, ah, and IFTA lesions, respectively. With the same cut-offs obtained from internal validation, the balanced accuracies (mean of sensitivity and specificity) were 0.787 (0.764–0.808), 0.808 (0.741–0.872), and 0.843 (0.824–0.862) for cv, ah, and IFTA, respectively. For glomerulosclerosis, the ensemble model showed the MAE of 5.200 (4.971–5.422) and the RMSE of 6.630 (6.339–6.908).
In the Sun Yat-sen University cohort, the ensemble models showed the multi-AUCs of 0.740 (95% CI 0.663–0.807), 0.736 (0.654–0.821), and 0.798 (0.731–0.839) for cv, ah, and IFTA lesions, respectively. Furthermore, the AUROCs from the ensemble models were 0.902 (0.783–0.978), 0.895 (0.825–0.950), 0.935 (0.867–0.985) for cv, ah, and IFTA lesions, respectively. The balanced accuracies (same cut-offs obtained from internal validation), were 0.760 (0.578–0.950), 0.840 (0.762–0.899), 0.797 (0.638–0.959) for cv, ah, and IFTA lesions, respectively. For glomerulosclerosis, the ensemble model showed the MAE of 4.608 (4.229–4.989) and the RMSE of 5.731 (5.269–6.197) for glomerulosclerosis.
Figure 2 summarizes the performance of the ensemble models. Calibration in the external validation cohorts is shown in Supplementary Table 11.
Validation of the virtual biopsy system in various scenarios
We confirmed the robustness of the virtual biopsy system in different subpopulations and clinical scenarios in the internal cross-validation, including (i) region (Europe, North America or Australia), (ii) donor ethnicity (African American, Caucasian, and Others [Hispanic, Asian, and Arabic]), (iii) donor criteria (extended criteria donors or standard criteria donors plus living donors), and (iv) biopsy type (preimplantation and postreperfusion). Overall, the system showed good performance in subpopulations. These analyses are depicted in Supplementary Table 12.
Pathologists’ biopsy findings reliability
We confirmed the inter-pathologist consistency in four expert nephropathologists from Necker hospital and Mayo clinic in evaluating the biopsy findings, with Fleiss Kappas of 0.68 (95% CI 0.63–0.73), 0.59 (0.53–0.65) and 0.51 (0.44–0.59), for cv, ah, and IFTA lesions respectively. The overall Fleiss Kappa for all lesions was 0.63 (0.60–0.66).
Performance of kidney donor profile index (KDPI) score
The derivation cohort included 4241 biopsies, and the external validation cohort comprised 1124 biopsies (920 from Columbia University medical center and 204 from Sun Yat-sen University). The mean KDPI was 53.43 (SD 29.49) in the derivation cohort and 63.24 (SD 26.63) in the external validation cohort.
Supplementary Table 13 shows model performance with KDPI as a parameter. The KDPI-based model achieved multi-AUCs of 0.688, 0.644, and 0.716 for cv, ah, and IFTA lesions during internal validation, respectively. Predicting glomerulosclerosis performed with the MAE of 6.647. During external validations, the KDPI-based model showed predictive performance for cv, ah, and IFTA, achieving multi-AUCs of 0.625, 0.668, and 0.638 for the Columbia University cohort, and 0.659, 0.552, and 0.710 for the Sun Yat-sen University cohort, respectively.
Virtual biopsy system online application for physicians
Based on these results, we constructed a ready-to-use online application to offer physicians an open access to the virtual day-zero biopsy system (Supplementary Movie 1). The application allows physicians to enter a single patient’s data, to get (i) the personalized probabilities of belonging to each day-zero histological lesion score and (ii) the prediction visualization with radar chart. The application is available online: https://transplant-prediction-system.shinyapps.io/Virtual_Biopsy_System. Figure 3 and Supplementary Fig. 1 provide examples of usage of the application in clinical practice with real donor clinical cases depicted. The potential clinical utility and impact of this application is also depicted in Supplementary Fig. 2.
Discussion
In this international, multicohort study of kidney transplant biopsies from 17 worldwide centers including the largest Organ Procurement Organization (OPO) in the USA and labeled by expert kidney pathologists, we derived and validated a virtual biopsy system that uses non-invasive and routinely collected donor parameters to predict kidney histological lesions. The virtual biopsy system was developed with four ensemble models based on aggregation of six machine learning algorithms to decrease the bias and maximize the generalizability and predict four biopsy lesion results. Overall, the virtual biopsy system showed good discrimination, calibration, robustness, and generalizability in various countries, external validation cohorts, and clinical scenarios.
Over the past decade, the use of kidneys from older donors with comorbidities has expanded the pool of kidneys, raising the question of whether pathological examination of donated kidneys could help better characterize organ quality or drive inefficiencies in organ allocation22. Additionally, this biopsy procedure needs to be performed and interpreted by trained experts, which is difficult to implement 24/723. Furthermore, in the USA, the United Network for Organ Sharing policy for organ allocation, recommends the use of KDPI, day-zero biopsy results, and donor characteristics to assess organ quality before transplantation. Despite the importance, the lost time due to this procedure could be precious when the biopsy result is used for allocation purposes as every additional hour of cold ischemia time is highly associated with worse graft outcomes. Therefore, many centers are discouraged from performing day-zero biopsy because it remains an invasive and time-consuming procedure that could increase cold ischemia time10,11.
Our literature search (Supplementary Method 2) revealed a dearth of studies that address the creation of a virtual biopsy for evaluating biopsy lesion presence and severity by utilizing non-intrusive factors such as donor parameters. Meanwhile, non-invasive diagnosis using machine learning has been studied. Yin et al. demonstrated that the potential of multiple machine learning classifiers in distinguishing histological features in bladder tumor images24. Detecting kidney biopsy results has been explored predominantly with histological images using deep learning. In 2018, Marsh et al. developed a convolutional neural networks model to identify and classify glomerulosclerosis in day-zero kidney biopsies, improving pre-transplant evaluation25. Hara et al. showcased a U-Net based segmentation model for classifying normal and abnormal tubules in kidney biopsies26. However, a need persists to compensate for the absence of performed day-zero biopsy for kidney allografts by virtually assessing the presence and severity of biopsy lesions using non-invasive donor parameters.
In this context, we believe that the virtual biopsy system has many potential implications. First, it not only predicts the presence of lesions (binary classification) but also predicts the severity grades of the lesion (multiclass classification), which fosters a more complete clinical interpretation.
Second, the virtual biopsy system can help a physician to evaluate and contextualize post-transplant lesions, which might be inherited from the donor or acquired after transplantation; this could reinforce precision medicine and patient monitoring of these nonspecific histological lesion to guide therapeutics27–30.
Third, since the virtual biopsy system is trained on high-quality data and biopsies labeled by expert kidney pathologists, its inferences are highly reliable. Because the day-zero biopsy labeling depends on the skills and experience of observers (e.g., general or kidney pathologist) and temporal settings, the virtual biopsy system may partly address the current issues of Banff classification using histology such as physicians’ variability and reproducibility in labeling biopsy findings. Additionally, it may have a great interest in many centers, especially from developing countries, that currently cannot yet afford to perform neither digital pathology with whole-slide imaging, nor day-zero biopsies due to the lack of resources.
Fourth, the system could decrease cold ischemia time and mobilization of team resources by circumventing the standard of care day-zero biopsy procedure using basic donor characteristics and virtual biopsy. Eliminating the process by offering the virtual biopsy could shorten the allocation time and improve the graft outcomes31. Overall, this can be achieved by utilizing the virtual biopsy system before organ retrieval (procurement) to provide physicians with a reliable surrogate of the true day-zero biopsy.
Fifth, the virtual biopsy system may be attractive for clinical trials by helping to improve the randomization of the patients at the time of transplantation, using not only the baseline characteristics but also the chronic lesions of kidney donors to avoid selection bias. Moreover, the efficacy of a new treatment is very often based on protocol biopsies where chronic lesions such as fibrosis and arteriosclerosis can be found. Because antibody-mediated rejection or immunosuppressive toxicity can induce those lesions27–29, knowing their origin—whether they were inherited from the donor or from the consequence of treatment inefficacy—is crucial to avoid misinterpretation of the findings and loss of potential useful treatments6,8.
Last, although the rapid improvements in computing power and huge digitized medical history records have led many researchers to attempt integrative approaches to scrutinizing unknown fields of medicine17,32,33, it is still difficult for health professionals to approach these tools in real life. Since the virtual biopsy system is not a mere proof of concept, we generated an easy-to-use online application to support physicians and reinforce applicability. This online clinical application is available immediately. Beyond transplantation, the idea of a virtual biopsy system, using routinely accessible donor parameters to predict biopsy findings with the power of algorithms, can be easily cross-fertilized with other fields of medicine that have a comparable need to predict specific lesions for an enhanced interpretation of patient prognosis.
Our study has numerous strengths, but we also acknowledge the following limitations.
First, due to the multi-centre nature of the study, the problem of interobserver variability in labeling biopsy findings, practices, and procedures may have carried compatibility issues and impacted the study results23. However, we made four pathologists reassess 10% of the biopsies and showed that this variability was limited, confirming that the biopsy findings may be considered reliable. Besides, our data collection procedure followed high-quality structured protocols to ensure compatibility across study centers. Second, due to the large number of centers involved, some heterogeneity was induced in biopsy techniques, tissue processing, and tissue stain but reflecting the different practices worldwide and remains a limited part of the derivation cohort <7% used only hematoxylin and eosin stain or only frozen tissue. This heterogeneity makes the model more generalizable and robust by improving its exportability. Third, wedge biopsy increases the risk of capturing only subcapsular tissue which could underestimate the extent of vascular intimal thickening or overestimate glomerulosclerosis; this could have introduced unintended overestimation in cv and glomerulosclerosis lesions34,35. However, most biopsies included were from centers that used core needle biopsy instead of wedge biopsy. Additionally, to overcome this issue, we included only centers with a large number of kidney transplants with a relatively low number of inadequate biopsies (7.2%) as compared to the literature (30%)36. Fourth, additional predictors, such as gene expression or new biomarkers, beyond the 11 donor parameters used to derive the virtual biopsy, may improve its performances. However, the parameters used in this study are the most commonly accessible, and including less standard ones might not only increase the number of missing data but also reduce generalizability by increasing the risk of parameters missing. Last, other sampling methods such as nested cross-validation, may help provide more precise prediction performances. However, with the large derivation cohort from heterogeneous and various data sources, we are confident in performing 3-times repeated 10-folds cross-validation for internal validation37. Moreover, we performed model assessments in subpopulations and various clinical scenarios. Finally, we showed the model performances are comparable in internal and external validations.
In conclusion, we derived and validated a machine learning-driven virtual kidney allograft biopsy system that uses easily accessible donor parameters at the time of transplantation. The virtual biopsy system demonstrates good performances and robustness across 17 geographically distinct centers and in many clinical scenarios. This system can provide physicians with a reliable estimation of the day-zero biopsy findings, which may reduce costs of invasive and time-consuming procedures and help guide further biopsy interpretations and patient management.
Methods
Study design and population
The population consisted of living or deceased and transplanted or discarded adult donors for kidney transplantation enrolled from January 1st, 2000, to December 31st, 2021, who underwent kidney biopsies performed prior to kidney transplantation as part of standard of care. For the derivation cohort, the study involved 15 centers including 14 institutions from seven countries (France, Belgium, Croatia, Spain, United States, Canada, and Australia) and the largest OPO in the USA (OneLegacy). For the external validation cohorts, two institutions from two countries were involved: Columbia university medical center from the USA and Sun-Yat-sen university from China. A total of 15,121 kidney biopsies were assessed overall. Exclusion criteria were inadequate biopsies according to Banff international classification requirements (n = 1089, 7.2%)21. A total of 14,032 kidney allograft biopsies were included for the final analyses including 1372 (9.8%) from discarded kidneys. Among them, 12,402 were in the derivation cohort and 1630 were in the external validation cohorts.
Inclusion and ethics statement
All data were anonymized, and the clinical and biological data were collected from each center and entered into the Paris Transplant Group database (French data protection authority (CNIL) registration number 363505). On January 1st, 2021, the data were accessed from the database. On November 19th, 2021, the Chinese data were accessed from the database. On June 8th, 2022, the OneLegacy OPO data were accessed from the database. The protocol of this study (NCT04759209) was approved by the Paris Transplant Group’s Institutional Review Board (IRB). Written informed consent was given by all living donors at the time of transplantation. The IRB of Paris Transplant Institute approved the study and waived the informed consent for deceased donors (registration no. 2018-1017-Virtual-Biopsy). The original collection and exportation of the data had the approval of the Ministry of Science and Technology for Sun-Yat-sen university in China. All data from the Paris Transplant Group centers (Necker, Saint Louis, and Toulouse Hospitals) were entered prospectively at the time of transplantation; a structured protocol was used to ensure harmonization across study centers. To ensure data accuracy, an annual audit was performed. As part of standard clinical procedures, other datasets from the European, North American, Australian, and Asian centers were compiled, entered in the databases of the centers in accordance with local and national regulatory standards, and submitted to the Paris Transplant Group anonymously.
Kidney biopsy histological assessment and protocols
Day-zero biopsies were performed after the organ was removed from the donor in accordance with standard practices by a surgeon using a 16-gauge needle device or a straight blade. The tissue was immediately fixed in an aqueous formaldehyde solution (formalin) or alcohol–formalin–acetic acid solution and subsequently embedded in paraffin or immediately frozen. The biopsy sections (4 μm) were stained with periodic acid-Schiff, Masson’s trichrome, hematoxylin, and eosin. Using the international Banff classification kidney lesions scoring system21, expert kidney pathologists graded the graft biopsy lesions using the following criteria: glomeruli number, arteriosclerosis, arteriolar hyalinosis, interstitial fibrosis and tubular atrophy, and the percentage of sclerotic glomeruli. A detailed table summarizing the participating centers’ biopsy practices and procedures is presented in Supplementary Table 14.
Outcomes of interest
The outcomes of interest were the biopsy findings according to the international Banff classification of allograft pathology, which uses a validated semi-quantitative ordinal grading scheme for all kidney compartments including: (i) arteriosclerosis defined by arterial intimal thickening in the most severely affected artery (Banff “cv” score), (ii) arteriolar hyalinosis defined by periodic acid-Schiff (PAS)-positive arteriolar hyaline thickening (Banff “ah” score), and (iii) interstitial fibrosis and tubular atrophy (Banff “IFTA” score) computed with the extent of cortical fibrosis (Banff “ci” score) and cortical tubular atrophy (Banff “ct” score)21. These semi-quantitative lesion grading scores are not linear. Last, the continuous percentage of sclerotic glomeruli was defined by the percentage of the total number of glomeruli affected by global sclerosis (“glomerulosclerosis” score)5. The Banff grading scheme in detail is available in Supplementary Method 3 and Supplementary Table 15.
Candidate predictors of kidney biopsy histological lesions
Eleven candidates, universally collected donor predictors at donation, of kidney day-zero histological lesions were examined. They comprised donor’s age, sex, type (living or deceased donor), donor’s cerebrovascular cause of death, donor’s circulatory cause of death (DCD), donor’s history of hypertension, diabetes, hepatitis C virus (HCV) status, body mass index (BMI), lowest serum creatinine at donation, and donor proteinuria status. The details of these predictors are available in Supplementary Method 4.
Statistical analyses
We used TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement for the reporting of the development and validation of the virtual biopsy system38, which was adapted to machine learning (Supplementary Method 5). Figure 4 summarizes the process of generating and validating machine learning models.
Descriptive analyses of baseline characteristics
For continuous variables, means and standard deviations or medians and interquartile ranges were used. We compared means and proportions between groups using Student’s t-test, analysis of variance (ANOVA) (or Mann-Whitney test and Kruskal-Wallis if appropriate), or the chi-squared test (or Fisher’s exact test if appropriate). All tests were two tailed.
Algorithm pre-process
To minimize the data imbalance in the lesion scores and maximize the predictive performance, which had more mild/lower grades (over-represented) than severe/higher grades (under-represented), we applied an up-sampling method during the model training process by resampling random kidneys from the severe/higher grades. Three numeric continuous donor parameters (age, body mass index, and creatinine) were standardized to have mean of zero and a standard deviation of one. These pre-process steps were done with caret R Package39.
Development of the virtual biopsy system
To develop the virtual biopsy system, we computed probabilities for each day-zero histological lesion score from six machine learning models: random forests (RF)40, model averaged neural networks (avNNet)41, gradient boosting machine (GBM)42, extreme gradient boosting tree (XGBoost)43, linear discriminant analysis (LDA)41, and multinomial logistic regression (MNOM)44. To avoid overfitting and sampling bias, hyperparameters were optimized by robust 3-times repeated 10-folds cross-validation when tuning the models45. Then, we aggregated the classification models by averaging probabilities provided by each model: this generated an ensemble model, or meta-classifier, which is aimed at decreasing bias and overfitting to take into account the “no free lunch” theorem46–48. MNOM and LDA were not used to predict glomerulosclerosis lesion (regression) because they are exclusively designed to predict categorical variables (classification). For the regression model, we built a linear model of regression models to create an ensemble model, a meta-regression.
Virtual biopsy system prediction performances
Models’ performances were assessed as internal and external validation. For the internal validation, the performance was assessed in 30 resamples from the 3-times repeated 10-folds cross-validation on the derivation cohort. For the external validation, the performance was assessed on the external cohorts. To assess the discrimination performance of the machine learning models used for glomerulosclerosis, which is continuous, we used the MAE and RMSE as a supplementary metric49. For ordinal day-zero histological lesion scores, cv, ah, and IFTA, we used the multi-area under curve (multi-AUC) using Hand and Till’s formula50. Further supplementary metrics for cv, ah, and IFTA were also reported for both internal and external validation: sensitivity, specificity, balanced accuracy (average of sensitivity and specificity), accuracy, and area under the receiver operating characteristic curve (AUROC). To present these supplementary metrics, we dichotomized the categorical Banff scores “None” (Banff score 0) and “Mild” (Banff score 1) as the negative class and “Moderate” (Banff score 2) and “Severe” (Banff score 3) as the positive class. Cut-offs for dichotomized Banff lesions were calculated using the method of Youden’s J statistic on internal validation51. Supplementary Method 6 contains the rationale for the cut-offs used to measure the performance. The 1000 bootstraps were used to obtain 95% CIs while the external validation cohorts’ samples were used for point estimate for each metric.
Model calibration was examined with confusion matrices. Furthermore, to assess the donor parameters that drive the performance of the models, we averaged the feature importance by RF, GBM, XGBoost, LDA (for classification models only), avNNet, and MNOM (for classification models only).
Imputation of missing data
For biopsies with at least one missing data element for predictors of interest, random forest imputation algorithm was performed using the missForest R package52. The maximum iteration was set to 10 times for imputation. The details of the imputation process and results are presented in Supplementary Method 7.
Kidney donor profile index (KDPI)
We conducted a sensitivity analysis to investigate whether KDPI could predict the day-zero biopsy lesions. We developed a model using only the KDPI score. Biopsies from living donors and those with missing ethnicity, height, or weight data were excluded from the imputed dataset. Organ Procurement and Transplantation Network (OPTN) guidelines, based on the database as of April 07, 2023, were followed for KDPI calculations. An ensemble of RF, XGBoost, LDA, avNNet, and MNOM models was employed. LDA and MNOM were excluded for predicting glomerulosclerosis lesion. GBM was excluded due to the difficulty of deriving a univariate model.
Assessment of the consistency in the biopsy evaluation
To evaluate the inter-pathologist’s consistency in evaluating the biopsy findings, we randomly selected 10% of the biopsies and made them reassessed in the original two transplant centers (Necker Hospital and Mayo Clinic) by four expert nephropathologists. Pathologists were blinded to the previous biopsy findings. Fleiss Kappa was used to measure the consistency and was weighted to take into account the magnitude of errors in the re-assessment.
Software and package
Descriptive analyses and machine learning analyses were conducted using R (version 3.5.1, R Foundation for Statistical Computing) and RStudio (version 2022.7.2.576). Packages used for data and machine learning analyses were: randomForest (version 4.6-14), gbm (version 2.1.5), xgboost (version 1.4.1.1), plyr (version 1.8.4), MASS (version 7.3-51.4), nnet (version 7.3-12), caret (version 6.0-84), caretEnsemble (version 2.0.1), tidyverse (version 1.3.0), ggsci (version 2.9), rsample (version 0.1.1), tidymodels (version 0.0.2), patchwork (version 1.0.0), dplyr (version 1.0.7), ggplot2 (version 3.3.1), yardstick (version 0.0.8), readr (version 1.3.1), cvms (version 1.3.3), pROC (version 1.18.0), rlist (version 0.4.6.2), autoxgboost (version 0.0.0.9000), shiny (version 1.6.0), shinythemes (version 1.1.2), kableExtra (version 1.3.4), and compareGroups (version 4.0.0).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We thank Sophie Ferlicot, Sandra Cockfield, Sumit Mohan, Syed A. Husain, David J. Cohen, Lloyd E. Ratner, and Maisarah Jalalonmuhali for data acquisition. French government managed by the National Research Agency (ANR) with the grant agreement ANR-17-RHUS-0010 and European Union’s Horizon 2020 research and innovation program EU-TRAIN with the grant agreement no. 754995 provided financial support. The funders of this study had no role in the study design, data collection, analysis, or interpretation of the manuscript.
Author contributions
A.L. and O.A. supervised the study. D.Y., G.D., M.R., O.A., and A.L. designed the study, analyzed and interpreted the data, wrote and edited the manuscript. D.Y., G.D., M.R., A.C., T.M., J.R., A.J.B., M.D.S., M.N., H.Z., C.W., J.G., N.K., A.B., I.B., S.M.C., J.S.G., F.O., E.D.S.-A., D.R.J.K., A.D., D.S., M.R., J.-P.D.V.H., P.C., S.S., M.M., O.B., N.B.-J., I.J., P.B., L.D.C., M.P.A., P.T.C., C.L., P.P.R., C.L., O.A., and A.L. contributed to the data acquisition and D.Y., G.D., M.R., O.A., and A.L. verified the data. D.Y. performed the data analysis. D.Y., G.D., M.R., O.A., and A.L. wrote the manuscript. The corresponding author attests that all authors have read and approved the manuscript. A.L. was responsible for the decision to submit the manuscript for publication. All authors revised the manuscript.
Peer review
Peer review information
Nature Communications thanks Nada Alachkar and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
The figure data generated in this study have been deposited in the public Synapse database (https://www.synapse.org/#!Synapse:syn51702348/files/)53. The figure data can be obtained by the signing-in process. The raw data are available from the corresponding author. Source data are provided with this paper.
Code availability
Complete code to reproduce the figures is available in the synapse public Synapse database (https://www.synapse.org/#!Synapse:syn51702348/files/)53. A sign-in process is required to access the code.
Competing interests
A.L. holds shares in Predict4Health, a software company that is not involved in the present research. The other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Daniel Yoo, Gillian Divard.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-44595-z.
References
- 1.Mallory, T. B. Pathology. N. Engl. J. Med.236, 438–443 (1947). [DOI] [PubMed] [Google Scholar]
- 2.Barry, J. M. & Murray, J. E. The first human renal transplants. J. Urol.176, 888–890 (2006). [DOI] [PubMed] [Google Scholar]
- 3.Michon, L. et al. [An attempted kidney transplantation in man: medical and biological aspects]. Presse Med.61, 1419–1423 (1953). [PubMed] [Google Scholar]
- 4.Gaber, L. W. et al. Glomerulosclerosis as a determinant of posttransplant function of older donor renal allografts. Transplantation60, 334–339 (1995). [DOI] [PubMed] [Google Scholar]
- 5.Naesens, M. Zero-time renal transplant biopsies: a comprehensive review. Transplantation100, 1425–1439 (2016). [DOI] [PubMed] [Google Scholar]
- 6.Mengel, M. et al. Protocol biopsies in renal transplantation: insights into patient management and pathogenesis. Am. J. Transpl.7, 512–517 (2007). [DOI] [PubMed] [Google Scholar]
- 7.Chauhan, A. et al. Using implantation biopsies as a surrogate to evaluate selection criteria for living kidney donors. Transplantation96, 975–980 (2013). [DOI] [PubMed] [Google Scholar]
- 8.Randhawa, P. Role of donor kidney biopsies in renal transplantation. Transplantation71, 1361–1365 (2001). [DOI] [PubMed] [Google Scholar]
- 9.Solez, K. et al. Banff 07 classification of renal allograft pathology: updates and future directions. Am. J. Transplant.8, 753–760 (2008). [DOI] [PubMed] [Google Scholar]
- 10.Sung, R. S. et al. Determinants of discard of expanded criteria donor kidneys: impact of biopsy and machine perfusion. Am. J. Transpl.8, 783–792 (2008). [DOI] [PubMed] [Google Scholar]
- 11.Mengel, M. & Sis, B. An appeal for zero-time biopsies in renal transplantation. Am. J. Transpl.8, 2181–2182 (2008). [DOI] [PubMed] [Google Scholar]
- 12.Springfield, D. S. & Rosenberg, A. Biopsy: complicated and risky. J. Bone Jt. Surg. Am.78, 639–643 (1996). [PubMed] [Google Scholar]
- 13.Aubert, O. et al. Long term outcomes of transplantation using kidneys from expanded criteria donors: prospective, population based cohort study. BMJ351, h3557 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Matas, A. J. et al. OPTN/SRTR 2013 Annual Data Report: kidney. Am. J. Transpl.15, 1–34 (2015). [DOI] [PubMed] [Google Scholar]
- 15.Jadlowiec, C. C. et al. Transplant outcomes using kidneys from high KDPI acute kidney injury donors. Clin. Transpl.35, e14279 (2021). [DOI] [PubMed] [Google Scholar]
- 16.Mancilla, E. et al. Time-zero renal biopsy in living kidney transplantation: a valuable opportunity to correlate predonation clinical data with histological abnormalities. Transplantation86, 1684–1688 (2008). [DOI] [PubMed] [Google Scholar]
- 17.Bora, A. et al. Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Digit Health3, e10–e19 (2021). [DOI] [PubMed] [Google Scholar]
- 18.Miles, J., Turner, J., Jacques, R., Williams, J. & Mason, S. Using machine-learning risk prediction models to triage the acuity of undifferentiated patients entering the emergency care system: a systematic review. Diagn. Progn. Res4, 16 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sufriyana, H. et al. Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis. JMIR Med. Inf.8, e16503 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huang, P. et al. Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. Lancet Digit Health1, e353–e362 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Roufosse, C. et al. A 2018 reference guide to the banff classification of renal allograft pathology. Transplantation102, 1795–1814 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pérez-Sáez, M. J., Montero, N., Redondo-Pachón, D., Crespo, M. & Pascual, J. Strategies for an expanded use of kidneys from elderly donors. Transplantation101, 727–745 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Azancot, M. A. et al. The reproducibility and predictive value on outcome of renal biopsies from expanded criteria donors. Kidney Int.85, 1161–1168 (2014). [DOI] [PubMed] [Google Scholar]
- 24.Yin, P.-N. et al. Histopathological distinction of non-invasive and invasive bladder cancers using machine learning approaches. BMC Med. Inform. Decis. Mak.20, 162 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Marsh, J. N. et al. Deep learning global glomerulosclerosis in transplant kidney frozen sections. IEEE Trans. Med. Imaging37, 2718–2728 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hara, S. et al. Evaluating tubulointerstitial compartments in renal biopsy specimens using a deep learning-based approach for classifying normal and abnormal tubules. PLoS One17, e0271161 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chapman, J. R. Chronic calcineurin inhibitor nephrotoxicity-lest we forget. Am. J. Transpl.11, 693–697 (2011). [DOI] [PubMed] [Google Scholar]
- 28.Loupy, A. et al. Determinants and Outcomes of accelerated arteriosclerosis: major impact of circulating antibodies. Circ. Res117, 470–482 (2015). [DOI] [PubMed] [Google Scholar]
- 29.Gosset, C. et al. Circulating donor-specific anti-HLA antibodies are a major factor in premature and accelerated allograft fibrosis. Kidney Int.92, 729–742 (2017). [DOI] [PubMed] [Google Scholar]
- 30.Loupy, A. & Lefaucheur, C. Antibody-mediated rejection of solid-organ allografts. N. Engl. J. Med379, 1150–1160 (2018). [DOI] [PubMed] [Google Scholar]
- 31.Debout, A. et al. Each additional hour of cold ischemia time significantly increases the risk of graft failure and mortality following renal transplantation. Kidney Int.87, 343–349 (2015). [DOI] [PubMed] [Google Scholar]
- 32.Obermeyer, Z. & Emanuel, E. J. Predicting the future - big data, machine learning, and clinical medicine. N. Engl. J. Med.375, 1216–1219 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med.380, 1347–1358 (2019). [DOI] [PubMed] [Google Scholar]
- 34.Haas, M. et al. Arteriosclerosis in kidneys from healthy live donors: comparison of wedge and needle core perioperative biopsies. Arch. Pathol. Lab Med.132, 37–42 (2008). [DOI] [PubMed] [Google Scholar]
- 35.Muruve, N. A., Steinbecker, K. M. & Luger, A. M. Are wedge biopsies of cadaveric kidneys obtained at procurement reliable? Transplantation69, 2384–2388 (2000). [DOI] [PubMed] [Google Scholar]
- 36.Bago-Horvath, Z. et al. The cutting (w)edge–comparative evaluation of renal baseline biopsies obtained by two different methods. Nephrol. Dial. Transpl.27, 3241–3248 (2012). [DOI] [PubMed] [Google Scholar]
- 37.Wainer, J. & Cawley, G. Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert. Syst. Appl.182, 115222 (2021). [Google Scholar]
- 38.Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ350, g7594 (2015). [DOI] [PubMed] [Google Scholar]
- 39.Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw., Artic.28, 1–26 (2008). [Google Scholar]
- 40.Breiman, L. Random Forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
- 41.Ripley, B. D. & Hjort, N. L. Pattern Recognition and Neural Networks. (Cambridge University Press, 1996).
- 42.Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal.38, 367–378 (2002). [Google Scholar]
- 43.Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 785–794 (ACM Press, 2016). 10.1145/2939672.2939785.
- 44.Ripley, B. D. Modern applied statistics with S. (springer, 2002).
- 45.Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. in Proc. 14th International Joint Conference on Artificial Intelligence vol. 14 1137–1145 (Morgan Kaufmann Publishers, 1995).
- 46.Doan, H. T. X. & Foody, G. M. Increasing soft classification accuracy through the use of an ensemble of classifiers. Int. J. Remote Sens.28, 4609–4623 (2007). [Google Scholar]
- 47.Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evolut. Comput.1, 67–82 (1997). [Google Scholar]
- 48.Wolpert, D. H. Stacked generalization. Neural Netw.5, 241–259 (1992). [Google Scholar]
- 49.Willmott, C. J. & Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res.30, 79–82 (2005). [Google Scholar]
- 50.Hand, D. J. & Till, R. J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn.45, 171–186 (2001). [Google Scholar]
- 51.Youden, W. J. Index for rating diagnostic tests. Cancer3, 32–35 (1950). [DOI] [PubMed] [Google Scholar]
- 52.Stekhoven, D. J. & Bühlmann, P. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics28, 112–118 (2012). [DOI] [PubMed] [Google Scholar]
- 53.Loupy, A. et al. Virtual biopsy system. 10.7303/syn51702348
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The figure data generated in this study have been deposited in the public Synapse database (https://www.synapse.org/#!Synapse:syn51702348/files/)53. The figure data can be obtained by the signing-in process. The raw data are available from the corresponding author. Source data are provided with this paper.
Complete code to reproduce the figures is available in the synapse public Synapse database (https://www.synapse.org/#!Synapse:syn51702348/files/)53. A sign-in process is required to access the code.