Computer Generated R.E.N.A.L. Nephrometry Scores Yield Comparable Predictive Results to that of Human-Expert Scores in predicting oncologic and perioperative outcomes

N Heller; R Tejpaul; F Isensee; T Benidir; M Hofmann; P Blake; Z Rengal; K Moore; N Sathianathen; A Kalapara; J Rosenberg; S Peterson; E Walczak; A Kutikov; RG Uzzo; DA Palacios; EM Remer; SC Campbell; N Papanikolopoulos; CJ Weight

doi:10.1097/JU.0000000000002390

. Author manuscript; available in PMC: 2023 May 1.

Published in final edited form as: J Urol. 2021 Dec 30;207(5):1105–1115. doi: 10.1097/JU.0000000000002390

Computer Generated R.E.N.A.L. Nephrometry Scores Yield Comparable Predictive Results to that of Human-Expert Scores in predicting oncologic and perioperative outcomes

N Heller ^a, R Tejpaul ^a, F Isensee ^b, T Benidir ^h, M Hofmann ^h, P Blake ^c, Z Rengal ^c, K Moore ^d, N Sathianathen ^e, A Kalapara ^e, J Rosenberg ^c, S Peterson ^f, E Walczak ^c, A Kutikov ^g, RG Uzzo ^g, DA Palacios ^h, EM Remer ^i,^h, SC Campbell ^h, N Papanikolopoulos ^a, CJ Weight ^h

PMCID: PMC8995335 NIHMSID: NIHMS1765759 PMID: 34968146

Abstract

Purpose:

To automate R.E.N.A.L. nephrometry scoring of preoperative computed tomography (CT) scans and create an artificial intelligence generated score (AI-Score). Subsequently, to evaluate its ability to predict meaningful oncologic and perioperative outcomes as compared to expert human-generated nephrometry scores (H-score.)

Materials and Methods:

300 patients with pre-operative CT’s were identified from a cohort of 544 consecutive patients undergoing surgical extirpation for suspected renal cancer at a single institution. A deep neural network approach was used to automatically segment kidneys and tumors and geometric algorithms were developed to estimate components of R.E.N.A.L. Tumors were independently scored by medical personnel blinded to AI-scores. AI- and H-score agreement was assessed using Lin’s concordance correlation and their predictive abilities for both oncologic and perioperative outcomes were assessed using areas under the curve.

Results:

Median age was 60 years (IQE 51–68), 40% were female. Median tumor size was 4.2 cm, 91.3% had malignant tumors, including 27%, 37% and 24% with high-stage, grade and necrosis, respectively. There was significant agreement between H-scores and AI-scores (Lin’s ⍴=0.59). Both AI- and H-scores similarly predicted meaningful oncologic outcomes (p<0.001) including presence of malignancy, necrosis, high grade and high stage disease (p<0.003.). They also predicted surgical approach (p<0.004) and specific perioperative outcomes (p<0.05.)

Conclusions:

Fully automated AI-generated R.E.N.A.L. scores are comparable to human generated R.E.N.A.L. scores and predict a wide variety of meaningful patient-centered outcomes. This unambiguous AI-based scoring is intended to facilitate wider adoption of the R.E.N.A.L score.

Keywords: Nephrometry, R.E.N.A.L. Score, Machine Learning, Artificial Intelligence

INTRODUCTION

Developed over a decade ago, R.E.N.A.L. and PADUA Nephrometry scores have aided in surgical decision making^1–3. In addition, they were found to predict meaningful perioperative outcomes and oncologic parameters including tumor grade, stage, and patient survival.^1,4–5 Despite clear advantages, their widespread adoption, especially outside of academic centers, has been modest. Potential reasons for this lack of adoption include: the required unreimbursed time needed to complete such scores from pressed clinicians, score ambiguity and interobserver variability.^6,7

Deep Learning (DL), a promising subfield of machine learning (ML), has made tremendous progress in prediction problems with high-dimensional data^8–11. DL models such as convolutional neural networks have shown considerable promise of non-inferiority when compared to human experts in predicting malignancy and pathologic features.^12,13 Radiographically, DL can be applied by means of semantic segmentation (SS), in which every image pixel (or voxel) is individually classified to a specific “region of interest” to allow high level delineation of disease parameters and spatial relationships^14–18.

Despite meaningful advances in Urologic DL^(19–29), its role in the context of renal mass evaluation at the time of initial diagnosis has remained underexplored. We hypothesize that a fully automated R.E.N.A.L. nephrometry score could be achieved by DL-based SS followed by the extraction of R.E.N.A.L.’s components using geometric algorithms (AI-Score). Secondly, we aimed to study its correlation with human-based nephrometry scores (H-Score) and compare its predictive abilities for meaningful patient-centered outcomes.

Methods

1. Cohort Assembly

From 2010–2018, at a single institution, we collected 544 consecutive patients undergoing surgical extirpation for a renal mass, following ethics board approval. Among those, we identified 300 consecutive patients with preoperative arterial phase CT’s. Preoperative CT’s were acquired from over 70 different institutions in the Midwest United States, using scanners from over four different manufactures. Overall inclusion was based on a well-described and previously published KiTS19 challenge protocol.^28–29 Patients were excluded if they were undergoing treatment for known benign disease or had tumor thrombus. Following patient selection, tumors were provided a human generated R.E.N.A.L. score (H-Score) by a medical personnel trained by a urologic oncologist (CW) and whom demonstrated proficiency in R.E.N.A.L. scoring in a test cohort. Human scorers were blinded to AI-Scores.

2. Semantic Segmentation

a. Creating the “ground truth for Deep Learning

In order to compute R.E.N.A.L automatically, we first segmented these structures from CT images. by manually segmented each individual voxel of all 300 CT scans as “kidney”, “kidney tumor” or “background.” These human generated segmentations were completed by expert medical personnel and accounted for nearly 50,000 individual axial slices. The resulting dataset is referred to as “ground truth” and provided the training material for the DL-algorithm.

The presence of foreign body, inflammation/edema or stranding made voxel delineation challenging. In these situations, segmentation consistency was accomplished by pre-defined Hounsfield Unit (HU) thresholds which could help advise the reader of the likelihood of adipose tissue being present (see Figure 1a–b). Thereafter, an algorithm was developed that enabled renal hilum to be bounded. This was accomplished on axial view by computing the line between the outermost flanking parenchyma on each side of the hilum (see Figure 1c–d). Tumors were then segmented with no other constraints other than that they must coincide within tissue labeled “kidney” (see Figure 1e). The resulting segmentation (Figure 1f) is what our DL model was trained to produce, and formed the basis for our geometric algorithm to extract R.E.N.A.L. scores. Figure 1 provides a stepwise approach to the AI-generated R.E.N.A.L score.

Figure 1: — A demonstration of the progression from manual delineation to kidney and tumor segmentations to be used to train the machine learning models. a) the manual kidney segmentation, b) the result of applying the HU threshold inside of the contour from a, c) the hilum bound automatically identified in cases where there is a significant deficiency in convexity—present in all but case iii, d) the result of including voxels within the hilum bound, e) the subsequent manual delineation of the tumor, and f) the result of restricting tumor voxels to lie within the kidney region from d.

b. Extracting R.E.N.A.L. components from AI segmented CT scans

The segmentations predicted by the AI model provided an excellent description of renal parenchyma and tumor regions comparable to the human generated segmentations (Sorenson-Dice score=0.92). However, they did not define the urinary collecting system (UCS) or polar lines (“N” and “L” components.). In order to define these regions, we utilized HU thresholds to identify adipose density within the kidney region, which was assumed to be sinus fat. Thereafter, polar lines were unambiguously defined by the highest (cranial) and lowest (caudal) axial slices in which sinus fat voxels were adjacent to voxels labeled “background” (see Figure 2). Similarly, since the sinus fat can be thought to surround UCS, we used a convex hull operation to identify all voxels above our HU threshold which resided inside the hull of sinus fat and took those voxels as an approximation to a UCS segmentation (see Figure 3a–c). After the tumor and kidney had been segmented and the UCS and polar lines were inferred, each component of R.E.N.A.L. was extracted. Of note, the “A” which describes anterior vs posterior dimensions was omitted for this study. We summarize the algorithms to extract each component individually below. Figure 4 provides a visual demonstration for each component.

Figure 2: — A demonstration of the process used to identify the renal polar and center lines. a) a coronal view with the polar lines marked in solid yellow and the center line marked in dashed yellow b) an axial cut superior to the interpolar region, c) the cranial-most axial cut in which sinus or UCS lies on the exterior of the kidney region, defining the superior polar line, d) the axial cut halfway between the superior and inferior polar lines, defining the center line e) the caudal-most axial cut meeting the criteria of c, defining the inferior polar line, and f) an axial cut inferior to the interpolar region.

Figure 3: — A demonstration of the automatic process of identifying the renal sinus and urinary collecting system given a kidney segmentation. a) an axial slice without any segmentation, b) the full kidney segmentation with a HU threshold used to identify sinus fat (blue) within it, c) the urinary collecting system (red) is identified as all voxels within the convex hull which encapsulates the sinus.

Figure 4: — Segmentation-based characterization of RENAL score components. (A) Diameter, (B) endophyticity, (C) nearness to renal sinus and (D) location relative to polar lines

1) Radius: Find the distance between the two “tumor” voxels that are furthest apart. Traditional R.E.N.A.L. size cutoffs were used to assign a score of either 1, 2, or 3 (< 4 cm, 4 – 7 cm, and > 7 cm, respectively) (Figure 4a).
2) Endophycity: For every axial slice that contains tumor voxels, find the convex hull of the “kidney” region excluding the tumor. For every “tumor” voxel, record whether it lies inside (endophytic) or outside (exophytic) of this convex hull. Endophytic proportions were quantized according to the traditional R.E.N.A.L. score instructions, where a score of 1, 2, or 3 is assigned for endophytic proportions of <50%, <100% and 100% respectively (Figure 4b).
3) Nearness to the Collecting System: Compute the distance between the nearest pair of voxels where one lies within the “tumor” region and the other lies within the “UCS” region. Use the traditional R.E.N.A.L. thresholds to assign a score of either 1, 2, or 3 (> 7 mm, 4 – 7 mm, and < 4 mm, respectively)
4) Longitudinal Location: Count tumor voxels according to whether they lie within or on the polar lines, or outside of them (see Figure 4d). If any tumor voxels exist in the center image, a 3 is assigned regardless. If not, thresholds of 0% and 50% are used to delineate 1s from 2s and 2s from 3s based on the voxel counts, respectively.

3. Challenging the AI-generated R.E.N.A.L algorithm

A competition within the DL community, named KiTS19, was used to identify a high-quality DL system for automatically segmenting CT scans.^28–29 A cross-validation approach was used to train five segmentation models on a randomly-selected set of 210 cases out of our total 300. For each patient in the entire cohort, the model(s) which did not see that case during training were used in order to predict a segmentation mask for all 300 cases. For the 90 cases that were not used to train any of the models, a majority voting was used to synthesize the segmentations predicted by all five models. The 300 predicted segmentations were then fed to the geometric algorithm which extracted each component of the R.E.N.A.L. score.

Statistical Analysis

The agreement between AI- and H-scores was assessed using Lin’s concordance correlation coefficient, and their discriminative abilities using area under the curve (AUC) for surgical approach, perioperative and pathologic outcomes. JMP® Pro 14.2 statistical software (SAS Institute Inc.) was used where appropriate. Figure 5 provides a schematic breakdown of our DL system.

Figure 5. — Schematic diagram demonstrating AI generated segmentation, subsequent coding to fully automated R.E.N.A.L scores and its comparison to a human generated R.E.N.A.L score.

Results

Concordance between H and AI generated R.E.N.A.L Scores

The median age was 60 (IQR 51–68) with 120 (40%) being female vs. 240 (60%) male (Table 1). The median tumor size was 4.2 cm (IQR 2.6–6.2). Among all resected tumors, 275 (92%) were malignant, including 75 (27%), 92 (37%) and 64 (24%) with high stage, high grade and the presence of tumor necrosis, respectively. Surgically, 221 (74%) underwent a minimally invasive approach and 188 (63%) had a nephron sparing approach. Other details involving pathologic outcomes and perioperative outcomes are further described in Table 1.

Table 1.

Patient Characteristics of 300 patients undergoing surgical extirpation of renal mass

		N=300

Demographics	Gender - n (%)
	Male	180 (60%)
	Female	120 (40%)

	Age (yrs.)-Median (IQR)	60 (51–68)

	Tumor diameter (cm)-Median (IQR)	4.2 (2.6–6.2)

	Body Mass Index (kg/m²)-Mean (SD)	30.9 (6.7)

	Baseline eGFR (ml/min/1.73m2)-Median (IQR)	71 (60–80)

	AI-R.E.N.A.L. score-Median (IQR)	8 (6–9)

	Human-R.E.N.A.L. score-Median (IQR)	8 (6–9)

Pathologic Outcomes	Malignant Renal Mass - n (%)	275 (92%)

	Pathologic Stage - n (%)
	1a	121 (44%)
	1b	59 (21%)
	2a	15 (5%)
	2b	5 (1.8%)
	3	70 (25%)
	4	5 (1.8%)

	Tumor Necrosis Present -n (%)	64 (24%)

	Tumor Grade
	1	33 (14%)
	2	119 (49%)
	3	66 (27%)
	4	26 (10%)

Surgical Approach	Surgical Technique - n (%)
	MIS(Laparoscopic/Robotic)	221 (73.7%)
	Open	79 (26.3%)

	Nephrectomy Type - n (%)
	Partial	188 (63%)
	Radical	112 (37%)

Perioperative Outcomes	Estimated Blood Loss (mL)-Median (IQR)	200 (100–400)

	Blood Transfusions-n (%)	5 (1.7%)

	Complications
	Any Grade Complication	72 (24%)
	High Grade Complications	21 (7%)

	eGFR change 3 months post-surgery mean (SD)
	Entire Cohort	−13 (15.3)
	Radical Nephrectomy	−24 (11.3)
	Partial Nephrectomy	−6 (13.3)

	Length of Hospital Stay-Median (IQR)	3 (2–4)

	Readmission to Hospital - n (%)	29 (0.9%)

Open in a new tab

The algorithm was able to formulate an AI generated score in 98% of candidates, while 6 (2%) patients had limitations in the segmentation masks preventing such a computation. The median (IQR) for the AI-scores and H-scores were, 8 (6–9) and 8 (6–9), respectively. There was substantial agreement between AI and Human Generated R.E.N.A.L Scores with a Lin’s concordance correlation coefficient of ⍴=0.59, (95% CI 0.51–0.66) p<0.0001 (Figure 6). In addition to total R.E.N.A.L score agreement, each individual component of the R.E.N.A.L system demonstrated significant agreement (p<0.001) with “R” holding the greatest (kappa coefficient 0.8, p<0.0001) and the remaining components demonstrating fair agreement (0.27–0.40, p<0.0001) (Table 2).

Figure 6. — Comparison of median (IQR indicated by error bars) human generated RENAL scores (H-Scores) on the y axis vs. computer-generated RENAL scores (AI-Scores) on the x axis. For example, of interpretation, for all the patients with an AI generated RENAL score of 9, the median H-Score that group of patients was also 9, however, for the patients with an AI generated RENAL score of 4, the median H-Score that group of patients was 6. Lin’s concordance correlation coefficient ρ=0.59, p<0.0001.

Table 2.

Agreement between Human Generated R.E.N.A.L. Nephrometry Score and AI Generated Nephrometry Score

	Agreement Statistic (95% CI)	P value
Total R.E.N.A.L. Score^a	0.60 Moderate	<0.0001
Components of R.E.N.A.L^b
R	0.80 (0.74–0.86) Substantial	<0.0001
E	0.27 (0.18–0.36) Fair	<0.0001
N	0.28 (0.20–0.38) Fair	<0.0001
A	0.40 (0.33–0.48) Fair	<0.0001
L	0.31 (0.23–0.40) Fair	<0.0001

Open in a new tab

Spearman’s ⍴

kappa coefficient

AI and H Score prediction of oncologic outcomes

AI-Scores performed similar to H-Scores in predicting meaningful oncologic parameters. including 1) presence of malignancy (AUC 0.67, p=0.0026 vs. AUC 0.63, p=0.037), 2) high stage disease (>pT2) (AUC 0.65, p=0.0002 vs. AUC 0.71, p<0.0001), 3) high grade disease (Furhman Grade 3–4) (AUC 0.63, p=0.0002 vs. AUC 0.65, p<0.0001) and tumor necrosis (AUC 0.72, p =0.0001 vs AUC 0.74, p <0.0001) (Table 3 and Figure 7 describe these findings in detail).

Table 3.

Comparison of the Predictive Ability of the Human Generated R.E.N.A.L. Nephrometry Score and AI Generated Nephrometry Score for 294 patients.^a

		Human ROC	Human p value	AI ROC	AI p value
Pathologic Outcomes	Malignant vs, Benign	0.63	0.0237	0.67	0.0026
	High Pathologic Stage (pT3,T4)	0.71	<0.0001	0.65	0.0002
	Tumor Necrosis Present	0.74	<0.0001	0.72	<0.0001
	High Tumor Grade (Furhman Grades 3–4)	0.65	<0.0001	0.63	0.0002
Surgical Approach	Surgical Technique (minimally invasive approach)	0.68	<0.0001	0.61	0.0038
Surgical Approach	Nephrectomy Type Partial Nephrectomy	0.79	<0.0001	0.74	<0.0001
Perioperative Outcomes	Estimated Blood Loss^b	35.8	0.0017	36.1	0.0455
	Blood Transfusions	0.72	0.10	0.77	0.0451
	Length of Hospital Stay^b	1.3	0.0322	1.8	0.3315
	Readmission to Hospital	0.51	0.87	0.54	0.50
	Complications Any Grade Complication High Grade Complications	0.50 0.57	0.87 0.2892	0.55 0.55	0.21 0.42
	eGFR change post surgery^b	−2.1	0.002	−0.3	0.0002

Open in a new tab

The Deep Learning Segmentation, failed on 6 tumors and a R.E.N.A.L. Nephrometry Score was not calculated in these 6 patients and therefore excluded from the comparison

Continuous variables have the reported parameter estimate per unit and associated p value

Figure 7. — Comparison of area under the curves (AUCs) using the computer generated R.E.N.A.L. score (AI-Score) and human generated R.E.N.A.L. score (H-Score) in predicting pathologic outcomes: A) Malignant vs. Benign Final Pathology, B) High Stage Tumor (pT3–4), B) High Grade Tumor (ISUP Grade 3–4), D) Tumor Necrosis.

AI and H Score prediction of non-oncologic outcomes

The AI-Scores were significantly associated with guidance towards surgical approach, such as nephron sparing surgery (AUC 0.74, p=0.0002) and minimally invasive surgery (AUC 0.61, p=0.0038). Several other perioperative outcomes were able to be predicted with both the AI-Score and H Score, notably, estimated blood loss, perioperative blood transfusion requirements (p<0.05) and change in estimated glomerular filtration rate postoperatively (p<0.001) (Table 3). Neither the AI-Score nor the H-Score were able to predict hospital readmission nor individual complications.

Discussion

Nephrometry scores represent a significant advance in our ability to systematically extract clinical information from preoperative images^14–21. However, infiltrative histology, high grade disease and patient factors (i.e inflammation) limit the confidence in true tumor boundaries with respect to contiguous structures. With the needed time required in generating such scores, widespread adoption has remained modest at best. The use of DL in this space remains highly intriguing as a way of reducing evaluation time, providing unambiguous assessment of imaging and also in providing a similar if not greater set of conclusions which can guide optimal care.

Using SS, we were able to extract components of the R.E.N.A.L score. The segmentations predicted by the AI model provided descriptions of renal parenchyma and tumor regions similarly to human generated segmentations (Sorenson-Dice score=0.92). Despite a rigorous 50,000 image evaluation to create a “ground truth” upon which the AI would be trained, limitations do exist. Voxel ambiguity was a foreseeable limitation and though unavoidable, was minimized using a standardized HU measurement to predict the likelihood of collecting system, sinus fat or parenchyma density. This was a useful adjunct to the visual interpretation. Future considerations would include the utilization of additional CT phases (i.e arterial followed by delayed phase) to delineate renal parenchyma from UCS. However, challenges would exist here as well as a DL system must be able to identify the same voxel taken approximately 10–15 minutes apart with accuracy. Fortunately, the single phase arterial CT evaluation was robust enough for confident segmentation by our AI model. Secondly, the “N” and “L” segments of the renal score required unique segmentation to identify the UCS and polar lines. Our solution was to utilize rule-based geometric algorithms. The convex hull approach is slightly different then the R.E.N.A.L approach by using a straight line between the edges of the tumor (solid line in Figure 4b) rather than an estimated parenchymal arc (dashed line in Figure 4b). This and other instances where automatic segmentations could potentially diverge from the true anatomy could account for some of the disagreement between automatic and manual R.E.N.A.L. scores. As segmentation is an inherently transparent vehicle for radiomic image analysis, clear errors can always be discarded and one could revert to traditional manual measurements. The overseeing clinicians is always encouraged to oversee such segmentations and decide if he/she agrees with the automated masks.

Another potential limitation is our single center experience. Though surgery was conducted at a single institution, the preoperative images were captured from over 70 external hospitals. With this widespread variability, we believe that this supports the generalizability and external validity of the segmentation model and its results. In addition, once the DL training algorithm was complete, we ensured its reliability using 210 cases for training and a 90 case test set with five-fold cross validation. The winner of the KiTs2019 challenge segmentation model was selected for our study. By promoting competition in the research community, we were able to objectively select the most promising AI segmentation model. In future studies, we encourage ongoing challenges and validation in geographically and temporally disparate patient cohorts.

Our algorithm generated an AI score for 98% of provided cases. Whether the 2% were due to poor image quality, motion artifact or significant renal atrophy was not elucidated. We feel this low percentage does corroborate with real world datasets in which certain images are difficult to interpret. This also supports the importance of medical experts overseeing AI algorithms and interpreting inconclusive results. In addition, as the algorithm improves in experience, we hope the proportions of inconclusive cases will further diminish.

Finally, while various imaging modalities exist to extrapolate meaningful data, using CT was the most pragmatic. We omitted patients with renal vein thrombus which often requires Magnetic resonance imaging (MRI) for better delineation. Further work is encouraged in various other imaging modalities (i.e MRI segmentation in tumor thrombi patients.)

Our automated AI-scores were non inferior to manually rendered scores in their ability to predict numerous meaningful outcomes. The R.E.N.A.L score among AI and H scores had a p=0.59 Lin concordance which is comparable to interobserver agreement between human observers. Each individual component also demonstrated agreement, with “R” having the best agreement (kappa coefficient 0.8, p<0.0001). AI Scores performed similarly to H-Scores for oncologic parameters including presence of malignancy, high stage/grade disease and the presence of tumor necrosis. Its ability to predict the safety of minimally invasive approach and partial nephrectomy further supports its use in surgical decision making (p<0.004). Finally, both scores corroborated with prediction of estimated blood loss, blood transfusion and post-operative eGFR (p<0.05). This certainly forewarns the surgeon of potential needed preoperative consults and operative planning (p<0.05). Its inability to predict individual complications or readmission was similar to the human generated scores. As a variable and multifactorial outcome, this last point is not surprising.

One advantage towards automating our R.E.N.A.L. score is to mitigate the time required and unreimbursed effort on the part of clinician when performing such calculations. With an easy-to-use system, our hope is to eliminate this barrier and encourage greater adoption. In addition, we hope to provide a standardized platform which reduces ambiguity regardless of the disease complexity or the treating teams experience. This work lays the groundwork for generating not just the R.E.N.A.L. score, but all Nephrometry scores simultaneously. The components can be automatically derived and be expressed as either continuous or categorical variables likely increasing the predictive capacity in future work.

Conclusions

Our AI-generated nephrometry scores were similar to human generated scores in providing reliable quantification of renal tumor complexity. In addition, the AI Score robustly predicted a variety of oncologic and perioperative outcomes. External validation of these results is necessary prior to implementation into clinical practice.

Glossary

AI Score: Artificial Intelligence Score
AUC: area under curve
CT: computed tomography
DL: Deep learning
H Score: Human generated Score
HU: Hounsfield units
IQR: interquartile range
KiTS19: Kidney and Kidney tumor segmentation challenge
ML: Machine learning
MRI: Magnetic Resonance Imaging
SS: Semantic Segmentation
UCS: urinary collecting system

References

1.Joshi SS, Uzzo RG R e n a l Tu m o r A n a t o m i c Complexity Clinical Implications for Urologists Kidney cancer Nephrometry score Renal mass Tumor complexity. Urol. Clin. NA 44, 179–187 (2017). [DOI] [PubMed] [Google Scholar]
2.Kutikov A, Uzzo RG The R.E.N.A.L. Nephrometry Score: A Comprehensive Standardized System for Quantitating Renal Tumor Size, Location and Depth. J. Urol 182, 844–853 (2009). [DOI] [PubMed] [Google Scholar]
3.Ficarra V, Novara G, Secco S, et al. Preoperative aspects and dimensions used for an anatomical (PADUA) classification of renal tumours in patients who are candidates for nephron-sparing surgery. Eur Urol 56, 786–793 (2009). [DOI] [PubMed] [Google Scholar]
4.Weight CJ, Atwell TD, Fazzio RT et al. A Multidisciplinary Evaluation of Inter-Reviewer Agreement of the Nephrometry Score and the Prediction of Long-Term Outcomes. J. Urol 186, 1223–1228 (2011). [DOI] [PubMed] [Google Scholar]
5.Kutikov A, Smaldone MC, Egleston BL, et al. Anatomic features of enhancing renal masses predict malignant and high-grade pathology: a preoperative nomogram using the RENAL Nephrometry score. Eur. Urol 60, 241–248 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Spaliviero M, Poon BY, Aras O, et al. Interobserver variability of R.E.N.A.L., PADUA, and centrality index nephrometry score systems. World J. Urol 33, 853–858 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Chapin BF & Wood CG The RENAL nephrometry nomogram: Statistically significant, but is it clinically relevant? Eur. Urol 60, 249–251 (2011). [DOI] [PubMed] [Google Scholar]
8.Rajkomar A, Dean J & Kohane I Machine Learning in Medicine. N. Engl. J. Med 380, 1347–1358 (2019). [DOI] [PubMed] [Google Scholar]
9.Beam AL & Kohane IS Big data and machine learning in health care. JAMA - Journal of the American Medical Association 319, 1317–1318 (2018). [DOI] [PubMed] [Google Scholar]
10.LeCun Y, Bengio Y & Hinton G Deep learning. Nature 521, 436–44 (2015). [DOI] [PubMed] [Google Scholar]
11.Russakovsky O et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis 211–252 (2015). doi: 10.1007/s11263-015-0816-y [DOI] [Google Scholar]
12.Brinker TJ, Hekler A, Utikal JS et al. Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review. Journal of medical Internet research 20, e11936 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA - J. Am. Med. Assoc 316, 2402–2410 (2016). [DOI] [PubMed] [Google Scholar]
14.Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 48, 441–446 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Gillies RJ, Kinahan PE & Hricak H Radiomics: Images Are More than Pictures, They Are Data. Radiology 278, 563–577 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Parmar C, Grossmann P, Bussink J, et al. Machine Learning methods for Quantitative Radiomic Biomarkers. Nat. Publ. Gr 1–11 doi: 10.1038/srep1308718. Kocak, B., Yardimci A.H, Bektas C.T., et al. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ranjbar S & Ross Mitchell J An Introduction to Radiomics: An Evolving Cornerstone of Precision Medicine. in Biomedical Texture Analysis 223–245 (Elsevier Ltd, 2017). doi: 10.1016/b978-0-12-812133-7.00008-9 [DOI] [Google Scholar]
18.Stai B, Heller N, McSweeney S, et al. Public Perceptions of Artificial Intelligence and Robotics in Medicine. J. Endourol (2020). doi: 10.1089/end.2020.0137 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Textural differences between renal cell carcinoma subtypes: Machine learning-based quantitative computed tomography texture analysis with independent external validation. Eur. J. Radiol 107, 149–157 (2018). [DOI] [PubMed] [Google Scholar]
20.Feng Z, Rong P, Cao P, et al. ,. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma. Eur. Radiol 28, 1625–1633 (2017). [DOI] [PubMed] [Google Scholar]
21.Simmons MN, Ching CB, Samplaski MK, et al. Kidney tumor location measurement using the C index method. J Urol 183, 1708–1713 (2010). [DOI] [PubMed] [Google Scholar]
22.Hsieh P-F, Wang Y-D, Huang C-P, et al. A Mathematical Method to Calculate Tumor Contact Surface Area: An Effective Parameter to Predict Renal Function after Partial Nephrectomy. J. Urol 196, 33–40 (2016). [DOI] [PubMed] [Google Scholar]
23.Ficarra V, Crestani A, Bertolo R, et al. Tumour contact surface area as a predictor of postoperative complications and renal function in patients undergoing partial nephrectomy for renal tumours. BJU Int. 123, 639–645 (2019). [DOI] [PubMed] [Google Scholar]
24.Suk-Ouichai C, Wu J, Dong W, et al. Tumor Contact Surface Area As a Predictor of Functional Outcomes After Standard Partial Nephrectomy: Utility and Limitations. Urology 116, 106–113 (2018). [DOI] [PubMed] [Google Scholar]
25.Haifler M, Ristau BT, Higgins AM, et al. External Validation of Contact Surface Area as a Predictor of Postoperative Renal Function in Patients Undergoing Partial Nephrectomy. J. Urol 199, 649–654 (2018). [DOI] [PubMed] [Google Scholar]
26.Jiang J, Qian J, Zhang Q, et al. Evaluation of surgery-related kidney volume loss to predict the outcomes of laparoscopic partial nephrectomy with segmental renal artery clamping. Int. Urol. Nephrol 52, 35–40 (2020). [DOI] [PubMed] [Google Scholar]
27.Sharma N, Zhang Z, Mir MC, et al. Comparison of 2 Computed Tomography-based Methods to Estimate Preoperative and Postoperative Renal Parenchymal Volume and Correlation With Functional Changes After Partial Nephrectomy. Urology 86, 80–86 (2015). [DOI] [PubMed] [Google Scholar]
28.Heller N et al. The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes. arXiv e-prints arXiv:1904.00445 (2019). [Google Scholar]
29.Heller N et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge. (2019). [DOI] [PMC free article] [PubMed]

[R1] 1.Joshi SS, Uzzo RG R e n a l Tu m o r A n a t o m i c Complexity Clinical Implications for Urologists Kidney cancer Nephrometry score Renal mass Tumor complexity. Urol. Clin. NA 44, 179–187 (2017). [DOI] [PubMed] [Google Scholar]

[R2] 2.Kutikov A, Uzzo RG The R.E.N.A.L. Nephrometry Score: A Comprehensive Standardized System for Quantitating Renal Tumor Size, Location and Depth. J. Urol 182, 844–853 (2009). [DOI] [PubMed] [Google Scholar]

[R3] 3.Ficarra V, Novara G, Secco S, et al. Preoperative aspects and dimensions used for an anatomical (PADUA) classification of renal tumours in patients who are candidates for nephron-sparing surgery. Eur Urol 56, 786–793 (2009). [DOI] [PubMed] [Google Scholar]

[R4] 4.Weight CJ, Atwell TD, Fazzio RT et al. A Multidisciplinary Evaluation of Inter-Reviewer Agreement of the Nephrometry Score and the Prediction of Long-Term Outcomes. J. Urol 186, 1223–1228 (2011). [DOI] [PubMed] [Google Scholar]

[R5] 5.Kutikov A, Smaldone MC, Egleston BL, et al. Anatomic features of enhancing renal masses predict malignant and high-grade pathology: a preoperative nomogram using the RENAL Nephrometry score. Eur. Urol 60, 241–248 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Spaliviero M, Poon BY, Aras O, et al. Interobserver variability of R.E.N.A.L., PADUA, and centrality index nephrometry score systems. World J. Urol 33, 853–858 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Chapin BF & Wood CG The RENAL nephrometry nomogram: Statistically significant, but is it clinically relevant? Eur. Urol 60, 249–251 (2011). [DOI] [PubMed] [Google Scholar]

[R8] 8.Rajkomar A, Dean J & Kohane I Machine Learning in Medicine. N. Engl. J. Med 380, 1347–1358 (2019). [DOI] [PubMed] [Google Scholar]

[R9] 9.Beam AL & Kohane IS Big data and machine learning in health care. JAMA - Journal of the American Medical Association 319, 1317–1318 (2018). [DOI] [PubMed] [Google Scholar]

[R10] 10.LeCun Y, Bengio Y & Hinton G Deep learning. Nature 521, 436–44 (2015). [DOI] [PubMed] [Google Scholar]

[R11] 11.Russakovsky O et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis 211–252 (2015). doi: 10.1007/s11263-015-0816-y [DOI] [Google Scholar]

[R12] 12.Brinker TJ, Hekler A, Utikal JS et al. Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review. Journal of medical Internet research 20, e11936 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA - J. Am. Med. Assoc 316, 2402–2410 (2016). [DOI] [PubMed] [Google Scholar]

[R14] 14.Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 48, 441–446 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Gillies RJ, Kinahan PE & Hricak H Radiomics: Images Are More than Pictures, They Are Data. Radiology 278, 563–577 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Parmar C, Grossmann P, Bussink J, et al. Machine Learning methods for Quantitative Radiomic Biomarkers. Nat. Publ. Gr 1–11 doi: 10.1038/srep1308718. Kocak, B., Yardimci A.H, Bektas C.T., et al. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Ranjbar S & Ross Mitchell J An Introduction to Radiomics: An Evolving Cornerstone of Precision Medicine. in Biomedical Texture Analysis 223–245 (Elsevier Ltd, 2017). doi: 10.1016/b978-0-12-812133-7.00008-9 [DOI] [Google Scholar]

[R18] 18.Stai B, Heller N, McSweeney S, et al. Public Perceptions of Artificial Intelligence and Robotics in Medicine. J. Endourol (2020). doi: 10.1089/end.2020.0137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Textural differences between renal cell carcinoma subtypes: Machine learning-based quantitative computed tomography texture analysis with independent external validation. Eur. J. Radiol 107, 149–157 (2018). [DOI] [PubMed] [Google Scholar]

[R20] 20.Feng Z, Rong P, Cao P, et al. ,. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma. Eur. Radiol 28, 1625–1633 (2017). [DOI] [PubMed] [Google Scholar]

[R21] 21.Simmons MN, Ching CB, Samplaski MK, et al. Kidney tumor location measurement using the C index method. J Urol 183, 1708–1713 (2010). [DOI] [PubMed] [Google Scholar]

[R22] 22.Hsieh P-F, Wang Y-D, Huang C-P, et al. A Mathematical Method to Calculate Tumor Contact Surface Area: An Effective Parameter to Predict Renal Function after Partial Nephrectomy. J. Urol 196, 33–40 (2016). [DOI] [PubMed] [Google Scholar]

[R23] 23.Ficarra V, Crestani A, Bertolo R, et al. Tumour contact surface area as a predictor of postoperative complications and renal function in patients undergoing partial nephrectomy for renal tumours. BJU Int. 123, 639–645 (2019). [DOI] [PubMed] [Google Scholar]

[R24] 24.Suk-Ouichai C, Wu J, Dong W, et al. Tumor Contact Surface Area As a Predictor of Functional Outcomes After Standard Partial Nephrectomy: Utility and Limitations. Urology 116, 106–113 (2018). [DOI] [PubMed] [Google Scholar]

[R25] 25.Haifler M, Ristau BT, Higgins AM, et al. External Validation of Contact Surface Area as a Predictor of Postoperative Renal Function in Patients Undergoing Partial Nephrectomy. J. Urol 199, 649–654 (2018). [DOI] [PubMed] [Google Scholar]

[R26] 26.Jiang J, Qian J, Zhang Q, et al. Evaluation of surgery-related kidney volume loss to predict the outcomes of laparoscopic partial nephrectomy with segmental renal artery clamping. Int. Urol. Nephrol 52, 35–40 (2020). [DOI] [PubMed] [Google Scholar]

[R27] 27.Sharma N, Zhang Z, Mir MC, et al. Comparison of 2 Computed Tomography-based Methods to Estimate Preoperative and Postoperative Renal Parenchymal Volume and Correlation With Functional Changes After Partial Nephrectomy. Urology 86, 80–86 (2015). [DOI] [PubMed] [Google Scholar]

[R28] 28.Heller N et al. The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes. arXiv e-prints arXiv:1904.00445 (2019). [Google Scholar]

[R29] 29.Heller N et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge. (2019). [DOI] [PMC free article] [PubMed]

PERMALINK

Computer Generated R.E.N.A.L. Nephrometry Scores Yield Comparable Predictive Results to that of Human-Expert Scores in predicting oncologic and perioperative outcomes

N Heller

R Tejpaul

F Isensee

T Benidir

M Hofmann

P Blake

Z Rengal

K Moore

N Sathianathen

A Kalapara

J Rosenberg

S Peterson

E Walczak

A Kutikov

RG Uzzo

DA Palacios

EM Remer

SC Campbell

N Papanikolopoulos

CJ Weight

Abstract

Purpose:

Materials and Methods:

Results:

Conclusions:

INTRODUCTION

Methods

1. Cohort Assembly

2. Semantic Segmentation

a. Creating the “ground truth for Deep Learning

Figure 1:

b. Extracting R.E.N.A.L. components from AI segmented CT scans

Figure 2:

Figure 3:

Figure 4:

3. Challenging the AI-generated R.E.N.A.L algorithm

Statistical Analysis

Figure 5.

Results

Concordance between H and AI generated R.E.N.A.L Scores

Table 1.

Figure 6.

Table 2.

AI and H Score prediction of oncologic outcomes

Table 3.

Figure 7.

AI and H Score prediction of non-oncologic outcomes

Discussion

Conclusions

Glossary

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases