Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 1.
Published in final edited form as: Acad Radiol. 2023 Oct 3;31(3):889–899. doi: 10.1016/j.acra.2023.09.009

Test Re-Test Reproducibility of Organ Volume Measurements in ADPKD using 3D Multi-Modality Deep Learning

Xinzi He 1,2, Zhongxiu Hu 2, Hreedi Dev 2, Dominick J Romano 2, Arman Sharbatdaran 2, Syed I Raza 2, Sophie J Wang 2, Kurt Teichman 2, George Shih 2, James M Chevalier 3,4, Daniil Shimonov 3,4, Jon D Blumenfeld 3,4, Akshay Goel 2, Mert R Sabuncu 1,2, Martin R Prince 2,5
PMCID: PMC10957335  NIHMSID: NIHMS1930463  PMID: 37798206

Abstract

Rationale and Objectives:

Following autosomal dominant polycystic kidney disease (ADPKD) progression by measuring organ volumes requires low measurement variability. The objective of this study is to reduce organ volume measurement variability on MRI of ADPKD patients by utilizing all pulse sequences to obtain multiple measurements which allows outlier analysis to find errors and averaging to reduce variability.

Methods:

In order to make measurements on multiple pulse sequences practical, a 3D multi-modality multi-class segmentation model based on nnU-net was trained/validated using T1, T2, SSFP, DWI and CT from 413 subjects. Reproducibility was assessed with test re-test methodology on ADPKD subjects (n = 19) scanned twice within a 3-week interval correcting outliers and averaging the measurements across all sequences. Absolute percent differences in organ volumes were compared with paired students t-test.

Results:

DICE > 97%, Jaccard Index > 0.94, mean surface distance < 1 mm and mean Hausdorff Distance < 2 cm for all three organs and all five sequences was found on internal (n = 25), external (n = 27) and test re-test reproducibility assessment (38 scans in 19 subjects). When averaging volumes measured from 5 MRI sequences, the model automatically segmented kidneys with test re-test reproducibility (percent absolute difference between exam 1 and exam 2) of 1.3% which was better than all 5 expert observers. It reliably stratified ADPKD into Mayo Imaging Classification (area under the curve = 100%) compared to radiologist.

Conclusion:

3D deep learning measures organ volumes on 5 MRI sequences leveraging the power of outlier analysis and averaging to achieve 1.3% total kidney test-retest reproducibility.

Keywords: ADPKD, artificial intelligence, reproducibility, MRI, deep learning, kidney, liver, spleen

INTRODUCTION

Autosomal dominant polycystic kidney disease (ADPKD) is the most common inherited renal disease [1, 2]. These patients develop cysts in the kidneys and liver enlarging those organs and the spleen is also often enlarged but without cysts. Total kidney volume (TKV) tracks autosomal dominant polycystic kidney disease (ADPKD) progression and can predict the time to various stages of chronic kidney disease [1, 2]. TKV was used as a primary endpoint for evaluating tolvaptan efficacy in treating ADPKD. Compared with placebo, tolvaptan significantly decreased the annualized rate of TKV growth (5.5% vs 2.8%) resulting in FDA approval [3, 4]. The Mayo Imaging Classification, which utilizes height-adjusted TKV (htTKV) and age, is now a standard predictor of disease severity in patients with ADPKD [5] and is used to determine eligibility for tolvaptan treatment [1, 2]. Follow-up TKV is measured to assess how well the medication is slowing ADPKD progression. TKV can also be used for assessing the utility of other therapies such as high water intake.

Given that the mean annual TKV increases of 2.8% and 5.5% with and without tolvaptan treatment, respectively, annual TKV measurement variability needs to be < 2.8% [4]. However, the widely utilized ellipsoidal technique for estimating TKV based upon measuring kidney length, width and depth has inter-reader variability of about 7% to 15% [6,7]. Manually contouring the kidneys has lower inter-reader variability, 3% to 6.7%, but is time consuming, subject to operator error and still not acceptable for annual TKV follow-up measurements [6,7]. Furthermore, these TKV inter-reader repeatability assessments are underestimates of true variability, which should be evaluated by test-retest methodology (imaging the patient twice, in quick succession), instead of just re-analyzing the same images by multiple observers.

In 2017, Kline et al. introduced deep learning for automating kidney segmentation to calculate TKV in ADPKD from coronal T2 images—later extending to measure the liver—and further refined by others, with deep learning now approaching the performance of experts using manual organ contouring [820]. Model-assisted segmentations allow deep learning models to obtain the performance of an expert (e.g., radiologist) but in a fraction of the time normally required for manual contouring [13, 21]. However, there remains the problem that reproducibility of the radiologist expert is still not ideal. Deep learning models can readily measure organ volumes on multiple MR pulse sequences so they can be averaged, further improving reproducibility [22]. Multiple measurements also allow for outlier analysis to detect errors in data acquisition and measurement [23]. Using a 2D U-net deep learning algorithm with model-assisted organ contouring, Dev et al. showed improved TKV reproducibility of 2.5% by averaging measurements from 5 MRI pulse sequences [22].

The aim of this study is to further improve organ volume measurement reproducibility using a 3D multi-modality, multi-class model that segments liver and spleen [24] in addition to kidneys on multiple types of MRI pulse sequences trained with data from over 1000 scans on over 400 subjects. Outlier analysis is employed for quality control and reproducibility is assessed rigorously using test-retest methodology on consecutive MRI exams performed over a short interval, where no changes in organ volumes are anticipated, comparing to assessments by 5 expert observers.

MATERIALS AND METHODS

Patients

This IRB-approved, Health Insurance Portability and Accountability Act (HIPAA) compliant study utilized existing images from 413 patients for training the deep learning algorithm (Figure 1). Among these 413 subjects that had organ volumes requested by their care providers, 389 (94%) were diagnosed with ADPKD. Performance of the model was assessed on images from 71 ADPKD patients, including local internal MRIs on 25 ADPKD patients that were not included in the training/validation pool, external MRIs from 27 ADPKD patients who were scanned at outside institutions, external CTs from 10 ADPKD patients who were scanned at outside institutions and 38 MRIs from 19 ADPKD “reproducibility subjects” who prospectively underwent serial MRI scanning twice within a 3-week interval so that no changes in organ volumes were expected. All reproducibility subjects signed informed consent prior to imaging and had no scans included in the training data. Seventeen reproducibility subjects were scanned twice on the same scanner and 2 were scanned on different scanner models. MRI scan protocol details are provided in Supplemental Table S1.

Figure 1.

Figure 1.

Flow Chart showing 413 subjects used for training/validation (note that most subjects had more than 1 sequence) and the 71 subjects used for internal test set (n=25), External test set (MRI=27, CT=10) and Reproducibility (n=19).

AI Model Development

This work implements a 3D, multi-modality, multi-class model based on nnU-net), which is trained on the manually segmented images subject to a rigorous quality control process (see supplementary methods for labeling training data), with final results verified by a board-certified radiologist in every case [25]. As shown in Figure 2, this model is an encoder-decoder network, with skip connections from the encoder to the decoder to enhance information details in the deep layers and gradient accumulation in the early layers. Both encoder and decoder contain 7 levels. Each level is made up of two consecutive 3D convolutional layers, followed by Instance Normalization layer and LeakyReLu (with negative slope: 0.01). The first 3D convolutional layer in each level—except the first and second level, which use stride (1,1,1) and (1,2,2), respectively—uses stride (2,2,2) to down sample features map by a factor of 2.

Figure 2.

Figure 2.

The network architecture. Our model uses U-Net architecture and contains 7 levels and input patch size of 96×256×256. The array on the left-hand side of each level in the encoder of U-Net represents the shape of input tensor. Each layer undergoes a convolution as specified in the superimposed black box followed by instant normalization and leaky ReLU. Dotted lines, correspond to skip connections from encoder to decoder, allow the model to utilize important spatial features in the decoding steps.

We trained the model for multi-organ segmentation using a combination of axial T2 (n = 351), axial SSFP (n = 133), axial T1 (n = 125), axial DWI (n = 17), coronal T2 (n = 175), coronal SSFP (n = 77) and CT (n = 11) images from 413 patients with right kidney (red), left kidney (green), liver (yellow) and spleen (blue) labeled by expert observers (H.D., Z.H., D.R., A.S., S.R.), then checked and corrected as necessary by a radiologist with 20 years of experience segmenting kidney and liver in ADPKD subjects (M.R.P.) [13, 2630]. Some patients had exams from more than one timepoint included in the training data set which helped to maximize training data. However, no data from patients contributing to training data were used for testing. Further training data quality control checks are described in supplementary methods.

Due to the large size of the 3D data sets, training was performed using patches (256×256×96 voxels). This was increased from the commonly used patch size of 96×96×96, which was too small to allow for proper identification of more distant landmarks necessary for the model to learn to discriminate among the various organs and to differentiate right from left kidney. To match the increased patch size, we increased the neural network field of view by using a U-net depth of 7 levels [31]. Z-normalization was used to set the mean signal intensity to zero and the standard deviation of signal intensity to 1. We used stochastic gradient descent (SGD) optimizer, starting the learning rate at 0.01 and gradually decreasing with each epoch, such that the learning rate reached 0 for the 1000th epoch [32]. Model code with instructions are available at https://github.com/Novestars/organ_volume_measurement.

AI model assessment and outlier analysis

See Supplementary Methods for formulas and details for measuring Dice similarity coefficient (DSC), Jaccard index (JI), Hausdorff distance (HD), mean surface distance (MSD) that compare model outputs to the radiologist-corrected segmentations for internal validation (n = 13), external validation (n = 27) and prospective reproducibility data sets (n = 19). For all reproducibility scans, outliers were identified as volume measurements differing by > 20% from the median volume obtained by averaging measurements from all 5 MRI pulse sequences. Outliers were investigated and corrected for data acquisition, image processing and segmentation errors, and they were re-analyzed by the model after corrections. Reproducibility of Mayo Imaging Classification based upon TKV measured from the model output (averaged across all pulse sequences acquired in one session or from a single pulse sequence) was assessed using area under the curve (AUC) analysis on all three test sets using radiologist’s TKV measurements as ground truth.

Reproducibility and accuracy

MRIs from the prospectively acquired ADPKD reproducibility subjects (n = 19) who were scanned twice, (all including the 5 MRI pulse sequences) within a 3-week interval during which there were no clinical events, were segmented using the final model checkpoints. For each case, 5 expert observers corrected the model contours and recorded the time required to correct each organ. To prevent memory bias, the corrections were completed in random order with a minimum 1-week interval between correcting annotations on the same patient. To compare correction time to the time required for manual TKV measurements, the axial T2-weighted scans were manually contoured by one observer. Since the reproducibility subjects received two scans within three weeks, the volumes of organs measured from images can be expected to remain unchanged. Therefore, the reproducibility (test-retest reliability) of the organ volume measurement was assessed by calculating absolute percent difference between the volumes measured from MRI exam 1 and MRI exam 2. Similarly, averages of the organ volumes measured by all five observers on the 38 sets of reproducibility scans were taken as ground truth and compared to each observer’s and the model’s volume measurements to assess accuracy.

RESULTS

Deep learning-based Organ volume measurement

Artificial intelligence (AI) organ volume measurement as shown in Figure 1 was based on multi-modality and multi-class U-Net trained with pixel level annotations. This model takes a 3D MRI volume from each of the MRI pulse sequences used in training and generates a segmentation of left kidney, right kidney, liver and spleen. Final AI volume measurement takes all segmentation maps from each of the 5 pulse sequences in one MRI study to generate averaged organ volumes. Just prior to averaging the 5 measurements are searched for outliers, 20% different from the median, to be discarded or corrected to ensure robust averaging.

Study population

Demographic data along with the number of DICOM images and mean TKV, eGFR, for the 494 patients utilized in training and validation (n = 413), internal testing (n = 25), external testing (n = 27 MRI and 10 CT) and prospective reproducibility (n = 19) testing are shown in Table 1. Age, gender and estimated glomerular filtration rate (eGFR) were similar for all groups. For some training data the kidneys were not completely included within the field of view and thus Mayo classification could not be determined. The number of training cases with truncated kidneys for each modality is also indicated in Table 1. All 5 MRI pulse sequences were available in PACS for analysis in 21 of the 25 (84%) internal test set subjects. All 5 sequences were not available for any of the external test set subjects, but four out of five MRI pulse sequences were available in 5 of 27 (18%); 3 MRI pulse sequences were available in 16 and only 2 sequences for 6 cases; see Supplemental Table S2. The average volume was calculated from all available sequences. For training/validation data, labels were available for all 5 sequences in 51 subjects. In 17 of the 19 (89%) reproducibility subjects, the scan was repeated on the same scanner and for the other 2 it was repeated on a different scanner.

Table 1.

Clinical and demographic characteristics for the training/validation as well as prospective, external and reproducibility test sets.

Parameters Training/Validation Data Internal Test Set External CT External Test Set Prospective Reproducibility Test Set Total
Axial T2 Axial SSFP Axial T1 Coronal T2 Coronal SSFP Axial DWI CT
Patients Scans 351
445
133
146
125
134
175
199
77
79
17
17
11
12
25
118
10
10
27
80
19
190
488
1,319
DICOM images 26,133 8,03 7 18,72 8 8,151 3,261 978 1,66 3 9,199 1,203 4,486 11,124 92,067
Male:Female
(% male)
195:25
0
(44%)
65:81
(45%)
64:70 (48%) 91:108 (46%) 33:46
(42%)
8:9
(47%)
7:5
(58%)
51:67(43 %) 2:8
20%
52:28
(65%)
100:90 (53%) 612:712
46%
Age: 48
39–60
44 46
37–59
45
38–58
42
34–57
46 48 43
30–56
55
54–66
49
40–60
51
35 – 65
46
37–58
Median, IQR2 35–58 37–61 38–54
eGFR3:
Median, IQR
71
47–94
71
42–95
66
41–88
67
41–89
71
41–94
71
58–80
62
44–69
78
61–100
54
40–92
60
42–76
62
40–82
70
43–94
BMI4:
Median, IQR
26
23–28
25
22–28
26
22–29
26
23–28
25
22–28
25
24–26
27
26–30
24
22–27
25
23–27
26
23–28
25
21–30
25
23–28
htTKV5:
Median, IQR
619
315–1168
712
393–1346
720
393–1378
733
416–1468
709
425–1409
651
265–1116
681
390–862
661
396–1078
871
426–1069
624
378–1406
545
322–1213
674
360–1245
1A 86 7 16 25 9 4 1 1 2 6 8 177
1B 107 29 34 43 20 3 4 40 2 22 14 334
1C 107 24 39 62 22 5 5 44 6 13 6 342
1D 56 14 22 41 15 3 2 26 0 19 4 203
1E 37 12 11 21 13 1 0 5 0 4 6 119
Truncated Kidneys6 51 60 12 7 0 1 0 2 0 16 0 149
1

based upon scans

2

IQR = interquartile range

3

eGFR = estimated glomerular filtration rate (mL/min/1.73m2)

4

BMI = body mass index (kg/m2)

5

htTKV = height adjusted total kidney volume (cc), excluding cases with truncated kidneys (see 6)

6

truncated kidneys means kidneys not completely included within the field-of-view and therefore Mayo Classification could not be determined; Class 2 patients with asymmetric disease were excluded.

Model validation

Model validation results for each MRI pulse sequence are shown in Table 2a (internal test set), 2b (external test set). Dice similarity coefficient and Jaccard index were > 0.97 for all 5 pulse sequences and all 4 organs (right kidney, left kidney, liver, and spleen) for the internal test set and > 0.96 for the external test sets. A slightly higher DICE for left kidney compared to right kidney, Supplemental Table S3, may reflect the difficulty distinguishing right kidney from liver when the liver is also polycystic. Mean surface distance and the Hausdorff distance measured also show the ability of the model to accurately segment organs. The external test set had a larger average mean surface distance of 0.28 ± 2 mm compared to the internal test set (0.25 ± 0.6 mm) but is still excellent and less than the voxel dimensions. While the mean surface distance captures the average performances of the model, Hausdorff distance measures the largest distance between a labeled voxel and the ground truth. For the internal test set, mean Hausdorff distance was 1 ± 2 cm, and 0.8 ± 1.2 cm for the external test set. An example of performance on internal test images is shown in Figure 3a along with the automatically produced report, Figure 3b and a graph showing ht-TKV/Mayo Classification over time, Figure 3c. Model performance on external images from a 1.2 T Hitachi OASIS open MRI scanner, a scanner not used for any training data and at a field strength different from all training data, is shown in Figure 4 demonstrating excellent generalizability. Supplemental Figure S1 shows excellent generalizability to a subject with no cysts and Supplemental Figure S2 shows good generalizability to decubitus and oblique rotations.

Table 2.

Internal test set (a, n = 25) and external test set (b, n = 27) and external CT (n = 10) agreement with ground truth (expert radiologist).

a. Internal (mean ± standard deviation)
Axial T2 Axial T1 Axial SSFP Coronal T2 Coronal SSFP Average
Dice similarity coefficient 0.99 ± 0.01 0.97 ± 0.07 0.98 ± 0.03 0.98 ± 0.01 0.97 ± 0.04 0.98 ± 0.04
Jaccard Index 0.97 ± 0.02 0.96 ± 0.08 0.96 ± 0.05 0.97 ± 0.03 0.95 ± 0.06 0.96 ± 0.06
Mean surface distance (mm) 0.137 ± 0.2 0.498 ± 1 0.240 ± 0.5 0.143 ± 0.2 0.228 ± 0.4 0.25 ± 0.6
Hausdorff distance (mm) 9.4 ± 23 13.1 ± 25 7.7 ± 8 7.6 ± 13 10.7 ± 19 9.7 ± 19
Right Kidney Left Kidney Liver Spleen Average
Dice similarity coefficient 0.98 ± 0.03 0.99 ± 0.008 0.98 ± 0.03 0.97 ± 0.07 0.98 ± 0.04
Jaccard Index 0.96 ± 0.04 0.97 ± 0.02 0.96 ± 0.04 0.94 ± 0.09 0.96 ± 0.06
Mean surface distance (mm) 0.278 ± 0.5 0.190 ± 0.6 0.268 ± 0.5 0.264 ± 0.7 0.25 ± 0.6
Hausdorff Distance (mm) 10.5 ± 25 6.6 ± 15 15.2 ± 20 6.4 ± 11 9.7 ± 19
b. External (mean ± standard deviation)
Axial T2 Axial T1 Axial SSFP Coronal T2 Coronal SSFP Average
Dice similarity coefficient 0.99 ± 0.01 0.96 ± 0.1 0.98 ± 0.03 0.98 ± 0.02 0.98 ± 0.02 0.98 ± 0.2
Jaccard Index 0.97 ± 0.02 0.94 ± 0.1 0.96 ± 0.06 0.97± 0.03 0.96 ± 0.04 0.97 ± 0.04
Mean surface distance (mm) 0.102 ± 0.08 2.5 ± 20 0.252 ± 0.5 0.226 ± 1 0.202 ± 0.3 0.281 ± 2
Hausdorff Distance (mm) 5.5 ± 6 19.8 ± 42 7.76 ± 8 8.0 ± 13 8.4 ± 10 7.85 ± 12
Right Kidney Left Kidney Liver Spleen Average
Dice similarity coefficient 0.98 ± 0.04 0.99 ± 0.01 0.98 ± 0.02 0.98 ± 0.01 0.98 ± 0.2
Jaccard Index 0.962 ± 0.06 0.972 ± 0.02 0.966 ± 0.3 0.966 ± 0.022 0.966 ± 0.04
Mean surface distance (mm) 0.530 ± 3 0.268 ± 1 0.215 ± 0.3 0.109 ± 0.1 0.281 ± 2
Hausdorff Distance (mm) 7.7 ± 13 7.0 ± 14 12.4 ± 14 4.3 ± 4 7.85 ± 12
c. External (mean ± standard deviation)
Right Kidney Left Kidney Liver Spleen Average
Dice similarity coefficient 0.96 ± 0.06 0.95 ± 0.08 0.95 ± 0.04 0.93 ± 0.1 0.95 ± 0.08
Jaccard Index 0.92 ± 0.1 0.92 ± 0.1 0.88 ± 0.2 0.91 ± 0.07 0.91 ± 0.1
Mean surface distance (mm) 1.51 ± 2 1.23 ± 3 1.44 ± 1 0.87 ± 2 1.26 ± 2
Hausdorff Distance (mm) 38.1 ± 49 33.0 ± 52 32.6 ± 14 9.19 ± 9 28.2 ± 37

Figure 3.

Figure 3.

Example of Model output of a coronal T2 image from the internal set: (a) on the XNAT viewer with the 5 MRI sequences along the left and the near perfect automatic model segmentations (red = right kidney, green = left kidney, yellow = liver, purple = spleen) on the right; (b) measurement report including organ volumes on each sequence, the average of all sequences and the standard deviation which helps to assess confidence. Note that when outlier values for organ volumes are detected, those segmentations can be corrected or discarded and the report re-run. (c) Temporal plot showing total kidney volume at every timepoint for which data was provided to the model.

Figure 4.

Figure 4.

Example of performance on Hitachi OASIS open MRI scanner at 1.2T. The acquired image contains multiple white pixel artifacts, demonstrating excellent model performance generalization to a field strength and scanner manufacturer not included in the training data. White pixel artifact is another feature of this internal test set case that was not present in the training data.

Performance of the algorithm on diffusion weighted (DWI) images was not as good as for the other sequences (Supplemental Figure S3a) due to the low resolution, low SNR and image distortion of DWI. Accordingly, this sequence was not further evaluated. Performance on CT appeared to be excellent (Figure S3b and Table 2c) but was not evaluated prospectively for reproducibility in this study because of the ethical concerns over nonbeneficial radiation exposure. There are more challenging cases in the test set, with lower DICE, including a subject that had a left kidney nephrectomy and iron overload with a dark liver as shown in Supplemental Figure S4a and a patient with massive cystic enlargement of the liver making it challenging to distinguish from right kidney, Supplemental Figure S4b.

Model-assisted segmentation is order of magnitude faster than manual segmentation

Average time to manually segment the organs from scratch were as follows: right and left kidneys (9:13 ± 5:30 and 9:44 ± 6:00 minutes, respectively), liver (17:51 ± 12:16 minutes) and spleen (4:32 ± 2:29 minutes) on axial T2 images totaled 41:20 ± 26:15 minutes (see Table 3). For 5 observers, however, the mean time for model-assisted segmentation of right and left kidney was much faster (0:47 ± 0:06 and 0:46 ± 0:07minutes, respectively), liver (1:02 ± 0:06 minutes) and spleen (1:12 ± 0:11 minutes), all of which totaled to 3:47 ± 0:30 minutes, an order of magnitude reduction compared to fully manual organ contouring. The time for model assisted segmentation of all 5 MRI pulse sequence averaged 18:54 ± 12:42 minutes, which is twice as fast as manual segmentation of just one pulse sequence.

Table 3.

Average times (minutes:seconds) for observers (n = 5) to manually correct the model segmentation in 19 subjects scanned twice.

Manual Axial T2 Model-Assisted Segmentation Times
Axial T2 Axial T1 Axial SSFP Coronal T2 Coronal SSFP Average Time per Sequence 5-Sequence Time
Liver 17:51 ± 12:16 0:54 ± 0:29 1:04 ± 0:29 1:03 ± 0:23 0:58 ± 0:43 1:11 ± 0:25 1:02 ± 0:06 5:10 ± 2:28
Spleen 4:32 ± 2:29 1:27 ± 2:27 1:25 ± 0:52 1:01 ± 0:23 1:01 ± 0:15 1:07 ± 0:28 1:12 ± 0:11 5:59 ± 4:24
Right kidney 9:13 ± 5:30 0:40 ± 0:22 1:12 ± 1:24 0:37 ± 0:15 0:44 ± 0:20 0:43 ± 0:18 0:47 ± 0:06 3:55 ± 2:39
Left kidney 9:44 ± 6:00 0:40 ± 0:19 1:11 ± 0:52 0:43 ± 0:23 0:41 ± 0:15 0:43 ± 0:28 0:46 ± 0:07 3:50 ± 3:11
Total 41:20 ± 26:15 3:40 ± 3:35 4:51 ± 4:25 3:23 ± 1:25 3:18 ± 1:18 3:42 ± 1:58 3:47 ± 0:30 18:54 ± 12:42

Model organ volume measurement reproducibility

Absolute percent differences for organ volumes measurements between MRI exam1 and MRI exam 2 for the model and each observer are shown in Table 4. For manual contouring, the absolute percent difference in TKV between MRI exam 1 and MRI exam 2 on axial T2 was 5.6%, which is similar to prior reports of variability for manual contouring [33]. Compared to manual contouring, model-assisted contouring had improved reproducibility with the percent absolute differences between MRI exam 1 and MRI exam 2 ranging from 1.5% to 3.2% for TKV, 1.3% to 3.9% for liver and 2.5% to 6.2% for spleen among the 5 observers. Although the smallest organ, spleen, had the greatest measurement variability, Bland Altman plots, Figure 5, show that percent differences did not vary by volume within individual organs. By averaging the volumes measured using individual pulse sequences, the reproducibility of model-assisted volume measurement improved with |% difference| dropping to 1.6%, 1.6%, 3.4% for TKV, liver volume, and spleen volume respectively. The model alone without any corrections performed even better for TKV with an absolute percent difference of 1.3%.

Table 4.

Absolute percent difference between TKV measurements on consecutive MRI scans for automatic contouring performed by the model, for model-assisted contouring performed by 5 expert observers correcting the model output and for completely manual contouring by a single observer with data only on axial T2. Absolute percent difference is provided for each MRI pulse sequence and for the combination of all sequences.

Pulse Sequence(s) for |%diff| in TKV Model Model-assisted contouring: 5 Observers Manual Contouring
1 2 3 4 5 Mean
Axial T2 1.9% 2.3% 2.3% 2.0% 2.3% 2.2% 2.2% 5.6%
Axial T1 1.9% 2.3% 2.8% 1.9% 2.3% 2.0% 2.3%
Axial SSFP 2.3% 3.2% 3.3% 3.1% 3.2% 3.1% 3.2%
Coronal T2 2.5% 2.4% 2.7% 2.1% 1.9% 2.3% 2.3%
Coronal SSFP 1.7% 1.5% 1.7% 1.7% 1.6% 1.3% 1.5%
All Sequences 1.3% 1.5% 1.7% 1.5% 1.7% 1.5% 1.6%
Pulse Sequence(s) for |%diff| in liver volume Model Model-assisted contouring: 5 Observers Manual Contouring
1 2 3 4 5 Mean
Axial T2 3.2% 4.1% 3.5% 5.1% 3.5% 3.5% 3.9% 6.8%
Axial T1 1.3% 1.0% 1.1% 1.9% 0.9% 1.5% 1.3%
Axial SSFP 2.0% 2.3% 2.5% 2.3% 2.2% 2.6% 2.4%
Coronal T2 2.3% 2.3% 2.8% 2.6% 2.9% 2.4% 2.6%
Coronal SSFP 1.3% 1.2% 1.5% 1.3% 1.2% 1.4% 1.3%
All Sequences 1.6% 1.5% 1.5% 1.9% 1.7% 1.5% 1.6%
Pulse Sequence(s) for |%diff| in spleen volume Model Model-assisted contouring: 5 Observers Manual Contouring
1 2 3 4 5 Mean
Axial T2 5.6% 5.8% 6.0% 5.6% 5.4% 6.1% 5.8% 9.4%
Axial T1 3.0% 2.6% 2.5% 3.1% 2.1% 2.4% 2.5%
Axial SSFP 6.0% 6.0% 6.4% 6.3% 6.8% 5.4% 6.2%
Coronal T2 4.7% 3.8% 4.4% 3.9% 4.2% 5.0% 4.3%
Coronal SSFP 5.4% 5.4% 5.9% 6.3% 4.9% 6.0% 5.7%
All Sequences 3.7% 3.2% 3.5% 3.6% 3.6% 3.3% 3.4%

Figure 5.

Figure 5.

Bland Altman plots of percent differences in organ volumes between scan 1 and scan 2 for (a) TKV, (b) liver, (c) spleen. Open diamonds correspond to model output and colored dots correspond to the 5 observers. Corrections to the model output made by 5 expert observers (colored dots) only minimally improved performance over the automated model.

Model Accuracy relative to average of expert observers

The accuracy of volume measurements made utilizing the model inferences was assessed using the average organ volumes of all expert observers as the standard of reference, where the absolute percent differences between the volumes are shown in Table 5 and Figure 5 along with the volumes measured by individual observers compared to the standard of reference (See Supplemental Table S3 for |% difference| between measurements from each pulse sequence and the standard of reference). The range of absolute percent differences for each observer was 0.27% to 0.57% (TKV), 0.27% to 0.46% (Liver) and 0.46% to 0.87% (spleen) compared to 0.57%, 0.61% and 1.14% for the fully automatic, uncorrected model respectively. Thus, the model alone performed within this range of 5 observers for TKV and liver and nearly within this range for spleen.

Table 5.

Accuracy of model and each observer compared to the mean of all observers (ground truth) for total kidney volume (TKV), liver and spleen.

|% Difference| vs. Standard Model Individual Observers
1 2 3 4 5 Mean
TKV 0.57% 0.57% 0.30% 0.27% 0.35% 0.33% 0.36%
Liver 0.61% 0.39% 0.27% 0.34% 0.29% 0.46% 0.34%
Spleen 1.14% 0.75% 0.46% 0.63% 0.69% 0.87% 0.68%

Accuracy for fully automatic volume measurements to predict Mayo Imaging Classification is shown in Table 6, 100% for the average of 5 measurements.

Table 6.

Model Accuracy for Predicting Mayo Classification for 71 test cases compared to the radiologist’s prospective report.

Area Under Curve*
By Exam By Sequence
Internal Test Set 100% 98.3%
External Test Set 100% 100%
External CT Test Set 100%
Prospective Reproducibility
  MRI exam 1 100% 96.8%
  MRI exam 2 100% 98.9%
*

Excluding cases with truncated organs.

DISCUSSION

These data from 1416 MRI scans on 492 subjects demonstrate an AI assisted approach to measuring TKV with superior test-retest reliability, 1.3%, compared to 5 expert observers and superior to prior reports of inter-reader variability for the popular ellipsoidal and manual contouring organ volume measurement methods [5, 34]. Here reproducibility was assessed in 19 ADPKD subjects scanned twice within a 3-week interval where no changes in TKV were expected [35]. High performance was achieved by making multiple TKV measurements, one measurement from each of the imaging sequences typically obtained from abdominal MRI. Finding outliers among these multiple measurements provided an opportunity for quality control to correct data acquisition issues including duplicate or overlapping slices. Averaging these multiple measurements reduced random measurement variation. ADPKD Mayo Classifications calculated from the fully automatic AI TKV measurements were in agreement with the expert observers.

This high measurement reproducibility approach required training our 3D model with multiple different types of images including T1-weighted, T2-weighted, SSFP images as well as CT in multiple imaging planes. We believe this augmented model delineation of ADPKD kidney, liver and spleen anatomy, makes it robust as reflected in the excellent performance on external validation. The superhuman reproducibility of the AI model may partly reflect the ability of the model to better integrate the third dimension of anatomic information as well as multi-modality information into the segmentation compared to human operators, who excel at analyzing 2D images but are not as good with 3D and multi-modality data.

Our 3D model Dice similarity coefficients of > 0.97 to 0.99 are better than most prior models which reported Dice similarity coefficient in the range of 0.8 to 0.97 [820]. Compared to other 2D based segmentation methods, 3D convolution empowers our model to better capture anatomy since the extra 3rd dimension adds unique discriminating information. Our model performance is also high because of the large number of training cases and our rigorous quality control on the training data. All training data was reviewed by multiple observers and searched for errors using an earlier implementation of the model. Interestingly, model performance in the spleen was not as good as in kidneys and liver. This surprised us because the homogenous spleen is easily segmented. So this may reflect minor errors and partial volume inaccuracies at edges being proportionately larger for spleen due to its smaller size. This is supported by the highest resolution sequence, axial T1, having the best test-retest reproducibility for spleen. For the kidneys, coronal SSFP had the best reproducibility and for liver axial T1 and coronal SSFP were equally best for reproducibility. However, T2 sequences had the highest DICE and axial T1 had the lowest DICE so it is difficult to recommend a single sequence. Indeed, given the presence of sequence biases, averaging multiple sequences is necessary to maximize measurement consistency and reproducibility.

The 1.3% absolute difference in TKV between MRI exam 1 and MRI exam 2 for the model averaging measurements from 5 MRI pulse sequences is reduced from the 3% to 6.7% variability reported previously for inter-reader agreement by manual contouring [6,7]. It is well under the typical 2.8% to 5.5% annual TKV growth rate in ADPKD with or without tolvaptan treatment making it well suited for annual TKV measurements and treatment follow-up [4]. Model output can be further refined by input from an expert observer; this is known as model-assisted segmentation. Our data indicates, however, that model-assisted segmentation does not substantially further improve measurement reproducibility. The Mayo imaging classification calculated from model TKV was the same as for expert observers in all 19 prospective reproducibility cases.

Accurate performance of the model on external data sets indicates excellent generalizability. This likely results from training the model with both CT and MRI data including many MRI sequences with different acquisition planes and contrast mechanisms. Model performance on external CT scans, albeit not as good as MRI, was surprisingly good given only 12 CT scans were used for training/validation. This suggests a high degree of transfer learning. Likely, the model grasps organ anatomy sufficiently well to recognize organ boundaries independent of modality.

Van Gastel et al has reported a TKV measurement bias on MRI with T2 weighted imaging producing slightly larger TKV measurements compared to T1 weighted images [35] and this has recently been confirmed by others [21]. The approach here of averaging TKV measurements from 5 MRI pulse sequences mitigates these biases which may also be contributing to the improved reproducibility. The best mitigation will occur when averaging all 5 sequences. However, in many of the external cases, not all 5 sequences were available and even in some of our internal test cases a sequence might be missing due to data corruption, incomplete anatomic coverage, excessive artifact or protocol error. For these cases we believe it is better to average all available sequences instead of basing organ volume measurements on just a single sequence, but we do not know the magnitude of this benefit. In the future it may be useful to explore combining multiple sequences into a super-resolution volume instead of just averaging.

Although the 3D model performed within the range of the expert observers, for TKV and liver volume measurements, there is still room for further improvement. In particular, the spleen measurement’s reproducibility and accuracy were inferior to TKV and liver. All patients were scanned supine with none scanned in prone or decubitus positions, so there is no assessment of the generalizability to various body rotations in ADPKD patients although the performance on rotations a healthy volunteer were promising, Supplemental Figure S4. Another limitation of this study is that manual measurements were only performed on axial T2 pulse sequences; the comparisons with manual measurements may have been different for other sequences. This model is only calculating TKV, liver volume and spleen volume even though refinements to the TKV biomarker have been proposed to improve renal function prediction accuracy based upon average population statistics including exophytic cysts, cyst number, cyst size, and hemorrhagic cysts [3640]. On a population basis, these are exciting refinements. Once we can establish reproducibility on an individual patient basis, these additional measures can be added along with additional tissues/organs relevant to ADPKD.

Supplementary Material

1

Supplemental Figure S1. Example of model performance on a subject without ADPKD showing excellent generalization of model performance to kidneys and liver without cysts as well as to massive splenomegally.

Supplemental Figure S2. Example of model performance on a healthy subject without ADPKD scanned in a) right side down lateral decubitus, b) right anterior oblique and c) right posterior oblique positions showing excellent generalization of model performance in different body rotations in spite of all training cases being scanned supine.;

Supplemental Figure S3. (a) Low SNR and low resolution on DWI make it challenging to assess segmentation accuracy. (b) Typical, accurate model performance on CT.

Supplemental Figure S4. Examples of sub-optimal model performance leading to lower DICE.

Supplemental Table S1. MR imaging parameters for the reproducibility cases at 1.5T and 3T.

Supplemental Table S2. Number of subjects with each combination of MRI sequences.

Supplemental Table S3. Dice similarity coefficient, Jaccard Index, mean surface distance and Hausdorff distance for right and left kidney by sequence (n = 25 for Ax T2 and Ax T1, n = 24 for Cor T2, n = 22 for Cor SSFP and Ax SSFP) of the internal test set.

Acknowledgements:

Support from Weill Cornell Medicine Radiology, the Shaw Foundation and NIH, grant UL1TR002384 is gratefully acknowledged.

Footnotes

Conflicts of Interest: Study “Polycystic Kidney Disease Data Repository” is registered on ClinicalTrials.gov (NCT00792155). George Shih discloses co-chairman of the Society of Abdominal radiology Artificial Intelligence committee and the Society of Imaging Informatics machine learning committee. Daniil Shimonov discloses consulting for Accordant/CVS on chronic kidney disease education. None of the other authors have anything to disclose.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Grantham JJ, Torres VE The importance of total kidney volume in evaluating progression of polycystic kidney disease. Nat Rev Nephrol 2016. 12: 667–677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chapman AB, Bost JE, Torres VE, et al. Kidney Volume and Functional Outcomes in Autosomal Dominant Polycystic Kidney Disease. Clin J Am Soc Nephrol 2012. 7: 479–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Torres VE, Higashihara E, Devuyst O, et al. Effect of Tolvaptan in Autosomal Dominant Polycystic Kidney Disease by CKD Stage: Results from the TEMPO 3:4 Trial. Clin J Am Soc Nephrol 2016. 11: 803–811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Torres VE, Chapman AB, Devuyst O, et al. Tolvaptan in Later-Stage Autosomal Dominant Polycystic Kidney Disease. N Engl J Med 2017. 377: 1930–1942. [DOI] [PubMed] [Google Scholar]
  • 5.Irazabal MV, Rangel LJ, Bergstralh EJ, et al. Imaging Classification of Autosomal Dominant Polycystic Kidney Disease: A Simple Model for Selecting Patients for Clinical Trials. JASN 2015. 26: 160–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Demoulin N, Nicola V, Michoux N, et al. Limited Performance of Estimated Total Kidney Volume for Follow-up of ADPKD. Kidney Int Rep 2021. 6: 2821–2829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sharma K, Caroli A, Quach LV, Petzold K, Bozzetto M, et al. Kidney volume measurement methods for clinical studies on autosomal dominant polycystic kidney disease. PLOS ONE 2017. 12(5): e0178488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sharbatdaran A, Romano D, Teichman K, et al. Deep Learning Automation of Kidney, Liver, and Spleen Segmentation for Organ Volume Measurements in Autosomal Dominant Polycystic Kidney Disease. Tomography 2022. 8: 1804–1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.van Gastel MDA, Edwards ME, Torres VE, et al. Automatic Measurement of Kidney and Liver Volumes from MR Images of Patients Affected by Autosomal Dominant Polycystic Kidney Disease. JASN 2019. 30: 1514–1522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kline TL, Korfiatis P, Edwards ME, et al. Performance of an Artificial Multi-observer Deep Neural Network for Fully Automated Segmentation of Polycystic Kidneys. JDI 2017. 30: 442–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kim Y, Ge Y, Tao C, et al. Automated Segmentation of Kidneys from MR Images in Patients with Autosomal Dominant Polycystic Kidney Disease. Clin J Am Soc Nephrol 2016. 11: 576–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jagtap JM, Gregory AV, Homes HL, et al. Automated measurement of total kidney volume from 3D ultrasound images of patients affected by polycystic kidney disease and comparison to MR measurements. Abdom Radiol 2022. 47: 2408–2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Goel A, Shih G, Riyahi S, et al. Deployed Deep Learning Kidney Segmentation for Polycystic Kidney Disease MRI. Radiol Artif Intell 2022. 4: e210205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Raj A, Tollens F, Hansen L, et al. Deep Learning-Based Total Kidney Volume Segmentation in Autosomal Dominant Polycystic Kidney Disease Using Attention, Cosine Loss, and Sharpness Aware Minimization. Diagnostics 2022. 12: 1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mu G, Ma Y, Han M, et al. Automatic MR kidney segmentation for autosomal dominant polycystic kidney disease. SPIE 2019. 10950. [Google Scholar]
  • 16.Taylor J, Thomas R, Metherall P, et al. MO012: Development of an Accurate Automated Segmentation Algorithm to Measure Total Kidney Volume in ADPKD Suitable for Clinical Application (The Cystvas Study). NDT 2022. 37. [Google Scholar]
  • 17.Keshwani D, Kitamura Y, Li Y. Computation of Total Kidney Volume from CT images in Autosomal Dominant Polycystic Kidney Disease using Multi-Task 3D Convolutional Neural Networks. arXiv 2018. [Google Scholar]
  • 18.Onthoni DD, Sheng T, Sahoo PK, et al. Deep Learning Assisted Localization of Polycystic Kidney on Contrast-Enhanced CT Images. Diagnostics 2020. 10: 1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Shin TY, Kim H, Lee JH, et al. Expert-level segmentation using deep learning for volumetry of polycystic kidney and liver. Investig Clin Urol 2020. 61: 555–564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hsiao CH, Lin PC, Chung LA, et al. A deep learning-based precision and automatic kidney segmentation system using efficient feature pyramid networks in computed tomography images. Comput Methods Programs Biomed 2022. 221: 106854. [DOI] [PubMed] [Google Scholar]
  • 21.Potretzke TA, Korfiatis P, Blezek DJ, et al. Clinical Implementation of an Artificial Intelligence Algorithm for Magnetic Resonance–Derived Measurement of Total Kidney Volume. Mayo Clin Proc 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dev H, Zhu C, Sharbatdaran A, et al. Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease. JMRI 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhu C, Dev H, Sharbatdaran A, He X, Shimonov D, Chevalier JM, Blumenfeld JD, Wang Y, Teichman K, Shih G, et al. Clinical Quality Control of MRI Total Kidney Volume Measurements in Autosomal Dominant Polycystic Kidney Disease. Tomography 2023, 9: 1341–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yin X, Prince WK, Blumenfeld JD, et al. Spleen phenotype in autosomal dominant polycystic kidney disease. Clinical Radiology 2019. 74: 975.e917–975.e924. [DOI] [PubMed] [Google Scholar]
  • 25.Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021. 18: 203–211. [DOI] [PubMed] [Google Scholar]
  • 26.Zhang W, Stephens CJ, Blumenfeld JD, et al. Relationship of Seminal Megavesicles, Prostate Median Cysts, and Genotype in Autosomal Dominant Polycystic Kidney Disease. JMRI 2019. 49: 894–903. [DOI] [PubMed] [Google Scholar]
  • 27.Farooq Z, Behzadi AH, Blumenfeld JD, et al. Complex liver cysts in Autosomal Dominant Polycystic Kidney Disease. Clin Imaging 2017. 46: 98–101. [DOI] [PubMed] [Google Scholar]
  • 28.Liu J, Yin X, Dev H, et al. Pleural Effusions on MRI in Autosomal Dominant Polycystic Kidney Disease. J Clin Med 2023. 12: 386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kim JA, Blumenfeld JD, Prince MR Seminal Vesicles in Autosomal Dominant Polycystic Kidney Disease. Codon Publications, 2015, pp 443–455. [PubMed] [Google Scholar]
  • 30.Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA, 18 Aug 2009. [Google Scholar]
  • 31.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015. [Google Scholar]
  • 32.Zhang S, Choromanska A, LeCun Y. Deep learning with Elastic Averaging SGD. arXiv 2015. [Google Scholar]
  • 33.Edwards ME, Periyanan S, Anaam D, et al. Automated total kidney volume measurements in pre-clinical magnetic resonance imaging for resourcing imaging data, annotations, and source code. Kidney Int 2021. 99: 763–766 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zöllner FG, Svarstad E, Munthe-Kaas AZ, et al. Assessment of Kidney Volumes From MRI: Acquisition and Segmentation Techniques. AJR 2012. 199: 1060–1069. [DOI] [PubMed] [Google Scholar]
  • 35.van Gastel MDA, Messchendorp AL, Kappert P, et al. T1 vs. T2 weighted magnetic resonance imaging to assess total kidney volume in patients with autosomal dominant polycystic kidney disease. Abdom Radiol 2018. 43, 1215–1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Torres VE, Chapman AB, Devuyst O, et al. Tolvaptan in patients with autosomal dominant polycystic kidney disease. N Engl J Med 2012. 367: 2407–2418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bae KT, Tao C, Zhu F, et al. MRI-based kidney volume measurements in ADPKD: reliability and effect of gadolinium enhancement. Clin J Am Soc Nephrol 2009. 4: 719–725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Riyahi S, Dev H, Blumenfeld JD, et al. Hemorrhagic Cysts and Other MR Biomarkers for Predicting Renal Dysfunction Progression in Autosomal Dominant Polycystic Kidney Disease. JMRI 2021. 53: 564–576. [DOI] [PubMed] [Google Scholar]
  • 39.Kline TL, Korfiatis P, Edwards ME, et al. Image texture features predict renal function decline in patients with autosomal dominant polycystic kidney disease. Kidney Int 2017. 92: 1206–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Karner LA, Arjune S, Todorova P, et al. Cyst Fraction as a Biomarker in Autosomal Dominant Polycystic Kidney Disease. J Clin Med 2022. 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pei Y. Diagnostic approach in autosomal dominant polycystic kidney disease. Clin J Am Soc Nephrol 2006. 1: 1108–1114. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Supplemental Figure S1. Example of model performance on a subject without ADPKD showing excellent generalization of model performance to kidneys and liver without cysts as well as to massive splenomegally.

Supplemental Figure S2. Example of model performance on a healthy subject without ADPKD scanned in a) right side down lateral decubitus, b) right anterior oblique and c) right posterior oblique positions showing excellent generalization of model performance in different body rotations in spite of all training cases being scanned supine.;

Supplemental Figure S3. (a) Low SNR and low resolution on DWI make it challenging to assess segmentation accuracy. (b) Typical, accurate model performance on CT.

Supplemental Figure S4. Examples of sub-optimal model performance leading to lower DICE.

Supplemental Table S1. MR imaging parameters for the reproducibility cases at 1.5T and 3T.

Supplemental Table S2. Number of subjects with each combination of MRI sequences.

Supplemental Table S3. Dice similarity coefficient, Jaccard Index, mean surface distance and Hausdorff distance for right and left kidney by sequence (n = 25 for Ax T2 and Ax T1, n = 24 for Cor T2, n = 22 for Cor SSFP and Ax SSFP) of the internal test set.

RESOURCES