Abstract
Objective:
To evaluate the quality of radiomics studies on pituitary adenoma according to the radiomics quality score (RQS) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD).
Methods:
PubMed MEDLINE and EMBASE were searched to identify radiomics studies on pituitary adenomas. From 138 articles, 20 relevant original research articles were included. Studies were scored based on RQS and TRIPOD guidelines.
Results:
Most included studies did not perform pre-processing; isovoxel resampling, signal intensity normalization, and N4 bias field correction were performed in only five (25%), eight (40%), and four (20%) studies, respectively. Only two (10%) studies performed external validation. The mean RQS and basic adherence rate were 2.8 (7.6%) and 26.6%, respectively. There was a low adherence rate for conducting comparison to “gold-standard” (20%), multiple segmentation (25%), and stating potential clinical utility (25%). No study stated the biological correlation, conducted a test–retest or phantom study, was a prospective study, conducted cost-effectiveness analysis, or provided open-source code and data, which resulted in low-level evidence. The overall adherence rate for TRIPOD was 54.6%, and it was low for reporting the title (5%), abstract (0%), explaining the sample size (10%), and suggesting a full prediction model (5%).
Conclusion:
The radiomics reporting quality for pituitary adenoma is insufficient. Pre-processing is required for feature reproducibility and external validation is necessary. Feature reproducibility, clinical utility demonstration, higher evidence levels, and open science are required. Titles, abstracts, and full prediction model suggestions should be improved for transparent reporting.
Advances in knowledge:
Despite the rapidly increasing number of radiomics researches on pituitary adenoma, the quality of science in these researches is unknown. Our study indicates that the overall quality needs to be significantly improved in radiomics studies on pituitary adenoma, and since the concept of RQS and IBSI is still unfamiliar to clinicians and radiologist researchers, our study may help to reach higher technical and clinical impact in the future study.
Introduction
Pituitary adenomas are the third most common primary brain tumors and comprise 16.7% of all brain tumors. 1 Although pituitary adenomas are rarely malignant, they may significantly disrupt the quality of life and often require medical interventions, surgery, radiation treatment, and life-long follow-up. 2 MRI remains the gold-standard of imaging for diagnosis and follow-up combined with biochemical tests and visual testing. 3–6 Tumor volume and the location relationship with adjacent structures (i.e. optic chiasm, cavernous sinus, etc.) need to be evaluated with MRI. MRI also plays an important role in assessing the therapeutic response of pituitary adenomas by tracking tumor volume changes. 7
Radiomics is an emerging field for extracting quantitative, mineable, and high-dimensional data from medical images to support decision-making. The underlying concept is that quantitative features (i.e. shape, first-order, and second-order [texture] features) may reflect the underlying pathophysiology of the tissue. 8 Previous radiomics studies on neuroradiology have mostly focused on glioma. 9–11 However, an increasing number of recent radiomics studies have shown promising findings on diverse tasks in pituitary adenomas; specifically, differential diagnosis, 12–21 as well as prediction of underlying pathology, 22–25 response to treatment, 26–29 and recurrence or progression. 30,31
For radiomics to overcome the current translational gap between an exploratory research and a clinically approved tool for decision-making, there is a need for methodology standardization for reproducibility and evaluation of the clinical-biomarker and biomarker-outcome correlations. 32 Recently, a radiomics quality score (RQS) was proposed as useful for assessing the quality of studies. 33 There have been previous studies on RQS in oncology 34,35 and glioma. 36 Furthermore, the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) initiative is also desirable, because radiomics research is a model-based approach. To the best of our knowledge, the quality of science in radiomics studies on pituitary adenoma remains unclear.
Therefore, this study aimed to evaluate the quality of radiomics studies on pituitary adenoma using RQS and TRIPOD. Specifically, we aim to discuss the current challenges underlying the translational gap in radiomics for imaging biomarkers of pituitary adenomas, as well as to postulate for future clinical application.
Methods and materials
Systematic search strategy and study selection
PubMed MEDLINE (n = 64) and EMBASE (n = 74) databases were used to search and collect all original research papers using radiomics analysis, published until September 17, 2020 and the following terms were used for search: (“pituitary adenoma” OR “pituitary macroadenoma” OR “prolactinoma” OR “acromegaly” OR “Cushing”) AND (“radiomic” OR “texture”). Of the total 138 papers searched, 38 duplicated articles were removed, and 79 were further excluded for the following reasons: non-radiomics studies (n = 34), not in field of interest (n = 29), non-brain image (n = 9), conference abstracts (n = 5), review article (n = 1), and editorial (n = 1). Of the remaining 21 articles, a study with main text in language other than English (n = 1) was excluded. Finally, 20 articles were used for analysis (Figure 1 and Supplementary Material 1).
Figure 1.
Flow chart of the study selection process.
Analysis of method quality based on RQS
The RQS score consisted of a total 16 components and was classified into 6 domains to evaluate RQS, as reported in previous studies (Supplementary Material 1). 35,36 Before RQS evaluation, the reviewers conducted a meeting to achieve consensus on the evaluation criteria. Two reviewers (with 6 years and 9 years of experience in radiology, respectively) independently evaluated RQS for the articles for each of the six domains (Supplementary Material 1). In case of disagreement, a senior radiologist (with 15 years of experience) was consulted for the final decision.
For additional RQS scoring, which require further discussion due to nature of the studies on pituitary adenoma, a consensus was reached through discussion of the reviewers. For RQS, additional discussion and consensus were made for the following topics: issues for ‘validation’ (domain 2), ‘multivariate analysis with non-radiomics features’ (domain 3), and ‘comparison with gold-standard’ (domain 3) (Supplementary Material 1).
Analysis of reporting completeness based on TRIPOD statement
The TRIPOD checklist, consisting of 37 items in 22 main criteria, was applied to each article to determine the completeness of reporting. 37 The details for TRIPOD checklist and data extraction are shown in Supplementary Material 1.
Statistical analysis
The characteristics of total 20 articles were reviewed. In case a score of even one point for each item (0–16 items) is obtained, it was considered to have basic adherence for each item to RQS. The basic adherence was calculated, and the basic adherence rate (%) was calculated as proportion of the number of articles with basic adherence to number of total articles. The RQS score for each item was reported as means and standard deviation. Percentage of the ideal score (%) was calculated as proportion of mean score to ideal score for each item, and the total RQS score (-8 to 36) was counted for all articles. Total 35 items in TRIPOD was scored and adherence to individual TRIPOD items was described using proportion (%). When the overall adherence rate was calculated, validation items (10c, 10e, 12, 13, 17 and 19a) and “if done” item (5c, 11) were excluded from both the denominator and numerator.
Results
Characteristics of the 20 included radiomics studies in pituitary adenomas
Table 1 and Figure 2 present the characteristics of the 20 included radiomics studies. 12–31 The median number of patients included in the studies was 84 (range: 27–235). Regarding the journal type, there were 7 (35%), 11 (55%), and 2 (10%) clinical, 12,21,22,24,26,27,31 imaging, 13–15,17–20,23,25,28,29 and computer/neuroscience journals, 16,30 respectively. Radiomics studies were diagnostic (70%), 12–25 predictive (20%), 26–29 or prognostic (10%). 30,31 The study topics included differential diagnosis (50%), 12–21 recurrence/progression prediction (10%), 30,31 pathologic classification (20%), 22–25 and response to treatment (20%). 26–29 The assessed tumor subtypes included the entire pituitary adenoma (50%), 13–21,25 acromegaly (30%), 12,22,24–27,29 non-functioning pituitary adenoma, (15%), 23,30,31 and invasive functional pituitary adenoma (5%). 28
Table 1.
Characteristics of the 20 included radiomics studies
| Article characteristics | No. of articles |
|---|---|
| No. of subjects | 84 (27–235) |
| Journal type | |
| Clinical journal | 7 (35) |
| Imaging journal | 11 (55) |
| Computer science/Neuroscience journal | 2 (10) |
| Biomarker | |
| Diagnostic | 14 (70) |
| Predictive | 4 (20) |
| Prognostic | 2 (10) |
| Topics in pituitary adenoma | |
| Differential diagnosis | 10 (50) |
| Prediction of recurrence/progression | 2 (10) |
| Pathologic classification | 4 (20) |
| Response to treatment | 4 (20) |
| Assessed tumor subtype | |
| Entire pituitary adenoma | 10 (50) |
| Acromegaly | 6 (30) |
| NFPA | 3 (15) |
| IFPA | 1 (5) |
| Preprocessing performed | |
| Isovoxel resampling | 5 (25) |
| N4 bias field correction | 4 (20) |
| Signal intensity normalization | 8 (40) |
| Sequence used for feature extraction | |
| Conventional images | 18 (90) |
| Advanced images (Perfusion or diffusion) | 2 (10) |
| Region of interest in tumor segmentation | |
| Placement of small regions of interest | 1 (5) |
| Tumor on a single 2D slice | 5 (25) |
| Whole tumor on entire slice | 11 (55) |
| N/A | 3 (15) |
| Software for feature extraction | |
| Pyradiomics | 10 (50) |
| LIFEx | 2 (10) |
| Matlab | 3 (15) |
| ImageJ | 2 (10) |
| Omni-kinetics | 2 (10) |
| N/A | 1 (5) |
| Segmentation method | |
| Manual | 18 (90) |
| Semi-automatic | 2 (10) |
| External validation | |
| Performed | 2 (10) |
| Not performed | 18 (90) |
| Magnetic field strength (Tesla) | |
| 1.5 | 2 (10) |
| 3.0 | 14 (70) |
| 1.5 & 3.0 | 4 (20) |
IFPA, Invasive functional pituitary adenoma; N/A, not available; NFPA, non-functioning pituitary adenoma.
Numbers in parentheses are either interquartile range or percentages.
Figure 2.
Summary chart of the radiomics studies, according to the (a) topics in pituitary adenomas, (b) assessed tumor subtypes, (c) segmentation method, (d) region of interest in tumor segmentation. IFPA, invasive functional pituitary adenoma; NFPA, non-functioning pituitary adenoma.
Most of the studies did not perform preprocessing; specifically, only 5 (25%), 13,14,17,22,29 4 (20%), 17,22,29,30 and 8 (40%) 13,14,17,22,23,25,29,30 studies performed isovoxel resampling, N4 bias field correction, and signal intensity normalization, respectively. Most studies (90%) used conventional images (non-contrast T 1 weighted, T 2 weighted, or contrast-enhanced T 1 weighted images) for feature extraction while the remaining studies (10%) 19,20 used advanced images, including perfusion, or diffusion images for feature extraction. To delineate the region of interest in tumor segmentation, 1 (5%) 17 study placed small regions of interest instead of including the whole tumor, 5 (25%) 13,14,18,26,31 studies only segmented a part of the tumor on a single two-dimensional slice, and 11 (55%) 15,16,19,21,22,24,25,27–30 studies segmented the whole three-dimensional tumor volume on entire slices. The other three studies (15%) 12,20,23 did not provide information about ROI. The software used for radiomics feature extraction included Pyradiomics (50%), 12–14,17,22,24,25,27,29,30 LIFEx (10%), 16,21 Matlab (15%), 15,23,28 Image J (10%), 26,31 and Omni-kinetics (10%). 19,20 Among them, Pyradiomics and LIFEx adhere to the Image Biomarker Standardization Initiative (IBSI). 38 Regarding segmentation, manual segmentation (90%) 12–21,23–29,31 was mainly used, followed by semi-automatic segmentation (10%). 22,30 Only two (10%) 12,24 studies performed external validation. 14 (70%), 4 (20%), and 2 (10%) studies used 3.0 Tesla magnets, 1.5 Tesla magnets, and both, respectively, for radiomics analysis.
Basic adherence rate of the reporting quality according to the six key domains
Table 2 summarizes the basic adherence rates to the RQS for 16 items. Regarding domain 1, 16 (80%) studies used well-documented image protocols. 12–15,17–23,25,27–30 None of the studies performed imaging scans at multiple time points to acquire reliability and phantom study. Two (10%) studies performed semi-automatic segmentation. 22,30
Table 2.
Radiomics quality score according to the six key domains
| Basic adherence rate (%) | Mean score (mean ± standard deviation) | Percentage of the ideal score (%) | |
|---|---|---|---|
| Total 16 items (ideal score 36) | 85 (26.6) | 2.8 ± 8.7 | 7.6 |
| Domain 1 – protocol quality and stability in image and segmentation (0–5 points) | 21 (26.3) | 1.1 ± 0.7 | 21.0 |
| Protocol quality (2) | 16 (80.0) | 0.8 ± 0.4 | 40.0 |
| Test-retest (1) | 0 (0) | 0 | 0 |
| Phantom study (1) | 0 (0) | 0 | 0 |
| Multiple segmentation (1) | 5 (25.0) | 0.3 ± 0.4 | 25.0 |
| Domain 2 – feature selection and validation (-8 to 8 points) | 20 (50.0) | −1.3 ± 6.0 | −16.3 |
| Feature reduction or adjustment of multiple testing (−3 or 3) | 12 (60.0) | 0.6 ± 3.0 | 20.0 |
| Validation (−5, 2, 3, 4, or 5) | 8 (40.0) | −1.9 ± 4.0 | −38.0 |
| Domain 3 – biologic/clinical validation and utility (0–6 points) | 15 (18.8) | 1.2 ± 2.1 | 20.0 |
| Non-radiomics features (1) | 6 (30.0) | 0.3 ± 0.5 | 30.0 |
| Biologic correlations (1) | 0 (0) | 0.0 ± 0.0 | 0.0 |
| Comparison to “gold-standard” (2) | 4 (20.0) | 0.4 ± 0.8 | 20.0 |
| Potential clinical utility (2) | 5 (25.0) | 0.5 ± 0.9 | 25.0 |
| Domain 4 – model performance index (0– 5 points) | 29 (48.3) | 1.8 ± 1.3 | 36.0 |
| Cut-off analysis (1) | 10 (50.0) | 0.5 ± 0.5 | 50.0 |
| Discrimination statistics (2) | 13 (65.0) | 0.9 ± 0.8 | 45.0 |
| Calibration statistics (2) | 6 (30.0) | 0.4 ± 0.7 | 20.0 |
| Domain 5 – high level of evidence (0–8 points) | 0 (0) | 0 | 0 |
| Prospective study (7) | 0 (0) | 0 | 0 |
| Cost-effectiveness analysis (1) | 0 (0) | 0 | 0 |
| Domain 6 – open science and data (0–4 points) | 0 (0) | 0 | 0 |
Data are the number of studies. Numbers in parentheses are percentage.
Regarding domain 2, 12 (60%) studies performed feature reduction or adjustment for multiple testing. 12–15,17,23–25,27,28 Six (30%) studies performed validation based on a data set obtained from the same institution and gained +2 points. 13–16,23,28 Only two (10%) studies performed validation based on three or more data sets from distinct institutions and gained a full score of +5. 12,24 The remaining 12 studies (60%) did not perform any form of validation.
In domain 3, six (30%) studies performed multivariate analysis with non-radiomics features to develop a differential diagnosis model, pathologic classification model, and treatment response prediction model. 12,15,23,24,27,28 No study evaluated the biological correlation. Four (20%) studies performed a comparative analysis with “gold-standard”, including Ki-67 or hormonal level. 12,24,27,28 Five (25%) studies confirmed clinical current and potential utility through decision curve analysis for diagnosis of tumor consistency and cavernous sinus invasion, prediction of treatment response, and prediction of Ki-67; thus, they received two points in the “clinical utility” component. 12,15,24,27,28
In domain 4, 10 (50%) studies performed cut-off analysis using a reference standard cut-off value. 13,15,17,19,24,26–28,31 13 (65%) studies performed a resampling method (including cross-validation or bootstrapping) or discrimination statistics (including receiver operating characteristic curve or area under the curve) to determine their statistical significance. 12,14,15,17,18,20–25,27,28 Six (30%) studies reported calibration statistics with their significance or resampling methods, including cross-validation or bootstrapping. 12,15,23,24,27,28
In domain 5, none of the studies were prospective or performed cost-effective analysis. Finally, in domain 6, none of the studies were publicly available.
RQS assessment
The mean overall RQS score for the 20 radiomics studies was 2.8 ± 8.7, which was 7.6% of the ideal score (Table 2 and Figure 3). The highest and lowest scores were 18 and −7, respectively, which corresponded to 50% and −19.4% of the ideal score, respectively. Among the six domains, domain 2 (feature selection and validation) and domain 4 (model performance) had the lowest (−1.3 and −16.3%) and highest (1.8 and 36%) mean scores and percentages of the ideal score, respectively.
Figure 3.
RQS assessment results according to the six key domains. RQS, radiomics quality score.
Seven (35%) studies 18–21,26,30,31 did not perform feature selection or validation; among them, three studies had the lowest RQS. 26,30,31 The study with the highest score obtained the ideal score for protocol quality, feature reduction or adjustment for multiple testing, validation, non-radiomics features, comparison to the “gold-standard”, potential clinical utility, discrimination statistics, and calibration items.
Completeness in reporting a radiomics-based multivariable prediction model using TRIPOD assessment
Upon the inclusion of all 35 items, the mean number of reported TRIPOD items was 15.7 ± 3.85 (standard deviation; range: 11–24). Upon exclusion of “if relevant” and “if done” items from both the numerator and denominator, the adherence rate for TRIPOD was 54.6%. Table 3 demonstrates the adherence rates to individual TRIPOD items.
Table 3.
Adherence to individual TRIPOD items in radiomics study
| Total (35 item) | All article (n = 20) |
|---|---|
| Title and abstract | |
| Title | |
| 1. Identify developing/validating a model, target population, and the outcome | 1 (5) |
| Abstract | |
| 2. Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions | 0 (0) |
| Introduction | |
| Background and objectives | |
| 3a. Explain the medical context and rationale for developing/validating the model | 20 (100) |
| 3b. Specify the objectives, including whether the study describes the development/validation of the model or both | 6 (30) |
| Methods | |
| Source of data | |
| 4a. Describe the study design or source of data (randomized trial, cohort, or registry data) | 3 (15) |
| 4b. Specify the key dates | 19 (95) |
| Participants | |
| 5a. Specify key elements of the study setting including number and location of centers | 13 (65) |
| 5b. Describe eligibility criteria for participants (inclusion and exclusion criteria) | 20 (100) |
| 5c. Give details of treatment received, if relevant (n = 2) | two out of 2 |
| Outcome | |
| 6a. Clearly define the outcome, including how and when assessed | 16 (80) |
| 6b. Report any actions to blind assessment of the outcome | 5 (25) |
| Predictors | |
| 7a. Clearly define all predictors, including how and when assessed | 19 (95) |
| 7b. Report any actions to blind assessment of predictors for the outcome and other predictors | 9 (45) |
| Sample size | |
| 8. Explain how the study size was arrived at | 2 (10) |
| Missing data | |
| 9. Describe how missing data were handled with details of any imputation method | 5 (25) |
| Statistical analysis methods | |
| 10a. Describe how predictors were handled | 20 (100) |
| 10b. Specify type of model, all model-building procedures (any predictor selection), and method for internal validation | 14 (70) |
| 10d. Specify all measures used to assess model performance and if relevant, to compare multiple models (discrimination and calibration) | 6 (30) |
| Risk groups | |
| 11. Provide details on how risk groups were created, if done (yes or no, n = 0) | N/A |
| Results | |
| Participants | |
| 13a. Describe the flow of participants, including the number of participants with and without the outcome. A diagram may be helpful | 19 (95) |
| 13b. Describe the characteristics of the participants, including the number of participants with missing data for predictors and outcome | 15 (75) |
| Model development | |
| 14a. Specify the number of participants and outcome events in each analysis | 5 (25) |
| 14b. Report the unadjusted association between each candidate predictor and outcome, if done (yes or no, n = 20) | 3 (15) |
| Model specification | |
| 15a. Present the full prediction model to allow predictions for individuals (regression coefficients, intercept) | 1 (5) |
| 15b. Explain how to the use the prediction model (nomogram, calculator, etc) | 7 (35) |
| Model performance | |
| 16. Report performance measures (with confidence intervals) for the prediction model | 10 (50) |
| Discussion | |
| Limitations | |
| 18. Discuss any limitations of the study | 20 (100) |
| Interpretation | |
| 19b. Give an overall interpretation of the results | 19 (95) |
| Implications | |
| 20. Discuss the potential clinical use of the model and implications for future research | 18 (90) |
| For validation (Types 2a, 2b, 3, and 4) | n = 8 |
| Methods—Statistical analysis methods | |
| 10c. Methods—Statistical analysis methods: describe how the predictions were calculated | 4 (50) |
| 10e. Describe any model updating (recalibration), if done (n = 0) | 0 (0) |
| Methods | |
| 12. Identify any differences from the development data in setting, eligibility criteria, outcome, and predictors | 6 (75) |
| Results | |
| 13c. Show a comparison with the development data of the distribution of important variables | 5 (62.5) |
| Results—model updating | |
| 17. Report the results from any model updating, if done (n = 0) | 0 (0) |
| Discussion—interpretation | |
| 19a. Discuss the results with reference to performance in the development data and any other validation data | 3 (37.5) |
N/A, not available.
Numbers in parentheses are percentages.
Discussion
Given the rapid increase in radiomics studies on pituitary adenoma, there is a need for a comprehensive evaluation of the research quality to promote clinical translation. Radiomics studies on pituitary adenomas were suboptimal with respect to the science and reporting quality, with an average score of 26.6 and 54.6% in the RQS and TRIPOD guidelines, respectively. Our findings indicate that radiomics studies on pituitary adenomas had insufficient methodological quality, requiring significant improvement to become a clinically applicable tool.
Radiomics research on pituitary adenomas has distinctive features from those of radiomics research on other brain diseases. 39,40 First, compared with other brain diseases with open source data sets, including The Cancer Imaging Archive (TCIA) for glioma or Alzheimer’s Disease Neuroimaging Initiative (ADNI) for Alzheimer’s disease, 41 pituitary adenomas lack a well-labeled open source public dataset. This resulted in low scores of validation (−1.9) for domain 2 and of open science and data (0) for domain 6; moreover, most of the included studies were single-centered. To improve radiomics research on pituitary adenomas, there is a need to establish an open source data set for testing and possible clinical translation. Second, the small size of pituitary adenomas may limit radiomics application. Information obtained from second-order features may be limited in the small region size; consequently, several radiomics studies on pituitary adenomas have focused on pituitary macroadenomas (>1 cm) 12,14,17,19–21,23,24,26,30 rather than including the entire micro- and macroadenomas. Third, pituitary adenoma segmentation was more often manual than semi-automatic; moreover, different reviewers did not perform segmentation, which limits the reproducibility and generalizability of radiomics features. The predominance of manual segmentation could result from the unique pituitary gland anatomy. Specifically, since the pituitary gland is located outside the blood–brain barrier, 42 there is contrast enhancement around the normal pituitary gland. The lack of contrast-enhanced boundaries impedes semi-automatic delineation of the delayed enhancement of the pituitary adenoma. Contrastingly, other brain tumors, including gliomas or meningiomas, 43 present a relatively distinctive contour compared to the background parenchyma. A practical approach may involve independent tumor segmentation by multiple readers and the performance of interobserver agreement analysis. Currently, only 5 (25%) studies have performed multiple segmentation.
We applied the previously designed six key domains, which supports RQS integration. 35 Most of the key domains require significant improvement. Regarding the technical validation in domain 1, no studies performed test–retest or phantom studies, which indicates the overall insufficiency of data regarding the precision or technical bias. Additionally, the basic adherence rates of domain 2 (feature selection and validation; 50.0%) and domain 3 (biological/clinical validation and utility; 18.8%) were substantially lower than those reported by studies on glioma and oncology (81.4 and 39.2%, respectively). 35 In domain 2, the basic adherence rate for feature selection was 60%, which was substantially lower than the previously reported rates by studies on glioma and oncology (94.1 and 96.1%, respectively). 35 Radiomics studies lacking appropriate feature reduction are susceptible to overfitting. 44 Similarly, the basic adherence rate of the validation in domain 2 was 40%, which was lower than the rates reported by previous glioma and oncology studies (68.6 and 70.1%, respectively), 35 with the mean RQS score being below zero (−1.9). 12 (60%) studies lacked any validation type (60%). Moreover, only two (10%) studies performed external validation. The lack of proper validation may have likely produced overoptimistic results on performance. Regarding domain 3, no study correlated radiomics features with biological features. For domain 5, none of the studies were prospective or analyzed cost-effectiveness. On the other hand, for domain 6, none of the studies provided data, code, or segmentation mask in open source. Multicenter studies are required to allow reproducibility of the radiomics technique. Furthermore, releasing data and code in open source can accelerate radiomics field development.
The adherence rate to TRIPOD guidelines, which generally covers prediction models, was 54.4%, which was similar to that reported by oncology radiomics studies. 35 Complete reporting of the prediction model studies allows study replication and study impact evaluation, which could facilitate the clinical application of prediction models. Inadequate reporting impedes the use of all available evidence regarding a prediction model. The title and abstract were among the least well-reported items. Several items should be present in the abstract; among them, one was missing in all studies. Specifically, they did not clearly describe “development” or “validation” in the study objective. Moreover, there were poor explanations regarding the determination of the sample size and suggestions of a full prediction model. Our results suggest the need to improve the transparent reporting of radiomics models.
Imaging pre-processing is a particularly crucial step for standardizing heterogeneous imaging data before feature extraction since radiomics feature preprocessing increases reproducibility. 44,45 Lack of reproducible features increases the risk to Type I error or false discoveries. 46 Isovoxel resampling is comprised in heterogeneous spatial resolution to maintain rotational invariability; contrastingly, bias field correction is frequently applied in MRI to mitigate distortions or non-uniformities that drastically alter the extracted features and hamper reproducibility. 47 Signal intensity normalization is crucial in MRI radiomics analysis for reducing variance in the arbitrary signal intensity of conventional MRI sequences. A previous study demonstrated that lacking image intensity normalization causes classification errors, especially in second-order features. 48 However, most radiomics studies on pituitary adenomas lacked descriptions regarding preprocessing steps in isovoxel resampling, bias field correction, and signal intensity normalization (80%, 60%, and 75%, respectively). Future studies should abide by the pre-processing steps for greater reproducibility.
The IBSI recently attempted to address the lack of standardization in radiomics by defining standardized nomenclature and producing tools for verifying radiomics software implementations, 38 which resulted in a validated set of consensus-based reference values for radiomics features. Only 12 (60%) radiomics studies on pituitary adenomas provided descriptions regarding adherence of radiomics software application to the IBSI recommendation. To allow the clinical application of radiomics models, future studies should adopt a standardized approach complying with the IBSI standard.
This study has several limitations. First, there were relatively few pituitary adenoma articles on radiomics analysis. Second, as RQS is an expert opinion, some items are too idealistic; e.g. in real-world clinical settings, phantom study and multiple imaging acquisitions are rarely available. Nonetheless, future clinical translation of radiomics researches requires a higher reporting quality.
Conclusion
There is insufficient reporting quality in radiomics studies on pituitary adenomas. Preprocessing is required for feature reproducibility; additionally, external validation is necessary. There is a need to improve feature reproducibility, demonstration of clinical utility, higher evidence level, and open science. Regarding transparent reporting, the title, abstract, and suggestion of a full prediction model should be improved.
Contributor Information
So Yeon Won, Email: wsy0622@yuhs.ac.
Narae Lee, Email: lnr84@yonsei.ac.kr.
Yae Won Park, Email: yaewonpark@yuhs.ac, yawonpark@naver.com.
Sung Soo Ahn, Email: SUNGSOO@yuhs.ac.
Cheol Ryong Ku, Email: CR079@yuhs.ac.
Eui Hyun Kim, Email: EUIHYUNKIM@yuhs.ac.
Seung-Koo Lee, Email: SLEE@yuhs.ac.
REFERENCES
- 1. Ezzat S, Asa SL, Couldwell WT, Barr CE, Dodge WE, Vance ML, et al. The prevalence of pituitary adenomas: a systematic review. Cancer 2004; 101: 613–19. doi: 10.1002/cncr.20412 [DOI] [PubMed] [Google Scholar]
- 2. Lake MG, Krook LS, Cruz SV. Pituitary adenomas: an overview. Am Fam Physician 2013; 88: 319–27. [PubMed] [Google Scholar]
- 3. Bladowska J, Biel A, Zimny A, Lubkowska K, Bednarek-Tupikowska G, Sozanski T, et al. Are T2-weighted images more useful than T1-weighted contrast-enhanced images in assessment of postoperative sella and parasellar region? Med Sci Monit 2011; 17: MT83–90. doi: 10.12659/msm.881966 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Freda PU, Beckers AM, Katznelson L, Molitch ME, Montori VM, Post KD, et al. Pituitary incidentaloma: an endocrine society clinical practice guideline. J Clin Endocrinol Metab 2011; 96: 894–904. doi: 10.1210/jc.2010-1048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Casanueva FF, Molitch ME, Schlechte JA, Abs R, Bonert V, Bronstein MD, et al. Guidelines of the pituitary society for the diagnosis and management of prolactinomas. Clin Endocrinol (Oxf) 2006; 65: 265–73. doi: 10.1111/j.1365-2265.2006.02562.x [DOI] [PubMed] [Google Scholar]
- 6. Melmed S, Casanueva FF, Cavagnini F, Chanson P, Frohman L, Grossman A, et al. Guidelines for acromegaly management. J Clin Endocrinol Metab 2002; 87: 4054–58. doi: 10.1210/jc.2002-011841 [DOI] [PubMed] [Google Scholar]
- 7. Melmed S, Casanueva FF, Hoffman AR, Kleinberg DL, Montori VM, Schlechte JA, et al. Diagnosis and treatment of hyperprolactinemia: an endocrine society clinical practice guideline. J Clin Endocrinol Metab 2011; 96: 273–88. doi: 10.1210/jc.2010-1692 [DOI] [PubMed] [Google Scholar]
- 8. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology 2016; 278: 563–77. doi: 10.1148/radiol.2015151169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Park YW, Han K, Ahn SS, Choi YS, Chang JH, Kim SH, et al. Whole-tumor histogram and texture analyses of DTI for evaluation of IDH1-mutation and 1p/19q-codeletion status in world health organization grade II gliomas. AJNR Am J Neuroradiol 2018; 39: 693–98. doi: 10.3174/ajnr.A5569 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Yu J, Shi Z, Lian Y, Li Z, Liu T, Gao Y, et al. Noninvasive IDH1 mutation estimation based on a quantitative radiomics approach for grade II glioma. Eur Radiol 2017; 27: 3509–22. doi: 10.1007/s00330-016-4653-3 [DOI] [PubMed] [Google Scholar]
- 11. Park YW, Choi YS, Ahn SS, Chang JH, Kim SH, Lee SK. Radiomics MRI phenotyping with machine learning to predict the grade of lower-grade gliomas: A study focused on nonenhancing tumors. Korean J Radiol 2019; 20: 1381–89. doi: 10.3348/kjr.2018.0814 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Fan Y, Hua M, Mou A, Wu M, Liu X, Bao X, et al. Preoperative noninvasive radiomics approach predicts tumor consistency in patients with acromegaly: development and multicenter prospective validation. Front Endocrinol (Lausanne) 2019; 10: 403. 10.3389/fendo.2019.00403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Ugga L, Cuocolo R, Solari D, Guadagno E, D’Amico A, Somma T, et al. Prediction of high proliferative index in pituitary macroadenomas using MRI-based radiomics and machine learning. Neuroradiology 2019; 61: 1365–73. doi: 10.1007/s00234-019-02266-1 [DOI] [PubMed] [Google Scholar]
- 14. Cuocolo R, Ugga L, Solari D, Corvino S, D’Amico A, Russo D, et al. Prediction of pituitary adenoma surgical consistency: radiomic data mining and machine learning on T2-weighted MRI. Neuroradiology 2020; 62: 1649–56. doi: 10.1007/s00234-020-02502-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Niu J, Zhang S, Ma S, Diao J, Zhou W, Tian J, et al. Preoperative prediction of cavernous sinus invasion by pituitary adenomas using a radiomics method based on magnetic resonance images. Eur Radiol 2019; 29: 1625–34. doi: 10.1007/s00330-018-5725-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhang Y, Chen C, Tian Z, Cheng Y, Xu J. Differentiation of pituitary adenoma from rathke cleft cyst: combining MR image features with texture features. Contrast Media Mol Imaging 2019; 2019: 6584636. doi: 10.1155/2019/6584636 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Zeynalova A, Kocak B, Durmaz ES, Comunoglu N, Ozcan K, Ozcan G, et al. Preoperative evaluation of tumour consistency in pituitary macroadenomas: a machine learning-based histogram analysis on conventional T2-weighted MRI. Neuroradiology 2019; 61: 767–74. doi: 10.1007/s00234-019-02211-2 [DOI] [PubMed] [Google Scholar]
- 18. Rui W, Wu Y, Ma Z, Wang Y, Wang Y, Xu X, et al. MR textural analysis on contrast enhanced 3D-SPACE images in assessment of consistency of pituitary macroadenoma. Eur J Radiol 2019; 110: 219–24: S0720-048X(18)30432-7. doi: 10.1016/j.ejrad.2018.12.002 [DOI] [PubMed] [Google Scholar]
- 19. Liu YQ, Gao BB, Dong B, Padikkalakandy Cheriyath SS, Song QW, Xu B, et al. Preoperative vascular heterogeneity and aggressiveness assessment of pituitary macroadenoma based on dynamic contrast-enhanced MRI texture analysis. Eur J Radiol 2020; 129: 109125: S0720-048X(20)30314-4. doi: 10.1016/j.ejrad.2020.109125 [DOI] [PubMed] [Google Scholar]
- 20. Su C-Q, Zhang X, Pan T, Chen X-T, Chen W, Duan S-F, et al. Texture analysis of high b-value diffusion-weighted imaging for evaluating consistency of pituitary macroadenomas. J Magn Reson Imaging 2020; 51: 1507–13. doi: 10.1002/jmri.26941 [DOI] [PubMed] [Google Scholar]
- 21. Zhang Y, Chen C, Tian Z, Xu J. Discrimination between pituitary adenoma and craniopharyngioma using MRI-based image features and texture features. Jpn J Radiol 2020; 38: 1125–34. doi: 10.1007/s11604-020-01021-4 [DOI] [PubMed] [Google Scholar]
- 22. Park YW, Kang Y, Ahn SS, Ku CR, Kim EH, Kim SH, et al. Radiomics model predicts granulation pattern in growth hormone-secreting pituitary adenomas. Pituitary 2020; 23: 691–700. doi: 10.1007/s11102-020-01077-5 [DOI] [PubMed] [Google Scholar]
- 23. Zhang S, Song G, Zang Y, Jia J, Wang C, Li C, et al. Non-invasive radiomics approach potentially predicts non-functioning pituitary adenomas subtypes before surgery. Eur Radiol 2018; 28: 3692–3701. doi: 10.1007/s00330-017-5180-6 [DOI] [PubMed] [Google Scholar]
- 24. Fan Y, Chai Y, Li K, Fang H, Mou A, Feng S, et al. Non-invasive and real-time proliferative activity estimation based on a quantitative radiomics approach for patients with acromegaly: a multicenter study. J Endocrinol Invest 2020; 43: 755–65. doi: 10.1007/s40618-019-01159-7 [DOI] [PubMed] [Google Scholar]
- 25. Peng A, Dai H, Duan H, Chen Y, Huang J, Zhou L, et al. A machine learning model to precisely immunohistochemically classify pituitary adenoma subtypes with radiomics based on preoperative magnetic resonance imaging. Eur J Radiol 2020; 125: 108892: S0720-048X(20)30081-4. doi: 10.1016/j.ejrad.2020.108892 [DOI] [PubMed] [Google Scholar]
- 26. Galm BP, Buckless C, Swearingen B, Torriani M, Klibanski A, Bredella MA, et al. MRI texture analysis in acromegaly and its role in predicting response to somatostatin receptor ligands. Pituitary 2020; 23: 212–22. doi: 10.1007/s11102-019-01023-0 [DOI] [PubMed] [Google Scholar]
- 27. Fan Y, Jiang S, Hua M, Feng S, Feng M, Wang R. Machine learning-based radiomics predicts radiotherapeutic response in patients with acromegaly. Front Endocrinol (Lausanne) 2019; 10: 588. 10.3389/fendo.2019.00588 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Fan Y, Liu Z, Hou B, Li L, Liu X, Liu Z, et al. Development and validation of an MRI-based radiomic signature for the preoperative prediction of treatment response in patients with invasive functional pituitary adenoma. Eur J Radiol 2019; 121: 108647: S0720-048X(19)30297-9. doi: 10.1016/j.ejrad.2019.108647 [DOI] [PubMed] [Google Scholar]
- 29. Kocak B, Durmaz ES, Kadioglu P, Polat Korkmaz O, Comunoglu N, Tanriover N, et al. Predicting response to somatostatin analogues in acromegaly: machine learning-based high-dimensional quantitative texture analysis on T2-weighted MRI. Eur Radiol 2019; 29: 2731–39. doi: 10.1007/s00330-018-5876-2 [DOI] [PubMed] [Google Scholar]
- 30. Machado LF, Elias PCL, Moreira AC, Dos Santos AC, Murta Junior LO. MRI radiomics for the prediction of recurrence in patients with clinically non-functioning pituitary macroadenomas. Comput Biol Med 2020; 124: 103966: S0010-4825(20)30299-7. doi: 10.1016/j.compbiomed.2020.103966 [DOI] [PubMed] [Google Scholar]
- 31. Galm BP, Martinez-Salazar EL, Swearingen B, Torriani M, Klibanski A, Bredella MA, et al. MRI texture analysis as a predictor of tumor recurrence or progression in patients with clinically non-functioning pituitary adenomas. Eur J Endocrinol 2018; 179: 191–98: EJE-18-0291. doi: 10.1530/EJE-18-0291 [DOI] [PubMed] [Google Scholar]
- 32. Waterton JC, Pylkkanen L. Qualification of imaging biomarkers for oncology drug development. Eur J Cancer 2012; 48: 409–15. doi: 10.1016/j.ejca.2011.11.037 [DOI] [PubMed] [Google Scholar]
- 33. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 2017; 14: 749–62. doi: 10.1038/nrclinonc.2017.141 [DOI] [PubMed] [Google Scholar]
- 34. Sanduleanu S, Woodruff HC, de Jong EEC, van Timmeren JE, Jochems A, Dubois L, et al. Tracking tumor biology with radiomics: A systematic review utilizing A radiomics quality score. Radiother Oncol 2018; 127: 349–60: S0167-8140(18)30179-8. doi: 10.1016/j.radonc.2018.03.033 [DOI] [PubMed] [Google Scholar]
- 35. Park JE, Kim D, Kim HS, Park SY, Kim JY, Cho SJ, et al. Quality of science and reporting of radiomics in oncologic studies: room for improvement according to radiomics quality score and TRIPOD statement. Eur Radiol 2020; 30: 523–36. doi: 10.1007/s00330-019-06360-z [DOI] [PubMed] [Google Scholar]
- 36. Park JE, Kim HS, Kim D, Park SY, Kim JY, Cho SJ, et al. A systematic review reporting quality of radiomics research in neuro-oncology: toward clinical utility and quality improvement using high-dimensional imaging features. BMC Cancer 2020; 20(1): 29. doi: 10.1186/s12885-019-6504-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–73. doi: 10.7326/M14-0698 [DOI] [PubMed] [Google Scholar]
- 38. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 2020; 295: 328–38. doi: 10.1148/radiol.2020191145 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Choi KS, Sunwoo L. Artificial intelligence in neuroimaging: clinical applications. Investig Magn Reson Imaging 2022; 26: 1–9. [Google Scholar]
- 40. Park YW, Lee N, Ahn SS, Chang JH, Lee S-K. Radiomics and deep learning in brain metastases: current trends and roadmap to future applications. Investig Magn Reson Imaging 2021; 25: 266–80. [Google Scholar]
- 41. Won SY, Park YW, Park M, Ahn SS, Kim J, Lee SK. Quality reporting of radiomics analysis in mild cognitive impairment and alzheimer’s disease: a roadmap for moving forward. Korean J Radiol 2020; 21: 1345–54. doi: 10.3348/kjr.2020.0715 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.. Nussey S, Whitehead S.. Endocrinology: an integrated approach. Oxford: BIOS Scientific Publishers, 2001. [PubMed] [Google Scholar]
- 43. Egger J, Bauer MHA, Kuhnt D, Freisleben B, Nimsky C. Pituitary adenoma segmentation. 2011. Available from: https://arxiv.org/abs/1103.1778
- 44. Park JE, Park SY, Kim HJ, Kim HS. Reproducibility and generalizability in radiomics modeling: possible strategies in radiologic and statistical perspectives. Korean J Radiol 2019; 20: 1124–37. doi: 10.3348/kjr.2018.0070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Moradmand H, Aghamiri SMR, Ghaderi R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. J Appl Clin Med Phys 2020; 21: 179–90. doi: 10.1002/acm2.12795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Traverso A, Wee L, Dekker A, Gillies R. Repeatability and reproducibility of radiomic features: A systematic review. Int J Radiat Oncol Biol Phys 2018; 102: 1143–58: S0360-3016(18)30905-2. doi: 10.1016/j.ijrobp.2018.05.053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Van Leemput K, Maes F, Vandermeulen D, Suetens P. Automated model-based bias field correction of MR images of the brain. IEEE Trans Med Imaging 1999; 18: 885–96. doi: 10.1109/42.811268 [DOI] [PubMed] [Google Scholar]
- 48. Collewet G, Strzelecki M, Mariette F. Influence of MRI acquisition protocols and image intensity normalization methods on texture classification. Magn Reson Imaging 2004; 22: 81–91. doi: 10.1016/j.mri.2003.09.001 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



