Skip to main content
Medeniyet Medical Journal logoLink to Medeniyet Medical Journal
. 2022 Mar 18;37(1):36–43. doi: 10.4274/MMJ.galenos.2022.58538

Multivariable Diagnostic Prediction Model to Detect Hormone Secretion Profile From T2W MRI Radiomics with Artificial Neural Networks in Pituitary Adenomas

Hipofiz Makroadenomlarında T2-MRG Tabanlı Radyomiks ve Sinir Ağları ile Hormon Salgılama Profilinin Tahmini

Begumhan BAYSAL 1,*, Mehmet Bilgin ESER 1, Mahmut Bilal DOGAN 1, Muhammet Arif KURSUN 1
PMCID: PMC8939455  PMID: 35306784

Abstract

Objective:

This study aims to develop neural networks to detect hormone secretion profiles in the pituitary adenomas based on T2 weighted magnetic resonance imaging (MRI) radiomics.

Methods:

This retrospective model-development study included a cohort of patients with pituitary adenomas (n=130) from January 2015 to January 2020 in one tertiary center. The mean age was 46.49±13.69 years, and 76/130 (58.46%) were women. Three observers segmented lesions on coronal T2 weighted MRI, and an interrater agreement was evaluated using the Dice coefficient. Predictors were determined as radiomics features (n=851). Feature selection was based on intraclass correlation coefficient, coefficient variance, variance inflation factor, and LASSO regression analysis. Outcomes were identified as 7 hormone secretion profiles [nonfunctioning pituitary adenoma, growth hormone-secreting adenomas, prolactinomas, adrenocorticotropic hormone-secreting adenomas, pluri-hormonal secreting adenomas (PHA), follicle-stimulating hormone and luteinizing hormone-secreting adenomas, and thyroid-stimulating hormone adenomas]. A multivariable diagnostic prediction model was developed with artificial neural networks (ANN) for 7 outcomes. ANN performance was presented as an area under the receiver operating characteristic curve (AUC) and accepted as successful if the AUC was >0.85 and p-value was <0.01.

Results:

The performance of the ANN distinguishing prolactinomas from other adenomas was validated (AUC=0.95, p<0.001, sensitivity: 91%, and specificity: 98%). The model distinguishing PHA had the lowest AUC (AUC=0.74 and p<0.001). The AUC values for the other five ANN were >0.85 and p values were <0.001.

Conclusions:

This study was successful in training neural networks that could differentiate the hormone secretion profile of pituitary adenomas.

Keywords: Pituitary adenoma, magnetic resonance imaging, machine learning, artificial intelligence, radiomics

INTRODUCTION

Pituitary adenoma is the second most common primary central nervous system tumor and constitutes approximately 14% of intracranial masses1,2,3. Pituitary adenomas are classified according to their size as microadenoma (<1 cm), macroadenoma (≥1 cm) and giant adenoma (>4 cm). Additionally, 54-62% of pituitary adenomas are active hormone secreting tumors [growth hormone-secreting adenomas (GHSA), prolactinomas, adrenocorticotropic hormone-secreting adenomas (ACTHSA), pluri-hormonal adenomas (PHA), follicle-stimulating hormone and luteinizing hormone-secreting adenomas (FSH&LHSA), and thyroid-stimulating hormone adenomas (TSHA)], and 38-46% of them are non-functioning4,5,6. However, it is possible to determine the hormone secretion profile by using plasma hormone concentrations. Currently, due to increasing use of radiological imaging, many pituitary adenomas are detected incidentally7. For these tumors, it may be possible to estimate the hormone secretion profile at the time of imaging by exploiting the heterogeneity8.

Radiomics is a quantitative approach that extracts many image features from medical images and allows the development of diagnostic tools8. The success of this approach in determining tumor subtypes has been studied and confirmed in some other tumors9,10. In addition, in a limited number of recent studies, a model based on radiomics features was developed to predict tumor consistency in patients with pituitary adenoma11,12,13,14,15.

In prior radiomics studies, the stability of the radiomics feature was only evaluated at the level of interrater agreement with the intraclass correlation coefficient11. Therefore, this approach may be inadequate to detect stability of radiomics feature. However, the recent statement offered that stable features also should have high precision and accuracy16. Therefore, creating diagnostic models based on stable radiomics features may positively affect reproducibility, precision, and accuracy.

This study aims to develop neural networks to detect hormone secretion profiles in the pituitary adenomas based on T2 weighted magnetic resonance imaging (MRI) radiomics.

MATERIALS and METHODS

Ethical Considerations

This retrospective model-development study was done after it was approved by the Istanbul Medeniyet University Goztepe Prof. Dr. Suleyman Yalcin Training and Research Hospital Clinical Research Ethics Committee (decision no: 2020/0304, date: 05.18.2020), and written informed consent was waived. The STARD 2015 statement was followed to document the study, and white papers and statements of multiple societies were followed17,18,19,20,21. This study was scored (18/36) with a radiomics quality score17.

Study Population and Data Collection

This model-development study was carried out in a single tertiary-care center. From the patients documented between January 2015 to January 2020, 130 patients who met the inclusion criteria were included in the study. Inclusion criteria were determined as compliance with the following criteria: 1. The MRI, including T2W sequences, of the patient must be present. 2. Image quality should be sufficient to allow segmentation. Patients diagnosed in our center, but whose imaging was performed in another center were excluded. The MRI protocol is described in Table 1.

Table 1. Magnetic resonance imaging protocol for pituitary gland used in the study.

graphic file with name medj-37-36-g4.jpg

Predictors: Analysis of the T2W Images

Three radiologists with 8 years, 3 years, and 1 year of experience performed segmentation using 3D Slicer software, version 4.10.2 (https://www.slicer.org). Segmentation was done volumetrically on T2W images. The 851 radiomics feature, which is the predictor of this study, was extracted with the PyRadiomics (version 2.2.0). All the features (shape, first order, and high order) in this module were selected. Resampling was done, normalization was enabled, and wavelet-based filters were activated (Figure 1).

Figure 1.

Figure 1

Pipeline of the study.

GH: Growth, hormone, ACTH: Adrenocorticotropic hormone, TSH: Thyroid-stimulating hormone, FSH/LH: Follicle-stimulating hormone/luteinizing hormone, PRL: Prolactin

Outcomes

Outcomes were identified as 7 hormone secretion profiles [non-functioning pituitary adenoma (n=19), GHSA (n=21), prolactinomas (n=64), ACTHSA (n=6), PHA (n=6), FSA&LHSA (n=8), and TSHA (n=6)].

Features Stability Analyses: Interobserver Agreement Evaluation and Coefficient of Variation Analysis

Segmentations and radiomic features were separately assessed for interobserver agreement. For segmentation, the Dice similarity coefficient was used to measure interobserver reliability, while intraclass correlation coefficient (ICC - 3,k), two-way random effects model, and absolute agreement were used for radiomics features22. Features with an ICC>0.75 were included in the coefficient of variation (CoV) analysis, with those presenting >15% variances being eliminated16. The predictor features that passed the CoV analysis were subjected to Spearman’s correlation (SC) analysis, and correlation matrixes were performed for variance inflation factor (VIF) analysis.

Features Selection Analyses: Collinearity-multicollinearity Evaluation and Least Absolute Shrinkage and Selection Operator Regression

VIF analyses were performed to reduce the collinearity-multicollinearity using the formula 1/1-R2. If the VIF was above 10, the feature was eliminated23. The features with smaller CoV were preserved in this elimination process. Further, validated imaging biomarkers were evaluated using SC analysis between features and outcomes (p<0.01).

Features were selected with the least absolute shrinkage and selection operator (LASSO) with L1 normalization. Random sampling and 5-fold cross-validation were used for seeding LASSO.

Structuring Artificial Neural Networks

For training, networks of multilayer perceptron and radial basis function were selected. The software appointed the number of layers, the number of neurons, error function, hidden activation, and output activation in these models.  The software used random number generator for sampling 70% of the patients as train, 15% as a test, and 15% as a validation (hold-out) set for each training session of neurons. These subgroups were in a similar distribution in terms of predictors and outcomes. Hyperparameter tuning was made with the “early stopping” algorithm. The “Early stopping” algorithm trains the neural networks with the “training” set and performs hyperparameter tuning with the “test” set at the end of each epoch. Neuron training continues as long as the error rate decreases in both groups. The training is terminated when the error rate starts to increase in the “test” set. Finally, neuron performance is measured with the validation (hold-out) set.

Statistical Analysis

Statistical analyses and neural network development were performed using the TIBCO Statistica version 13.0.5 (TIBCO Software, Palo Alto, CA). Neural network results with the highest diagnostic accuracy are presented with area under the receiver operating characteristic curve (AUC) with 95% confidence intervals. In receiver operating curve analysis, if AUC was >0.85 and p-value was <0.01, then it was considered a validated classifier neural network16.

RESULTS

Patient’s Characteristics

This study included 130 consecutive patients with pituitary adenoma. The mean age was 46.49±13.69 years, and 76/130 (58.46%) were women. All patients were Caucasians. A full summary of clinicopathologic characteristics of the patients is presented in Table 2.

Table 2. Characteristics of the participants.

graphic file with name medj-37-36-g5.jpg

Model Development and Specification

The interobserver median Dice coefficient values for segmentations were 0.84 [interquartile range (IQR): 0.06] between observers 1 and 2; 0.84 (IQR: 0.17) between observers 1 and 3; and 0.79 (IQR: 0.20) between observers 2 and 3.

The 204 features were eliminated by using ICC (<0.75). By using CoV analysis (>0.15), 552 features were eliminated. Finally, another 44 features were eliminated by using VIF analysis due to collinearity. Most of the radiomics features were found to be unstable (n=800, 94%).

Stable predictors (n=51) and all outcomes were used for correlation analysis, and correlation matrixes were created to evaluate the unadjusted relation between each candidate predictor and outcomes (Figure 2). In this analysis, all SC coefficients were below 0.30, with p<0.01 for only five predictors (Figure 3). Finally, LASSO regression was used for regularization, and the most relevant predictors were selected for neural network training.

Figure 2.

Figure 2

Correlation matrix between predictors and outcomes.

GH: Growth, hormone, PRL: Prolactin,

ACTH: Adrenocorticotropic hormone,

FSH/LH: Follicle-stimulating hormone/luteinizing hormone,

TSH: Thyroid-stimulating hormone

Figure 3.

Figure 3

Heatmap of the predictors. Each predictor coded with a variable number and an available list of variables in a supplemental file. With this Spearman rank correlation analysis, this heatmap created the high collinear variables eliminated by VIF analyses.

VIF: Varince inflation factor

Diagnostic Prediction Model Results

The performance of the ANN distinguishing prolactinomas from other adenomas was validated (AUC=0.95, p<0.001, sensitivity: 91%, and specificity: 98%). The model distinguishing PHA had the lowest AUC (AUC=0.74 and p<0.001). Results of seven neural networks are presented in Table 3.

Table 3. Neural networks performance results.

graphic file with name medj-37-36-g6.jpg

DISCUSSION

The most obvious result of this study was that prolactinomas, which were found in about half of the included patients, were predicted with high accuracy based on the heterogeneity in the T2W MRI images. However, the model distinguishing PHA had the lowest AUC. Difficulty in distinguishing these tumors with more than one cell group suggests that the results are not random and related to tissue heterogeneity.

There are limited studies in the literature on the classification of pituitary adenomas from MRI images11,12,13,14,15. The four of these studies investigated surgical consistency after surgical excision of adenomas11,13,14,15. In the study, which included 89 macroadenomas, Cuocolo et al.11 predicted 28 patients’ outcomes in the test group, and only two soft tumors were misclassified as fibrous tumors. However, all fibrous tumors were correctly classified. Fan et al.14 reported that adding clinical data such as age, sex, hormone levels to the model improved the model’s accuracy. These results meant that patients who might require re-surgery were identified by imaging the early phase of the disease. This information can make the surgeon confident for surgical planning and reduce residuals and recurrence rates. A second benefit is that the patient can be informed that the tumor is consistent and may need re-surgery in the future. In another study, Peng et al.12 used T1W, contrast-enhanced T1W, and T2W MRI images and three different machine learning algorithms, and they predicted three different immunohistochemical classes of pituitary adenomas preoperatively. They observed that T2W radiomics based model’s accuracy was the highest. The best classifier was the support vector machine. Considering these results, we did our study with T2W radiomics features and pre-trained neural networks.

Currently, radiomics studies are facing a reproducibility crisis. Therefore, the European Society of Radiology (ESR) has recently presented the statement for imaging biomarkers stability such as radiomics.16 Cuoculo et al.11 and Zeynalova et al. evaluated the reproducibility of radiomics features by using ICC and included the features with ICC>0.75 and ICC>0.90, respectively. Peng et al.1213, Fan et al.14, and Rui et al.15 did not evaluate the reproducibility of radiomics features. In this study, we followed the ESR statement to evaluate the feature’s stability. Therefore, we eliminated high variance features by using CoV and high collinear features by using VIF analysis16. Although Cuocolo et al.11 did not accept variance and collinearity as a criterion of stability, they also eliminated these features similar to our study.

The incidence of incidental adenoma is increasing due to the increasing frequency of imaging7. Detecting these lesions’ secretion profiles and consistency at the time of imaging can be beneficial for accelerating patient management. Due to several studies on tumor stiffness and consistency, we focused on the secretion profile in this study11,13,14,15. We hypothesized that the cells that determined the secretion profile could be detected by quantitative analysis in this study and we thought that estimating PHA with the lowest accuracy while estimating prolactinoma with the highest accuracy confirmed this hypothesis. Because each of the pluri-hormonal tumors has different amounts of different secretory types of cells, this condition restricts imaging profiling whereas imaging profiling in a tumor containing a single type of cell, such as a prolactinoma, is succesful.

This study had several limitations. First, prolactinomas were found in half of the patients, and this neural network trained balanced distribution; however other networks have not. Second, the ground truth was hormone plasma levels because our patient population was consisted of patients admitting to the outpatient clinic of endocrinology. Third, the study was single-centered. However, radiomics features were subjected to rigorous stability analyses to increase reproducibility and precision, and the internal validation methods were used in training neural networks to increase accuracy.

CONCLUSIONS

Soon, this study and previous studies will become parts of a complex web and accumulate, allowing us to obtain much more quantitative data on patients than current. Until then, we need to increase our quantitative data and closely test our imaging biomarkers’ reproducibility, precision, and accuracy. This study shows that the ANN distinguishes with 95% accuracy whether a pituitary adenoma is a prolactinoma.

Footnotes

Ethics

Ethics Committee Approval: This retrospective model-development study was done after it was approved by the Istanbul Medeniyet University Goztepe Prof. Dr. Suleyman Yalcin Training and Research Hospital Clinical Research Ethics Committee (decision no: 2020/0304, date: 05.18.2020).

Informed Consent: Written informed consent was waived.

Peer-review: Externally and internally peer-reviewed.

Author Contributions

Surgical and Medical Practices: B.B., M.B.E., M.B.D., M.A.K., Concept: B.B., M.B.E., M.B.D., M.A.K., Design: B.B., M.B.E., M.B.D., Data Collection and/or Processing: B.B., M.B.D., M.A.K., Analysis and/or Interpretation: B.B., M.B.E., Literature Search: B.B., M.B.E., M.B.D., Writing: B.B., M.B.E., M.B.D., M.A.K.

Conflict of Interest: The authors have no conflict of interest to declare.

Financial Disclosure: The authors declared that this study has received no financial support.

References

  • 1.Dolecek TA, Propp JM, Stroup NE, Kruchko C. CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2005-2009. Neuro Oncol. 2012;14(Suppl 5):1–49. doi: 10.1093/neuonc/nos218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ezzat S, Asa SL, Couldwell WT, et al. The prevalence of pituitary adenomas: A systematic review. Cancer. 2004;101:613–9. doi: 10.1002/cncr.20412. [DOI] [PubMed] [Google Scholar]
  • 3.Guo S, Wang Z, Kang X, Xin W, Li X. A Meta-Analysis of Endoscopic vs. Microscopic Transsphenoidal Surgery for Non-functioning and Functioning Pituitary Adenomas: Comparisons of Efficacy and Safety. Front Neurol. 2021;12:614382. doi: 10.3389/fneur.2021.614382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Agustsson TT, Baldvinsdottir T, Jonasson JG, et al. The epidemiology of pituitary adenomas in Iceland, 1955-2012: a nationwide population-based study. Eur J Endocrinol. 2015;173:655–64. doi: 10.1530/EJE-15-0189. [DOI] [PubMed] [Google Scholar]
  • 5.Raappana A, Koivukangas J, Ebeling T, Pirilä T. Incidence of pituitary adenomas in Northern Finland in 1992-2007. J Clin Endocrinol Metab. 2010;95:4268–75. doi: 10.1210/jc.2010-0537. [DOI] [PubMed] [Google Scholar]
  • 6.Tjörnstrand A, Gunnarsson K, Evert M, et al. The incidence rate of pituitary adenomas in western Sweden for the period 2001-2011. Eur J Endocrinol. 2014;171:519–26. doi: 10.1530/EJE-14-0144. [DOI] [PubMed] [Google Scholar]
  • 7.Sivakumar W, Chamoun R, Nguyen V, Couldwell WT. Incidental pituitary adenomas. Neurosurg Focus. 2011;31:18. doi: 10.3171/2011.9.FOCUS11217. [DOI] [PubMed] [Google Scholar]
  • 8.Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6. doi: 10.1016/j.ejca.2011.11.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lu CF, Hsu FT, Hsieh KL, et al. Machine Learning-Based Radiomics for Molecular Subtyping of Gliomas. Clin Cancer Res. 2018;24:4429–36. doi: 10.1158/1078-0432.CCR-17-3445. [DOI] [PubMed] [Google Scholar]
  • 10.Huang YQ, Liang CH, He L, et al. Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. J Clin Oncol. 2016;34:2157–64. doi: 10.1200/JCO.2015.65.9128. [DOI] [PubMed] [Google Scholar]
  • 11.Cuocolo R, Ugga L, Solari D, et al. Prediction of pituitary adenoma surgical consistency: radiomic data mining and machine learning on T2-weighted MRI. Neuroradiology. 2020;62:1649–56. doi: 10.1007/s00234-020-02502-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Peng A, Dai H, Duan H, et al. A machine learning model to precisely immunohistochemically classify pituitary adenoma subtypes with radiomics based on preoperative magnetic resonance imaging. Eur J Radiol. 2020;125:108892. doi: 10.1016/j.ejrad.2020.108892. [DOI] [PubMed] [Google Scholar]
  • 13.Zeynalova A, Kocak B, Durmaz ES, et al. Preoperative evaluation of tumour consistency in pituitary macroadenomas: a machine learning-based histogram analysis on conventional T2-weighted MRI. Neuroradiology. 2019;61:767–74. doi: 10.1007/s00234-019-02211-2. [DOI] [PubMed] [Google Scholar]
  • 14.Fan Y, Hua M, Mou A, et al. Preoperative noninvasive radiomics approach predicts tumor consistency in patients with acromegaly: Development and multicenter prospective validation. Front Endocrinol (Lausanne). 2019;10:403. doi: 10.3389/fendo.2019.00403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rui W, Wu Y, Ma Z, et al. MR textural analysis on contrast enhanced 3D-SPACE images in assessment of consistency of pituitary macroadenoma. Eur J Radiol. 2019;110:219–24. doi: 10.1016/j.ejrad.2018.12.002. [DOI] [PubMed] [Google Scholar]
  • 16.European Society of Radiology (ESR) ESR Statement on the Validation of Imaging Biomarkers. Insights Imaging. 2020;11:76. doi: 10.1186/s13244-020-00872-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Park JE, Kim D, Kim HS, et al. Quality of science and reporting of radiomics in oncologic studies: room for improvement according to radiomics quality score and TRIPOD statement. Eur Radiol. 2020;30:523–36. doi: 10.1007/s00330-019-06360-z. [DOI] [PubMed] [Google Scholar]
  • 18.Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. Radiology. 2015;277:826–32. doi: 10.1148/radiol.2015151516. [DOI] [PubMed] [Google Scholar]
  • 19.Jaremko JL, Azar M, Bromwich R, et al. Canadian Association of Radiologists White Paper on Ethical and Legal Issues Related to Artificial Intelligence in Radiology. Can Assoc Radiol J. 2019;70:107–18. doi: 10.1016/j.carj.2019.03.001. [DOI] [PubMed] [Google Scholar]
  • 20.Geis JR, Brady AP, Wu CC, et al. Ethics of artificial intelligence in radiology: Summary of the joint European and North American multisociety statement. Radiology. 2019;293:436–40. doi: 10.1148/radiol.2019191586. [DOI] [PubMed] [Google Scholar]
  • 21.European Society of Radiology (ESR) What the radiologist should know about artificial intelligence - an ESR white paper. Insights Imaging. 2019;10:44. doi: 10.1186/s13244-019-0738-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15:155–63. doi: 10.1016/j.jcm.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kim JH. Multicollinearity and misleading statistical results. Korean J Anesthesiol. 2019;72:558–69. doi: 10.4097/kja.19087. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Medeniyet Medical Journal are provided here courtesy of Istanbul Medeniyet University

RESOURCES