Skip to main content
Acta Obstetricia et Gynecologica Scandinavica logoLink to Acta Obstetricia et Gynecologica Scandinavica
. 2025 May 1;104(8):1433–1442. doi: 10.1111/aogs.15146

Performance of radiomics analysis in ultrasound imaging for differentiating benign from malignant adnexal masses: A systematic review and meta‐analysis

Francesca Moro 1,2,, Marianna Ciancia 2, Maria Sciuto 3, Giulia Baldassari 4, Huong Elena Tran 4, Antonella Carcagnì 5, Anna Fagotti 2,3, Antonia Carla Testa 2,3
PMCID: PMC12283166  PMID: 40312890

Abstract

Introduction

We present the state of the art of ultrasound‐based machine learning (ML) radiomics models in the context of ovarian masses and analyze their accuracy in differentiating between benign and malignant adnexal masses.

Material and Methods

Web of Science, PubMed, and Scopus databases were searched. All studies were imported into RAYYAN QCRI software. All studies that developed and internally or externally validated ML models using only radiomics features extracted from ultrasound images were included. The overall quality of the included studies was assessed using the QUADAS‐AI tool. Summary sensitivity and specificity analyses with corresponding 95% confidence intervals (CIs) were reported.

Results

12 studies developed ML models including only radiomics features extracted from ultrasound images, and six of them were included in the meta‐analysis. The overall sensitivity and specificity for differentiating benign from malignant adnexal masses were 0.80 (95% CI 0.74–0.87) and 0.86 (95% CI 0.80–0.90), respectively, in the validation set. All studies demonstrated a high risk of bias in subject selection (e.g., lack of details on image sources or scanner models; absence of image preprocessing), and the majority also showed a high risk in the index test (e.g., models were not validated on external datasets) domain. In contrast, the risk of bias was generally low for the reference standard (i.e., most studies used a reference that accurately identified the target condition) and the testing workflow (i.e., the time interval between the index test and reference standard was appropriate) domains.

Conclusions

The good performance of ultrasound‐based radiomics models in the validation set supports that radiomics is worth exploring to improve the diagnosis of adnexal masses. So far, the studies have a high risk of bias due to the small sample size, single‐setting design, and no external validation included.

Keywords: artificial intelligence, machine learning, ovarian cancer, radiomics, ultrasonography


Currently, the IOTA‐ADNEX model and the O‐RADS are the most reliable methods for calculating the risk of malignancy in adnexal masses. However, some skill is required to accurately recognize the ultrasound variables included in these methods. In recent years, several authors have demonstrated that radiomic analysis, which does not rely on expert interpretation or the recognition of ultrasound variables, performs almost as well as the methods currently in use in discriminating between benign and malignant adnexal masses. With further development and validation, it has the potential to be integrated into ultrasound imaging systems, enhancing clinical decision‐making and potentially evolving into a more autonomous diagnostic tool in the future.

graphic file with name AOGS-104-1433-g004.jpg


Abbreviations

ADNEX

Assessment of Different Neoplasia in the adnexa

AI

artificial intelligence

CT

computed tomography

IOTA

International Ovarian Tumor Analysis group

ML

machine learning

MRI

magnetic resonance imaging

O‐RADS

Ovarian‐Adnexal Reporting and Data System

QUADAS‐AI

Quality Assessment Tool for Artificial Intelligence‐Centered Diagnostic Test Accuracy Studies criteria

Key message.

The IOTA‐ADNEX model and O‐RADS reliably assess adnexal masses but require expert interpretation. Radiomics, which does not require expert input, shows promise in distinguishing between benign and malignant adnexal masses and could, with further development, evolve into an autonomous ultrasound tool.

1. INTRODUCTION

Ultrasound is the first imaging modality for the diagnosis of gynecological diseases 1 and plays a crucial role in the characterization of ovarian neoplasms. 2 Since 1999, ultrasound has achieved a high degree of accuracy in the diagnosis of ovarian masses when performed by an experienced examiner. 3 Over the past decade, modern tools have been developed and tested by the International Ovarian Tumor Analysis (IOTA) group to achieve the same performance as an experienced examiner in correctly differentiating benign from malignant ovarian masses. In particular, the IOTA‐Assessment of Different Neoplasia in the adneXa (ADNEX) model, which combines clinical and simple ultrasound variables, 4 has been proposed in international guidelines for the diagnosis and management of patients with adnexal masses. 2 The model showed high accuracy (AUC 0.94, sensitivity 0.97 and specificity 0.71) in distinguishing between benign and malignant adnexal masses.

In 2020, the IOTA group and the American College of Radiology jointly published a consensus guideline on the Ovarian‐Adnexal Reporting and Data System (O‐RADS) for ultrasound. 5 O‐RADS categorizes adnexal masses into six risk groups ranging from a normal ovary to high risk of malignancy and guides the management of adnexal masses. According to the consensus statement, malignancy risk can be estimated using either the ADNEX model or the examiner's interpretation of ultrasound findings based on the O‐RADS lexicon.

Both ADNEX and O‐RADS lexicons are practical diagnostic tools that can be integrated into ultrasound devices, providing an objective evaluation of ovarian masses without requiring extensive experience. However, some skill is necessary to accurately identify the ultrasound variables included in these methods.

In recent years, the scientific community has developed a great interest in artificial intelligence (AI) and radiomics applied to imaging, including ultrasound. 6 , 7 , 8 , 9 AI and radiomics‐based models are revolutionizing the current approach to diagnosis and patient management as they are objective and clinician‐independent. In medical imaging, AI systems help interpret images, detect anomalies, and predict clinical outcomes, enhancing diagnostic accuracy and treatment planning, including imaging data, clinical variables, and radiomics features. Radiomics is a technique used to extract, analyze, and interpret quantitative data from medical images. 10 The types of features commonly extracted include statistical features (which reflect the intensity distribution and heterogeneity of the image) and textural features (which capture patterns and variations in tissue architecture based on the spatial arrangement of gray‐level pixels). Additionally, other types of features such as morphological features (which describe the shape and size of structures) and wavelet features (which analyze the image at multiple scales and orientations) can also be utilized to gain a more comprehensive understanding of the underlying tissue characteristics.

Since 2020, the role of AI and radiomics has been increasingly explored in the field of ultrasound for both benign and malignant gynecological conditions. 6 Most of the studies in the literature have focused on the creation of classification tasks with the main interest in the differentiation between benign and malignant ovarian tumors. 6 , 7 However, there is still a lack of synthesis of the available evidence regarding radiomics models in the prediction of ovarian malignancies.

We aim to present the state of the art of ultrasound‐based machine learning (ML) radiomics models in the context of ovarian masses and analyze their accuracy in differentiating between benign and malignant adnexal masses.

2. MATERIAL AND METHODS

2.1. Search strategy

A comprehensive literature search was conducted using the Web of Science, PubMed, and Scopus databases to identify potentially eligible articles published up to 1 November 2024. A search strategy was developed using a combination of Medical Subject Headings (MeSH), keywords, and free‐text terms, including: “radiomics,” “ultrasound‐based radiomics,” “artificial intelligence,” “machine learning,” “deep learning,” “ultrasonography,” “gynecology,” “gynecological diseases,” “ovary,” “ovarian,” “ovaries,” and “fallopian tube.” The search was restricted to studies involving human participants and published in English. No additional filters or limitations were applied. The full search strategies for all databases are available in Supporting Information Appendix S1. This systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) guidelines. The protocol was registered in PROSPERO (registration number CRD42024613425).

2.2. Inclusion criteria

Articles that developed ML models using only radiomics features extracted from ultrasound images to differentiate between benign and malignant adnexal masses were included. Articles that did not include an independent (internal or external) validation set were not considered for the meta‐analysis.

2.3. Exclusion criteria

Articles that do not consider an independent validation set (i.e. papers with cross‐validation: a type of validation in which the whole set is divided into a number of folds, usually 10, among which one is used as test set, while the other 9 were used as training set, this cycle of validation is repeated the same number of times as the number of sets, and in each specific cycle the test set is different from the previous one, over the 10 folds) were considered but not included for the meta‐analysis. In the case of papers with multiple ML models (i.e., a model with only radiomics features and models with both radiomics and other types of features and/or a deep learning model using images), we consider the model with only radiomics features for the analysis.

Systematic reviews, non‐empirical studies, animal studies, conference abstracts, editorials, commentaries, book reviews, and abstracts without corresponding full‐text articles were excluded from eligibility for inclusion in the present systematic review.

2.4. Study selection

All studies retrieved through the search strategy were imported into the RAYYAN QCRI software, and duplicates were removed. Two authors (M.C., M.S.) independently reviewed all abstracts, with consensus reached regarding potential relevance. The full‐text versions of the selected papers were acquired, and four reviewers (F.M., M.C., G.B., and M.S.) independently extracted relevant data on study characteristics. Any discrepancies were discussed among the reviewers, and consensus was achieved. In cases where multiple studies were published for the same cohort with identical endpoints, the report containing the most comprehensive information on the population was included to prevent overlap.

Reference lists of the included studies were searched to identify additional studies.

When the full text could not be retrieved online or specific data needed for the meta‐analysis (e.g., number of patients, images of benign tumors, images of malignant tumors, or the number of patients in the training and validation sets) were unavailable, the corresponding authors of the articles were contacted.

2.5. Data extraction and analysis

Data extraction was performed by four researchers (F.M., M.C., G.B., and M.S.) using a standardized data extraction form. The following information was extracted for each eligible study: (1) study identification (first author, publication year); (2) study characteristics (study period, country, population); (3) specific type of ML model assessed; and (4) performance outcomes expressed as sensitivity and specificity. A summary of the results is presented in a dedicated table.

The reported performance refers to either the external or internal validation set. Specifically, if a model was validated on an external cohort, its performance is based on the results obtained from the external validation set. In the absence of external validation, performance data from the internal validation set were reported. If neither external nor internal validation was performed, the performance of the training set was reported, and the study was excluded from the meta‐analysis.

Statistical analyses were conducted using R statistical software (version 4.2.1), utilizing the Meta and Metaplus packages.

Sensitivity and specificity, along with their corresponding 95% confidence intervals (CIs), were calculated. A random effects model was applied to account for heterogeneity arising from variations in clinical settings. Heterogeneity between studies was assessed using Cochrane's Q test and the I 2 index, with p‐values <0.05 considered indicative of significant heterogeneity. Pooled estimates and their 95% CIs were visualized using forest plots.

2.6. Quality assessment

The overall quality of the studies included in the meta‐analysis was assessed using the Quality Assessment Tool for Artificial Intelligence‐Centered Diagnostic Test Accuracy Studies (QUADAS‐AI) criteria. 11 The details are provided in Table S1. These criteria are based on the revised and extended QUADAS‐2 12 and QUADAS‐C 13 guidelines and include four domains: patient selection, index test, reference standard, and flow and timing, all in relation to the risk of bias. This updated tool evaluates each domain, offering a robust framework for conducting reviews of AI‐centered diagnostic studies.

3. RESULTS

3.1. General characteristics

Twelve studies 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 developed ML models that included only radiomics features extracted from ultrasound images (Table 1). Five of them lacked an independent internal or external validation set 14 , 15 , 16 , 17 , 18 and one lacked the information required for the analysis 19 (i.e., number of patients/images with benign and malignant histology in the validation set) and was therefore finally excluded for the meta‐analysis (Figure 1) (Table 2).

TABLE 1.

Data source for the 12 studies that developed machine learning models using only radiomic features extracted from ultrasound images.

First Author, year Country Assessment time Sample size Number of images Type of ML Family of radiomics features Performance (AUC) Sensitivity Specificity Accuracy Type of validation
Acharya U. R. 2014 NA NA 20 2600 KNN Textural, statistical NA 1 1 1 Cross‐validation
Pathak H. 2015 NA NA NA 120 SVM Wavelet NA 0.92 0.92 0.92 Internal
Martinez‐Mas J. 2019 Belgium NA NA 384 Extreme Learning Machine with linear‐Sigmoid‐Gaussian kernel Geometric (Fast Fourier Transform) features 0.88 0.93 0.77 0.87 Cross‐validation
Al‐karawi D. 2021 UK 2005–2013 232 150 SVM Statistical, fractal, Gabor filter, Uniform Local Binary Pattern, Histograms of Oriented Gradient NA 0.90 0.90 0.90 Internal
Chiappa V. 2021 Italy 2017–2019 241 NA Ensemble of SVM (TRACE4©) Morphology, intensity‐based statistics, texture Solid: 0.87 Cystic:0.88 Mixed:0.89 Solid: 0.78 Cystic:0.75 Mixed: 0.81 Solid: 0.83 Cystic:0.90 Mixed: 0.81 Solid: 0.80 Cystic:0.87 Mixed: 0.81 Cross‐validation
Hussein I. J. 2021 NA NA NA 125 SVM Local Binary Pattern NA 0.97 0.90 0.95 Cross‐validation
Qi L. 2021 China, Tianjin 2013–2016 265 279 Multivariate logistic regression Textural, wavelet 0.88 0.76 0.90 0.84 Internal
Yin Q. 2022 China; Guangzhou 2008–2019 137 NA Upgrade Logistic regression Local Binary Pattern, Laws' Texture Energy NA NA NA 0.97 No validation
Barcroft J. F. 2024 Multicenter (UK; Italy) 2017–2022 761 1920 ODS radiomics model Fractal, wavelet 0.90 1 0.73 NA External
Du Y. 2024 China 2014–2022 849 NA LightGBM Statistical, shape, textural 0.75 0.60 0.89 0.72 Internal
Moro F, 2024 Italy 2014–2021 326 775 Random Forest Statistical and textural 0.80 0.78 0.76 0.78 Internal
Liu L. 2024 Multicenter (China) 2021–2023 1080 NA Not specified Statistical, shape, textural 0.87 0.84 0.80 0.81 External

Note: Summary of the studies included in the paper listed according to different parameters (first author, publication year, study period, country, population) and AI models characteristics (specific type of ML model being assessed, model input, performance, validation type). NA (Not Specified); AUC (Area Under the receiver operating characteristic Curve); ML (Machine Learning); LightGBM (Light Gradient Boosting Machine); KNN (k‐Nearest Neighbor); ODS (Ovarian Diagnostic Score); SVM (Support Vector Machine). The performance has been reported in “AUC” when available, otherwise in “sensitivity/specificity”, then in “accuracy”. The indicated performance refers to the external or internal validation set. Studies subsequently included in the meta‐analysis are highlighted in gray.

FIGURE 1.

FIGURE 1

Flow diagram.

TABLE 2.

Data source for the 6 studies that developed and validated machine learning models using only radiomic features extracted from ultrasound images included in the meta‐analysis.

First Author, year Country Assessment time Sample size Sample size validation Sample size validation malignant Sample size validation benign Number of images Images validation Images validation malignant Images validation benign Type of ML Family of radiomics features Performance (AUC) AUC (CI) Sensitivity Sensitivity (CI) Specificity Specificity (CI) Accuracy Type of validation
Pathak H. 2015 NA NA NA NA NA NA 120 50 25 25 SVM Wavelet NA NA 0.92 NA 0.92 NA 0.92 Internal
Al‐karawi D. 2021 UK 2005–2013 232 NA NA NA 150 76 38 38 SVM Statistical, fractal, Gabor filter, Uniform Local Binary Pattern, Histograms of Oriented Gradient NA NA 0.90 NA 0.90 NA 0.90 Internal
Qi L. 2021 China, Tianjin 2013–2016 265 NA NA NA 279 83 53 30 Multivariate logistic regression Textural, wavelet 0.88 (0.798–0.957) 0.76 (0.574–0.883) 0.90 (0.774–0.963) 0.84 Internal
Du Y. 2024 China 2014–2022 849 169 60 109 NA NA NA NA LightGBM Statistical, shape, textural 0.75 NA 0.60 NA 0.89 NA 0.72 Internal
Moro F. 2024 Italy 2014–2021 326 98 73 25 775 NA NA NA Random Forest Statistical and textural 0.80 (0.70, 0.90) 0.78 (0.69, 0.88) 0.76 (0.69, 0.88) 0.78 Internal
Liu L. 2024 China 2021–2023 1080 397 107 290 NA NA NA NA Not specified Statistical, shape, textural 0.87 (0.827–0.903) 0.84 NA 0.80 NA 0.81 External

Note: Summary of the studies included in the paper listed according to different parameters (first author, publication year, study period, country, study population, validation set population) and AI models characteristics (specific type of ML model being assessed, model input, performance, validation type). NA (Not Specified); AUC (Area Under the receiver operating characteristic Curve); CI (Confidence Interval); ML (Machine Learning); LightGBM (Light Gradient Boosting Machine); SVM (Support Vector Machine). The performance has been reported in “AUC” when available, otherwise in “sensitivity/specificity”, then in “accuracy”. The indicated performance refers to the external or internal validation set.

The results of the quality assessment for the studies included in the meta‐analysis, as evaluated using the QUADAS‐AI tool, are reported in Figure 2. All studies were found to have a high risk of bias in the subject selection domain (i.e., source was not specified, image preprocessing was not performed, and the scanner model used for image acquisition was not reported), and all but one study exhibited a high risk in the index test domain (i.e., the radiomic model was not validated in an external cohort). Generally, there was a low risk of bias in the reference standard domain (i.e., the reference standard used in most studies accurately classified the target condition), as histology was specified as the reference standard in the majority of studies (4/6), and in the workflow domain (i.e., the time interval between the index test and the reference standard was deemed reasonable in most studies).

FIGURE 2.

FIGURE 2

Risk of bias for each domain in included studies in the meta‐analysis using QUADAS‐AI. “High” = high risk of bias; “Low” = low risk of bias; “Uncertain” = insufficient data to permit the definition of bias whether high or low.

3.2. Individual characteristics of the included studies

All studies but one were conducted in a single center, most in China. All studies were published between 2015 and 2024, and the sample size ranged from 232 to 1080 patients.

The summary sensitivity and specificity of radiomics‐based models in discriminating between benign and malignant adnexal masses was 0.80 (95% CI 0.71–0.87) and 0.86 (95% CI 0.80–0.90), respectively (Figure 3A,B).

FIGURE 3.

FIGURE 3

Forest plots of summary sensitivity (A) and specificity (B) of studies included in the meta‐analysis. CI, confidence interval.

The study by Liu et al. 25 included the largest number of patients (total number = 1080) and was the only study with an external validation set (training = 683, validation = 397). However, it was the only study that used the Ovarian‐Adnexal Reporting and Data System (O‐RADS) rather than histology as the reference standard and did not specify the type of ML model. Specifically, O‐RADS risk classes 1–2‐3 were classified as benign (low risk of malignancy) and O‐RADS risk classes 4–5 were classified as malignant (intermediate/high risk of malignancy). The developed ML model showed good performance (AUC 0.87, sensitivity 0.84 and specificity 0.80) in the external validation set. The performance of the model was compared with that of O‐RADS assessed by junior examiners and was similar (AUC 0.88, sensitivity 0.84, specificity 0.92).

Du et al., 23 including 849 patients (training = 680, validation = 169), developed and internally validated a LightGBM radiomics model that showed AUC of 0.75, sensitivity of 0.60, and specificity of 0.89 in the internal validation set.

Moro et al., 24 in a population of 326 patients (training = 228 and validation = 98), developed and internally validated a random forest model with AUC 0.80, sensitivity 0.78, and specificity 0.76 in the validation set. The performance of the model was lower than the subjective assessment of an expert (sensitivity 0.99, specificity 0.72) and the IOTA‐ADNEX (AUC 0.88, sensitivity 0.99 and specificity 0.64). In this study, the authors included only adnexal masses with a solid ultrasound morphology.

Qi et al. 22 developed and internally tested a multivariate logistic regression radiomics model on a total of 279 images (training = 196, validation = 83). The model showed higher performance than subjective assessment of both junior and senior sonographers in terms of AUC (0.88 vs. 0.70 and 0.79), while sensitivity and specificity were shown to be higher than the junior sonographer (0.76 vs. 0.57 and 0.90 vs. 0.80) and similar to the senior sonographer (0.76 vs. 0.70 and 0.90 vs. 0.86) in the validation set.

Al‐karawi et al. 21 collected 150 images of ovarian masses (training = 74, validation = 76). Their Support Vector Machine model fusing the top 5 best‐performing textural features showed great performance in distinguishing between benign and malignant adnexal masses, with sensitivity 0.90, specificity 0.90, and accuracy 0.90. Similar results were obtained by Pathak et al., 20 who developed and internally tested a Support Vector Machine radiomics model on a total number of 120 images (training = 70, validation = 50). Their radiomics model showed sensitivity 0.92, specificity 0.92, and accuracy 0.92 in classifying adnexal masses using the 6 more relevant features of the 14 extracted features.

4. DISCUSSION

In the present study, we summarized the results of articles that developed and validated ML models using only radiomics features extracted from ultrasound images to distinguish between benign and malignant adnexal masses. We found that the summary accuracy of the models in the validation set was good and similar to that of an expert examiner. Most studies were at high risk of bias for subjective selection and index testing due to small sample size, lack of external validation, and single‐center setting.

To the best of our knowledge, this is the first meta‐analysis specifically focused on the application of radiomics to ultrasound imaging in the context of adnexal masses. We conducted a literature search across multiple databases to ensure the robustness of the study. Data such as sample size, number of images, type of ML model developed, and the families of radiomic features included were reported. Additionally, the quality of the studies was assessed using the QUADAS‐AI tool, which has been specifically adapted for AI research, representing a key strength of this study.

Finally, we only included models constructed with radiomic features, excluding mixed models developed with both radiomic and other variables (i.e., ultrasound data, clinical features). This was because we wanted to know the performance of radiomic analysis without other information in characterizing adnexal masses. This could be a strength, but also a limitation. In fact, this strategy prevented us from verifying whether the performance of the models would be higher if other data were added.

Currently, various AI approaches are employed in ultrasound imaging to enhance diagnostic accuracy, with primary applications in classification and automatic segmentation. In particular, ML techniques play a crucial role in radiomics, where quantitative features extracted from ultrasound images are analyzed to classify lesions, predict malignancy, and support clinical decision‐making by detecting patterns that are not discernible to the human eye. Supervised learning models, such as Support Vector Machines, Logistic Regression, and Random Forests, are trained on labeled datasets to predict outcomes based on input features and are widely used in radiomic analysis. Support Vector Machines identify optimal decision boundaries between classes, Logistic Regression estimates the probability of specific outcomes, and Random Forests aggregate predictions from multiple decision trees to enhance classification robustness. Additional algorithms, including k‐Nearest Neighbors (kNN), Extreme Gradient Boosting (XGBoost), and LightGBM, further refine predictions by comparing data point similarities or iteratively minimizing prediction errors. In contrast, unsupervised learning models are less commonly used than supervised models to predict outcomes, such as distinguishing between benign and malignant ovarian masses. Instead, they are primarily employed to uncover inherent patterns in unlabeled data, typically through clustering techniques, to reveal hidden structures or groupings without predefined labels. For instance, fuzzy c‐means clustering assigns each pixel a degree of membership to multiple clusters based on its features. This approach is particularly useful for assessing echogenicity and segmenting regions of interest. All these unsupervised models should be further explored to better understand their potential role in radiomic feature selection and the diagnosis of ovarian masses.

Finally, AI also includes deep learning algorithms, a subset of ML, which leverage multi‐layer neural networks to model complex data and automatically learn hierarchical features directly from raw inputs. In the context of ovarian masses, DL models are primarily applied for two purposes: (1) automated segmentation of regions of interest, and (2) image‐based classification. In the pipeline of the radiomics analysis, DL models can be used for automatic segmentation of the region of interest for feature extraction.

The role of radiomics in the field of adnexal masses has been studied and summarized by other authors in other imaging modalities, including computed tomography (CT) and magnetic resonance imaging (MRI). 26 In particular, in the systematic review by Huang et al., 57 articles were included dealing with radiomics applied to CT and MRI in ovarian cancer, and 16 of them aimed to classify adnexal masses. However, the outcomes assessed varied across the included studies, including discrimination between epithelial ovarian and metastatic tumors, borderline and invasive tumors, and benign and borderline, as well as malignant and benign adnexal masses. Of the 9 studies reported in the study by Huang et al. that aimed to discriminate between benign and malignant masses 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 (Table S2), seven had an independent internal or external validation set, and the performance of the models was similar to that found for the ultrasound‐based model, with sensitivity ranging from 0.60 to 1.0 and specificity from 0.60 to 1.0. In particular, the largest study, including 1329 patients, developed and internally validated a radiomics‐based model applied to CT imaging, reporting a sensitivity of 0.84 and a specificity of 0.82 in the validation set. 31

As with ultrasound, most studies applying radiomics to CT and MRI for the assessment of adnexal tumors are few, include small sample sizes, and lack external validation. All these factors limit the generalizability of the results and therefore the application of radiomics analysis in clinical practice. Indeed, to date, the IOTA‐ADNEX model remains the tool of choice for preoperative diagnosis of adnexal masses. 5 However, we believe that increasing the sample size and adding other variables to the radiomics analysis could improve the performance of radiomics in classifying adnexal masses. We also suggest for those authors who would develop radiomics models applied to ultrasound images to compare the model with that of ADNEX to assess the true clinical impact and suitability of the radiomics model itself in discriminating between benign and malignant adnexal masses. In addition, if a radiomics model is developed that achieves similar performance to ADNEX, it should be validated in an external cohort to ensure its reproducibility. If the performance of the radiomics model will be similar or superior to that of ADNEX, radiomics and ML models could be integrated into ultrasound machines in the future to assist clinicians in the automated, consistent, and immediate evaluation of adnexal masses using the information provided by a single ultrasound image. Incorporating radiomics systems into clinical practice can improve diagnosis and patient management, reduce healthcare costs, and help reduce the workload of gynecologists by increasing their efficiency and accuracy. Other prospects include the development of models for predicting specific histology to personalize surgical treatment and patients' response to treatment.

5. CONCLUSION

Our meta‐analysis demonstrated the suitability of radiomics analysis applied to ultrasound images in discriminating between benign and malignant adnexal masses. It seems to have similar accuracy to an expert examiner. However, the comparison with the reference IOTA‐ADNEX model has not been sufficiently investigated. Moreover, the number of studies present in the literature is few, lacks external validation, and includes small sample sizes.

AUTHOR CONTRIBUTIONS

Francesca Moro: Conceptualization; data curation; investigation; project administration; supervision; visualization; writing—review and editing. Marianna Ciancia: Conceptualization; data curation; investigation; project administration; supervision; visualization; writing—review and editing. Maria Sciuto: Data curation; investigation; writing–original draft. Giulia Baldassari: Data curation; formal analysis; investigation; visualization; writing—original draft. Huong Elena Tran: Data curation; formal analysis; investigation; visualization; supervision. Antonella Carcagnì: Methodology. Anna Fagotti: Supervision. Antonia Carla Testa: Supervision; writing—review and editing.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflict of interest.

Supporting information

Appendix S1. Search strategy.

Table S1. Description of quality assessment based on QUADAS‐AI domains.

Table S2. Data source for the 9 studies that developed machine learning models using only radiomic features extracted from Computed Tomography (CT) or Magnetic Resonance Imaging (MRI) images, reported in the study by Huang et al.

AOGS-104-1433-s001.docx (24.2KB, docx)

ACKNOWLEDGMENTS

We would like to express our profound gratitude to Prof. Giovanni Scambia for his teaching and inspirations. Open access funding provided by BIBLIOSAN.

Moro F, Ciancia M, Sciuto M, et al. Performance of radiomics analysis in ultrasound imaging for differentiating benign from malignant adnexal masses: A systematic review and meta‐analysis. Acta Obstet Gynecol Scand. 2025;104:1433‐1442. doi: 10.1111/aogs.15146

Francesca Moro and Marianna Ciancia contributed equally to the work.

REFERENCES

  • 1. American College of Obstetricians and Gynecologists . Practice bulletin No. 174: evaluation and Management of Adnexal Masses. Obstet Gynecol. 2016;128:e210‐e226. [DOI] [PubMed] [Google Scholar]
  • 2. Timmerman D, Planchamp F, Bourne T, et al. ESGO/ISUOG/IOTA/ESGE consensus statement on preoperative diagnosis of ovarian tumors. Ultrasound Obstet Gynecol. 2021;58:148‐168. [DOI] [PubMed] [Google Scholar]
  • 3. Sokalska A, Timmerman D, Testa AC, et al. Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific diagnosis to adnexal masses. Ultrasound Obstet Gynecol. 2009;34:462‐470. [DOI] [PubMed] [Google Scholar]
  • 4. Van Calster B, Van Hoorde K, Valentin L, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ. 2014;349:g5920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Andreotti RF, Timmerman D, Strachowski LM, et al. O‐RADS US risk stratification and management system: a consensus guideline from the ACR ovarian‐adnexal reporting and data system committee. Radiology. 2020;294:168‐185. [DOI] [PubMed] [Google Scholar]
  • 6. Moro F, Ciancia M, Zace D, et al. Role of artificial intelligence applied to ultrasound in gynecology oncology: a systematic review. Int J Cancer. 2024;155:1832‐1845. [DOI] [PubMed] [Google Scholar]
  • 7. Moro F, Giudice MT, Ciancia M, et al. ABST‐0726 artificial intelligence applied to ultrasound imaging for benign gynecologic disorders: a systematic review. European Society for Gynaecological Endoscopy—ESGE 33rd Annual Congress 2024.
  • 8. Whitney HM, Yoeli‐Bik R, Abramowicz JS, et al. AI‐based automated segmentation for ovarian/adnexal masses and their internal components on ultrasound imaging. J Med Imaging. 2024;11:044505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Mikdadi D, O'Connell KA, Meacham PJ, et al. Applications of artificial intelligence (AI) in ovarian cancer, pancreatic cancer, and image biomarker discovery. Cancer Biomark. 2022;33(2):173‐184. doi: 10.3233/CBM-210301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14:749‐762. [DOI] [PubMed] [Google Scholar]
  • 11. Sounderajah V, Ashrafian H, Rose S, et al. A quality assessment tool for artificial intelligence‐centered diagnostic test accuracy studies: QUADAS‐AI. Nat Med. 2021;27:1663‐1665. [DOI] [PubMed] [Google Scholar]
  • 12. Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS‐2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155:529‐536. [DOI] [PubMed] [Google Scholar]
  • 13. Yang B, Mallett S, Takwoingi Y, et al. QUADAS‐C: a tool for assessing risk of bias in comparative diagnostic accuracy studies. Ann Intern Med. 2021;174:1592‐1599. [DOI] [PubMed] [Google Scholar]
  • 14. Acharya UR, Sree VS, Kulshreshtha S, et al. GyneScan: an improved online paradigm for screening of ovarian cancer via tissue characterization. Technol Cancer Res Treat. 2014;13:529‐540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Martínez‐Más J, Bueno‐Crespo A, Khazendar S, et al. Evaluation of machine learning methods with Fourier transform features for classifying ovarian tumors based on ultrasound images. PLoS One. 2019;14:e0219388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Chiappa V, Bogani G, Interlenghi M, et al. The adoption of Radiomics and machine learning improves the diagnostic processes of women with ovarian MAsses (the AROMA pilot study). J Ultrasound. 2021;24:429‐437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Hussein IJ, Burhanuddin MA, Mohammed MA, et al. Fully automatic segmentation of gynaecological abnormality using a new viola–jones model. Comput Mater Continua. 2021;66(3):3161‐3182. doi: 10.32604/cmc.2021.012691 [DOI] [Google Scholar]
  • 18. Yin Q, Zhong M, Wang Z, Sheng XJ. Clinical analysis of 137 cases of ovarian tumors in pregnancy. J Oncol. 2022;2022:1907322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Barcroft JF, Linton‐Reid K, Landolfo C, et al. Machine learning and radiomics for segmentation and classification of adnexal masses on ultrasound. NPJ Precis Oncol. 2024;8:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Pathak H, Kulkarni V. Identification of ovarian mass through ultrasound images using machine learning techniques. 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). IEEE; 2015:137‐140. [Google Scholar]
  • 21. Al‐karawi D, Al‐Assam H, Du H, et al. An evaluation of the effectiveness of image‐based texture features extracted from static B‐mode ultrasound images in distinguishing between benign and malignant ovarian masses. Ultrason Imaging. 2021;43:124‐138. [DOI] [PubMed] [Google Scholar]
  • 22. Qi L, Chen D, Li C, et al. Diagnosis of ovarian neoplasms using nomogram in combination with ultrasound image‐based Radiomics signature and clinical factors. Front Genet. 2021;12:753948. doi: 10.3389/fgene.2021.753948 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Du Y, Guo W, Xiao Y, Chen H, Yao J, Wu J. Ultrasound‐based deep learning radiomics model for differentiating benign, borderline, and malignant ovarian tumours: a multi‐class classification exploratory study. BMC Med Imaging. 2024;24:89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Moro F, Vagni M, Tran HE, et al. Radiomics analysis of ultrasound images to discriminate between benign and malignant adnexal masses with solid ultrasound morphology. Ultrasound Obstet Gynecol. 2025;65:353‐363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Liu L, Cai W, Tian H, et al. Ultrasound image‐based nomogram combining clinical, radiomics, and deep transfer learning features for automatic classification of ovarian masses according to O‐RADS. Front Oncol. 2024;14:1377489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Huang ML, Ren J, Jin ZY, et al. A systematic review and meta‐analysis of CT and MRI radiomics in ovarian cancer: methodological issues and clinical utility. Insights Imaging. 2023;14:117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Wei M, Zhang Y, Bai G, et al. T2‐weighted MRI‐based radiomics for discriminating between benign and borderline epithelial ovarian tumors: a multicenter study. Insights Imaging. 2022;13:130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Nagawa K, Kishigami T, Yokoyama F, et al. Diagnostic utility of a conventional MRI‐based analysis and texture analysis for discriminating between ovarian thecoma‐fibroma groups and ovarian granulosa cell tumors. J Ovarian Res. 2022;15:65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Liu P, Liang X, Liao S, Lu Z. Pattern classification for ovarian tumors by integration of Radiomics and deep learning features. Curr Med Imaging Rev. 2022;18:1486‐1502. [DOI] [PubMed] [Google Scholar]
  • 30. Li S, Liu J, Xiong Y, et al. Application values of 2D and 3D Radiomics models based on CT plain scan in differentiating benign from malignant ovarian tumors. Biomed Res Int. 2022;2022:1‐11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Li J, Zhang T, Ma J, Zhang N, Zhang Z, Ye Z. Machine‐learning‐based contrast‐enhanced computed tomography radiomic analysis for categorization of ovarian tumors. Front Oncol. 2022;12:934735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Song XL, Ren JL, Zhao D, Wang L, Ren H, Niu J. Radiomics derived from dynamic contrast‐enhanced MRI pharmacokinetic protocol features: the value of precision diagnosis ovarian neoplasms. Eur Radiol. 2021;31:368‐378. [DOI] [PubMed] [Google Scholar]
  • 33. Li S, Liu J, Xiong Y, et al. A radiomics approach for automated diagnosis of ovarian neoplasm malignancy in computed tomography. Sci Rep. 2021;11:8730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Li NY, Shi B, Chen YL, et al. The value of MRI findings combined with texture analysis in the differential diagnosis of primary ovarian granulosa cell tumors and ovarian Thecoma‐Fibrothecoma. Front Oncol. 2021;11:758036. doi: 10.3389/fonc.2021.758036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Zhang H, Mao Y, Chen X, et al. Magnetic resonance imaging radiomics in categorizing ovarian masses and predicting clinical outcome: a preliminary study. Eur Radiol. 2019;29:3358‐3371. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1. Search strategy.

Table S1. Description of quality assessment based on QUADAS‐AI domains.

Table S2. Data source for the 9 studies that developed machine learning models using only radiomic features extracted from Computed Tomography (CT) or Magnetic Resonance Imaging (MRI) images, reported in the study by Huang et al.

AOGS-104-1433-s001.docx (24.2KB, docx)

Articles from Acta Obstetricia et Gynecologica Scandinavica are provided here courtesy of Nordic Federation of Societies of Obstetrics and Gynecology (NFOG) and John Wiley & Sons Ltd

RESOURCES