Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2017 Jun 1;49(6):784–792. doi: 10.1002/uog.17225

Estimating risk of malignancy in adnexal masses: external validation of the ADNEX model and comparison with other frequently used ultrasound methods

E M J Meys 1,2, L S Jeelof 1, N M J Achten 1, B F M Slangen 1,2, S Lambrechts 1,2, R F P M Kruitwagen 1,2, T Van Gorp 1,2,
PMCID: PMC5488216  PMID: 27514486

ABSTRACT

Objectives

To validate externally the performance of the Assessment of Different NEoplasias in the adneXa (ADNEX) model and compare this model with other frequently used models in the differentiation between benign and malignant adnexal masses.

Methods

In this retrospective diagnostic accuracy study, we assessed data collected prospectively from patients with adnexal pathology who underwent real‐time transvaginal or transrectal ultrasound by a single expert ultrasonographer in a tertiary care hospital between July 2011 and July 2015. The presence of a malignancy was determined by subjective assessment and use of four prediction models: the ADNEX model, simple ultrasound‐based rules (simple rules), Logistic Regression model 2 (LR2) and the Risk of Malignancy Index (RMI), of which three different variants were assessed. Pathology was the clinical reference standard.

Results

In total, 851 consecutive patients underwent ultrasound examination for an adnexal mass. For 326 patients (128 premenopausal and 198 postmenopausal), pathology results were available (211 (64.7%) benign; 115 (35.3%) malignant) and these were included in the analysis. The area under the receiver–operating characteristics curve (AUC) of the ADNEX model for the discrimination between benign and malignant tumors was 0.93 (95% CI, 0.89–0.95). AUCs for the subtypes of malignancy (i.e. borderline, Stage I–IV and metastatic adnexal tumors) ranged between 0.60 and 0.90. Only subjective assessment (AUC, 0.96 (95% CI, 0.93–0.98)) was superior to the ADNEX model (P = 0.01) in differentiating malignant from benign tumors. AUCs for the other models were 0.92 (95% CI, 0.89–0.95) for LR2, 0.85 (95% CI, 0.81–0.89) for RMI‐I, 0.82 (95% CI, 0.77–0.86) for RMI‐II and 0.84 (95% CI, 0.80–0.88) for RMI‐III. At the proposed cut‐off of ≥ 10%, the ADNEX model had the highest sensitivity (0.98 (95% CI, 0.93–1.00)) but the lowest specificity (0.62 (95% CI, 0.55–0.68)) compared with the other models. Both subjective assessment (sensitivity, 0.90 (95% CI, 0.83–0.95); specificity 0.91 (95% CI, 0.86–0.94)) and the simple rules model with inconclusive cases classified by subjective assessment (sensitivity, 0.89 (95% CI, 0.81–0.94); specificity, 0.90 (95% CI, 0.85–0.94)) had lower sensitivity, but their sensitivity and specificity were better balanced.

Conclusions

Although the test performance of subjective assessment by an expert remains superior, the ADNEX model can help in the differentiation between benign and malignant ovarian tumors. The advantage of the ADNEX model as a polytomous model remains to be shown. © 2016 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of the International Society of Ultrasound in Obstetrics and Gynecology.

Keywords: ADNEX model, diagnostic test accuracy, LR2, ovarian carcinoma, RMI, simple rules, subjective assessment

INTRODUCTION

Adnexal masses are common. Although only a minority are malignant, their management, and therefore both patient morbidity and mortality, depends on their correct preoperative differentiation. For benign masses, conservative management or laparoscopic and fertility‐sparing surgery is preferred. Laparoscopy is associated with reduced morbidity and lower cost when compared with laparotomy1. In the case of malignancy, however, more extensive surgery is necessary, preferably performed in an oncology center. This is essential to optimize care and thereby survival of the patient2, 3. Ultrasound examination, more specifically subjective assessment by an expert examiner, is considered the best way to differentiate malignant from benign adnexal masses prior to surgery4. However, an expert examiner is not always available.

Various ultrasound‐based prediction models and scoring systems have been developed to support the diagnosis of adnexal masses by less experienced examiners. The Risk of Malignancy Index (RMI) is one such scoring system5, 6, 7, and is currently recommended by many national guidelines. However, performance of this model is poor4, 8. Other models, with better test accuracy, include the International Ovarian Tumour Analysis (IOTA) simple ultrasound‐based rules (‘simple rules’)9 and IOTA Logistic Regression model 2 (LR2)10. Another model with excellent test performance was developed recently: the Assessment of Different NEoplasias in the adneXa (ADNEX) model11. This model predicts not only whether a mass is malignant, but also, to a certain extent, the type of malignancy. Insight into the specific tumor type makes it possible to optimize treatment, which may reduce morbidity and enhance the chances of survival3. For example, the distinction between malignancy and borderline malignancy is relevant for the treatment of premenopausal women in the context of fertility preservation.

The aim of this study was to validate externally the performance of the ADNEX model, and compare it with that of other frequently used models, in the differentiation between benign and malignant adnexal masses.

METHODS

Study design and setting

This was a retrospective, single‐center, diagnostic accuracy study, conducted at a tertiary care hospital using data collected prospectively between July 2011 and July 2015. A single ultrasonographer (T.V.G.) with more than 10 years' experience in gynecological ultrasound (Level‐3 examiner) assessed all consecutively recruited patients with adnexal pathology12. All women underwent transvaginal or transrectal grayscale and color Doppler ultrasound examination, using a Voluson E8 (GE Healthcare Ultrasound, Milwaukee, WI, USA) ultrasound machine. If the mass was too large to be seen entirely by transvaginal ultrasound, or if malignancy was suspected, transabdominal ultrasound was also performed. The operator assessed the sonographic tumor morphology based on the nomenclature of the IOTA Group13, recording the ultrasound findings in a secure electronic data‐collection system (Astraia version 1.23.6, Astraia Software GmbH, Munich, Germany) together with demographic data, tumor markers and tumor diagnosis based on subjective assessment. The complete list of prospectively collected ultrasound features is shown in Table S1. Along with the subjective classification as benign or malignant, the ultrasound examiner noted his level of confidence (certain, probable or uncertain). All assessments were done prior to obtaining the histological diagnosis.

Patients were excluded when no pathology result was obtained, when the pathology result was known before the ultrasound examination (from transabdominal biopsy in the case of metastasis), when pathology was obtained > 120 days after the ultrasound examination and when a patient had previously undergone a bilateral oophorectomy. Patients with a previous hysterectomy who were 50 years of age or older and patients with amenorrhea of more than 1 year were defined as postmenopausal.

Pathology was the clinical reference standard used for all patients in this study. Results were obtained by either surgery or biopsy of a metastasis and added to the database. The pathologist was unaware of the results of the ultrasound examination. Tumors were classified according to the World Health Organization International Classification of Ovarian Tumors14. Tumor stage was defined according to the International Federation of Gynecologists and Obstetricians (FIGO) 2012 classification15.

The study was approved by the Medical Research Ethics Committee of the Maastricht University Medical Center in The Netherlands. According to Dutch law, this study was not subject to formal ethics committee assessment and therefore no informed consent of patients was required. STARD guidelines16 were followed for the conduct, analysis and reporting of the study.

Prediction models

Risk of malignancy was determined by four prediction models and subjective assessment by the expert ultrasonographer.

The ADNEX model11 includes nine variables: age (years), serum CA 125 level (U/mL), type of center (oncology center/other hospital), maximum diameter of the lesion (mm), proportion of solid tissue (%), number of papillary projections (0/1/2/3/> 3), more than 10 cyst locules (yes/no), acoustic shadow (yes/no) and ascites (yes/no). The formula for the risk calculation can be found in the original article11; for use in clinical practice, an application is available (http://www.iotagroup.org/adnexmodel). The outcome of this model is an absolute risk estimate (expressed as a percentage) for five different types of adnexal pathology: benign, borderline, Stage‐I invasive, Stage‐II–IV invasive and secondary metastatic. Furthermore, a risk estimate for the overall risk of malignancy is given (which is the sum of the estimates for all subtypes of malignancy). A cut‐off of ≥ 10% for the overall risk of malignancy was used to predict malignancy11, 17.

The IOTA simple rules model (Table 1)9 includes five ultrasound features suggestive of benignity (B‐features) and five features suggestive of malignancy (M‐features). If one or more B‐features are present in the absence of M‐features, the mass is classified as benign, and vice versa. If both B‐ and M‐features exist or if none of the 10 features is present, the simple rules yields an inconclusive result. Two different approaches were used for these difficult‐to‐diagnose masses: use of subjective assessment by the expert ultrasonographer as a second‐stage test, and classification of all inconclusive masses as malignant.

Table 1.

International Ovarian Tumour Analysis (IOTA) simple ultrasound‐based rules9 for prediction of malignancy in adnexal mass, divided into five benign (B)‐features and five malignant (M)‐features

B1 Unilocular tumor M1 Irregular solid tumor
B2 Solid component with largest diameter < 7 mm M2 Presence of ascites
B3 Presence of acoustic shadow M3 ≥ 4 papillary projections
B4 Smooth multilocular tumor with largest diameter < 100 mm M4 Irregular multilocular solid tumor with largest diameter ≥ 100 mm
B5 No blood flow (color score 1) M5 Very strong blood flow (color score 4)

The LR2 model10 uses six variables to estimate the probability of malignancy: (a) age (years); (b) presence of ascites (yes = 1, no = 0); (c) presence of blood flow within a papillary projection (yes = 1, no = 0); (d) maximum diameter of the solid component (mm; capped at 50 mm); (e) irregular internal cyst walls (yes = 1, no = 0); and (f) presence of acoustic shadow (yes = 1, no = 0). The estimated probability of malignancy for an adnexal tumor is calculated by LR2 as: 1/(1 exp(−z)), where z = −5.3718 + 0.0354a + 1.6159b + 1.1768c + 0.0697d + 0.9586e – 2.9486f. A cut‐off of ≥ 0.1 (≥ 10%) was used to predict malignancy.

The RMI scoring system5, 6, 7 combines the ultrasound features of the mass (U), the menopausal status of the patient (M) and the serum CA 125 level (U/mL) into a risk score (U × M × serum CA 125). The ultrasound features are multilocularity, solid areas, bilaterality, ascites and intra‐abdominal metastases. Three principal variants of the RMI were applied (RMI‐I, RMI‐II and RMI‐III), which differed according to the points attributed to the different ultrasound variables and the menopausal status of the patient (Table 2). A total score of ≥ 200 was used as a cut‐off for malignancy.

Table 2.

Characteristics of three variants of Risk of Malignancy Index (RMI) scoring system for prediction of malignancy in adnexal mass5, 6, 7

RMI variant Ultrasound score (U) Menopausal status (M)
Characteristic Score Characteristic Score
RMI‐I No features present 0 Premenopausal 1
1 feature present 1 Postmenopausal 3
≥ 2 features present 3
RMI‐II ≤ 1 feature present 1 Premenopausal 1
≥ 2 features present 4 Postmenopausal 4
RMI‐III ≤ 1 feature present 1 Premenopausal 1
≥ 2 features present 3 Postmenopausal 3

Ultrasound score (U) includes five features: multilocular cyst, solid areas, bilateral cysts, ascites and intra‐abdominal metastases.

Total scores for U and M are inserted into the following formula to calculate RMI: U × M × serum CA 125.

Statistical analysis

All data analyses were performed with IBM SPSS statistics v20 (IBM Corp, Los Angeles, CA, USA) and MedCalc v16.1 (MedCalc Software, Mariakerke, Belgium). For statistical purposes, borderline tumors were considered malignant. In women with bilateral tumors, only the tumor with the most complex ultrasound morphology was included in the statistical analysis. If both masses had the same morphology, the mass with the largest size was used. We calculated sensitivity, specificity, positive and negative predictive values (PPV and NPV) and positive and negative likelihood ratios (LR+ and LR–) for the cut‐off points proposed in the original publications for each model. We also performed a subgroup analysis for pre‐ and postmenopausal patients. Multiple imputation (fully conditional specification) was used to deal with missing values of serum CA 12511, 18. Predictive mean matching regression was applied, using variables from our dataset related to the level of CA 125 (i.e. values included in the ADNEX model and others such as pathology results, previous hysterectomy and parity), or the unavailability of this tumor marker (i.e. a binary indicator with value 1 if results from CA 125 were missing and value 0 if they were not).

Receiver–operating characteristics (ROC) curves were derived for the ADNEX model, subjective assessment, LR2 and RMI, and summarized by calculating the area under the curve (AUC) with 95% CI using exact methods based on the binomial distribution. To calculate the AUC for subjective assessment, six levels of diagnostic confidence were used (certainly benign; probably benign; uncertain, but most likely benign; uncertain, but most likely malignant; probably malignant; certainly malignant). The method described by DeLong et al.19 was used to calculate statistical significance of differences between AUCs. The McNemar test was used to test the statistical significance of differences in sensitivity and specificity between the various models, when an AUC could not be calculated (i.e. the two different variants of the simple rules). P < 0.05 was considered statistically significant for all comparisons.

RESULTS

Between July 2011 and July 2015 a total of 851 patients visited our hospital to undergo adnexal ultrasound examination by an expert, and pathology results were obtained for 424 of them. The final cohort consisted of 326 consecutive patients who fulfilled our inclusion criteria, involving 128 (39.3%) premenopausal and 198 (60.7%) postmenopausal patients. A detailed overview of patient inclusion is shown in Figure 1. Patient characteristics and data for the ultrasound features used in the different models are shown in Table 3.

Figure 1.

UOG-17225-FIG-0001-b

Flow diagram summarizing inclusion of patients in the study. US, ultrasound examination.

Table 3.

Descriptive statistics for patient characteristics and ultrasound features according to tumor type in 326 patients with adnexal mass

Variable Benign (n = 211) Borderline (n = 27) Stage I (n = 18) Stage II–IV (n = 56) Metastatic (n = 14)
Age (years) 53.2 50.6 63.1 67.7 64.6
(16.1–87.2) (36.9–65.8) (50.3–68.5) (32.3–87.0) (20.0–87.1)
CA 125 (U/mL)* 26.0 61.9 109.5 456.0 78.6
[16.5–27.0] [27.5–295.0] [16.8–361.5] [170.8–1175.0] [27.5–260.8]
Missing values for CA 125 31 (14.7) 1 (3.7) 0 (0.0) 0 (0.0) 0 (0.0)
Menopausal status
Premenopausal 97 (46.0) 15 (55.6) 6 (33.3) 7 (12.5) 3 (21.4)
Postmenopausal 114 (54.0) 12 (44.4) 12 (66.7) 49 (87.5) 11 (78.6)
Patient pregnant 2 (0.9) 1 (3.7) 0 (0.0) 1 (1.8) 0 (0.0)
Family history of ovarian cancer 6 (2.8) 1 (3.7) 0 (0.0) 3 (5.3) 0 (0.0)
Laterality of tumor
Unilateral 169 (80.1) 21 (77.8) 16 (88.9) 35 (62.5) 10 (71.4)
Bilateral 42 (19.9) 6 (22.2) 2 (11.1) 21 (37.5) 4 (28.6)
Maximum diameter of lesion (mm) 80.0 155.0 122.5 71.5 105.0
[59.0–115.0] [123.0–229.0] [92.0–214.0] [50.3–102.8] [63.0–133.5]
Type of tumor
Unilocular 65 (30.8) 0 (0.0) 0 (0.0) 0 (0.0) 1 (7.1)
Multilocular 71 (33.6) 6 (22.2) 0 (0.0) 0 (0.0) 0 (0.0)
Unilocular‐solid 19 (9.0) 5 (19.5) 2 (11.1) 4 (7.1) 0 (0.0)
Multilocular‐solid 35 (16.6) 16 (59.3) 11 (61.1) 21 (37.5) 4 (28.6)
Solid 20 (9.5) 0 (0.0) 5 (27.8) 31 (55.4) 9 (64.3)
Unclassifiable 1 (0.5) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)
Solid tissue 1 (0.5) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)
Presence of solid tissue 78 (37.0) 21 (77.8) 18 (100.0) 55 (98.2) 14 (100.0)
Maximum diameter of solid tissue (mm) 0.0 50.3 52.5 58.0 60.5
[0.0–16.0] [25.0–60.5] [29.8–93.5] [33.5–80.0] [34.5–123.5]
Proportion of solid tissue if present (%) 28.9 27.9 55.9 100.0 100.0
[15.7–100.0] [18.4–46.9] [20.9–100.0] [56.1–100.0] [54.6–100.0]
Number of locules
0 21 (10.0) 0 (0.0) 5 (27.8) 31 (55.4) 9 (64.3)
1–4 134 (63.5) 12 (44.4) 3 (16.7) 7 (12.5) 3 (21.4)
5–10 29 (13.7) 4 (14.8) 4 (22.2) 13 (23.2) 1 (7.1)
> 10 27 (12.8) 11 (40.7) 6 (33.3) 5 (8.9) 1 (7.1)
Number of papillary projections
0 167 (79.1) 13 (48.1) 13 (72.2) 46 (82.1) 12 (85.7)
1 22 (10.4) 3 (11.1) 1 (5.6) 5 (8.9) 1 (7.1)
2 8 (3.8) 1 (3.7) 0 (0.0) 1 (1.8) 0 (0.0)
3 1 (0.5) 1 (3.7) 0 (0.0) 1 (1.8) 0 (0.0)
> 3 13 (6.2) 9 (33.3) 4 (22.2) 3 (5.4) 1 (7.1)
Blood flow in papillary projections: color Doppler score
1 106 (50.2) 3 (11.1) 1 (5.6) 3 (5.4) 2 (14.3)
2 75 (35.5) 11 (40.7) 3 (16.7) 11 (19.6) 2 (14.3)
3 24 (11.4) 12 (44.4) 11 (61.1) 24 (42.9) 4 (28.6)
4 6 (2.8) 1 (3.7) 3 (16.7) 18 (32.1) 6 (42.9)
Irregular cyst wall 78 (37.0) 20 (74.1) 13 (72.2) 51 (91.1) 13 (92.9)
Metastases 2 (0.9) 3 (11.1) 1 (5.6) 34 (60.7)
Acoustic shadow 78 (37.0) 4 (14.8) 2 (11.1) 5 (8.9) 1 (7.1)
Ascites 13 (6.2) 6 (22.2) 3 (16.7) 35 (62.5) 2 (14.3)

Data are given as n (%), median (range) or median [interquartile range].

*

Results based on multiple imputation of missing values.

The median interval between ultrasound examination and obtaining the pathology results was 21 days. Results were benign for 211 (64.7%) masses and malignant for 115 (35.3%) masses (Table 4). The most common benign pathologies were cystadenoma, endometrioma, mature teratoma, fibroma and cystadenofibroma. Six benign masses consisted of mixed pathology (two or more different histological subtypes) and therefore could not be categorized into a specific subtype.

Table 4.

Pathology results of 326 adnexal masses

Pathology n (%)
Benign 211 (64.7)
Cystadenoma  82 (25.2)
Endometriotic cyst  39 (12.0)
Mature teratoma  29 (8.9) 
Fibroma  23 (7.1) 
Cystadenofibroma  15 (4.6) 
Salpingitis   6 (1.8) 
Functional cyst   4 (1.2) 
Parasalpingeal cyst   2 (0.6) 
Struma ovarii   2 (0.6) 
Pseudocyst   2 (0.6) 
Unknown type   1 (0.3) 
Mixed   6 (1.8) 
Borderline  27 (8.3) 
Serous  11 (3.4) 
Mucinous  13 (4.0) 
Other   3 (0.9) 
Malignant  88 (27.0)
Epithelial ovarian cancer  70 (21.5)
Stage
Stage I  14 (20.0)
Stage II   8 (11.4)
Stage III  27 (38.6)
Stage IV  21 (30.0)
Differentiation grade
Grade 1   8 (11.4)
Grade 2  12 (17.1)
Grade 3  46 (65.7)
Unknown   4 (5.7) 
Granulosa cell carcinoma   3 (0.9) 
Yolk‐sac tumor   1 (0.3) 
Metastatic tumor  10 (3.1) 
Non‐primary ovarian carcinoma   4 (1.2) 

The majority (84.3% (97/115)) of malignancies consisted of epithelial ovarian carcinomas. Almost a quarter of all malignant masses were borderline tumors. Furthermore, 14 patients were diagnosed with extraovarian primary tumors; 10 of these were extraovarian tumors (mainly of gastrointestinal or endometrial origin) with metastases to the ovaries, while the others were primary tumors of rectosigmoid or endometrial origin, mimicking a primary tumor of the ovary.

Validation of ADNEX model

The ADNEX model, at a cut‐off ≥ 10%, had a sensitivity of 0.98 (95% CI, 0.93–1.00) and a specificity of 0.62 (95% CI, 0.55–0.68) (Table 5). The AUC for the overall discrimination between benign and malignant tumors was 0.93 (95% CI, 0.89–0.95). AUCs for discrimination between different tumor subgroups ranged between 0.60 and 0.97 (Table 6). The model was particularly able to distinguish benign from Stage‐II–IV tumors, benign from secondary metastatic cancer and borderline from secondary metastatic cancer. In contrast, discrimination between borderline and Stage‐I tumors and between Stage‐II–IV tumors and secondary metastatic cancer was mediocre (Table 6).

Table 5.

Diagnostic performance indices for subjective assessment (SA) and four prediction models for differentiation between benign and malignant adnexal masses, in whole study population (n = 326) and in premenopausal (n = 128) and postmenopausal (n = 198) subgroups

Assessment method Sensitivity Specificity PPV NPV LR+ LR–
All patients
ADNEX 0.98 0.62 0.58 0.98 2.56 0.03
(0.93–1.00) (0.55–0.68) (0.51–0.65) (0.94–0.99) (2.15–3.04) (0.01–0.11)
SA 0.90 0.91 0.83 0.95 9.54 0.11
(0.83–0.95) (0.86–0.94) (0.76–0.90) (0.90–0.97) (6.26–14.54) (0.06–0.19)
IOTA‐SR + mal 0.93 0.68 0.61 0.93 2.51 0.11
(0.86–0.97) (0.61–0.70) (0.54–0.69) (0.87–0.97) (2.07–3.06) (0.06–0.22)
IOTA‐SR + SA 0.89 0.90 0.83 0.94 8.91 0.13
(0.81–0.94) (0.85–0.94) (0.75–0.89) (0.89–0.96) (5.91–13.44) (0.08–0.21)
LR2 0.93 0.79 0.71 0.95 4.46 0.09
(0.86–0.97) (0.73–0.84) (0.63–0.78) (0.91–0.98) (3.41–5.83) (0.04–0.17)
RMI‐I 0.71 0.79 0.65 0.83 3.34 0.36
(0.62–0.79) (0.72–0.84) (0.56–0.73) (0.77–0.88) (2.52–4.44) (0.27–0.49)
RMI‐II 0.74 0.73 0.60 0.84 2.74 0.36
(0.65–0.81) (0.66–0.79) (0.51–0.68) (0.77–0.89) (2.14–3.50) (0.26–0.49)
RMI‐III 0.71 0.81 0.67 0.84 3.76 0.35
(0.62–0.79) (0.75–0.86) (0.58–0.75) (0.78–0.88) (2.78–5.09) (0.26–0.47)
Premenopausal
ADNEX 1.00 0.71 0.53 1.00 3.46 0.00
(0.86–1.00) (0.61–0.80) (0.39–0.66) (0.93–1.00) (2.53–4.73) (0–NA)
SA 0.84 0.96 0.87 0.95 20.30 0.17
(0.66–0.94) (0.89–0.99) (0.68–0.96) (0.88–0.98) (7.70–53.75) (0.08–0.38)
IOTA‐SR + mal 0.94 0.76 0.56 0.97 3.95 0.08
(0.77–0.99) (0.66–0.84) (0.41–0.69) (0.89–1.00) (2.73–5.70) (0.02–0.33)
IOTA‐SR + SA 0.87 0.96 0.87 0.96 21.12 0.13
(0.69–0.96) (0.89–0.99) (0.69–0.96) (0.89–0.99) (8.01–55.66) (0.05–0.34)
LR2 0.83 0.92 0.76 0.95 10.17 0.18
(0.66–0.94) (0.84–0.96) (0.58–0.89) (0.87–0.98) (5.14–20.10) (0.08–0.39)
RMI‐I 0.42 0.94 0.68 0.83 6.78 0.62
(0.25–0.61) (0.86–0.97) (0.43–0.86) (0.75–0.90) (2.82–16.32) (0.46–0.84)
RMI‐II 0.45 0.92 0.64 0.84 5.48 0.60
(0.28–0.64) (0.84–0.96) (0.41–0.82) (0.75–0.90) (2.54–11.81) (0.43–0.82)
RMI‐III 0.39 0.95 0.71 0.83 7.51 0.65
(0.22–0.58) (0.88–0.98) (0.44–0.89) (0.74–0.89) (2.87–19.65) (0.49–0.86)
Postmenopausal
ADNEX 0.98 0.54 0.61 0.97 2.10 0.04
(0.91–1.00) (0.44–0.63) (0.52–0.69) (0.88–0.99) (1.72–2.56) (0.01–0.18)
SA 0.93 0.86 0.83 0.94 6.62 0.08
(0.86–0.97) (0.78–0.92) (0.74–0.90) (0.87–0.98) (4.19–10.46) (0.04–0.18)
IOTA‐SR + mal 0.93 0.61 0.64 0.92 2.41 0.12
(0.85–0.97) (0.52–0.70) (0.55–0.72) (0.83–0.97) (1.89–3.06) (0.05–0.25)
IOTA‐SR + SA 0.89 0.85 0.82 0.92 5.99 0.13
(0.80–0.95) (0.77–0.91) (0.72–0.89) (0.84–0.96) (3.84–9.34) (0.07–0.23)
LR2 0.96 0.68 0.69 0.96 3.05 0.05
(0.89–0.99) (0.59–0.77) (0.60–0.77) (0.89–0.99) (2.32–4.01) (0.02–0.16)
RMI‐I 0.82 0.66 0.64 0.83 2.40 0.27
(0.72–0.89) (0.56–0.74) (0.54–0.73) (0.74–0.90) (1.83–3.16) (0.17–0.43)
RMI‐II 0.85 0.57 0.59 0.83 1.97 0.27
(0.75–0.91) (0.47–0.66) (0.50–0.68) (0.73–0.90) (1.56–2.48) (0.16–0.45)
RMI‐III 0.83 0.69 0.67 0.85 2.71 0.24
(0.73–0.90) (0.60–0.77) (0.57–0.75) (0.76–0.91) (2.03–3.63) (0.15–0.39)

Values in parentheses are 95% CI.

Prediction models: Assessment of Different NEoplasias in the adneXa (ADNEX) model11; subjective assessment (SA); International Ovarian Tumour Analysis simple ultrasound‐based rules9 (IOTA‐SR), applied both with inconclusive results being considered to be malignant (IOTA‐SR + mal) and with inconclusive results diagnosed by subjective assessment (IOTA‐SR + SA); IOTA Logistic Regression model 210 (LR2); and three variants of Risk of Malignancy Index (RMI)5, 6, 7.

For both ADNEX and LR2 models, cut‐off value of ≥ 0.1 (≥ 10%) was used; for variants of RMI model, cut‐off value of ≥ 200 was used.

LR–, negative likelihood ratio; LR+, positive likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.

Table 6.

Performance of Assessment of Different NEoplasias in the adneXa (ADNEX) model11 for five tumor types, expressed as area under the receiver–operating characteristics curve (AUC)

Tumor type AUC (95% CI)
Benign vs borderline 0.81 (0.75–0.86)
Benign vs Stage I 0.87 (0.84–0.91)
Benign vs Stage II–IV 0.97 (0.94–0.99)
Benign vs metastatic 0.93 (0.89–0.96)
Borderline vs Stage I 0.60 (0.44–0.74)
Borderline vs Stage II–IV 0.87 (0.78–0.93)
Borderline vs metastatic 0.90 (0.77–0.97)
Stage I vs Stage II–IV 0.82 (0.71–0.90)
Stage I vs metastatic 0.72 (0.53–0.86)
Stage II–IV vs metastatic 0.67 (0.55–0.78)

ADNEX model vs other methods

When comparing overall test performance, expressed as AUC, subjective assessment performed significantly better than did the ADNEX model (P = 0.01), with AUCs of 0.96 (95% CI, 0.93–0.98) and 0.93 (95% CI, 0.89–0.95), respectively (Table 7 and Figure 2). The difference between the ADNEX model and LR2 (AUC, 0.92 (95% CI, 0.89–0.95)) was not significant (P = 0.60). The AUCs of all variants of the RMI were significantly lower than those of the other methods in our comparison (all P < 0.001).

Table 7.

Pairwise receiver–operating characteristics (ROC) curve comparisons expressed as differences in area under the curve (AUC) and P‐values calculated for whole study population

SA LR2 RMI‐I RMI‐II RMI‐III
ADNEX 0.027 (0.008–0.047)* 0.005 (−0.015–0.025) 0.075 (0.040–0.109) 0.108 (0.066–0.149) 0.088 (0.049–0.127)
P = 0.01 P = 0.5968 P < 0.0001 P < 0.0001 P < 0.0001
SA 0.033 (0.007–0.058) 0.102 (0.062–0.141) 0.135 (0.089–0.182) 0.115 (0.072–0.159)
P = 0.0119 P < 0.0001 P < 0.0001 P < 0.0001
LR2 0.069 (0.029–0.110) 0.103 (0.057–0.148) 0.082 (0.039–0.126)
P = 0.0009 P < 0.0001 P = 0.0002
RMI‐I 0.033 (0.016–0.051) 0.013 (0.004–0.022)
P = 0.0003 P = 0.0041
RMI‐II 0.020 (0.004–0.036)
P = 0.0123
RMI‐III

Prediction models: Assessment of Different NEoplasias in the adneXa (ADNEX) model11; subjective assessment (SA); IOTA Logistic Regression model 210 (LR2); and three variants of Risk of Malignancy Index (RMI)5, 6, 7.

Methods in left column are used as reference standard for comparisons:

*

model in upper row outperforms corresponding model in left column;

model in left column outperforms corresponding model in upper row.

Values in parentheses are 95% CI.

Figure 2.

UOG-17225-FIG-0002-c

Receiver–operating characteristics curves for detection of malignant disease (including borderline ovarian tumors) for the Assessment of Different NEoplasias in the adneXa (ADNEX) model11, subjective assessment (SA), International Ovarian Tumour Analysis (IOTA) Logistic Regression model 210 (LR2) and three variants of the Risk of Malignancy Index5, 6, 7 (RMI) in the whole population (n = 326) (a) and in premenopausal (n = 128) (b) and postmenopausal (n = 198) (c) subgroups. AUC, area under the curve.

For the study population as a whole, among all the methods assessed, the sensitivity of the ADNEX model (at cut‐off ≥ 10%) was highest, although the specificity was lowest (Table 5). The sensitivity and specificity of subjective assessment differed significantly from those of the ADNEX model (P = 0.01 and P < 0.0001, respectively). The sensitivity and specificity of the simple rules, using subjective assessment in case of inconclusive test results, were comparable to those of subjective assessment and, as for subjective assessment, differed significantly from those of the ADNEX model (P = 0.03 and P < 0.001 for sensitivity and specificity, respectively). When all masses yielding inconclusive results using the simple rules were classified as malignant instead of using subjective assessment, the specificity dropped significantly (P < 0.0001), while the sensitivity remained high (P = 0.06). The three different variants of the RMI (cut‐off ≥ 200) performed worst of all the methods, with sensitivities as low as 0.71 for both RMI‐I and RMI‐III (95% CI, 0.62–0.79 for both), resulting in the largest differences in sensitivity from that of the ADNEX model (P < 0.0001 for both).

Optimal cut‐off

We calculated optimal cut‐off values for all models with a cut‐off (i.e. ADNEX model, LR2, RMI‐I, RMI‐II and RMI‐III) at a fixed sensitivity of 90% (Table S2). The optimal cut‐off values for both the ADNEX model and LR2 were higher in our population than the values applied in the original articles: ≥ 26.1% for the ADNEX model, with a specificity of 0.76 (95% CI, 0.66–0.85) and ≥ 16.5% for LR2, with a specificity of 0.82 (95% CI, 0.68–0.89). Optimal cut‐off values for RMI‐I, RMI‐II and RMI‐III, on the other hand, were lower in our population (≥ 63.7%, ≥ 51.3% and ≥ 64.3%, respectively).

Pre‐ and postmenopausal subgroups

Malignant masses occurred more frequently in postmenopausal than in premenopausal women (42.4% and 24.2%, respectively). Subjective assessment had the highest diagnostic accuracy for differentiating between benign and malignant adnexal masses in both pre‐ and postmenopausal subgroups (Table 5 and Figure 2). Nonetheless, the differences between the AUC for subjective assessment and that for the ADNEX model and for LR2 were not significant (P = 0.65 and P = 0.08, respectively) for premenopausal women (Table S3), while in the postmenopausal subgroup this difference for the ADNEX model was significant (P = 0.02) (Table S4).

DISCUSSION

We have shown that the ADNEX model has good overall performance in the differentiation between benign and malignant adnexal masses, with an AUC of 0.93 (95% CI, 0.89–0.95). At the recommended cut‐off of ≥ 10%, the model had high sensitivity; however, this was at the expense of specificity. In our population, the optimal cut‐off of ≥ 26.1% gave somewhat more balanced results for sensitivity and specificity. The model is particularly good at differentiating benign from Stage‐II–IV or secondary metastatic tumors and borderline from secondary metastatic cancer. However, other tumor types could be distinguished less easily. Furthermore, our study suggests that subjective assessment remains superior to the ADNEX model.

In the original article11, validation AUCs for the ADNEX model were slightly higher than ours, especially for differentiating between various types of malignancy. The model showed better discrimination of borderline from Stage‐I tumors (AUC, 0.75 in the original vs 0.60 in the current study) and Stage‐II–IV from metastatic tumors (AUC, 0.82 in the original vs 0.67 in the current study). This could be due to the distribution of tumor types in each dataset. For example, in the present study, the number of borderline tumors amounted to almost a quarter (23.4%) of all malignancies, and borderline tumors are generally known to be difficult to diagnose20. This also resulted in a slightly lower‐than‐expected test accuracy of subjective assessment. Furthermore, the number of inconclusive results when applying the IOTA simple rules was higher than usual (26.3% in the present study vs 19% in a recent review4). However, general malignancy rate in our study (27%) was similar to that in the original publication (33%). Moreover, this study was conducted in an oncology center, while the original study was performed in both second‐ and third‐level hospitals. Although type of center is the weakest predictor in this model, this could mean that results from our study might not be generalizable11.

The poorest performance was seen for RMI, yet this method is advocated by many guidelines. Although a sensitivity of ≥ 90% is generally considered most important in the preoperative diagnosis of ovarian carcinoma, the sensitivity of RMI‐III was only 71% in our study. This is in accordance with the sensitivity of 0.71 (95% CI, 0.67–0.75) for RMI‐III reported in a recent review of 18 studies validating RMI4. Thus, more than a quarter of ovarian carcinomas will be missed, leading to incorrect treatment of these masses and subsequently to deterioration of the prognosis of these patients3, 21.

This external validation study compared the ADNEX model with other frequently used models to evaluate its added value. It is a strength of our study that the collection of clinical and ultrasound data was meticulous and prospective, in accordance with IOTA nomenclature and measurement techniques, based on real‐time ultrasound and with blinding to pathology results. Although data analysis was done retrospectively, this was not regarded a limitation since previous research on the effects of design‐related biases in studies assessing diagnostic tests showed that a retrospective design is not associated with overestimation or underestimation of diagnostic accuracy22. Overestimation may occur in diagnostic accuracy studies that use different reference tests or with inadequate blinding; this was not the case in our study.

A limitation of this study is that CA 125 values were missing in 32 (9.8%) patients. When an adnexal mass gives the impression of being completely benign from the overall clinical picture and morphology on ultrasound, clinicians may be less inclined to determine the CA 125 level preoperatively. Had we excluded these patients, we would have introduced selection bias, since all but one of the cases in which CA 125 data were missing were benign (the exception concerned a mucinous borderline tumor). Another potential limitation is that we included both pregnant patients (n = 4) and non‐primary ovarian carcinomas (n = 4). The level of CA 125 can rise during pregnancy, which could lead to an overestimation of the risk of malignancy by models incorporating CA 12523. However, an analysis performed on all patients except the pregnant patients confirmed that pregnancy hardly influenced the results of the ADNEX model (data not shown). The non‐primary ovarian carcinomas were included only if the ultrasound findings were suspicious for ovarian pathology. This is in accordance with daily clinical practice, because the risk of malignancy is estimated after ultrasound and before surgery, and therefore before pathology results revealing non‐primary ovarian carcinoma become available. Finally, ultrasound examinations in our study were performed by an expert ultrasonographer. It remains to be shown if the ADNEX model retains its performance when applied by non‐experts.

Using the ADNEX model, absolute risk estimates for benign tumors and four types of malignancy can be obtained with acceptable diagnostic performance. However, how to use the model clinically is not straightforward, as also observed by Van Calster et al.17. Two options are available. First, a cut‐off can be used, such as the one applied in this study. However, rigid use of a cut‐off may result in suboptimal and even unethical judgment, according to Van Calster et al.17. Furthermore, this can only be used to distinguish between benign and malignant masses, thereby losing the advantage of a polytomous model (i.e. a model differentiating between more than two subgroups). Second, an assessment per tumor type can be made to estimate how the predicted risk per type relates to the baseline risk. This requires certain calculations (not supplied by the IOTA application for the ADNEX model) and can also be difficult to interpret.

In this study the ADNEX model was used as a single test, but it can also be applied as a two‐step triage test. For example, when results of the ADNEX model are between certain values (e.g. 5–25%), subjective assessment can be used as a second‐line test to increase diagnostic accuracy. The same kind of triage test could be performed with the other models investigated.

In conclusion, the ADNEX model can be used as a good alternative to subjective assessment in the estimation of risk of malignancy of adnexal masses. However, the advantage of the ADNEX model as a polytomous model for the differentiation between various subtypes of malignancy was modest in our study. More guidance on how to use the ADNEX model in clinical practice would be useful.

Supporting information

Table S1 List of prospectively collected ultrasound features (per adnexal mass)

Table S2 Optimal cut‐off and corresponding performance indices for models with a cut‐off value at a fixed sensitivity of 90%

Tables S3 and S4 Pairwise receiver–operating characteristics (ROC) curve comparisons expressed as differences in the area under the curve (AUC) and P‐values calculated for premenopausal (Table S3) and postmenopausal (Table S4) patients

ACKNOWLEDGMENTS

This study received funding from the Academic Fund, Maastricht University Medical Center+, The Netherlands and the CZ Fund, The Netherlands. We thank Ben van Calster from the Department of Development and Regeneration, KU Leuven, Belgium for his help with the multiple imputation analysis.

REFERENCES

  • 1. Weber S, McCann CK, Boruta DM, Schorge JO, Growdon WB. Laparoscopic surgical staging of early ovarian cancer. Rev Obstet Gynecol 2011; 4: 117–122. [PMC free article] [PubMed] [Google Scholar]
  • 2. American College of Obstetricians and Gynecologists Committee on Gynecologic Practice . Committee Opinion No. 477: the role of the obstetrician‐gynecologist in the early detection of epithelial ovarian cancer. Obstet Gynecol 2011; 117: 742–746. [DOI] [PubMed] [Google Scholar]
  • 3. Woo YL, Kyrgiou M, Bryant A, Everett T, Dickinson HO. Centralisation of services for gynaecological cancer. Cochrane Database Syst Rev 2012; 3: CD007945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Meys EM, Kaijser J, Kruitwagen RF, Slangen BF, Van Calster B, Aertgeerts B, Verbakel JY, Timmerman D, Van Gorp T. Subjective assessment versus ultrasound models to diagnose ovarian cancer: A systematic review and meta‐analysis. Eur J Cancer 2016; 58: 17–29. [DOI] [PubMed] [Google Scholar]
  • 5. Jacobs I, Oram D, Fairbanks J, Turner J, Frost C, Grudzinskas JG. A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer. Br J Obstet Gynaecol 1990; 97: 922–929. [DOI] [PubMed] [Google Scholar]
  • 6. Tingulstad S, Hagen B, Skjeldestad FE, Onsrud M, Kiserud T, Halvorsen T, Nustad K. Evaluation of a risk of malignancy index based on serum CA125, ultrasound findings and menopausal status in the pre‐operative diagnosis of pelvic masses. Br J Obstet Gynaecol 1996; 103: 826–831. [DOI] [PubMed] [Google Scholar]
  • 7. Tingulstad S, Hagen B, Skjeldestad FE, Halvorsen T, Nustad K, Onsrud M. The risk‐of‐malignancy index to evaluate potential ovarian cancers in local hospitals. Obstet Gynecol 1999; 93: 448–452. [PubMed] [Google Scholar]
  • 8. Kaijser J, Sayasneh A, Van Hoorde K, Ghaem‐Maghami S, Bourne T, Timmerman D, Van Calster B. Presurgical diagnosis of adnexal tumours using mathematical models and scoring systems: a systematic review and meta‐analysis. Hum Reprod Update 2014; 20: 449–462. [DOI] [PubMed] [Google Scholar]
  • 9. Timmerman D, Testa AC, Bourne T, Ameye L, Jurkovic D, Van Holsbeke C, Paladini D, Van Calster B, Vergote I, Van Huffel S, Valentin L. Simple ultrasound‐based rules for the diagnosis of ovarian cancer. Ultrasound Obstet Gynecol 2008; 31: 681–690. [DOI] [PubMed] [Google Scholar]
  • 10. Timmerman D, Testa AC, Bourne T, Ferrazzi E, Ameye L, Konstantinovic ML, Van Calster B, Collins WP, Vergote I, Van Huffel S, Valentin L, International Ovarian Tumor Analysis Group . Logistic regression model to distinguish between the benign and malignant adnexal mass before surgery: a multicenter study by the International Ovarian Tumor Analysis Group. J Clin Oncol 2005; 23: 8794–8801. [DOI] [PubMed] [Google Scholar]
  • 11. Van Calster B, Van Hoorde K, Valentin L, Testa AC, Fischerova D, Van Holsbeke C, Savelli L, Franchi D, Epstein E, Kaijser J, Van Belle V, Czekierdowski A, Guerriero S, Fruscio R, Lanzani C, Scala F, Bourne T, Timmerman D, International Ovarian Tumour Analysis Group . Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ 2014; 349: g5920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. European Federation of Societies for Ultrasound in Medicine and Biology . Minimum training recommendations for the practice of medical ultrasound. Ultraschall Med 2006; 27: 79–105. [DOI] [PubMed] [Google Scholar]
  • 13. Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H, Vergote I, International Ovarian Tumor Analysis Group . Terms, definitions and measurements to describe the sonographic features of adnexal tumors: a consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group. Ultrasound Obstet Gynecol 2000; 16: 500–505. [DOI] [PubMed] [Google Scholar]
  • 14. WHO . World Health Organization Classification of Tumours Pathology and Genetics of Tumours of the Breast and Female Genital Organs. Tavassoli FA, Devilee P. (eds). IARC Press: Lyon, 2003. [Google Scholar]
  • 15. Prat J, Figo Committee on Gynecologic Oncology . Staging classification for cancer of the ovary, fallopian tube, and peritoneum. Int J Gynaecol Obstet 2014; 124: 1–5. [DOI] [PubMed] [Google Scholar]
  • 16. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC, Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326: 41–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Van Calster B, Van Hoorde K, Froyman W, Kaijser J, Wynants L, Landolfo C, Anthoulakis C, Vergote I, Bourne T, Timmerman D. Practical guidance for applying the ADNEX model from the IOTA group to discriminate between different subtypes of adnexal tumors. Facts Views Vis Obgyn 2015; 7: 32–41. [PMC free article] [PubMed] [Google Scholar]
  • 18. Van Calster B, Valentin L, Van Holsbeke C, Zhang J, Jurkovic D, Lissoni AA, Testa AC, Czekierdowski A, Fischerova D, Domali E, Van de Putte G, Vergote I, Van Huffel S, Bourne T, Timmerman D. A novel approach to predict the likelihood of specific ovarian tumor pathology based on serum CA‐125: a multicenter observational study. Cancer Epidemiol Biomarkers Prev 2011; 20: 2420–2428. [DOI] [PubMed] [Google Scholar]
  • 19. DeLong ER, DeLong DM, Clarke‐Pearson DL. Comparing the areas under two or more correlated receiver operating characteristics curves: a nonparametric approach. Biometrics 1988; 44: 837–845. [PubMed] [Google Scholar]
  • 20. Fischerova D, Zikan M, Dundr P, Cibula D. Diagnosis, treatment, and follow‐up of borderline ovarian tumors. Oncologist 2012; 17: 1515–1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Vergote I, De Brabanter J, Fyles A, Bertelsen K, Einhorn N, Sevelda P, Gore ME, Kaern J, Verrelst H, Sjovall K, Timmerman D, Vandewalle J, Van Gramberen M, Trope CG. Prognostic importance of degree of differentiation and cyst rupture in stage I invasive epithelial ovarian carcinoma. Lancet 2001; 357: 176–182. [DOI] [PubMed] [Google Scholar]
  • 22. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, Bossuyt PM. Empirical evidence of design‐related bias in studies of diagnostic tests. JAMA 1999; 282: 1061–1066. [DOI] [PubMed] [Google Scholar]
  • 23. Han SN, Lotgerink A, Gziri MM, Van Calsteren K, Hanssens M, Amant F. Physiologic variations of serum tumor markers in gynecological malignancies during pregnancy: a systematic review. BMC Med 2012; 10: 86. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1 List of prospectively collected ultrasound features (per adnexal mass)

Table S2 Optimal cut‐off and corresponding performance indices for models with a cut‐off value at a fixed sensitivity of 90%

Tables S3 and S4 Pairwise receiver–operating characteristics (ROC) curve comparisons expressed as differences in the area under the curve (AUC) and P‐values calculated for premenopausal (Table S3) and postmenopausal (Table S4) patients


Articles from Ultrasound in Obstetrics & Gynecology are provided here courtesy of Wiley

RESOURCES