Skip to main content
. 2021 Oct 22;2021(4):hoab025. doi: 10.1093/hropen/hoab025

Table II.

Overview of replication, validation and clinical value of published systems.

Diagnosis/pre-operative assessment Description Staging Treatment selection Prediction of difficulty of surgery/complication Prediction of pain remediation/QoL Prediction of conception Feasibility Interobserver agreement Aim of the study Sample size Age mean (range or SD) Population source Endometriosis case definition Main results Reference
ENZIAN
  • +

  • length of hospital stays

Correlation with Clavien-Dindo complication grading 401 34.8 years (SD 8.73) Single center Histologic confirmation ENZIAN A2, C1, C3 and FA were risk factors for the length of hospital stay. Nicolaus et al. (2020)

+
  • ENZIAN (MRI)

  • Correlation with intraoperative findings

63 33.5 years (22–49) 2 centers Surgical confirmation Sensitivity and NPV of MRI confirmed by surgery were 95.2% and 91.7% (lesions in the vaginal/rectovaginal space), 78.4% and 56% (utero-sacral ligaments), 91.4% and 89.7% (rectum/sigmoid colon), 57.1% and 94.1% (myometrium), 85.7% and 98.3% (bladder), and 73.3% and 92.2% (intestine), respectively. Burla et al. (2019)

No analysis Application of the rENZIAN system 60 30.5 years (28.6–32.3) Single center Laparoscopic diagnosis Medial compartment was found as the most affected one in 80% of the cases (mainly ovarian endometriomas), followed by posterior compartment in 65% and less frequent, anterior compartment. Morgan-Ortiz et al. (2018)

+
  • ENZIAN (MRI)

  • Accuracy of the score compared to surgical-pathologic findings

115 36 years (20.3–48) Single center Histologic confirmation The sensitivity, specificity, accuracy, PPV and NPV of MRI were 94%, 97%, 95%, 99%, 86%, respectively. The highest accuracy was for adenomyosis (100%) and endometriosis of utero-sacral ligaments (98%), slightly lower for vagina-rectovaginal septum and colorectal walls (96%), and the lowest for bladder endometriosis (92%). The concordance with histopathology was excellent. Di Paola et al. (2015)
ENZIAN
  • +

  • Operating time

Preoperative estimation of laparoscopic operating time 151
  • 31 years

  • (19–53 years)

Single center Histologic confirmation An ENZIAN-based model for estimating operating time for DE, assuming complication-free procedures (model’s predictive power: P < 0.001). The error of estimation for the operating time prediction is 0 ± 35.35 min (range 83 to +117 min). Haas et al. (2013a)

+ Identification of duplicate classifications of the same lesions 219 Not reported Single center Histologic confirmation Comparison to rAFS: The severity of DE according to ENZIAN was as follows: grade 1: 45%; grade 2: 26%; grade 3: 19%; grade 4: 10%. Fifty-eight patients were classified according to ENZIAN although they did not fulfill the criteria of DE and had previously been classified according to the rAFS classification. Adaptation of the ENZIAN score would reduce the diagnoses of DE by 36% (95% CI: 29–44%). Haas et al. (2011)
UBESS
+ Correlation with RCOG laparoscopic level of complexity 1, 2, and 3 293 Not reported Multi-center history of chronic pelvic pain, or endometriosis.
  • Strengths: UBESS predicted the requirement for RCOG level 3 laparoscopic surgical skills (accuracy, 89.4–95.4%).

  • Limitation: misclassification of women who require surgical ureterolysis in the absence of bowel DE.

Espada et al. (2020)

Correlation with the difficulty of the surgery 33 32.8 years (SD 7.7) Single center Histologic confirmation Weak concordance between pre-operative UBESS score and the difficulty of the surgery (RCOG, concordance Kendall Tau 0.22) and between UBESS and CHI (concordance 0.30). Chaabane et al. (2019)
UBESS
  • +

  • surgical skills

Validation for predicting the correct RANZCOG/AGES’ laparoscopic skill level. 155 32.7 years (SD 8.6) Multi-center history of chronic pelvic pain and/or endometriosis The accuracy, sensitivity, specificity, PPV and NPV, and positive and negative likelihood ratios of the UBESS I to predict the RANZCOG/AGES surgical skill levels 1/2 were 99.4%, 98.9%, 100%, 100%, 98.5%, not applicable, and 0.011; those of UBESS II to predict surgical skill levels 3/4 were: 98.1%, 96.8%, 98.4%, 93.8%, 99.2%, 60 and 0.033, and those for UBESS III to predict surgical skill level 6 were: 98.7%, 97.2%, 99.2%, 97.2%, 99.2%, 115.7, and 0.028. The rate of correctly predicting the exact level of skills needed was 98.1%, and Cohen’s kappa statistic for the agreement between UBESS prediction and levels of training required at surgery was 0.97, indicating almost perfect agreement. Tompsett et al. (2019)
EFI
+ Accuracy for the prediction of non-ART pregnancy 4598 NA NA Cumulative non-ART pregnancy rate at 36 months increased from 10% (95% CI: 3, 16%) in women with EFI score 0–2 to 69% (95% CI: 58, 79%) in women with EFI score 9–10, with a significant increase for each score category (0–2, 3–4, 5–6, 7–8, 9–10)

Meta-analysis

Vesaliet al. (2020)

EFI
Acceptable Reproducibility among three experts 82 Reproductive age as inclusion criterium Single-center Surgical confirmation A near ‘inter-expert’ clinical agreement rate (1.000, 95% CI 0.956–1.000; P = 0.0149) was observed. The numerical agreement between two experts was also high (0.988, 95% CI 0.934–1.000); similarly, high agreement rates were observed for both ‘junior-expert’ comparisons (clinical 0.963, 95% CI 0.897–0.992; numerical 0.988, 95% CI 0.934–1.000) and ‘intra-expert’ comparisons (clinical 0.988, 95% CI 0.934–1.000; numerical 1.000, 95% CI 0.956–1.000). Tomassetti et al. (2020)

  • +

  • Conception rate

Accuracy for the prediction of pregnancy 123 32.4 years (no range) Single-center Surgical confirmation 8 (40%) patients with low, 20 (58.82%) with moderate, and 26 (96.29%) with high EFI conceived. EFI score showed statistically significant positive correlation with pregnancy outcome P = 0.001. Patients conceived spontaneously, after ovulation induction (±IUI) or after IVF. Negi et al. (2019)

+ Accuracy for the prediction of non-ART pregnancy in recurrent endometriosis 107 31.1 years (SD 0.39) Single-center Surgical confirmation—recurrent endometriosis Cumulative pregnancy rates (CPR) during the first 2 years were 51.86% in women with EFI ≥5, and 26.00% in women with EFI <5. At 3- and 5-year post-surgery, the CPR increased further in women with EFI ≥5, but not in women with EFI <5. The EFI score had good predictive power for postoperative pregnancy in women with recurrent endometriosis. Zhou et al. (2019)
EFI
+ Accuracy for the prediction of non-ART pregnancy 68 XX Single-center Surgical confirmation The mean EFI scores of 68 women who were not pregnant and pregnant were 5.43 ± 0.36 and 6.88 ± 0.28, respectively. The relation between EFI and natural pregnancy was significant (cumulative overall PR, p = 0.006), whereas rAFS stage was not (univariate logistics, P = 0.853). The cut-point for maximum natural pregnancy outcomes was 6 (area under ROC curve = 0.710, 95% CI 0.586–0.835). Kim et al. (2019) *

+ Accuracy for the prediction of non-ART pregnancy 1097 29.8 years (20–46) Single-center Surgical confirmation The difference in cumulative pregnancy incidence among EFI scores 10, 7–9, 4–6, and 2–3 was statistically significant (Kaplan–Meier survival analysis). A significant relationship was found between EFI and time to achieving pregnancy. Zhang et al. (2018) *

+ Accuracy for the prediction of non-ART pregnancy 235 34 years (20–47) 2 centers Histologic confirmation The EFI was highly associated with live births (P < 0.001): for EFI of 0–2, the estimated cumulative non-ART LBR at 5 years was 0% and steadily increased up to 91% with an EFI of 9–10, while the proportion of women who attempted ART and had a live birth, steadily increased from 38% to 71% among the same EFI strata (P = 0.1). A low least function score was the most significant predictor of failure, followed by having had a previous resection or incomplete resection, being older than 40 compared to <35 years, and having leiomyomas. Maheux-Lacroix et al. (2017) *
EFI
+ Accuracy for the prediction of non-ART pregnancy and ART pregnancy 196 32.3 years (SD 4.8) Single center Surgical confirmation The cumulative PR was 76%. The PR, non-ART PR and ART PR for EFI ≤4 were 42.3%, 0% and 50%; for EFI 5–6, 67.9%, 30.5% and 60.6%; and for EFI ≥7, 87.7%48.2% and 80.3%, respectively. The benefit of ART was inversely correlated with the mean EFI score. On multivariate analysis, the EFI score was significantly associated with non-ART pregnancy (OR 1.629, 95% CI 1.235–2.150). Boujenah et al. (2017) *

  • +

  • ART outcome

+ Accuracy for the prediction of non-ART pregnancy + use for treatment selection (Surgery vs surgery + IVF-ET) 345 32.2 years (22.0–45.0) Single center Histologic confirmation Significant differences in spontaneous PRs among different EFI scores were identified (chi2 = 29.945, P < 0.05). The least function score was proved to be the most important factor for EFI. In patients with an EFI score ≥5 after 12 months from surgery, the cumulative PRs of those who received both surgery and IVF-ET were much higher than the spontaneous PRs of those who received surgery alone (chi2 = 4.16, ns). Li et al. (2017) *

  • +

  • ART outcome

+ Accuracy for the prediction of non-ART pregnancy + use for treatment selection (Surgery vs surgery + IVF-ET) 412 32.5 years (SD 4.6) Single center Histologic confirmation A significant relationship between EFI and spontaneous PR was observed at 12 months (P = 0.001). The least function score and complete removal of endometriotic lesions and pelvic adhesions were significantly associated with spontaneous pregnancy (P = 0.006). Cumulative PR at 18 months was 78.8%. ART benefits were higher for patients with poor EFI. Boujenah et al. (2015) *
EFI
  • +

  • non-ART/ART outcome

Accuracy for the prediction of non-ART pregnancy and ART pregnancy 104 34.5 years (SD 4.5) Single center Surgical confirmation Differences in time to non-ART pregnancy for the six EFI groups were statistically significant (log-rank, P = 1.4 × 10(4)). The AUC for EFI as ART outcome predictor was 0.75 (95% CI 0.61–0.89, P = 6.2 × 10(3)), while the best cut-point for pregnancy was 5.5. Garavaglia et al. (2015) *

+ Accuracy for the prediction of non-ART pregnancy 161 32.08 years (22–40) Single center Surgical confirmation Comparison to rAFS: Significant differences in cumulative PRs were observed among EFI scores (EFI score 0–3, 8.3%; EFI score 4–7 41.2%, and EFI score 8–10 60.9%; chi2 = 16.254, P < 0.001). EFI scores, but not rAFS stage, predict PRs in patients with endometriosis-associated infertility. Zeng et al. (2014) *

  • +

  • ART outcome

Ability of the EFI score and rAFS classification for predicting IVF outcomes 199 32.0 years (SD 4.2) Single center Histologic confirmation Comparison to rAFS: The AUC of the EFI score (AUC = 0.641, standard error (SE) = 0.039, 95% CI = 0.564–0.717, cutoff score = 6) was significantly larger than that of the r-AFS classification (AUC = 0.445, SE = 0.041, and 95% CI = 0.364–0.526). The antral follicle count, estradiol level on day of hCG, number of oocytes retrieved, number of oocytes fertilised, number of cleaved embryos, implantation rate, CPR, and cumulative pregnancy rate were greater in the ≥6 EFI score group compared to the ≤5 EFI score group. EFI has more predictive power for IVF outcomes than r-AFS. Wang et al. (2013)
EFI
+ Accuracy for the prediction of non-ART pregnancy 233 31.3 years (SD 3.9) Single center Surgical confirmation Highly significant relationship between EFI and the time to non-ART pregnancy (P = 0.0004), with the K-M estimate of cumulative overall PR at 12 months after surgery equal to 45.5% (95% CI 39.47–49.87) ranging from 16.67% (95% CI 5.01–47.65) for EFI scores 0–3, to 62.55% (95% CI 55.18–69.94) for EFI scores 9–10. For each increase of 1 point in the EFI score, the relative risk of becoming pregnant increased by 31% (95% CI 16–47%; i.e. HR 1.31). The ‘least function score’ was found to be the most important contributor to the total EFI score. Tomassetti et al. (2013) *
ECO system
+ Validation 166 34.0 years (SD 7.2) 2 centers Histologic confirmation Among patients, 78 (47.0%) were medically treated and 88 (53.0%) underwent therapeutic laparoscopy. All three patients scoring two had undergone hormonal treatment. Among 51 patients scoring 3, 49 (96.1%) were clinically managed and 2 (3.9%) underwent surgery. Among 52 patients scoring 4, 26 (50.0%) had undergone medical treatment and 26 (50.0%) surgical treatment. All 56 patients who scored 5 and the 4 patients who scored 6 underwent surgery. Lasmar et al. (2015)
rASRM/rAFS/AFS
Accuracy for the prediction of non-ART pregnancy 161 32.08 years (22–40) Single center Surgical confirmation Comparison to EFI: The cumulative PR 36 months after surgery was 46.6% (stage I, 53.6%; stage II, 36.0%; stage III, 51.7%, and stage IV, 41.7%; chi2 = 4.143, P = 0.246). In the 1st year, PRs significantly differed between patients with rAFS stage IV and those with stages I–III (chi2 = 6.024, P = 0.014). rAFS stage did not predict PR in patients with endometriosis-associated infertility. Zeng et al. (2014)

  • ART

Ability of rAFS (vs EFI) to predict IVF outcomes 199 32.0 years (SD 4.2) Single center Histologic confirmation Comparison to EFI: The AUC of the EFI score was significantly larger than that of the r-AFS classification (AUC = 0.445, SE = 0.041, and 95% CI = 0.364–0.526). Wang et al. (2013)

+ Correlation with Clavien-Dindo complication grading 401 34.8 years (SD 8.73) Single center Histologic confirmation rASRM IV was a risk factors for the length of hospital stay. Clavien-Dindo Grade III complications were significantly associated with rASRM stage IV. Nicolaus et al. (2020)
rASRM/rAFS/AFS
Acceptable Inter-observer agreement 148 32.0 years (SD 6.7) Single center 105 women with and 43 women without a postoperative endometriosis Surgeons and expert reviewers had substantial agreement on diagnosis and staging after viewing digital images (n = 148; mean j = 0.67, range 0.61–0.69; mean j = 0.64, range 0.53–0.78, respectively) and after additionally viewing operative reports (n = 148; mean j = 0.88, range 0.85–0.89; mean j = 0.85, range 0.84–0.86, respectively). Although additionally viewing MRI findings (n = 36) did not greatly impact agreement, agreement substantially decreased after viewing histological findings (n = 67), with expert reviewers changing their assessment from a positive to a negative diagnosis in up to 20% of cases. Schliep et al. (2017)

+ Prognostic value of individual adhesion scores for recurrence 379 31.8 years (SD 6.7) Single center histologic confirmation In endometriosis of advanced stage, younger age at the time of surgery, bilateral ovarian cysts at the time of diagnosis, a rAFS ovarian adhesion score >24, and complete cul-de-sac obliteration were independent risk factors of poor outcomes, and a rAFS ovarian adhesion score >24 had the highest risk of recurrence [hazard ratio = 2.948 (95% CI: 1.116–7.789), P = 0.029]. Yun et al. (2015)
rASRM/rAFS/AFS
  • ns

  • AFC, FSH

Correlation with the number of follicles, the level of FSH 39 28.7 years (22–34) Single center Surgical confirmation No statistically significant correlation between the AFC, the level of FSH and the stage of endometriosis was found. Posadzka et al. (2014)

+

ART outcome

Prediction of IVF outcome 40 34.7 years (SD 4.3) Single center Surgical confirmation Higher cancelation rates, higher total gonadotropin requirements, and lower oocyte yield were found in women with endometriosis Stage III and IV compared with both the Stage I/II and control groups. The fertilization rate was higher in Stage III/IV endometriosis compared to Stage I/II. CPR and LBR were comparable between patients with endometriosis Stage I/II and control group, whereas they were significantly lower in patients with endometriosis Stage III/IV compared to other two groups. Pop-Trajkovic et al. (2014)

  • ART outcome

Correlation of rASRM stage with outcome ART treatment 1764 (11) Not applicable Not applicable Not reported Comparison of women with Stage-III/IV vs Stage-I/II endometriosis: LBR, RR = 0.94 (95% CI, 0.80–1.11); CPR, RR = 0.90 (95% CI, 0.82–1.00); miscarriage, RR = 0.99 (95% CI, 0.73–1.36); number of oocytes retrieved, MD = 1.03 (95% CI, 1.67 to 0.39). No relevant difference between Stage-III/IV and Stage-I/II in LBR following ART.

Meta-analysis

Barbosa et al. (2014)

rASRM/rAFS/AFS
Acceptable (surgeons) Interrater and intrarater reliability (8 experts) 148 Not reported Single center Not reported The interrater reliability for endometriosis diagnosis among the 8 surgeons was substantial: Fleiss kappa = 0.69 (95% CI 0.64–0.74). Surgeons agreed on revised ASRM endometriosis staging criteria after experienced assessment in a majority of cases (mean 61%, range 52–75%) with moderate interrater reliability: Fleiss kappa = 0.44 (95% CI 0.41–0.47). Schliep et al. (2012)

Correlation with symptoms 319 Age categories reported Single center Surgical confirmation, histologic confirmation in 72.9% A correlation between endometriosis stage and severity of symptoms was observed only for dysmenorrhea (chi2 = 5.14, P = 0.02) and non-menstrual pain (chi2 = 5.63, P = 0.018). However, the point estimates of ORs were very close to unity (respectively, 1.33, 95% CI 1.04–1.71, and 1.01, 95% CI 1.00–1.03). The association between endometriosis stage and severity of pelvic symptoms was marginal and inconsistent. Vercellini et al. (2007)
rASRM/rAFS/AFS
  • pain recur rence

  • + relapse

  • pregnancy

Predictive value for response to surgical treatment 537 Age categories reported Single center Histologic confirmation The cumulative probability of pregnancy at 3 years from surgery was 47% (51% at stage I, 45% at stage II, 46% at stage III and 44% at stage IV; chi2 = 1.50, ns). The cumulative probability of moderate or severe dysmenorrhoea recurrence in 425 symptomatic subjects was 24% (32% at stage I, 24% at stage II, 21% at stage III and 19% at stage IV; chi2 = 6.39, ns). The cumulative probability of disease relapse was 12% (3% at stage I, 11% at stage II, 11% at stage III and 23% at stage IV; chi2 = 24.95, P = 0.0001). Vercellini et al. (2006)

+ +
  • +

  • pain

Association with type and severity of pain, and with symptoms after laparoscopic surgery 95 Not reported Single center Surgical confirmation In patients with AFS ≥16; preoperative pain scores were significantly higher for dysmenorrhea (P = 0.0022) and deep dyspareunia (P < 0.0001) but not for non-menstrual pain. After surgery, dysmenorrhea improved in 43% of cases in patients with AFS <16 vs 66% with AFS ≥16 (P = 0.0037). For deep dyspareunia, improvement was reported by 33% and 67%, respectively (ns). Improvement in non-menstrual pain was not significantly different (67% vs 56%). Cases with advanced disease benefit the most from laparoscopy. Milingos et al. (2006)
rASRM/rAFS/AFS
Impact of treatments on pain + association pain scores 181 Not reported Single center Histologic confirmation No correlation was found between the stage of endometriosis according to R-AFS score and the severity of CPP. Szendei et al. (2005)

Variable Comparison of laparoscopic and laparotomic scoring 84 Not reported Single center Surgical confirmation There was considerable variability in laparoscopic vs laparotomic scoring by the same observer, with largest variability in ovarian endometriosis and cul-de-sac obliteration subscores, and least variability for peritoneum endometriosis. The inter-method variation was sufficient to alter the staging in 34.5% of patients, with a difference of two stages in 3.6% of patients. In general, there was fair-to-good agreement (kappa coefficient 0.49). Lin et al. (1998)

  • ART pregnancy

Impact of severity of endometriosis on the outcome of IVF 61
  • 33.6 years (SD 3.0) and

  • 34.4 years (SD 4.0)

Single center Surgical confirmation Response to COH and the number, maturity, and quality of the oocytes was comparable between stages. Fertilization rates for oocytes of patients with stages III/IV were significantly impaired compared to those in stage I/II (P = 0.004). The implantation rate, CPR, and miscarriage rate were comparable between stages I/II and stages III/IV. Pal et al. (1998)
rASRM/rAFS/AFS
Variable Intraobserver and interobserver variability—5 experts 20 Not reported Single center Not reported The grand total score varied with an SD of 13.44 when the videotape of a single patient was rated twice by the same observer and varied with an SD of 17.12 when rated by two observers. The greatest variability occurred in endometriosis of the ovary and cul-de-sac obliteration, with less variability for peritoneum endometriosis and for ovarian and tubal adhesions. Comparison of intraobserver and interobserver scores resulted in a change in endometriosis stage in 38% and 52% of patients, respectively. Hornstein et al. (1993)

Acceptable Reproducibility—2 experts 315 Not reported Single center Not reported Good to fair agreement scoring endometriosis between the investigator and the blinded reviewer was noted. Rock (1995)

No analysis Feasibility of AFS and adnexal score 89 Not reported Single center Surgical confirmation Suggestion to split class IV in class IV and class V (with higher rate of bilateral adnexal disease/adhesions). Canis et al. (1992)

  • age, symptoms

Relation with endometriosis-associated symptoms and patients’ age. 206 30 years (18–44) Single center Surgical confirmation

No significant differences were found in total endometriosis scores, in active scores or in adhesion scores in different age groups. There was no significant difference in prevalence rate of symptoms for different aspects of endometriosis (implants, cysts or adhesions).

AFS score does not reflect the intensity of symptoms.

Marana et al. (1991)
rASRM/rAFS/AFS
Variable Feasibility of measuring endometrioma 52 29.5 years (24–39) Single center NA Cyst diameter was calculated using the geometric formula radius = 3 square root of 3 V/4 pi where V = volume of liquid aspirated. Eight patients with apparently normal pelvis had endometriosis, and 14 with apparent minimal or mild endometriotic lesions were restaged. Laparoscopic ovarian puncture of enlarged ovaries was important for correct diagnosis and staging of endometriosis. Candiani et al. (1990)

  • pregnancy

Relation with pregnancy after therapy 214 Not reported Single center Surgical confirmation The AFS scale poorly specifies the relation between severity of disease and pregnancy outcome after therapy. A nonparametric monotonic estimator, generating a relationship between AFS score and pregnancy following treatment is shown to improve the discriminatory power of the AFS scale. Guzick et al. (1982)

  • +

  • pregnancy

(+ Kistner, Buttram) Prediction of pregnancy 214 28.6 years (17–37) Single center Surgical confirmation The AFS score revealed significant differences in pregnancy rate only if categories were combined (mild plus moderate versus severe plus extensive, P ≤ 0.05). The AFS system revealed that pregnancy success was significantly reduced if an ovarian endometrioma was greater than 3 cm or had ruptured (P ≤ 0.01). Rock et al. (1981)

The symbols should be interpreted as follows; + indicates a significant positive result in a correlation (or similar) test, indicates a significant negative result in a correlation (or similar) test, ns indicates a non-conclusive/non-significant result in a correlation (or similar) test. The highlighted columns represent the intended purpose of the classification/staging system (as in Table I).

AFC, antral follicle count; AFS, American Fertility Society; AGES, Australasian Gynaecological Endoscopy and Surgery; COH, controlled ovarian hyperstimulation; CPP, chronic pelvic pain; CPR, clinical pregnancy rate; DE, deep endometriosis; EFI: endometriosis fertility index; HR, hazard ratio; IVF-ET, IVF embryo transfer; K-M, Kaplan–Meier; LBR, live birth rate; NPV, negative predictive value; PPV, positive predictive value; RANZCOG, Royal Australian and New Zealand College of Obstetricians and Gynaecologists; RR, relative risk.

*

Study included in meta-analysis (Vesali et al., 2020).