. 2021 Oct 22;2021(4):hoab025. doi: 10.1093/hropen/hoab025

Table II.

Overview of replication, validation and clinical value of published systems.

Diagnosis/pre-operative assessment	Description	Treatment selection	Prediction of difficulty of surgery/complication	Prediction of pain remediation/QoL	Prediction of conception	Feasibility	Interobserver agreement	Aim of the study	Sample size	Age mean (range or SD)	Population source	Endometriosis case definition	Main results	Reference

ENZIAN
			+ length of hospital stays					Correlation with Clavien-Dindo complication grading	401	34.8 years (SD 8.73)	Single center	Histologic confirmation	ENZIAN A2, C1, C3 and FA were risk factors for the length of hospital stay.	Nicolaus et al. (2020)

+								ENZIAN (MRI) Correlation with intraoperative findings	63	33.5 years (22–49)	2 centers	Surgical confirmation	Sensitivity and NPV of MRI confirmed by surgery were 95.2% and 91.7% (lesions in the vaginal/rectovaginal space), 78.4% and 56% (utero-sacral ligaments), 91.4% and 89.7% (rectum/sigmoid colon), 57.1% and 94.1% (myometrium), 85.7% and 98.3% (bladder), and 73.3% and 92.2% (intestine), respectively.	Burla et al. (2019)

	No analysis							Application of the rENZIAN system	60	30.5 years (28.6–32.3)	Single center	Laparoscopic diagnosis	Medial compartment was found as the most affected one in 80% of the cases (mainly ovarian endometriomas), followed by posterior compartment in 65% and less frequent, anterior compartment.	Morgan-Ortiz et al. (2018)

+								ENZIAN (MRI) Accuracy of the score compared to surgical-pathologic findings	115	36 years (20.3–48)	Single center	Histologic confirmation	The sensitivity, specificity, accuracy, PPV and NPV of MRI were 94%, 97%, 95%, 99%, 86%, respectively. The highest accuracy was for adenomyosis (100%) and endometriosis of utero-sacral ligaments (98%), slightly lower for vagina-rectovaginal septum and colorectal walls (96%), and the lowest for bladder endometriosis (92%). The concordance with histopathology was excellent.	Di Paola et al. (2015)
ENZIAN
			+ Operating time					Preoperative estimation of laparoscopic operating time	151	31 years (19–53 years)	Single center	Histologic confirmation	An ENZIAN-based model for estimating operating time for DE, assuming complication-free procedures (model’s predictive power: P < 0.001). The error of estimation for the operating time prediction is 0 ± 35.35 min (range −83 to +117 min).	Haas et al. (2013a)

	+							Identification of duplicate classifications of the same lesions	219	Not reported	Single center	Histologic confirmation	Comparison to rAFS: The severity of DE according to ENZIAN was as follows: grade 1: 45%; grade 2: 26%; grade 3: 19%; grade 4: 10%. Fifty-eight patients were classified according to ENZIAN although they did not fulfill the criteria of DE and had previously been classified according to the rAFS classification. Adaptation of the ENZIAN score would reduce the diagnoses of DE by 36% (95% CI: 29–44%).	Haas et al. (2011)
UBESS
			+					Correlation with RCOG laparoscopic level of complexity 1, 2, and 3	293	Not reported	Multi-center	history of chronic pelvic pain, or endometriosis.	Strengths: UBESS predicted the requirement for RCOG level 3 laparoscopic surgical skills (accuracy, 89.4–95.4%). Limitation: misclassification of women who require surgical ureterolysis in the absence of bowel DE.	Espada et al. (2020)

			−					Correlation with the difficulty of the surgery	33	32.8 years (SD 7.7)	Single center	Histologic confirmation	Weak concordance between pre-operative UBESS score and the difficulty of the surgery (RCOG, concordance Kendall Tau 0.22) and between UBESS and CHI (concordance 0.30).	Chaabane et al. (2019)
UBESS
			+ surgical skills					Validation for predicting the correct RANZCOG/AGES’ laparoscopic skill level.	155	32.7 years (SD 8.6)	Multi-center	history of chronic pelvic pain and/or endometriosis	The accuracy, sensitivity, specificity, PPV and NPV, and positive and negative likelihood ratios of the UBESS I to predict the RANZCOG/AGES surgical skill levels 1/2 were 99.4%, 98.9%, 100%, 100%, 98.5%, not applicable, and 0.011; those of UBESS II to predict surgical skill levels 3/4 were: 98.1%, 96.8%, 98.4%, 93.8%, 99.2%, 60 and 0.033, and those for UBESS III to predict surgical skill level 6 were: 98.7%, 97.2%, 99.2%, 97.2%, 99.2%, 115.7, and 0.028. The rate of correctly predicting the exact level of skills needed was 98.1%, and Cohen’s kappa statistic for the agreement between UBESS prediction and levels of training required at surgery was 0.97, indicating almost perfect agreement.	Tompsett et al. (2019)
EFI
					+			Accuracy for the prediction of non-ART pregnancy	4598	NA	NA		Cumulative non-ART pregnancy rate at 36 months increased from 10% (95% CI: 3, 16%) in women with EFI score 0–2 to 69% (95% CI: 58, 79%) in women with EFI score 9–10, with a significant increase for each score category (0–2, 3–4, 5–6, 7–8, 9–10)	Meta-analysis Vesaliet al. (2020)
EFI
							Acceptable	Reproducibility among three experts	82	Reproductive age as inclusion criterium	Single-center	Surgical confirmation	A near ‘inter-expert’ clinical agreement rate (1.000, 95% CI 0.956–1.000; P = 0.0149) was observed. The numerical agreement between two experts was also high (0.988, 95% CI 0.934–1.000); similarly, high agreement rates were observed for both ‘junior-expert’ comparisons (clinical 0.963, 95% CI 0.897–0.992; numerical 0.988, 95% CI 0.934–1.000) and ‘intra-expert’ comparisons (clinical 0.988, 95% CI 0.934–1.000; numerical 1.000, 95% CI 0.956–1.000).	Tomassetti et al. (2020)

					+ Conception rate			Accuracy for the prediction of pregnancy	123	32.4 years (no range)	Single-center	Surgical confirmation	8 (40%) patients with low, 20 (58.82%) with moderate, and 26 (96.29%) with high EFI conceived. EFI score showed statistically significant positive correlation with pregnancy outcome P = 0.001. Patients conceived spontaneously, after ovulation induction (±IUI) or after IVF.	Negi et al. (2019)

					+			Accuracy for the prediction of non-ART pregnancy in recurrent endometriosis	107	31.1 years (SD 0.39)	Single-center	Surgical confirmation—recurrent endometriosis	Cumulative pregnancy rates (CPR) during the first 2 years were 51.86% in women with EFI ≥5, and 26.00% in women with EFI <5. At 3- and 5-year post-surgery, the CPR increased further in women with EFI ≥5, but not in women with EFI <5. The EFI score had good predictive power for postoperative pregnancy in women with recurrent endometriosis.	Zhou et al. (2019)
EFI
					+			Accuracy for the prediction of non-ART pregnancy	68	XX	Single-center	Surgical confirmation	The mean EFI scores of 68 women who were not pregnant and pregnant were 5.43 ± 0.36 and 6.88 ± 0.28, respectively. The relation between EFI and natural pregnancy was significant (cumulative overall PR, p = 0.006), whereas rAFS stage was not (univariate logistics, P = 0.853). The cut-point for maximum natural pregnancy outcomes was 6 (area under ROC curve = 0.710, 95% CI 0.586–0.835).	Kim et al. (2019) ^*

					+			Accuracy for the prediction of non-ART pregnancy	1097	29.8 years (20–46)	Single-center	Surgical confirmation	The difference in cumulative pregnancy incidence among EFI scores 10, 7–9, 4–6, and 2–3 was statistically significant (Kaplan–Meier survival analysis). A significant relationship was found between EFI and time to achieving pregnancy.	Zhang et al. (2018) ^*

					+			Accuracy for the prediction of non-ART pregnancy	235	34 years (20–47)	2 centers	Histologic confirmation	The EFI was highly associated with live births (P < 0.001): for EFI of 0–2, the estimated cumulative non-ART LBR at 5 years was 0% and steadily increased up to 91% with an EFI of 9–10, while the proportion of women who attempted ART and had a live birth, steadily increased from 38% to 71% among the same EFI strata (P = 0.1). A low least function score was the most significant predictor of failure, followed by having had a previous resection or incomplete resection, being older than 40 compared to <35 years, and having leiomyomas.	Maheux-Lacroix et al. (2017) ^*
EFI
					+			Accuracy for the prediction of non-ART pregnancy and ART pregnancy	196	32.3 years (SD 4.8)	Single center	Surgical confirmation	The cumulative PR was 76%. The PR, non-ART PR and ART PR for EFI ≤4 were 42.3%, 0% and 50%; for EFI 5–6, 67.9%, 30.5% and 60.6%; and for EFI ≥7, 87.7%48.2% and 80.3%, respectively. The benefit of ART was inversely correlated with the mean EFI score. On multivariate analysis, the EFI score was significantly associated with non-ART pregnancy (OR 1.629, 95% CI 1.235–2.150).	Boujenah et al. (2017) ^*

		+ ART outcome			+			Accuracy for the prediction of non-ART pregnancy + use for treatment selection (Surgery vs surgery + IVF-ET)	345	32.2 years (22.0–45.0)	Single center	Histologic confirmation	Significant differences in spontaneous PRs among different EFI scores were identified (chi2 = 29.945, P < 0.05). The least function score was proved to be the most important factor for EFI. In patients with an EFI score ≥5 after 12 months from surgery, the cumulative PRs of those who received both surgery and IVF-ET were much higher than the spontaneous PRs of those who received surgery alone (chi² = 4.16, ns).	Li et al. (2017) ^*

		+ ART outcome			+			Accuracy for the prediction of non-ART pregnancy + use for treatment selection (Surgery vs surgery + IVF-ET)	412	32.5 years (SD 4.6)	Single center	Histologic confirmation	A significant relationship between EFI and spontaneous PR was observed at 12 months (P = 0.001). The least function score and complete removal of endometriotic lesions and pelvic adhesions were significantly associated with spontaneous pregnancy (P = 0.006). Cumulative PR at 18 months was 78.8%. ART benefits were higher for patients with poor EFI.	Boujenah et al. (2015) ^*
EFI
					+ non-ART/ART outcome			Accuracy for the prediction of non-ART pregnancy and ART pregnancy	104	34.5 years (SD 4.5)	Single center	Surgical confirmation	Differences in time to non-ART pregnancy for the six EFI groups were statistically significant (log-rank, P = 1.4 × 10(−4)). The AUC for EFI as ART outcome predictor was 0.75 (95% CI 0.61–0.89, P = 6.2 × 10(−3)), while the best cut-point for pregnancy was 5.5.	Garavaglia et al. (2015) ^*

					+			Accuracy for the prediction of non-ART pregnancy	161	32.08 years (22–40)	Single center	Surgical confirmation	Comparison to rAFS: Significant differences in cumulative PRs were observed among EFI scores (EFI score 0–3, 8.3%; EFI score 4–7 41.2%, and EFI score 8–10 60.9%; chi² = 16.254, P < 0.001). EFI scores, but not rAFS stage, predict PRs in patients with endometriosis-associated infertility.	Zeng et al. (2014) ^*

					+ ART outcome			Ability of the EFI score and rAFS classification for predicting IVF outcomes	199	32.0 years (SD 4.2)	Single center	Histologic confirmation	Comparison to rAFS: The AUC of the EFI score (AUC = 0.641, standard error (SE) = 0.039, 95% CI = 0.564–0.717, cutoff score = 6) was significantly larger than that of the r-AFS classification (AUC = 0.445, SE = 0.041, and 95% CI = 0.364–0.526). The antral follicle count, estradiol level on day of hCG, number of oocytes retrieved, number of oocytes fertilised, number of cleaved embryos, implantation rate, CPR, and cumulative pregnancy rate were greater in the ≥6 EFI score group compared to the ≤5 EFI score group. EFI has more predictive power for IVF outcomes than r-AFS.	Wang et al. (2013)
EFI
					+			Accuracy for the prediction of non-ART pregnancy	233	31.3 years (SD 3.9)	Single center	Surgical confirmation	Highly significant relationship between EFI and the time to non-ART pregnancy (P = 0.0004), with the K-M estimate of cumulative overall PR at 12 months after surgery equal to 45.5% (95% CI 39.47–49.87) ranging from 16.67% (95% CI 5.01–47.65) for EFI scores 0–3, to 62.55% (95% CI 55.18–69.94) for EFI scores 9–10. For each increase of 1 point in the EFI score, the relative risk of becoming pregnant increased by 31% (95% CI 16–47%; i.e. HR 1.31). The ‘least function score’ was found to be the most important contributor to the total EFI score.	Tomassetti et al. (2013) ^*
ECO system
		+						Validation	166	34.0 years (SD 7.2)	2 centers	Histologic confirmation	Among patients, 78 (47.0%) were medically treated and 88 (53.0%) underwent therapeutic laparoscopy. All three patients scoring two had undergone hormonal treatment. Among 51 patients scoring 3, 49 (96.1%) were clinically managed and 2 (3.9%) underwent surgery. Among 52 patients scoring 4, 26 (50.0%) had undergone medical treatment and 26 (50.0%) surgical treatment. All 56 patients who scored 5 and the 4 patients who scored 6 underwent surgery.	Lasmar et al. (2015)
rASRM/rAFS/AFS
					−			Accuracy for the prediction of non-ART pregnancy	161	32.08 years (22–40)	Single center	Surgical confirmation	Comparison to EFI: The cumulative PR 36 months after surgery was 46.6% (stage I, 53.6%; stage II, 36.0%; stage III, 51.7%, and stage IV, 41.7%; chi² = 4.143, P = 0.246). In the 1st year, PRs significantly differed between patients with rAFS stage IV and those with stages I–III (chi² = 6.024, P = 0.014). rAFS stage did not predict PR in patients with endometriosis-associated infertility.	Zeng et al. (2014)

					− ART			Ability of rAFS (vs EFI) to predict IVF outcomes	199	32.0 years (SD 4.2)	Single center	Histologic confirmation	Comparison to EFI: The AUC of the EFI score was significantly larger than that of the r-AFS classification (AUC = 0.445, SE = 0.041, and 95% CI = 0.364–0.526).	Wang et al. (2013)

			+					Correlation with Clavien-Dindo complication grading	401	34.8 years (SD 8.73)	Single center	Histologic confirmation	rASRM IV was a risk factors for the length of hospital stay. Clavien-Dindo Grade III complications were significantly associated with rASRM stage IV.	Nicolaus et al. (2020)
rASRM/rAFS/AFS
							Acceptable	Inter-observer agreement	148	32.0 years (SD 6.7)	Single center	105 women with and 43 women without a postoperative endometriosis	Surgeons and expert reviewers had substantial agreement on diagnosis and staging after viewing digital images (n = 148; mean j = 0.67, range 0.61–0.69; mean j = 0.64, range 0.53–0.78, respectively) and after additionally viewing operative reports (n = 148; mean j = 0.88, range 0.85–0.89; mean j = 0.85, range 0.84–0.86, respectively). Although additionally viewing MRI findings (n = 36) did not greatly impact agreement, agreement substantially decreased after viewing histological findings (n = 67), with expert reviewers changing their assessment from a positive to a negative diagnosis in up to 20% of cases.	Schliep et al. (2017)

				+				Prognostic value of individual adhesion scores for recurrence	379	31.8 years (SD 6.7)	Single center	histologic confirmation	In endometriosis of advanced stage, younger age at the time of surgery, bilateral ovarian cysts at the time of diagnosis, a rAFS ovarian adhesion score >24, and complete cul-de-sac obliteration were independent risk factors of poor outcomes, and a rAFS ovarian adhesion score >24 had the highest risk of recurrence [hazard ratio = 2.948 (95% CI: 1.116–7.789), P = 0.029].	Yun et al. (2015)
rASRM/rAFS/AFS
					ns AFC, FSH			Correlation with the number of follicles, the level of FSH	39	28.7 years (22–34)	Single center	Surgical confirmation	No statistically significant correlation between the AFC, the level of FSH and the stage of endometriosis was found.	Posadzka et al. (2014)

					+ ART outcome			Prediction of IVF outcome	40	34.7 years (SD 4.3)	Single center	Surgical confirmation	Higher cancelation rates, higher total gonadotropin requirements, and lower oocyte yield were found in women with endometriosis Stage III and IV compared with both the Stage I/II and control groups. The fertilization rate was higher in Stage III/IV endometriosis compared to Stage I/II. CPR and LBR were comparable between patients with endometriosis Stage I/II and control group, whereas they were significantly lower in patients with endometriosis Stage III/IV compared to other two groups.	Pop-Trajkovic et al. (2014)

					− ART outcome			Correlation of rASRM stage with outcome ART treatment	1764 (11)	Not applicable	Not applicable	Not reported	Comparison of women with Stage-III/IV vs Stage-I/II endometriosis: LBR, RR = 0.94 (95% CI, 0.80–1.11); CPR, RR = 0.90 (95% CI, 0.82–1.00); miscarriage, RR = 0.99 (95% CI, 0.73–1.36); number of oocytes retrieved, MD = −1.03 (95% CI, −1.67 to −0.39). No relevant difference between Stage-III/IV and Stage-I/II in LBR following ART.	Meta-analysis Barbosa et al. (2014)
rASRM/rAFS/AFS
							Acceptable (surgeons)	Interrater and intrarater reliability (8 experts)	148	Not reported	Single center	Not reported	The interrater reliability for endometriosis diagnosis among the 8 surgeons was substantial: Fleiss kappa = 0.69 (95% CI 0.64–0.74). Surgeons agreed on revised ASRM endometriosis staging criteria after experienced assessment in a majority of cases (mean 61%, range 52–75%) with moderate interrater reliability: Fleiss kappa = 0.44 (95% CI 0.41–0.47).	Schliep et al. (2012)

−								Correlation with symptoms	319	Age categories reported	Single center	Surgical confirmation, histologic confirmation in 72.9%	A correlation between endometriosis stage and severity of symptoms was observed only for dysmenorrhea (chi² = 5.14, P = 0.02) and non-menstrual pain (chi² = 5.63, P = 0.018). However, the point estimates of ORs were very close to unity (respectively, 1.33, 95% CI 1.04–1.71, and 1.01, 95% CI 1.00–1.03). The association between endometriosis stage and severity of pelvic symptoms was marginal and inconsistent.	Vercellini et al. (2007)
rASRM/rAFS/AFS
				− pain recur rence + relapse	− pregnancy			Predictive value for response to surgical treatment	537	Age categories reported	Single center	Histologic confirmation	The cumulative probability of pregnancy at 3 years from surgery was 47% (51% at stage I, 45% at stage II, 46% at stage III and 44% at stage IV; chi² = 1.50, ns). The cumulative probability of moderate or severe dysmenorrhoea recurrence in 425 symptomatic subjects was 24% (32% at stage I, 24% at stage II, 21% at stage III and 19% at stage IV; chi² = 6.39, ns). The cumulative probability of disease relapse was 12% (3% at stage I, 11% at stage II, 11% at stage III and 23% at stage IV; chi² = 24.95, P = 0.0001).	Vercellini et al. (2006)

+		+		+ pain				Association with type and severity of pain, and with symptoms after laparoscopic surgery	95	Not reported	Single center	Surgical confirmation	In patients with AFS ≥16; preoperative pain scores were significantly higher for dysmenorrhea (P = 0.0022) and deep dyspareunia (P < 0.0001) but not for non-menstrual pain. After surgery, dysmenorrhea improved in 43% of cases in patients with AFS <16 vs 66% with AFS ≥16 (P = 0.0037). For deep dyspareunia, improvement was reported by 33% and 67%, respectively (ns). Improvement in non-menstrual pain was not significantly different (67% vs 56%). Cases with advanced disease benefit the most from laparoscopy.	Milingos et al. (2006)
rASRM/rAFS/AFS
−				−				Impact of treatments on pain + association pain scores	181	Not reported	Single center	Histologic confirmation	No correlation was found between the stage of endometriosis according to R-AFS score and the severity of CPP.	Szendei et al. (2005)

						Variable		Comparison of laparoscopic and laparotomic scoring	84	Not reported	Single center	Surgical confirmation	There was considerable variability in laparoscopic vs laparotomic scoring by the same observer, with largest variability in ovarian endometriosis and cul-de-sac obliteration subscores, and least variability for peritoneum endometriosis. The inter-method variation was sufficient to alter the staging in 34.5% of patients, with a difference of two stages in 3.6% of patients. In general, there was fair-to-good agreement (kappa coefficient 0.49).	Lin et al. (1998)

					− ART pregnancy			Impact of severity of endometriosis on the outcome of IVF	61	33.6 years (SD 3.0) and 34.4 years (SD 4.0)	Single center	Surgical confirmation	Response to COH and the number, maturity, and quality of the oocytes was comparable between stages. Fertilization rates for oocytes of patients with stages III/IV were significantly impaired compared to those in stage I/II (P = 0.004). The implantation rate, CPR, and miscarriage rate were comparable between stages I/II and stages III/IV.	Pal et al. (1998)
rASRM/rAFS/AFS
							Variable	Intraobserver and interobserver variability—5 experts	20	Not reported	Single center	Not reported	The grand total score varied with an SD of 13.44 when the videotape of a single patient was rated twice by the same observer and varied with an SD of 17.12 when rated by two observers. The greatest variability occurred in endometriosis of the ovary and cul-de-sac obliteration, with less variability for peritoneum endometriosis and for ovarian and tubal adhesions. Comparison of intraobserver and interobserver scores resulted in a change in endometriosis stage in 38% and 52% of patients, respectively.	Hornstein et al. (1993)

							Acceptable	Reproducibility—2 experts	315	Not reported	Single center	Not reported	Good to fair agreement scoring endometriosis between the investigator and the blinded reviewer was noted.	Rock (1995)

						No analysis		Feasibility of AFS and adnexal score	89	Not reported	Single center	Surgical confirmation	Suggestion to split class IV in class IV and class V (with higher rate of bilateral adnexal disease/adhesions).	Canis et al. (1992)

− age, symptoms								Relation with endometriosis-associated symptoms and patients’ age.	206	30 years (18–44)	Single center	Surgical confirmation	No significant differences were found in total endometriosis scores, in active scores or in adhesion scores in different age groups. There was no significant difference in prevalence rate of symptoms for different aspects of endometriosis (implants, cysts or adhesions). AFS score does not reflect the intensity of symptoms.	Marana et al. (1991)
rASRM/rAFS/AFS
						Variable		Feasibility of measuring endometrioma	52	29.5 years (24–39)	Single center	NA	Cyst diameter was calculated using the geometric formula radius = 3 square root of 3 V/4 pi where V = volume of liquid aspirated. Eight patients with apparently normal pelvis had endometriosis, and 14 with apparent minimal or mild endometriotic lesions were restaged. Laparoscopic ovarian puncture of enlarged ovaries was important for correct diagnosis and staging of endometriosis.	Candiani et al. (1990)

					− pregnancy			Relation with pregnancy after therapy	214	Not reported	Single center	Surgical confirmation	The AFS scale poorly specifies the relation between severity of disease and pregnancy outcome after therapy. A nonparametric monotonic estimator, generating a relationship between AFS score and pregnancy following treatment is shown to improve the discriminatory power of the AFS scale.	Guzick et al. (1982)

					+ pregnancy			(+ Kistner, Buttram) Prediction of pregnancy	214	28.6 years (17–37)	Single center	Surgical confirmation	The AFS score revealed significant differences in pregnancy rate only if categories were combined (mild plus moderate versus severe plus extensive, P ≤ 0.05). The AFS system revealed that pregnancy success was significantly reduced if an ovarian endometrioma was greater than 3 cm or had ruptured (P ≤ 0.01).	Rock et al. (1981)

The symbols should be interpreted as follows; + indicates a significant positive result in a correlation (or similar) test, − indicates a significant negative result in a correlation (or similar) test, ns indicates a non-conclusive/non-significant result in a correlation (or similar) test. The highlighted columns represent the intended purpose of the classification/staging system (as in Table I).

AFC, antral follicle count; AFS, American Fertility Society; AGES, Australasian Gynaecological Endoscopy and Surgery; COH, controlled ovarian hyperstimulation; CPP, chronic pelvic pain; CPR, clinical pregnancy rate; DE, deep endometriosis; EFI: endometriosis fertility index; HR, hazard ratio; IVF-ET, IVF embryo transfer; K-M, Kaplan–Meier; LBR, live birth rate; NPV, negative predictive value; PPV, positive predictive value; RANZCOG, Royal Australian and New Zealand College of Obstetricians and Gynaecologists; RR, relative risk.

Study included in meta-analysis (Vesali et al., 2020).