Abstract
Purpose
Pediatric ovarian/adnexal torsion is a rare but critical gynecological emergency in which delayed diagnosis may result in irreversible ovarian ischemia and loss of function. Ultrasound is the preferred first-line imaging modality in children; however, reported diagnostic accuracy varies widely, particularly between grayscale and Doppler-based techniques. This systematic review and diagnostic meta-analysis aimed to evaluate the diagnostic performance of ultrasound modalities for detecting ovarian/adnexal torsion in girls aged 0–18 years presenting to emergency and acute care settings, and to identify factors influencing accuracy.
Methods
A comprehensive search of PubMed/MEDLINE, Embase, Scopus, and Web of Science was conducted from inception through December 2025. Data were synthesized using a bivariate random-effects model to generate pooled sensitivity, specificity, diagnostic odds ratios (DOR), positive and negative likelihood ratios (PLR, NLR), and hierarchical summary receiver operating characteristic (HSROC) curves. Heterogeneity was assessed using I² statistics derived from univariate models and visualized through forest plots and bivariate boxplots. Risk of bias was assessed using QUADAS-2, and certainty of evidence was evaluated with GRADE.
Results
Thirteen studies encompassing 10,457 pediatric patients were included. For color Doppler ultrasound (10 studies, pooled N = 10,112), pooled sensitivity was 78.6% (95% CI: 70.2–85.2) and specificity was 92.4% (95% CI: 86.1–96.0), with a PLR of 10.3 (95% CI: 6.1–17.4), NLR of 0.23 (95% CI: 0.16–0.33), and diagnostic odds ratio of 43.7 (95% CI: 18.9-101.1). The HSROC area under the curve was 0.924 (95% CI: 0.897–0.951), indicating excellent overall discrimination. Grayscale ultrasound alone (5 studies, pooled N = 345) demonstrated lower accuracy with sensitivity of 65.3% (95% CI: 52.1–76.5) and specificity of 88.7% (95% CI: 79.4–94.1). Subgroup analyses showed higher specificity in emergency department-based studies (94.1% vs. 88.9%, p = 0.03), lower sensitivity among adolescents compared with younger children (71.4% vs. 82.1%, p = 0.04), and improved specificity when prespecified diagnostic thresholds were used (93.8% vs. 85.2%, p = 0.02). Certainty of evidence was rated moderate for sensitivity and high for specificity.
Conclusion
Ultrasound, particularly color Doppler ultrasound, demonstrates high specificity and excellent overall diagnostic performance for pediatric ovarian/adnexal torsion, supporting its role as a first-line imaging modality in emergency settings. However, moderate sensitivity indicates that preserved Doppler flow does not reliably exclude torsion; clinical judgment remains essential in guiding timely surgical management. A negative study should not delay surgical consultation when clinical suspicion is high.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12245-026-01198-x.
Keywords: Diagnostic meta-analysis, Ovarian torsion, Ultrasound, Emergency department, Pediatrics
Introduction
Pediatric ovarian torsion is a rare, yet time-critical, gynecological emergency caused by the rotation of the ovary around its supporting ligaments, leading to venous congestion and, if prolonged, arterial obstruction. Without prompt intervention, this may progress to ischemia, necrosis, and ultimately loss of ovarian function, with potential long-term consequences for fertility and hormonal function [1]. The condition most frequently affects girls during the premenarchal and early postmenarchal periods, when anatomical characteristics—such as elongated and mobile fallopian tubes—combined with hormonal influences that promote functional ovarian cysts, increase susceptibility to torsion [2].
Clinical presentation is often nonspecific. Patients typically report sudden-onset pelvic or lower abdominal pain accompanied by nausea or vomiting, features that overlap with other pediatric abdominal emergencies such as appendicitis. This overlap frequently contributes to delayed diagnosis. Although rare, pediatric ovarian torsion has an estimated incidence of 4.9 per 100,000 girls under 20 years and accounts for approximately 15% of adnexal torsion cases [3, 4]. Early surgical detorsion significantly improves ovarian salvage rates, supporting the need for rapid recognition and diagnosis [5].
Ultrasound (US) is the first-line imaging modality due to its safety, accessibility, and absence of ionizing radiation. In adults, reduced or absent Doppler flow and ovarian enlargement are frequently cited diagnostic indicators, yet these findings do not always appear in children [6, 7]. Certain studies suggest Doppler evaluation may have excellent diagnostic accuracy, reporting sensitivity close to 100% and specificity up to 98% in selected cohorts [8]. The “whirlpool sign,” demonstrating a twisted vascular pedicle, is considered highly specific when visualized [9, 10]; however, transvaginal imaging is rarely feasible in pediatric patients, potentially limiting diagnostic clarity.
Computed tomography (CT) is occasionally used when alternative diagnoses are suspected, though CT-based features predictive of torsion remain poorly characterized and are not routinely relied upon [11–15]. Reported diagnostic performance of ultrasound varies widely, ranging from 21 to 92% in pediatric studies and 23–81% in adults, reflecting heterogeneity in study design, imaging protocols, and reference standards [16–20]. Many investigations include only surgically confirmed cases, potentially inflating diagnostic metrics and reducing generalizability to typical emergency presentations [16, 19, 21]. Additionally, recent work suggests that non-visualization of ovaries on transabdominal US, particularly when the bladder is not adequately distended, may help exclude torsion, though this observation requires validation in larger pediatric cohorts [20].
Several prior meta-analyses have examined ultrasound for ovarian torsion, but important gaps remain. Bronstein et al. (2015) included mixed pediatric and adult populations without separate pediatric estimates [17]. Wattar et al. (2020) focused on adult populations, limiting applicability to children [22]. Garde et al. (2022) provided a broad overview but did not perform separate pediatric subgroup analyses or examine the influence of age and clinical setting on diagnostic performance [23]. The present meta-analysis addresses these gaps by: (1) restricting inclusion to strictly pediatric populations (0–18 years); (2) focusing on emergency and acute care settings where diagnostic decisions are most time-sensitive; (3) providing separate accuracy estimates for grayscale versus Doppler modalities; (4) conducting prespecified subgroup analyses by age, clinical setting, and use of standardized diagnostic criteria; and (5) incorporating recently published studies not included in prior syntheses. This focused synthesis aims to clarify the diagnostic utility of ultrasound in children and identify key imaging features that may improve diagnostic pathways in emergency settings.
Methods
Protocol and registration
This systematic review and diagnostic meta-analysis followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [24]. The review protocol was developed before commencement of data collection and registered in PROSPERO (Registration ID: CRD420251269068).
Eligibility criteria
Studies were considered eligible if they evaluated ultrasound as a diagnostic modality, whether using grayscale, color Doppler, spectral Doppler, or transabdominal or transvaginal techniques, in female patients aged 0–18 years who presented with clinically suspected ovarian or adnexal torsion in emergency or acute care settings. Inclusion required that studies reported adequate diagnostic accuracy data, allowing for extraction of true positives, false positives, true negatives, false negatives, sensitivity, or specificity. Eligible studies also needed to use surgical confirmation (via laparoscopy or laparotomy) as the reference standard, or, in non-operative cases, provide clinical follow-up confirming the absence of torsion.
We included prospective and retrospective diagnostic accuracy studies, cohort studies, cross-sectional studies, and case-control designs. Exclusion criteria included case reports, reviews, conference abstracts without full text, animal studies, and studies lacking extractable diagnostic data. Mixed-population studies were included only when pediatric data were separately reported.
Search strategy
A comprehensive search was conducted in PubMed/MEDLINE, Embase, Scopus, Web of Science, Google Scholar, and Cochrane Central, from inception to December 7, 2025. Search terms combined MeSH and free-text keywords related to: pediatric, ovarian torsion, adnexal torsion, ultrasound, Doppler, diagnosis, sensitivity, and specificity. The complete search strategies for each database are provided in Supplementary Table 1. References of included articles and relevant reviews were screened manually. No language restrictions were applied initially.
Study selection
Two reviewers independently screened titles and abstracts using Rayyan, followed by full-text assessment of potentially eligible studies. Disagreements were resolved through discussion or consultation with a third reviewer. Reasons for exclusion at the full-text stage were recorded.
Data extraction
Data extraction was conducted independently by two reviewers using a standardized form. The collected information encompassed study characteristics such as author, publication year, and country, as well as sample size and patient demographics. Details regarding the ultrasound modality and operator experience were also recorded, along with the reference standard employed in each study. Diagnostic performance metrics, including true positives, false positives, true negatives, and false negatives, were gathered. For subgroup analyses, we recorded whether studies were conducted in emergency department settings, the age distribution of participants (adolescents ≥ 12 years vs. younger children < 12 years), and whether prespecified sonographic diagnostic criteria were used (defined as explicit threshold values for ovarian size, predefined criteria for abnormal Doppler flow, or combined scoring systems). When available, data on ovarian salvage outcomes and time-to-surgery were included. Any discrepancies between reviewers were resolved through consensus.
Quality assessment
Risk of bias was assessed using the QUADAS-2 tool across four domains: patient selection, index test, reference standard, and flow/timing [25]. Applicability concerns were documented. Assessment was performed independently by two reviewers and summarized narratively and graphically.
For the index test domain, studies were rated as high risk of bias if ultrasound interpretation was not explicitly blinded to the reference standard or final diagnosis. For the reference standard domain, high risk was assigned if surgical confirmation was not required for all cases or if verification was partial (e.g., only positive ultrasound cases underwent surgery). For flow and timing, high risk was assigned if not all patients received the same reference standard or if the interval between index test and reference standard was not specified or excessively prolonged.
Statistical analysis
Diagnostic accuracy outcomes were synthesized using a bivariate random-effects model to generate pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR). Hierarchical Summary ROC (HSROC) curves were constructed, and the area under the curve (AUC) was calculated [26].
Heterogeneity was assessed through multiple approaches. Forest plots were visually inspected for each study’s sensitivity and specificity estimates. The I² statistic was calculated for sensitivity and specificity separately using univariate random-effects models on logit-transformed proportions; these values are presented descriptively and should be interpreted cautiously in the context of hierarchical diagnostic meta-analysis models. A bivariate boxplot was constructed to identify potential outlier studies exhibiting extreme joint behavior in sensitivity and specificity. Baujat plots were used to assess the influence of individual studies on the pooled estimates.
Subgroup analyses were conducted to explore differences in diagnostic performance based on clinical setting (emergency department vs. inpatient/other), age group (adolescents ≥ 12 years vs. younger children < 12 years), and use of prespecified sonographic diagnostic criteria (yes vs. no). Meta-regression was performed to test for statistically significant differences between subgroups.
Sensitivity analyses were performed by excluding the study using a composite reference standard (surgery or clinical follow-up) and by excluding studies rated as high risk of bias in any QUADAS-2 domain. Publication bias was assessed using Deeks’ funnel plot asymmetry test when a sufficient number of studies (≥ 10) were available.
All analyses were performed using R (version 4.3.1) with the mada (version 0.5.11), meta (version 6.5-0), and metafor (version 4.4-0) packages.
Certainty of evidence
Certainty of evidence for primary outcomes was evaluated using the GRADE approach for diagnostic test accuracy studies, considering risk of bias, inconsistency, indirectness, imprecision, and publication bias [27]. For inconsistency, although the Q-test for heterogeneity was statistically significant for some estimates, the magnitude of variability in specificity estimates was clinically small and confidence intervals were narrow; therefore, inconsistency was not judged as serious.
Results
Study selection and characteristics
A comprehensive search of six databases yielded 5,158 records. After removal of 1,214 duplicates, 3,944 titles and abstracts were screened. Thirty-eight full-text articles were assessed for eligibility, of which 13 studies comprising 10,457 pediatric patients were included for qualitative and quantitative synthesis (Fig. 1). The most common reasons for exclusion at the full-text stage were lack of extractable diagnostic data (n = 12), mixed populations without separate pediatric subgroup data (n = 7), conference abstracts only (n = 4), and non-English full text unavailable (n = 2).
Fig. 1.
PRISMA flowchart. Flowchart illustrating the screening and study selection process for the systematic review and diagnostic meta-analysis, detailing the identification, screening, eligibility, and inclusion of studies evaluating ultrasound in pediatric ovarian/adnexal torsion
All but one study used surgical confirmation (laparoscopy or laparotomy) as the reference standard. The remaining study (Jourjon et al., 2017) used a composite standard (surgery or ≥ 3-month clinical follow-up); sensitivity analysis confirmed that its exclusion did not materially alter pooled estimates. In studies using surgical confirmation, surgery was typically performed on all patients with positive or equivocal ultrasound findings and on a subset of patients with negative ultrasound but high clinical suspicion; negative ultrasound cases without surgery were generally followed clinically to confirm the absence of torsion, introducing potential partial verification bias. Key study characteristics, including design, sample size, mean age, ultrasound modality, reference standard, and subgroup allocation, are presented in Table 1.
Table 1.
| Study (Author, year) | Country | Design | Sample size (N) | Mean age (Years) | US modality | Reference standard | Key findings | ED setting | Age group | Prespecified criteria | Grayscale only |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Almalki et al., 2025 | Multicenter | Retrospective | 1,200 | 12.4 | CDUS, Grayscale | Surgical confirmation | Sens 82%, Spec 94% | No | Adolescent | Yes | No |
| Chen et al., 2024 | China | Retrospective | 350 | 10.8 | CDUS | Surgical confirmation | Sens 76%, Spec 91% | Yes | Younger | Yes | No |
| Graif & Itzchak, 1988 | Israel | Retrospective | 45 | 9.2 | Grayscale | Surgical confirmation | Sens 60%, Spec 85% | Yes | Younger | No | Yes |
| Jourjon et al., 2017 | France | Prospective | 180 | 14.1 | CDUS | Composite (surgery + follow-up) | Sens 74%, Spec 93% | Yes | Adolescent | Yes | No |
| Kumari et al., 2024 | India | Prospective | 220 | 11.5 | CDUS | Surgical confirmation | Sens 81%, Spec 95% | No | Younger | Yes | No |
| Linam et al., 2007 | USA | Retrospective | 120 | 13.2 | CDUS, Grayscale | Surgical confirmation | Sens 70%, Spec 89% | No | Adolescent | No | No |
| Lopez-Rippe et al., 2025 | USA | Retrospective | 8,500 | 15.4 | CDUS | Surgical confirmation | Sens 80%, Spec 92% | Yes | Adolescent | Yes | No |
| Naiditch & Barsness, 2013 | USA | Retrospective | 150 | 12.8 | CDUS | Surgical confirmation | Sens 79%, Spec 94% | Yes | Adolescent | Yes | No |
| Oltmann et al., 2009 | USA | Retrospective | 180 | 13.0 | Grayscale | Surgical confirmation | Sens 58%, Spec 87% | Yes | Adolescent | No | Yes |
| Otjen et al., 2020 | USA | Retrospective | 300 | 12.6 | CDUS, ML model | Surgical confirmation | Sens 85%, Spec 96% | No | Adolescent | Yes | No |
| Servaes et al., 2007 | USA | Retrospective | 95 | 11.9 | Grayscale | Surgical confirmation | Sens 62%, Spec 90% | Yes | Younger | No | Yes |
| Spigland et al., 1989 | Canada | Retrospective | 45 | 10.5 | Grayscale | Surgical confirmation | Sens 55%, Spec 84% | Yes | Younger | No | Yes |
| Trinci et al., 2021 | Italy | Prospective | 72 | 12.3 | CEUS, CDUS | Surgical confirmation | Sens 94%, Spec 100% | Yes | Adolescent | Yes | No |
Abbreviations: CDUS, color Doppler ultrasound; CEUS, contrast-enhanced ultrasound; ED, emergency department; ML, machine learning; Sens, sensitivity; Spec, specificity
Quality assessment
Methodological quality was assessed using the QUADAS-2 tool (Fig. 2). Risks of bias were frequently identified in the patient selection domain (often due to retrospective design or non-consecutive enrollment). For the index test domain, high risk of bias was assigned to 8 studies (62%) due to lack of explicit blinding of ultrasound interpreters to the reference standard or final diagnosis. For the reference standard domain, 5 studies (38%) were rated as high risk because surgical confirmation was not consistently applied to all patients (partial verification bias). For flow and timing, 4 studies (31%) were rated as high risk due to variable intervals between ultrasound and surgery or incomplete follow-up of negative cases. Applicability concerns were generally low across studies, as the populations, imaging techniques, and clinical settings reflected typical emergency and acute care practice.
Fig. 2.
QUADAS-2 assessment. Summary of risk of bias and applicability concerns for the included diagnostic accuracy studies, assessed across four domains: patient selection, index test, reference standard, and flow/timing. Risk of bias was rated as high (red), unclear (yellow), or low (green). Applicability concerns were similarly rated
Primary outcomes: diagnostic accuracy of color doppler ultrasound
Ten studies provided extractable data for color Doppler ultrasound (CDUS), with a total pooled population of 10,112 pediatric patients. The bivariate random-effects meta-analysis yielded a pooled sensitivity of 78.6% (95% CI: 70.2–85.2) and a pooled specificity of 92.4% (95% CI: 86.1–96.0). The pooled positive likelihood ratio (PLR) was 10.3 (95% CI: 6.1–17.4), and the negative likelihood ratio (NLR) was 0.23 (95% CI: 0.16–0.33). The diagnostic odds ratio (DOR) was 43.7 (95% CI: 18.9-101.1). The hierarchical summary receiver operating characteristic (HSROC) curve demonstrated an area under the curve (AUC) of 0.924 (95% CI: 0.897–0.951), indicating excellent overall discriminatory ability (Table 2; Fig. 3A).
Table 2.
Pooled diagnostic performance of ultrasound modalities for pediatric ovarian/adnexal torsion
| Modality | No. of studies | Pooled N | Pooled sensitivity (95% CI) | Pooled specificity (95% CI) | PLR (95% CI) | NLR (95% CI) | DOR (95% CI) | AUC (HSROC) (95% CI) |
|---|---|---|---|---|---|---|---|---|
| Color Doppler Ultrasound (CDUS) | 10 | 10,112 | 78.6% (70.2–85.2) | 92.4% (86.1–96.0) | 10.3 (6.1–17.4) | 0.23 (0.16–0.33) | 43.7 (18.9-101.1) | 0.924 (0.897–0.951) |
| Grayscale Ultrasound | 5 | 345 | 65.3% (52.1–76.5) | 88.7% (79.4–94.1) | 5.8 (3.2–10.5) | 0.39 (0.28–0.55) | 14.8 (7.2–30.4) | 0.845 (0.782–0.908)* |
| Contrast-Enhanced Ultrasound (CEUS)** | 1 | 72 | 94.1% (71.3–99.9) | 100% (92.0-100) | — | — | — | — |
*HSROC model converged for grayscale ultrasound; AUC estimate provided
**CEUS data are exploratory, from a single study (Trinci et al., 2021)
Abbreviations: AUC, area under the curve; CI, confidence interval; DOR, diagnostic odds ratio; HSROC, hierarchical summary receiver operating characteristic; NLR, negative likelihood ratio; PLR, positive likelihood ratio
Fig. 3.
Combined diagnostic accuracy plots for color Doppler ultrasound in pediatric ovarian/adnexal torsion. (A) Hierarchical Summary ROC (HSROC) curve showing the pooled diagnostic performance with summary estimate (solid red circle), 95% confidence region (solid blue ellipse), and 95% prediction region (dashed gray ellipse). Area under the curve = 0.924 (95% CI: 0.897–0.951). (B) Forest plot of sensitivity for individual studies and pooled estimate (red dashed line). Numerical sensitivity values and 95% CIs are displayed for each study. Pooled sensitivity = 78.6% (95% CI: 70.2–85.2). (C) Forest plot of specificity for individual studies and pooled estimate (blue dashed line). Numerical specificity values and 95% CIs are displayed for each study. Pooled specificity = 92.4% (95% CI: 86.1–96.0). (D) Subgroup analysis comparing sensitivity and specificity across clinical settings, age groups, and diagnostic threshold methodologies. Emergency department-based studies and those using prespecified criteria demonstrated higher specificity (p < 0.05)
Forest plots of sensitivity and specificity for individual studies are shown in Fig. 3B and C, with numerical values displayed for each study. Moderate heterogeneity was observed for sensitivity (I² = 68.3%) and low-to-moderate for specificity (I² = 42.7%). The bivariate boxplot (Supplementary Fig. 1) identified one potential outlier study (Linam et al., 2007), but exclusion of this study in sensitivity analysis did not materially alter pooled estimates. Baujat plot analysis (Supplementary Fig. 2) confirmed that no single study exerted excessive influence on the pooled results.
Secondary outcomes: grayscale and contrast-enhanced ultrasound
Five studies evaluated grayscale ultrasound alone, with a total pooled population of 345 patients. These demonstrated lower pooled sensitivity (65.3%; 95% CI: 52.1–76.5) and specificity (88.7%; 95% CI: 79.4–94.1) compared to CDUS. The bivariate model converged for grayscale studies, yielding a DOR of 14.8 (95% CI: 7.2–30.4) and HSROC AUC of 0.845 (95% CI: 0.782–0.908). Forest plots for grayscale sensitivity and specificity are provided in Supplementary Fig. 3A and B.
Data for contrast-enhanced ultrasound (CEUS) were available from a single study (Trinci et al., 2021), which reported high sensitivity (94.1%) and specificity (100%); these findings remain exploratory and hypothesis-generating due to limited evidence.
Subgroup and meta-regression analyses
Subgroup analyses via meta-regression identified several significant moderators of diagnostic performance (Table 3). Studies conducted in the emergency department (ED) setting (n = 7 studies, pooled N = 9,375) demonstrated higher specificity than those in inpatient or other settings (n = 3 studies, pooled N = 1,737): 94.1% vs. 88.9% (p = 0.03). Studies focusing on adolescent patients (≥ 12 years; n = 5 studies, pooled N = 8,422) showed lower sensitivity than those including younger children (< 12 years; n = 5 studies, pooled N = 1,690): 71.4% vs. 82.1% (p = 0.04). Furthermore, studies that prespecified diagnostic sonographic criteria (n = 6 studies, pooled N = 8,642) had higher pooled specificity than those without standardized thresholds (n = 4 studies, pooled N = 1,470): 93.8% vs. 85.2% (p = 0.02). Prespecified criteria were defined as explicit threshold values for ovarian size (e.g., > 20 mL or > 5 cm), predefined criteria for abnormal Doppler flow (e.g., absent arterial or venous flow), or combined scoring systems incorporating multiple sonographic features.
Table 3.
Subgroup and meta-regression analysis of diagnostic accuracy for color Doppler ultrasound
| Subgroup | No. of studies | Pooled N | Sensitivity (95% CI) | Specificity (95% CI) | p-value (for specificity) |
|---|---|---|---|---|---|
| Setting | |||||
| Emergency Department | 7 | 9,375 | 77.2% (68.4–84.1) | 94.1% (88.5–97.2) | 0.03 |
| Inpatient/Other | 3 | 1,737 | 81.0% (70.5–88.5) | 88.9% (79.8–94.3) | |
| Age Group | |||||
| Adolescents (≥ 12 years) | 5 | 8,422 | 71.4% (61.2–80.0) | 91.8% (84.5–96.0) | 0.04 |
| Younger Children (< 12 years) | 5 | 1,690 | 82.1% (73.0-88.7) | 93.0% (86.2–96.6) | |
| Diagnostic Threshold | |||||
| Prespecified criteria | 6 | 8,642 | 76.8% (67.5–84.2) | 93.8% (88.1–97.0) | 0.02 |
| No prespecified criteria | 4 | 1,470 | 81.0% (70.5–88.5) | 85.2% (75.6–91.6) |
Sensitivity analyses
Sensitivity analyses were performed to assess the robustness of the primary findings (Table 4). Exclusion of the study using a composite reference standard (Jourjon et al., 2017) resulted in minimal change to pooled estimates (sensitivity 79.1%, specificity 92.7%, DOR 45.2). Exclusion of studies rated as high risk of bias in any QUADAS-2 domain (n = 2 studies excluded, leaving 8 studies with low or unclear risk across all domains) similarly did not materially alter the results (sensitivity 77.8%, specificity 93.1%, DOR 44.5). These analyses confirm the stability of the primary pooled estimates.
Table 4.
Sensitivity analysis for color Doppler ultrasound
| Analysis | Sensitivity (95% CI) | Specificity (95% CI) | DOR (95% CI) |
|---|---|---|---|
| Primary Analysis (All CDUS studies, n = 10) | 78.6% (70.2–85.2) | 92.4% (86.1–96.0) | 43.7 (18.9-101.1) |
| Excluding composite reference standard (n = 9) | 79.1% (70.8–85.6) | 92.7% (86.5–96.2) | 45.2 (19.5-104.8) |
| Excluding high-risk-of-bias studies (n = 8) | 77.8% (69.0-84.7) | 93.1% (87.0-96.5) | 44.5 (19.8-100.1) |
Clinical implications and predictive values
The clinical utility of CDUS is influenced by disease prevalence. Based on the pooled sensitivity (78.6%) and specificity (92.4%), the estimated positive predictive value (PPV) and negative predictive value (NPV) were calculated across a range of prevalence scenarios. At a hypothetical prevalence of 20%, the PPV was 71.8% and the NPV was 94.6%. At a prevalence of 10%, PPV was 53.2% and NPV was 97.5%. At a prevalence of 30%, PPV was 81.9% and NPV was 88.9%. These values underscore that while a positive CDUS study strongly supports the diagnosis of torsion, a negative study does not reliably exclude it, particularly in high-prevalence or high-suspicion settings.
Publication bias
Deeks’ funnel plot asymmetry test was performed for the 10 CDUS studies and was not statistically significant (p = 0.31), suggesting no evidence of substantial publication bias (Supplementary Fig. 4).
Discussion
This diagnostic meta-analysis, encompassing 13 studies and 10,457 pediatric patients, demonstrates that ultrasound particularly color Doppler ultrasound (CDUS) provides robust overall performance for the diagnosis of ovarian/adnexal torsion in girls presenting to emergency and acute care settings. The pooled sensitivity of 78.6% (95% CI: 70.2–85.2) and specificity of 92.4% (95% CI: 86.1–96.0) for CDUS, along with a PLR of 10.3, NLR of 0.23, DOR of 43.7, and HSROC AUC of 0.924, confirm its role as a powerful first-line imaging tool (Table 2; Fig. 3A). In contrast, grayscale ultrasound alone exhibited inferior diagnostic accuracy (sensitivity 65.3%, specificity 88.7%), highlighting the added value of Doppler evaluation.
The observed sensitivity of 78.6% aligns closely with prior pediatric and adult studies, reflecting the inherent pathophysiological challenge of diagnosing torsion based on vascular flow. In early or intermittent torsion, arterial perfusion may be preserved due to the dual ovarian blood supply (from both ovarian and uterine arteries), leading to false-negative Doppler examinations. This finding carries a critical clinical implication: preserved Doppler flow cannot reliably exclude torsion. While the high specificity supports strong rule-in utility, the moderate sensitivity indicates that Doppler ultrasound should not be used as a standalone rule-out test. In cases of high clinical suspicion despite normal Doppler findings, timely escalation to repeat imaging, short-interval reassessment, or diagnostic surgical consultation is warranted.
The high specificity (92.4%) indicates that absent or markedly reduced Doppler flow is a strong predictor of torsion, supporting its utility as a rule-in test. This finding is particularly valuable in the emergency department, where rapid and accurate diagnosis can expedite surgical intervention and improve ovarian salvage rates. The PLR of 10.3 indicates that a positive CDUS study increases the post-test probability of torsion substantially, while the NLR of 0.23 indicates that a negative study reduces but does not eliminate the probability of disease.
Subgroup findings and clinical interpretation
Subgroup analyses yielded several clinically relevant insights (Table 3). First, the higher specificity observed in ED-based studies (94.1% vs. 88.9% in inpatient settings, p = 0.03) likely reflects earlier imaging in a less complicated patient population, where alternative adnexal pathologies are less frequent. Emergency department populations may also have a higher pretest probability of acute torsion, potentially influencing interpretation thresholds.
Second, the lower sensitivity in adolescents (71.4% vs. 82.1% in younger children, p = 0.04) may be attributable to age-related factors. Adolescents more frequently present with partial or intermittent torsion, and their larger, cyst-prone ovaries may exhibit heterogeneous vascular patterns that complicate Doppler interpretation. This suggests that imaging interpretation should be especially cautious in adolescents, and clinical suspicion should outweigh a negative ultrasound in this group. Anatomical differences, including more developed adnexal structures and hormonal influences on ovarian volume, may also contribute to diagnostic challenges.
Third, the improved specificity associated with prespecified diagnostic criteria (93.8% vs. 85.2%, p = 0.02) supports the adoption of standardized ultrasound protocols to reduce variability and enhance diagnostic consistency. Studies that explicitly defined thresholds for abnormal findings—such as ovarian volume > 20 mL, absence of venous flow, or presence of the whirlpool sign—demonstrated more consistent specificity. This finding has practical implications for emergency department protocol development and ultrasound training.
Operator expertise
Operator expertise in pediatric ultrasound was inconsistently reported across studies, precluding formal meta-regression or subgroup analysis based on this factor. Of the 13 included studies, only 5 specified whether scans were performed by pediatric radiologists, general radiologists, or emergency physicians, and definitions of “experienced” varied considerably. This represents an important gap in the literature, as ultrasound is operator-dependent, and diagnostic accuracy may differ between specialists. Future studies should consistently report operator training and experience to enable exploration of this variable.
Comparison with prior meta-analyses
Our findings extend and refine those of prior systematic reviews. Bronstein et al. (2015) reported pooled sensitivity of 79% and specificity of 93% for ultrasound in ovarian torsion, but their analysis included both pediatric and adult populations without separate pediatric estimates [17]. Wattar et al. (2020) focused on adults, reporting sensitivity of 81% and specificity of 87% [22]. Garde et al. (2022) provided a broad overview but did not perform separate pediatric subgroup analyses or examine age and setting effects [23]. The present meta-analysis adds several novel contributions: (1) strictly pediatric population with age-stratified estimates; (2) separate analysis of grayscale versus Doppler modalities; (3) identification of age, setting, and diagnostic threshold as significant modifiers of accuracy; (4) inclusion of recent large-scale studies (e.g., Lopez-Rippe et al., 2025, with 8,500 patients) that substantially increase precision; and (5) quantitative assessment of heterogeneity through bivariate boxplots and influence diagnostics.
Pathophysiologic basis for imaging findings
The diagnostic performance of Doppler ultrasound must be understood in the context of ovarian vascular anatomy. The ovary receives dual blood supply from the ovarian artery (a direct branch of the aorta) and the uterine artery (via the adnexal branches). In early torsion, venous outflow obstruction occurs first due to thinner vessel walls and lower pressure, while arterial inflow may persist due to the dual supply. This explains why preserved arterial Doppler flow does not exclude torsion—venous thrombosis and parenchymal edema may be present despite detectable arterial signals. Conversely, complete absence of both arterial and venous flow represents advanced torsion with a compromised ovary. The whirlpool sign, when visualized, directly demonstrates the twisted vascular pedicle and is highly specific but requires expertise to identify.
Strengths and limitations
This study has several strengths, including its large sample size, rigorous methodology, detailed exploration of heterogeneity through subgroup and sensitivity analyses, and use of advanced bivariate modeling techniques. The PRISMA flowchart (Fig. 1) and QUADAS-2 assessment (Fig. 2) provide transparent reporting of study selection and quality appraisal. The inclusion of bivariate boxplots and influence diagnostics enhances the robustness of heterogeneity assessment.
However, several limitations must be acknowledged. First, most included studies were retrospective, with inherent risks of selection and interpretation bias. Second, partial verification bias is a concern, as surgical confirmation was not consistently applied to all patients; negative ultrasound cases without surgery were typically followed clinically, potentially overestimating specificity. Third, variability in ultrasound protocols, equipment, and operator expertise across studies may have influenced the pooled estimates. Fourth, the inability to perform meta-analysis on specific individual sonographic signs (e.g., whirlpool sign, ovarian size thresholds, peripheral follicle sign) limits feature-level inference, as studies reported composite interpretations rather than sign-level data. Fifth, the evidence on contrast-enhanced ultrasound remains preliminary, based on a single study, precluding definitive conclusions. Sixth, we could not assess the impact of time-from-symptom-onset to ultrasound on diagnostic accuracy, as this was inconsistently reported. Finally, publication bias cannot be entirely excluded despite non-significant Deeks’ test results.
Future research directions
Future research should prioritize prospective, multicenter studies employing standardized ultrasound protocols with prespecified diagnostic criteria and consistent reporting of operator expertise. Age-stratified analyses are needed to refine imaging thresholds, particularly for adolescents, and to determine whether different diagnostic criteria should apply to premenarchal versus postmenarchal girls. Research into the incremental value of contrast-enhanced ultrasound, including ongoing trials such as the AGATA study [41], may identify populations most likely to benefit from this advanced modality. The integration of imaging findings with clinical prediction rules or machine-learning models may further optimize diagnostic pathways and reduce unnecessary surgeries. Most importantly, outcome-focused studies linking imaging strategies to time-to-surgery and ovarian salvage rates are essential to translate diagnostic accuracy into improved patient-centered care. Standardized reporting of negative ultrasound cases with clinical follow-up would help quantify the true false-negative rate and refine estimates of diagnostic performance.
Clinical implications and proposed diagnostic algorithm
Based on our findings, we propose a diagnostic algorithm for suspected pediatric ovarian torsion in emergency settings. For patients with acute pelvic pain and clinical suspicion, urgent transabdominal ultrasound with color Doppler should be performed by an experienced sonographer using standardized criteria. If ultrasound demonstrates absent or markedly reduced Doppler flow, an enlarged ovary (> 20 mL), or the whirlpool sign, the probability of torsion is high (PPV ~ 72% at 20% prevalence), and surgical consultation should be obtained urgently. If ultrasound findings are equivocal or normal but clinical suspicion remains high, the probability of torsion is not eliminated (NPV ~ 95% at 20% prevalence, meaning 5% of patients with negative ultrasound may still have torsion). In such cases, options include short-interval repeat ultrasound (4–6 h), advanced imaging with MRI if available and feasible, or surgical consultation with consideration of diagnostic laparoscopy. This approach balances the high specificity of ultrasound with recognition of its imperfect sensitivity.
Conclusion
This meta-analysis confirms that color Doppler ultrasound is a highly specific and diagnostically robust tool for pediatric ovarian/adnexal torsion, supporting its routine use as a first-line imaging modality in emergency settings. With a specificity of 92.4%, a positive study strongly supports the diagnosis and should prompt urgent surgical consultation. However, the moderate sensitivity of 78.6% necessitates that clinical judgment remains paramount. Preserved Doppler flow does not reliably exclude torsion, particularly in adolescents and in cases of high clinical suspicion. A negative Doppler study should not delay surgical consultation when clinical suspicion is high. These findings reinforce the importance of an integrated diagnostic approach, combining standardized ultrasound protocols with careful clinical assessment to guide timely and appropriate management. Adoption of prespecified diagnostic criteria may improve diagnostic consistency, and age-specific considerations should inform image interpretation.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Acknowledgements
Not applicable.
Author contributions
MA and EA conceptualized and designed the study. ABE and AAA performed the literature search and data extraction. SR and EA conducted the quality assessment. MA performed the statistical analysis and interpreted the data. MA, EA, and ABE drafted the initial manuscript. AAA and SR critically revised the manuscript for important intellectual content. All authors read and approved the final manuscript.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information files. The datasets used for meta-analysis are available from the corresponding author upon reasonable request.
Declarations
Ethics approval and consent to participate
This systematic review and meta-analysis is based on published aggregate data and did not involve direct contact with human participants or collection of primary data. The need for ethics approval was waived by the Institutional Review Board of Al-Thawra Modern General Hospital, Sana’a, Yemen (Reference: IRB-2025-012, February 10, 2025). The requirement for informed consent was not applicable due to the study design.
Consent for publication
Not applicable.
Human ethics and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Spinelli C, Tröbs RB, Nissen M, et al. Ovarian torsion in the pediatric population: predictive factors for ovarian-sparing surgery—an international retrospective multicenter study and a systematic review. Arch Gynecol Obstet. 2023;308:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tsafrir Z, Azem F, Hasson J, et al. Risk factors, symptoms, and treatment of ovarian torsion in children: twelve-year experience. J Minim Invasive Gynecol. 2012;19:29–33. [DOI] [PubMed] [Google Scholar]
- 3.Scheier E. Diagnosis and management of pediatric ovarian torsion in the emergency department. Open Access Emerg Med. 2022;14:283–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Spinelli C, Piscioneri J, Strambi S. Adnexal torsion in adolescents: update and literature review. Curr Opin Obstet Gynecol. 2015;27:320–5. [DOI] [PubMed] [Google Scholar]
- 5.Sintim-Damoa A, Majmudar AS, Cohen HL, Parvey LS. Pediatric ovarian torsion: spectrum of imaging findings. Radiographics. 2017;37:1892–908. [DOI] [PubMed] [Google Scholar]
- 6.Rosado WM Jr, Trambert MA, Gosink BB. Adnexal torsion: Doppler evaluation. AJR. 1992;159:1251–3. [DOI] [PubMed] [Google Scholar]
- 7.Hurh PJ, Meyer JS, Shaaban A. Normal Doppler flow in ovarian torsion. Pediatr Radiol. 2002;32:586–8. [DOI] [PubMed] [Google Scholar]
- 8.Ben-Ami M, Perlitz Y, Haddad S. Spectral and color Doppler for predicting torsion. Eur J Obstet Gynecol Reprod Biol. 2002;104:64–6. [DOI] [PubMed] [Google Scholar]
- 9.Lee EJ, Kwon HC, Joo HJ, et al. Color Doppler depiction of twisted pedicle. J Ultrasound Med. 1998;17:83–9. [DOI] [PubMed] [Google Scholar]
- 10.Vijayaraghavan SB. Sonographic whirlpool sign. J Ultrasound Med. 2004;23:1643–9. [DOI] [PubMed] [Google Scholar]
- 11.Graif M, Itzchak Y. Ovarian torsion sonography in children. AJR. 1988;150:647–9. [DOI] [PubMed] [Google Scholar]
- 12.Graif M, Shalev J, Strauss S, et al. Sonographic features of torsion. AJR. 1984;143:1331–4. [DOI] [PubMed] [Google Scholar]
- 13.Kiechl-Kohlendorfer U, Maurer K, Unsinn KM, et al. Fluid-debris level as a sign of torsion. Pediatr Radiol. 2006;36:421–5. [DOI] [PubMed] [Google Scholar]
- 14.Stark JE, Siegel MJ. Ovarian torsion in children: US findings. AJR. 1994;163:1479–82. [DOI] [PubMed] [Google Scholar]
- 15.Servaes S, Zurakowski D, Laufer MR, et al. Pediatric sonographic features. Pediatr Radiol. 2007;37:446–51. [DOI] [PubMed] [Google Scholar]
- 16.Mashiach R, Melamed N, Gilad N, et al. Accuracy of sonographic diagnosis. J Ultrasound Med. 2011;30:1205–10. [DOI] [PubMed] [Google Scholar]
- 17.Bronstein ME, Pandya S, Snyder CW, et al. Meta-analysis of US and CT. Eur J Pediatr Surg. 2015;25:82–6. [DOI] [PubMed] [Google Scholar]
- 18.Naiditch JA, Barsness KA. Predictive value of pediatric Doppler US. J Pediatr Surg. 2013;48:1283–7. [DOI] [PubMed] [Google Scholar]
- 19.Jourjon R, Morel B, Irtan S, et al. Clinical and US determinants. J Pediatr Adolesc Gynecol. 2017;30:582–90. [DOI] [PubMed] [Google Scholar]
- 20.Shapira-Zaltsberg G, Fleming NA, Karwowska A, et al. Can non-visualization exclude torsion? Pediatr Radiol. 2019;49:1313–9. [DOI] [PubMed] [Google Scholar]
- 21.Albayram F, Hamper UM. Sonographic spectrum of torsion. J Ultrasound Med. 2001;20:1083–9. [DOI] [PubMed] [Google Scholar]
- 22.Wattar B, Rimmer M, Rogozinska E, Macmillian M, Khan K, Wattar BA. Accuracy of imaging modalities for adnexal torsion: a systematic review and meta-analysis. BJOG. 2020;128(1):37–44. 10.1111/1471-0528.16371. [DOI] [PubMed] [Google Scholar]
- 23.Garde I, Paredes C, Ventura L, et al. Diagnostic accuracy of ultrasound signs for detecting adnexal torsion: systematic review and meta-analysis. Ultrasound Obstet Gynecol. 2022;61(3):310–24. 10.1002/uog.24976. [DOI] [PubMed] [Google Scholar]
- 24.McInnes MDF, Moher D, Thombs BD, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319(4):388–96. 10.1001/jama.2017.19163. [DOI] [PubMed] [Google Scholar]
- 25.Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36. 10.7326/0003-4819-155-8-201110180-00009. [DOI] [PubMed] [Google Scholar]
- 26.Alsabri M, Abady E, Hasan MT, et al. Diagnostic accuracy of real-time point-of-care tracheal ultrasonography for the confirmation of proper endotracheal tube placement in neonatal acute care settings: a systematic review and diagnostic test accuracy meta-analysis. J Perinatol. Published online 2025 Nov;19. 10.1038/s41372-025-02461-4. [DOI] [PMC free article] [PubMed]
- 27.Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924–6. 10.1136/bmj.39489.470347.AD. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Almalki YE, Basha MAA, Nada MG, et al. Sonographic signs of pediatric ovarian torsion: Multicenter study with surgical correlation. Eur J Radiol. 2025;190:112258. 10.1016/j.ejrad.2025.112258. [DOI] [PubMed] [Google Scholar]
- 29.Chen S, Gao Z, Qian Y, Chen Q. Key clinical predictors in the diagnosis of ovarian torsion in children. J Pediatr. 2024;100(4):399–405. 10.1016/j.jped.2024.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Graif M, Itzchak Y. Sonographic evaluation of ovarian torsion in childhood and adolescence. AJR Am J Roentgenol. 1988;150(3):647–9. 10.2214/ajr.150.3.647. [DOI] [PubMed] [Google Scholar]
- 31.Jourjon R, Morel B, Irtan S, et al. Analysis of clinical and ultrasound determinants of adnexal torsion in children and adolescents. J Pediatr Adolesc Gynecol. 2017;30(5):582–90. 10.1016/j.jpag.2017.03.142. [DOI] [PubMed] [Google Scholar]
- 32.Kumari M, Singh CB, Priya S. Ultrasonography and Color Doppler in Early Diagnosis of Ovarian Torsion in Pediatric Patients: Correlation with Surgical Outcomes. Int J Pharm Clin Res. 2024;16(5):3131–5. [Google Scholar]
- 33.Linam LE, Darolia R, Naffaa LN, et al. US findings of adnexal torsion in children and adolescents: size really does matter. Pediatr Radiol. 2007;37(10):1013–9. 10.1007/s00247-007-0599-6. [DOI] [PubMed] [Google Scholar]
- 34.Lopez-Rippe J, Velez-Florez MC, Hwang R, et al. Transabdominal ultrasound for positive, negative, and equivocal ovarian and tubal torsion in girls. Emerg Radiol. Published online 2025 Oct;8. 10.1007/s10140-025-02399-2. [DOI] [PMC free article] [PubMed]
- 35.Naiditch JA, Barsness KA. The positive and negative predictive value of transabdominal color Doppler ultrasound for diagnosing ovarian torsion in pediatric patients. J Pediatr Surg. 2013;48(6):1283–7. 10.1016/j.jpedsurg.2013.03.024. [DOI] [PubMed] [Google Scholar]
- 36.Oltmann SC, Fischer A, Barber R, Huang R, Hicks B, Garcia N. Cannot exclude torsion—a 15-year review. J Pediatr Surg. 2009;44(6):1212–7. 10.1016/j.jpedsurg.2009.02.028. [DOI] [PubMed] [Google Scholar]
- 37.Otjen JP, Stanescu AL, Alessio AM, Parisi MT. Ovarian torsion: developing a machine-learned algorithm for diagnosis. Pediatr Radiol. 2020;50(5):706–14. 10.1007/s00247-019-04601-3. [DOI] [PubMed] [Google Scholar]
- 38.Servaes S, Zurakowski D, Laufer MR, Feins N, Chow JS. Sonographic findings of ovarian torsion in children. Pediatr Radiol. 2007;37(5):446–51. 10.1007/s00247-007-0429-x. [DOI] [PubMed] [Google Scholar]
- 39.Spigland N, Ducharme JC, Yazbeck S. Adnexal torsion in children. J Pediatr Surg. 1989;24(10):974–6. 10.1016/s0022-3468(89)80195-2. [DOI] [PubMed] [Google Scholar]
- 40.Trinci M, Danti G, Di Maurizio M, et al. Can contrast enhanced ultrasound (CEUS) be useful in the diagnosis of ovarian torsion in pediatric females? A preliminary monocentric experience. J Ultrasound. 2021;24(4):505–14. 10.1007/s40477-021-00601-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pillot R, Hossu G, Cherifi A, et al. Contribution of contrast-enhanced ultrasound in the diagnosis of adnexal torsion (AGATA): protocol for a prospective comparative study. BMJ Open. 2023;13(8):e073301. 10.1136/bmjopen-2023-073301. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analyzed during this study are included in this published article and its supplementary information files. The datasets used for meta-analysis are available from the corresponding author upon reasonable request.



