Skip to main content
BMC Medicine logoLink to BMC Medicine
. 2026 Mar 6;24:235. doi: 10.1186/s12916-026-04768-1

AI-based quality control was associated with improved fetal ultrasound image quality in low-resource settings: a real-world multicenter study from West China

Jianxin Zhao 1,2,#, Yao Tang 3,#, Shengli Li 4, Ke Wang 1,2, Jing Tao 1,2, Chunyi Chen 1,2, Jiayuan Zhou 1,2, Lang Cui 1,2, Yuji Wang 1,2, Cheng Huang 1,2, Zheng Liu 1,2, Hong Kang 1,2, Jun Zhu 1,2,, Yong Huang 5,
PMCID: PMC13077844  PMID: 41787418

Abstract

Background

Despite rapid advances in medical artificial intelligence (AI), robust evidence for real-world clinical application—particularly in low-resource settings (LRS)—remains limited. To address this gap, we conducted a multicenter evaluation of an AI-based quality control (AI-QC) system for fetal ultrasound images across hospitals in Guizhou Province, China.

Methods

We implemented an independent, post-examination AI-QC system in Guizhou. After image uploaded, the system assigns a 0–100 score and classifies images as standard (≥ 80), basic-standard (60–79), or non-standard (< 60). From September 2020 to May 2025, we prospectively collected ultrasound examinations uploaded by sonographers. Examinations were categorized into four types according to national guideline: first-trimester scan (2 planes), basic biometry scan (3 planes), limited anomaly scan (11 planes), and standard anomaly scan (23 planes). First-trimester and standard anomaly scans represent the highest technical demands. Quality was assessed at two levels: examination level (proportion of required images per examination classified as standard; 100% defined as full-standard); and plane level (proportion of images for a given view classified as standard). Primary outcomes were temporal trends in these two measures.

Results

We analyzed 61,959 examinations (551,144 images) from 186 sonographers at 34 hospitals. Over 36 months, the combined proportion of first-trimester and standard anomaly scans increased from 33.1% to 66.8% (p < 0.0001). The proportion of full-standard examinations increased significantly across all categories: first-trimester scans from 39.5% to 82.1%, basic biometry from 46.3% to 65.5%, limited anomaly from 29.2% to 58.8%, and standard anomaly scans from 16.1% to 53.3% (all p < 0.0001). By 18–24 months post-deployment, most counties surpassed a 60% examination-level standardization threshold; for example, for first-trimester scans, the proportion of counties with mean rates ≥ 60% increased from 31.6% to 68.4% (p for trend < 0.0001). At the plane level, representative views showed improvement; for example, standard transthalamic plane images increased from 91 to 97% (p for trend < 0.0001), accompanied by marked reductions in common deficiencies.

Conclusions

AI-based quality control was associated with improved image quality in LRS, with sustained improvements over time. Future studies linking image quality to diagnostic performance and perinatal outcomes are needed to establish clinical benefit.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12916-026-04768-1.

Keywords: Artificial Intelligence, Quality Control, Fetal Ultrasound, Low-Resource Settings, AI in obstetrics

Background

Fetal ultrasound is central to antenatal care, enabling early detection of fetal abnormalities, accurate gestation dating, and timely clinical decision-making [13]. However, in many low-resource settings (LRS), image quality remains suboptimal due to limited availability of well-trained personnel and the absence of systematic quality assurance mechanisms [4, 5]. These limitations may hinder the effectiveness of prenatal care and contribute to preventable maternal and neonatal morbidity [6].

Artificial intelligence (AI) has emerged as a promising tool to address these challenges by automating image quality assessment and providing instant feedback to sonographers [79]. AI-based systems can process large image volumes, with the potential to improve diagnostic reliability while reducing the burden on clinicians [1012]. However, despite growing interest, the clinical utility of AI in obstetrics remains largely unproven [13]. As noted by Drukker and colleagues, most published studies are single-center, retrospective, and focused on algorithmic performance rather than clinical outcomes or integration into real-world workflows [14]. Recent perspectives in medical AI research have further emphasized that meaningful studies should move beyond model performance to address genuine clinical needs and demonstrate feasibility in real-world practice [15], a gap that the present work seeks to fill.

Since 2020, an independent AI-based platform for fetal-ultrasound quality control has been deployed across county-level hospitals in Guizhou, China. After each examination, sonographers upload the plane set captured in routine practice; the system instantly assesses compliance with national standards and provides actionable feedback, supporting quality improvement efforts without requiring changes to clinical workflows. In this multicenter real-world observational evaluation (> 72,000 examinations from 49 hospitals), we examine temporal changes in examination distribution, image standardisation, and common causes of non-standard images in an LRS.

Methods

Study design and setting

This multicenter real-world observational study was conducted in Guizhou Province, southwest China, a predominantly low-resource region. Guizhou ranks among the five provinces with the lowest per capita gross domestic product (¥58,700 [USD 8,260] in 2024), and over one quarter of its ~ 40 million residents live in poverty. Approximately 87% of the province is mountainous, and limited transportation infrastructure further constrains healthcare resources and accessibility [16].

Beginning in 2020, an AI-based quality control (AI-QC) system for fetal ultrasound was made available and phased in across all 88 district- and county-level maternal and child health hospitals in Guizhou Province. For this study, we identified 49 hospitals that actively used the system, as evidenced by uploading fetal ultrasound images for quality assessment.

In addition to the main dataset, we obtained complete (non-sampled) datasets from a subset of hospitals to provide contextual comparisons. Specifically, five of the hospitals equipped with AI-QC provided complete pre-use data within prespecified pre-deployment windows, allowing within-hospital comparisons of periods before versus during AI use. We also obtained complete data from one hospital that did not use the AI system and used it as a parallel non-AI comparator over corresponding calendar periods.

Participants and data collection

Between September 15, 2020, and May 22, 2025, sonographers at participating hospitals performed fetal ultrasound examinations as part of routine antenatal care. Data from all examinations uploaded to the AI platform during this period were prospectively collected. We applied the following exclusion criteria to ensure data reliability and consistency: (1) Fewer than 30 examinations performed by the sonographer; (2) AI system usage for less than 3 months or more than 36 months; (3) Failure of the AI system to identify uploaded images; (4) Incorrect file type submitted; and (5) Examinations conducted before November 2021 or after February 2025, due to low upload numbers. All sonographer and examination data were anonymised at the point of upload and stored in a secure cloud-based database for analysis.

AI-based quality control (AI-QC) system

The AI-QC system operates as an independent post-examination quality control platform and does not intervene in or alter routine clinical workflows. After completing an ultrasound examination, sonographers are required to upload a complete set of plane images for each examination, rather than individual images. Based on workload and local hospital policies, sonographers are expected to upload a variable number of complete examinations each month, typically ranging from a few to several dozen per scan type. Each examination is categorized into one of four types according to the national fetal ultrasound guidelines [17]: first-trimester scan, basic biometry scan, limited anomaly scan and standard anomaly scan in the second/third trimester, with 2, 3, 11, and 23 required quality-controlled images, respectively. The specific scan planes required for each category are detailed in Table 1. In the first trimester, sonographers are required to obtain the crown–rump length (CRL) and nuchal translucency (NT) planes, which demand a high level of skill and precision. In the second and third trimesters, examinations are classified as 3 types: (1) basic biometry scan, typically performed for routine fetal biometry, requires relatively few planes; (2) limited anomaly scan, a simplified structural screening that focuses on detecting major or lethal anomalies and is often adopted in settings with constrained medical resources; and (3) standard anomaly scan, usually conducted around 20–24 weeks, involves a comprehensive assessment with numerous required planes and represents the highest level of scanning difficulty for sonographers.

Table 1.

Required ultrasound scan planes for AI-based quality control, categorized by examination type according to the 2012 national guidelines

Number of scans Planes
first-trimester scan 2

• Nuchal translucency (NT) plane

• Crown–rump length (CRL) plane

basic biometry scan 3

• Transthalamic plane

• Transverse section of the upper abdomen

• Longitudinal view of the femur (femoral diaphysis length)

limited anomaly scan 11

1. Brain

• Transthalamic plane

• Transcerebellar plane

2. Heart

• Four-chamber view of the heart (including lungs)

3. Abdomen

• Transverse section of the upper abdomen

• Transverse view of the umbilical cord insertion site into the fetal abdomen

4. Urinary system

• Transverse view of the bladder with color Doppler (the fetal bladder and both kidneys should be identified)

• Transverse view of both kidneys

5. Spine

• Sagittal view of the cervical and thoracic spine

• Sagittal view of the lumbar and sacral spine

6. Limbs

• Longitudinal view of the femur (femoral diaphysis length)

7. Cervix/Placenta

• Longitudinal view of the cervix (placental position should be determined in relation to the cervix)

standard anomaly scan 23

1. Cranial/Brain

• Transthalamic plane

• Transventricular plane

• Transcerebellar plane

2. Face

• Coronal view of the mouth, lips, and nose

• Transverse view of both orbits (symmetrical and intact, interorbital distance ≈ one orbital diameter)

3. Heart

• Four-chamber view of the heart (including lungs and cardiac chambers)

• Left ventricular outflow tract (LVOT) view

• Right ventricular outflow tract (RVOT) view

4. Abdomen

• Transverse section of the upper abdomen

• Transverse view of the umbilical cord insertion site into the fetal abdomen

5. Urinary system

• Transverse view of the bladder with color Doppler (fetal bladder and both kidneys identified)

• Transverse view of both kidneys

6. Spine

• Sagittal view of the cervical and thoracic spine

• Sagittal view of the lumbar and sacral spine

7. Limbs

• Longitudinal view of the femur × 2 (femoral diaphysis length)

• Coronal view of the tibia and fibula × 2

• Longitudinal view of the humerus × 2 (humeral diaphysis length)

• Coronal view of the radius and ulna × 2

8. Cervix/Placenta

• Longitudinal view of the cervix (placental position in relation to the cervix)

Previous validation studies of the AI-QC system, all led by Shengli Li and colleagues, have reported its accuracy and efficiency across diverse fetal imaging settings [18]. In a large-scale evaluation of 164,010 s- and third-trimester images from 64 hospitals in Shenzhen, China, the system showed high concordance with expert reviewers for both plane classification and image standardization, while reducing quality control time by over 20-fold [19]. For the mid-sagittal facial view in the first trimester, it achieved over 96% agreement with expert assessment and reduced evaluation time from hours to minutes [20]. Together, these findings are consistent with the platform being suitable for large, multicenter studies, such as the present investigation in Guizhou Province.

The system is accessible via both a web-based interface (https://www.ddxx56.com) and a mobile application, enabling flexible use across diverse clinical settings (Fig. 1). Once images are uploaded, the system automatically assigns a quality score to each one, evaluating anatomical completeness, correct orientation, and image clarity. Scores below 60 indicate non-standard images, 60–79 indicate basic-standard, and ≥ 80 indicate standard. Unrecognized or missing required views are flagged as absent.

Fig. 1.

Fig. 1

Illustration of the AI-based quality control interface for prenatal ultrasound. A Example of an original ultrasound image (left; transthalamic plane) and corresponding AI-generated quality control result (right). Key anatomical landmarks are automatically detected and displayed with measurement values and individual quality scores (maximum score = 1.0). B Text-based quality control report showing whether required anatomical structures were displayed (left) and the overall cut-plane evaluation (right), including the total quality score and classification (standard (≥ 80), basic standard (60–79), or non-standard (< 60)). C Reference images for the transthalamic plane, including a representative standard ultrasound image (left) and an anatomical schematic (right). When a scan is classified as non-standard, sonographers can use these references to identify deficiencies and improve subsequent examinations

Results are generated promptly after upload, enabling sonographers to review image quality and access system-generated explanations for non-standard classifications (e.g., incorrect landmarks or suboptimal resolution). The system also provides reference standard images and schematic diagrams for comparison, supporting timely feedback, continuous improvement, and self-directed learning (Fig. 1).

Of note, examinations may contain more images than the minimum guideline-required plane set because sonographers often store additional optional views and/or multiple attempts for the same target plane in routine practice. Standardisation metrics are calculated only for the guideline-defined target planes and, when multiple images map to the same target plane, the system de-duplicates and retains the highest-scoring image for calculation. For example, the mid- and third-trimester standard anomaly scan requires 23 target planes. In routine practice, the number of stored images per examination can be substantially higher than this minimum.

Quality assurance of AI assessments

An appeal mechanism is embedded in the AI-QC platform (Additional file 1: Fig. S1). If a sonographer considers the AI plane identification or quality score incorrect, an appeal can be submitted and is adjudicated within the system by designated quality-control personnel. For successful appeals (AI error), the reviewer manually corrects the relevant assessment in the platform (including the score) using the same predefined scoring criteria; for unsuccessful appeals (AI correct), the appeal is rejected with a documented reason and the AI-generated score remains unchanged.

Outcomes

The primary outcomes were assessed at two levels using the AI threshold for standard image (score ≥ 80):

  1. Examination level – the standardisation rate per examination, defined as the number of required images scoring ≥ 80 divided by the total number of required images. Examinations with 100% of required images scoring ≥ 80 were classified as fully standardised. Example: in a standard anomaly scan examination requiring 23 images, if 20 were classified as standard, the examination-level standardisation rate is 87% (20/23).

  2. Plane level – the standardisation rate per plane, defined as the number of images of a given plane scoring ≥ 80 divided by all images of that plane.

Temporal patterns in these two measures were described across ordered time intervals (quarters or 6-month groups). As a descriptive geographical summary, for each interval we also reported the county-level mean examination-level standardisation rate (counties defined by hospital location) and the number of counties with mean standardisation rates (examination level) < 60% versus ≥ 60%.

Secondary outcomes included the distribution of reasons for non-standard classification at the plane level. For each plane and interval, the proportion for a given issue was calculated as: (images labelled with that issue) ÷ (all images of the same plane). Example: if 100 transthalamic-plane images were uploaded and 5 are non-standard, of which 3 are due to “unclear cavum septi pellucidi”, then the proportion for this issue is 3% (3/100).

Statistical analysis

Descriptive statistics were used to summarise demographic and procedural characteristics. Continuous variables are presented as medians with 5th and 95th percentiles, and categorical variables as frequencies and percentages. The proportions of standard, basic standard, and non-standard images were described across ordered time intervals. Comparisons of baseline characteristics between counties with and without AI system implementation were performed using the Wilcoxon rank-sum test. Changes over time in the proportion of standard images were assessed using the Cochran–Armitage χ2 test for trend, with time grouped into quarters or 6-month intervals relative to the start of the AI-QC system. A two-sided p value < 0.05 was considered statistically significant. Data analysis was performed using JMP Pro version 18 (SAS Institute, Cary, NC, USA).

Results

Examination characteristics

Between September 2020 and May 2025, we collected data on 72,373 fetal ultrasound examinations performed by 255 sonographers across 49 delivery facilities (Fig. 2). A total of 10,414 examinations were excluded for the following reasons: (1) fewer than 30 examinations uploaded by the sonographer; (2) AI system use for < 3 months or > 36 months; (3) failure of the AI system to identify uploaded images; (4) incorrect file type uploaded; or (5) examinations conducted before November 2021 or after February 2025 due to low upload volumes during these periods. Finally, 61,959 examinations contributed by 186 sonographers from 34 hospitals were analyzed. The geographic distribution of participating facilities is shown in Fig. 3.

Fig. 2.

Fig. 2

Study flowchart of ultrasound examinations included in the analysis

Fig. 3.

Fig. 3

Geographic distribution of the 34 hospitals equipped with AI-based quality control system in Guizhou, China. The left panel shows the location of Guizhou Province within China. The right panel highlights the counties (in red) where hospitals participated in the study and uploaded ultrasound images to the AI platform

As summarized in Table 2, there were no statistically significant differences in either county-level or hospital-level characteristics between hospitals equipped with AI-based quality-control (AI-QC) and those without AI-QC. Among hospitals equipped with AI-QC, ultrasound equipment came from multiple manufacturers with no single brand predominating, supporting the generalisability of the AI-QC system across heterogeneous devices; sonographers had a median age of 36 years and were predominantly bachelor’s educated, with most holding intermediate- or junior-level titles, consistent with county-level low-resource settings (LRS). A total of 551,144 images were identified by the AI-QC system. The median number of uploaded images per sonographer was 202 (33–809), with a median participation duration of 24 months (10–39). The first-trimester scan group accounted for 26.2% of cases, followed by 36.6%, 20.9%, and 16.2% in the three mid- and third-trimester groups (basic biometry scan, limited anomaly scan, and standard anomaly scan). The median number of images per case was 2 (2–29), 8 (3–26), 55 (12–100), and 52 (25–137), respectively. Where denominators were available from three hospitals, upload coverage (uploaded/total obstetric ultrasound examinations) ranged from 6.0% to 37.5% across hospitals and years (Additional file 1: Table S1).

Table 2.

Baseline characteristics of hospitals, sonographers, and uploaded fetal ultrasound examinations

Hospitals equipped with AI based quality-control (AI-QC) system Hospitals without AI-QC system P value
County-level characteristics (counties where hospital are located)
 Counties, n 30 58
 Population, thousand 306.3 (147.7–1118.7) 340.1 (125.5–1048.2) 1.0000
 GDP, CNY 100 million 143.8 (54.9–670.0) 157.0 (48.7–1014.3) 0.7782
 GDP per capita, thousand CNY 42.5 (30.4–90.1) 45.7 (26.7–129.7) 0.3859
 Area, km2 1881.1 (954.2–3451.1) 1844.7 (266.5–4059.1) 0.5822
Hospital-level characteristics
 Hospitals, n 34 54
 Hospital level, n (%) 0.5587
 Tertiary 3 (8.8%) 3 (5.6%)
 Secondary or below 31 (91.2%) 51 (94.4%)
 Open bedsᵃ 100 (38–300) 100 (27–1317) 0.2723
 Total staffᵇ 151 (97–4747) 106 (28–1567) 0.0162*
 Building area, m2 c 12,800 (3,153–55,678) 16,253 (1,776–90,748) 0.6517
 Annual deliveriesd 893 (99.7–8,935.9) 2,097 (48–12,473) 0.0721
 Fetal ultrasound examinations (annual)e 6,200 (891–36,521) Not available
Ultrasound equipment and sonographers (Hospitals with uploaded images only)
 Ultrasound equipment brands, n (% of machines)ᶠ NA
 GE Healthcare (USA) 9 (21.4%)
 Mindray (China) 9 (21.4%)
 SonoScape (China) 7 (16.7%)
 Philips Healthcare (Netherlands) 5 (11.9%)
 Othersj 12 (28.6%)
 Total 42 (100.0%)
 Sonographer age, yearsᵍ 36 (27–53) NA
 Education level, n (%)ʰ NA
 Master’s degree 4 (2.8%)
 Bachelor’s degree 122 (84.1%)
 Below bachelor’s degree level 19 (13.1%)
 Professional title, n (%)ⁱ NA
 Senior-level 26 (17.9%)
 Intermediate-level 58 (40.0%)
 Junior-level 49 (33.8%)
 None 12 (8.3%)
Uploaded examinations and images (Hospitals with uploaded images only)
 Number of uploaded cases, n (%)
 First-trimester scan 16,249 (26.2%) - -
 Mid- & third-trimester—basic biometry scan 22,705 (36.6%) - -
 Mid- & third-trimester—limited anomaly scan 12,959 (20.9%) - -
 Mid- & third-trimester—standard anomaly scan 10,046 (16.2%) - -
 Total 61,959 (100.0%)
 Number of uploaded images 551,144
 Number of sonographers 186 - -
 Number of uploaded images (per sonographer) 202 (33–809) - -
 Months (per sonographer) 24 (10–39) - -
 Number of scan per casek - -
 First-trimester (2 images) 2 (2–29) - -
 Mid- & third-trimester—basic biometry scan (3 images) 8 (3–26) - -
 Mid- & third-trimester—limited anomaly scan (11 images) 55 (12–100) - -
 Mid- & third-trimester—standard anomaly scan 52 (25–137) - -

Abbreviations: GDP gross domestic product, NA not available

Data are shown as median (P5, P95) for continuous variables; P values are shown only where both groups had data; “—” indicates not applicable

ᵃData available: hospitals with uploaded images n = 16; hospitals without uploaded images n = 21

ᵇData available: hospitals with uploaded images n = 22; hospitals without uploaded images n = 28

ᶜData available: hospitals with uploaded images n = 17; hospitals without uploaded images n = 23

ᵈData available: hospitals with uploaded images n = 20; hospitals without uploaded images n = 17

ᵉData available only for hospitals with uploaded images (n = 13)

ᶠMachine brand data available from 16/34 hospitals with uploaded images; total number of machines n = 42

ᵍSonographer age data: n = 144 from 24/34 hospitals with available data

ʰEducation level data: n = 145 from 24/34 hospitals with available data

ⁱProfessional title data: n = 145 from 24/34 hospitals with available data

jOthers include Canon Medical/ALOKA (Japan), Hitachi Healthcare (Japan), Siemens Healthineers (Germany), Samsung Medison (South Korea), Sharp Medical Systems (Japan), and SIUI/Apogee (China)

kExaminations may contain more images than the minimum guideline-required plane set because sonographers often store additional optional views and/or multiple attempts for the same target plane in routine practice. Standardisation metrics are calculated only for the guideline-defined target planes and, when multiple images map to the same target plane, the system de-duplicates and retains the highest-scoring image for calculation. For example, the mid- and third-trimester standard anomaly scan requires 23 target planes. In routine practice, the number of stored images per examination can be substantially higher than this minimum

*P values < 0.05

Additional file 1: Fig. S2 describes temporal trends in AI-QC platform usage. The number of hospitals equipped with AI-QC decreased from 34 in months 1–3 to 25 in months 21–24, and then to 12 by the end of follow-up. A similar pattern was observed at the sonographer level, with a gradual decline in active users over time. Average uploads per active user were relatively stable in the early and mid-study periods, and were lower toward the end of follow up.

Changes in examination type distribution

Figure 4 summarises changes in the examination mix over time. In Fig. 4A (absolute counts), uploads of basic biometry scans and limited anomaly scans decreased, whereas first-trimester scans and standard anomaly scans showed smaller decreases. In Fig. 4B (proportions), the distribution shifted accordingly: basic biometry scans and limited anomaly scans decreased from 44.5% to 25.2% and from 22.4% to 8.0%, while first-trimester scans and standard anomaly scans increased from 23.8% to 28.8% and from 9.3% to 38.0% (p for trend < 0.0001). Overall, the case mix toward first-trimester and standard anomaly scan examinations, which typically require higher technical proficiency.

Fig. 4.

Fig. 4

Temporal trends in the distribution of ultrasound examination types. A Top: Number of uploaded examinations by scan type (first trimester, basic biometry scan (Grade I), limited anomaly scan (Grade II), standard anomaly scan (Grade III)) across all hospitals. Bottom: Number of uploaded examinations by hospital (n = 34). B Proportional distribution of examination types over the same period. Trend analysis using the χ.2 test for trend demonstrated statistically significant changes in the proportional distribution over time (p < 0.0001)

Changes in image quality over time

Image quality metrics increased over time across all examination categories (Fig. 5). In first trimester scans, the proportion of full standard examinations (both required planes scoring ≥ 80) increased from 39.5% in months 1–3 to 82.1% in months 21–24 (2.1-fold). In basic biometry scans, it increased from 46.3% to 65.5% (1.4-fold); in limited anomaly scans, from 29.2% to 58.8% (2.0-fold); and in standard anomaly scans, from 16.1% to 53.3% (3.3-fold) (p for trend < 0.0001 for all). Later intervals (27–36 months) contained fewer uploads, resulting in greater variability in the estimated proportion. To assess potential survivorship bias due to declining hospital participation, we performed retention-stratified sensitivity analyses. Improvements in full-standard examinations were observed among hospitals retained for < 1 year, 1–2 years, and 2–3 years of active uploading (Additional file 1: Fig. S3). When the definition of a full-standard examination was relaxed from 100% to ≥ 90% of required planes scoring ≥ 80, the temporal trends were similar to those observed under the primary 100% definition for both the limited and standard anomaly scans (Additional file 1: Fig. S4).

Fig. 5.

Fig. 5

Quarterly distribution of image standardisation levels by scan type over the 36-month study period. Each panel presents a mosaic plot illustrating the quarterly distribution of examinations, grouped by the number of standard-quality images (score ≥ 80) within each case. A First-trimester scans (2 required images), with three possible outcomes: both images standard (green), one standard (red), or none standard (blue). BD Second- and third-trimester scans requiring 3 (basic biometry scan), 11 (limited anomaly scan), or 23 (standard anomaly scan) images, respectively, with outcomes categorized by the proportion of standard images achieved. The width of each bar represents the number of cases in that quarter, and the height of each segment represents the proportion of cases in each category. Across all examination types, significant improvements were observed over time (χ.2 test for trend, p < 0.001)

Using additionally obtained complete (non-sampled) data from five hospitals equipped with AI-QC, in first-trimester scans the proportion of fully standardised examinations was similar in the prespecified pre-use windows (6–7 months before AI implementation vs within 2 months before implementation) but was higher after the start of AI system use (Additional file 1: Fig. S5); in one hospital without AI-QC, trends over corresponding calendar periods did not show a similarly sustained increase (Additional file 1: Fig. S6). Finally, to assess whether selective uploading could fully explain our findings, we performed a consistency check using complete (non-sampled) uploads from one hospital equipped with AI-QC (Additional file 1: Fig. S7), which showed trends consistent with the overall results.

Figure 6 maps county-level examination-level standardisation by 6-month intervals after the start of AI-QC system use (counties defined by hospital location). Across all examination categories, county-level examination-level mean standardisation rates generally increased over time. In first-trimester scans–typically among the most technically demanding protocols–the proportion of counties with a mean standardisation rate < 60% decreased from 68.4% (13/19) at 0–6 months to 31.6% (6/19) at 18–24 months (p for trend < 0.0001). Basic biometry, limited anomaly and standard anomaly scans showed similar upward patterns, with most counties reaching ≥ 60% by 18–24 months (p for trend < 0.0001 for each category). These increases were apparent within two years; by 18–24 months, most counties had reached relatively high levels of standardisation.

Fig. 6.

Fig. 6

Regional changes in image quality standardisation following AI implementation. Each map shows the proportion of examinations meeting quality standards (score ≥ 80) across counties in Guizhou Province, displayed at 6-month intervals. Colors represent the proportion of standard-quality examinations in each county, ranging from low compliance (blue) to high compliance (red). Rows correspond to examination categories: first-trimester scans (top row) and second- and third-trimester scans at increasing levels of protocol complexity (basic biometry scan, limited anomaly scan, standard anomaly scan)

Trends in common causes of non-standard images

Of 551,144 uploaded images, 358 (0.065%) were submitted for adjudication through the in-platform appeal mechanism (Additional file 1: Table S2). As shown in Additional file 1: Fig. S8, among four representative planes, the proportion of successful appeals (adjudicated AI error) ranged from 11.1% for the NT plane to 42.1% for the transverse upper-abdomen plane, with intermediate proportions for the CRL measurement plane (23.1%) and the trans-thalamic axial plane (35.0%). Overall, the low appeal frequency provides pragmatic, real-world evidence supporting the accuracy of AI assessments during routine deployment.

At the plane level, we analysed standardisation rates (proportion scoring ≥ 80) for four representative planes–CRL, NT, transthalamic, and transverse abdominal. Across all four planes, the proportion of standard images increased over time, with parallel decreases in basic-standard (60–79), non-standard (< 60), and unidentified (unrecognised/missing) labels (Fig. 7; p for trend < 0.0001 for each plane). The largest increases were observed for the first-trimester planes (CRL and NT), whereas the transthalamic and transverse abdominal planes, which started with higher baseline proportions of standard images, showed smaller incremental increases over time.

Fig. 7.

Fig. 7

Temporal changes in image quality across four representative ultrasound scan planes.The four panels show the proportion of examinations classified as standard (score ≥ 80, blue), basic standard (60–79, red), non-standard (< 60, green), or unidentified (purple) over 36 months. Scan planes include the crown–rump length (CRL) and nuchal translucency (NT) in the first trimester, and the transthalamic and transverse abdominal planes in later trimesters

Additional file 1: Table S3 summarises plane-specific quality issues. For the transthalamic plane– which involves multiple quality-control points–the most frequent non-standard reasons were unclear cavum septi pellucidi (1.95%), unclear Sylvian fissure (1.60%), cerebellum appears (0.98%), thalamus unclear (0.53%), and midline unclear (0.30%). The proportions for these issues decreased across ordered intervals (Fig. 8A; p for trend < 0.0001). Similarly, the proportions of images without any flagged issue increased from 91% at 0–6 months to ≥ 96% in the final intervals (Fig. 8B; p for trend < 0.0001). Comparable decrease in common non-standard reasons were observed for the CRL, NT, and transverse abdominal planes (Additional file 1: Fig. S9; p for trend < 0.0001).

Fig. 8.

Fig. 8

Trends in common causes of non-standard images in the transthalamic plane. A Quarterly trends in the proportion of examinations with specific quality issues, including unclear cavum septi pellucidi, Sylvian fissure, thalamus, or midline. All categories showed progressive declines over time. B Proportion of transthalamic plane images without any problems, which increased steadily to exceed 96% by the end of follow-up

Discussion

In this large, multicenter, real-world study conducted in Guizhou Province—one of China’s less economically developed regions—use of an AI-based quality control (AI-QC) system for fetal ultrasound was associated with sustained increases over time in image quality standardization across diverse hospitals and sonographers. Over three-year, the proportion of fully standardised examinations increased across all categories, with the largest increases observed in standard anomaly scans—among the most technically demanding protocols—and increases also observed across all scan types in the second and third trimesters. Notably, these increases were observed within two years after the start of AI-QC system use, suggesting potential applicability in low-resource settings (LRS). These findings align with recent calls in the broader medical AI field to move beyond retrospective model performance and to report real-world feasibility and implementation experience, and they highlight that our study focuses on practical use at scale rather than technical accuracy alone [15].

Most prior studies have focused on validating the performance of AI models for fetal ultrasound—such as biometric measurement or quality control—rather than describing their deployment and use within clinical workflows in LRS. For instance, Taksoee-Vester et al. developed and prospectively validated a deep-learning model for fetal echocardiography quality assessment, reporting high accuracy but within a narrow diagnostic domain [21]. Similarly, Liu et al. applied deep learning to automate quality assessment of fetal nuchal translucency images in the first trimester, reporting excellent performance, but again within controlled technical settings [22]. He et al. (2025) introduced a vision-language model for blind-sweep ultrasound data, reporting robust performance in image quality assessment, yet still focusing primarily on algorithmic accuracy rather than clinical implementation [23]. While Tan et al. (2024) conducted a cost-effectiveness study comparing AI-based versus manual quality control, it was limited to a single-center and short-term outcomes, underscoring the need for broader multicenter evaluations [9]. By contrast, our study reports a multi-site deployment of an AI-QC platform across 34 hospitals with integration into routine workflows; over follow-up, image-standardisation metrics increased across all examination types. To our knowledge, this is among the large multicenter, long-term descriptions of AI workflow integration for prenatal ultrasound in an LRS.

The sustained increases over time observed in our study may be related to the system’s built-in feedback mechanism. After each upload, sonographers received immediate, case-specific reports highlighting reasons for non-standard classifications (e.g., incorrect landmarks, incomplete anatomy, poor resolution). Such real-time, actionable feedback could facilitate practice and skill development, consistent with a feedback-driven learning process. Related evidence have been reported in medical education and AI-supported ultrasound contexts. In telemedicine-based ultrasound training, AI-generated feedback has been associated with fast learning and reduced inter-operator variability through guided practice [24]. In primary care ultrasound settings, AI features that provide real-time actionable feedback have been associated with a flatter learning curve for general practitioners, and improved diagnostic confidence and image quality without adding training burden [25]. Moreover, a global survey of healthcare professionals reported higher confidence and perceived usability with immediate AI-assisted feedback in point-of-care ultrasound across diverse settings [26].

Over time, this feedback-driven cycle may have been associated with higher image quality standardisation and with a shift toward more complex protocols, potentially reflecting increased sonographer confidence. Notably, these changes occurred without additional infrastructure or major alterations to routine workflows, suggesting that AI-QC platforms may help support sustainably and scalably-building in LRS.

Geographic analysis showed broadly distributed increases in image-quality standardisation across hospitals, suggesting that the observed changes were not limited to facilities with higher performance. Hospitals with lower baseline standardisation also showed increases over time. Related patterns have been reported in other digital health initiatives, where province- or country-wide deployment of AI tools was associated with narrow performance gaps between facilities with different baseline capacities [27, 28]. Separately, tele-reviewed point-of-care obstetric ultrasound in rural Kenya was associated with improved image quality and reduced inter-clinician variability across sites [29]. More broadly, work on AI models intend to mitigate bias and support fairness in medical imaging—including applications to remote ultrasound and other diagnostic scans—suggests the AI-enabled tools may help narrow, rather than widen, regional disparities in diagnostic quality [30, 31]. Taken together, these observations suggest that such system may help reduce disparities in ultrasound service quality across regions and facilities and support more equitable maternal–fetal health services in LRS.

Nevertheless, several limitations should be acknowledged. First, because this was an observational study without randomisation or a contemporaneous control group, causal inference is not possible, and image quality could also have changed over time in facilities not using the AI-QC system due to secular trends, increased monitoring (Hawthorne effect), or natural skill acquisition. Although this limitation cannot be fully addressed in our design, we strengthened the temporal context by additionally incorporating complete pre-use data from five hospitals equipped with AI-QC and complete data from one hospital without AI-QC over corresponding calendar periods; however, residual confounding and other time-varying factors may still contribute. Second, image uploading to the AI-QC platform was voluntary, and sonographers may have preferentially submitted higher-quality images, which could overestimate temporal increases in standardisation. Although this potential bias cannot be fully excluded, we provide supporting evidence by reporting upload coverage in three hospitals with available denominators and by presenting an additional analysis from one hospital with complete (non-sampled) uploads, which showed temporal trends consistent with the overall results. Third, although prior studies have reported the system’s validity, we did not perform independent expert review for all uploaded images in this multicenter study; therefore, some misclassification by the AI assessment cannot be excluded. However, the platform’s built-in appeal-and-adjudication mechanism provides an additional layer of real-world human oversight for disputed cases, offering supportive evidence for the reliability of AI assessments during routine deployment. Fourth, participation declined over time, highlighting challenges in sustaining engagement. To address potential attrition bias, we conducted sensitivity analyses stratified by participation duration (≤ 1 year, 1–2 years, and 2–3 years). Trends were consistent across strata, and hospitals participating for 2–3 years continued to show sustained improvement over time. Finally, the study period (2020–2025) overlaps with the COVID-19 pandemic, which may have affected healthcare utilisation, training opportunities, and staffing. Importantly, implementation of the AI-QC platform was staggered across hospitals (Additional file 1: Fig. 10), and a proportion of sites initiated platform use after the major relaxation of COVID-19 control measures in late 2022 (18 of 34 AI hospitals). This may reduce the likelihood that observed trends were driven solely by a single pandemic phase. In this context, the AI quality-control tool may have provided continuous, standardised feedback when traditional training routes were constrained, particularly in LRS where opportunities for training, learning, and access to quality-control processes decreased during the pandemic.

Our analyses focused on AI-derived image quality and standardisation metrics and did not link these to clinical outcomes such as anomaly detection, missed diagnoses, or maternal–fetal health outcomes. Although image quality is a relevant surrogate endpoint—because acquisition of standard planes and consistent measurements is a prerequisite for reliable interpretation [32, 33]—the present real-world dataset was generated through voluntary image uploads and was de-identified at upload, which precluded linkage to patient-level clinical records and follow-up outcomes. Before recommending large-scale adoption on the basis of clinical benefit, future clinical validation should explicitly evaluate diagnostic performance using outcome-linked datasets, including: (i) anomaly detection and missed-diagnosis assessment [34], by comparing prenatal ultrasound findings (before vs after implementation, or AI vs non-AI sites) against an independent reference standard such as postnatal diagnosis, surgical/pathology confirmation, or expert-adjudicated follow-up imaging, with predefined anomaly categories and blinded outcome adjudication; (ii) measurement-level benefit, such as the accuracy and reproducibility of ultrasound-estimated fetal weight and the resulting classification of Small for Gestational Age (SGA) [35]; and (iii) downstream care and perinatal outcomes [36], where feasible, including changes in referral patterns, follow-up imaging, and clinically relevant maternal–neonatal outcomes. Such evaluations would ideally be prospective, include complete denominators, and use linked clinical records to quantify benefit beyond image-quality improvements.

Conclusions

In conclusion, in this observational, multicenter, real-world study conducted in a resource-limited region, use of an AI-based quality control platform for fetal ultrasound was associated with sustained improvements in image-quality standardisation over time, with increases observed within 2–3 years of implementation. These findings reflect real-world implementation experience and should be interpreted as associative rather than causal. Future studies incorporating appropriate control groups and outcome-linked clinical data are needed to determine whether improvements in image quality translate into measurable diagnostic and perinatal benefits.

Supplementary Information

12916_2026_4768_MOESM1_ESM.docx (10.8MB, docx)

Additional File 1: Figures S1-S10, Tables S1-S3. FigS1-[Case example of the embedded appeal mechanism for quality assurance of AI-based ultrasound assessments]. FigS2-[Number of uploaded cases every three months by hospital and sonographer(c)]. FigS3-[Sensitivity analysis of quarterly image standardisation levels stratified by hospital retention duration on the AI platform]. FigS4-[Sensitivity analysis using an alternative definition of full-standard examinations]. FigS5-[Within-hospital comparison before and during AI system use in five hospitals equipped with AI-based quality-control system]. FigS6-[Parallel comparator without AI-based quality-control system over corresponding calendar periods]. FigS7-[Quarterly distribution of image standardisation levels by scan type over the 36-month study period from one hospital with complete uploads]. FigS8-[Outcomes of sonographer appeals among appealed images for four representative ultrasound planes]. FigS9-[Trends in common causes of non-standard images across four key ultrasound scan planes]. FigS10-[Quarterly distribution of image uploads across 34 AI hospitals]. TabS1-[Upload coverage in three of 34 hospitals]. TabS2-[Distribution of images by plane type among 358 appealed planes]. TabS3-[Image quality issues identified by the AI-based quality control system in the four key fetal ultrasound planes].

Acknowledgements

We thank the participating sonographers from county-level hospitals in Guizhou Province, China, for their contributions to ultrasound examinations and data collection throughout the study. We also thank Professor Juan Liang for her guidance on the study design and manuscript revision.

Abbreviations

AI

Artificial Intelligence

AI-QC

AI-based Quality Control

CRL

Crown–Rump Length

LRS

Low-Resource Settings

NT

Nuchal Translucency

SGA

Small for Gestational Age

Authors’ contributions

JXZ, SLL, JZ and YH conceived and designed the study. YT, KW, JT, CYC, JYZ, LC, YJW, CH, ZL, and HK acquired or interpreted the data. JXZ and KW did the statistical analysis. JXZ and YT drafted the manuscript. All authors critically revised the manuscript for important intellectual content and approved the final version. YH and JZ supervised the study, accessed and verified the data, and had full access to all the data in the study. YH and JZ had final responsibility for the decision to submit for publication. All authors read and approved the final manuscript.

Funding

This study was funded by the National Key Research and Development Program of China (2024YFC2707000, 2024YFC2707003 and 2022YFC2704701).

Data availability

The clinical data and study protocol for this article can be obtained from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Declarations

Ethics approval and consent to participate

The study was approved by the Ethics Committee of West China Second University Hospital (No. 2019YFS0530). The requirement for informed consent was waived because this study used routinely collected clinical ultrasound data that were automatically anonymised upon upload to the AI-QC platform, with all personal identifiers removed. The research analysed only AI-generated quality scores and structured assessment outputs and did not involve manual review of identifiable ultrasound images. No additional procedures or interventions were introduced. The ethics committee determined that the study involved minimal risk to participants and therefore did not require individual informed consent.

Consent for publication

Not applicable.

Competing interests

Shengli Li has led the development and prior validation of the AI-based fetal ultrasound quality control system evaluated in this study, which may be perceived as an intellectual competing interest. All other authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jianxin Zhao and Yao Tang contributed equally to this work.

Contributor Information

Jun Zhu, Email: zhujun028@163.com.

Yong Huang, Email: 18394819902@163.com.

References

  • 1.Khalil A, Sotiriadis A, D’Antonio F, Da Silva CF, Odibo A, Prefumo F, et al. ISUOG Practice Guidelines: performance of third-trimester obstetric ultrasound scan. Ultrasound Obstet Gynecol. 2024;63(1):131–47. [DOI] [PubMed] [Google Scholar]
  • 2.Bilardo CM, Chaoui R, Hyett JA, Kagan KO, Karim JN, Papageorghiou AT, et al. ISUOG Practice Guidelines (updated): performance of 11–14-week ultrasound scan. Ultrasound Obstet Gynecol. 2023;61(1):127–43. [DOI] [PubMed] [Google Scholar]
  • 3.Salomon LJ, Alfirevic Z, Berghella V, Bilardo CM, Chalouhi GE, Da Silva CF, et al. ISUOG Practice Guidelines (updated): performance of the routine mid-trimester fetal ultrasound scan. Ultrasound Obstet Gynecol. 2022;59(6):840–56. [DOI] [PubMed] [Google Scholar]
  • 4.Kim ET, Singh K, Moran A, Armbruster D, Kozuki N. Obstetric ultrasound use in low and middle income countries: a narrative review. Reprod Health. 2018;15(1):129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Aliyu LD, Kurjak A, Wataganara T, De Sá RAM, Pooh R, Sen C, et al. Ultrasound in Africa: what can really be done? J Perinat Med. 2016;44(2):119–23. [DOI] [PubMed] [Google Scholar]
  • 6.Stanton K, Mwanri L. Global maternal and child health outcomes: the role of obstetric ultrasound in low resource settings. J Prev Med. 2013;1(3):22–9. [Google Scholar]
  • 7.He F, Wang Y, Xiu Y, Zhang Y, Chen L. Artificial intelligence in prenatal ultrasound diagnosis. Front Med (Lausanne). 2021;8:729978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chen Z, Liu Z, Du M, Wang Z. Artificial intelligence in obstetric ultrasound: an update and future applications. Front Med (Lausanne). 2021;8:733468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tan Y, Peng Y, Guo L, Liu D, Luo Y. Cost-effectiveness analysis of AI-based image quality control for perinatal ultrasound screening. BMC Med Educ. 2024;24(1):1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Recker F, Gembruch U, Strizek B. Clinical ultrasound applications in obstetrics and gynecology in the year 2024. J Clin Med. 2024;13(5):1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Horgan R, Nehme L, Abuhamad A. Artificial intelligence in obstetric ultrasound: a scoping review. Prenat Diagn. 2023;43(9):1176–219. [DOI] [PubMed] [Google Scholar]
  • 12.Cao X, Li B, Zhou Y, Cao Y, Yang X, Hu X, et al. Effectiveness and clinical impact of using deep learning for first-trimester fetal ultrasound image quality auditing. BMC Pregnancy Childbirth. 2025;25(1):375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Iftikhar P, Kuijpers MV, Khayyat A, Iftikhar A, De Sa MD. Artificial intelligence: a new paradigm in obstetrics and gynecology research and clinical practice. Cureus. 2020;12(2):e7124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Drukker L, Noble J, Papageorghiou A. Introduction to artificial intelligence in ultrasound imaging in obstetrics and gynecology. Ultrasound Obstet Gynecol. 2020;56(4):498–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jain SS, Mortazavi BJ, You SC, Yao X, Lam CSP, Elias P, et al. Moving Beyond the Model: Our Perspective on Meaningful AI Research in Cardiovascular Care. J Am Coll Cardiol. 2025;86(10):691–5. [DOI] [PubMed] [Google Scholar]
  • 16.Guizhou Province Bureau of Statistics. Statistical Communiqué of Guizhou Province on the 2024 National Economic and Social Development. 2025. https://stjj.guizhou.gov.cn/tjsj/tjfbyjd/202503/t20250322_87236410.html?utm_source=chatgpt.com. Accessed 2 Feb 2026.
  • 17.Chinese Ultrasound Doctor Association. Guideline on prenatal ultrasound examination 2012 edition. Chin J Med Ultrasound (Electronic Edition). 2012;9(7):574–80. [Google Scholar]
  • 18.Liang M, Li S, Wen H. EP02.24: Application of prenatal ultrasonic artificial intelligence quality control system in Shenzhen. Ultrasound Obstet Gynecol. 2023;62(S1):110. [Google Scholar]
  • 19.Ying T, Huaxuan W, Guiyan P, Dandan L, Xin W, Yao J, et al. Effectiveness of obstetric intelligent ultrasonic quality control system. Chin J Med Imaging Technol. 2022;38(9):1361–6. [Google Scholar]
  • 20.Wenlan H, Ying T, Guiyan P, Yi L, Qin Z, Yao J, et al. Artificial intelligence in the evaluation of the standard mid-sagittal view of fetal face in 11–13+6 weeks of gestation. Chin J Ultrasonogr. 2023;32(9):807–12. [Google Scholar]
  • 21.Taksoee-Vester CA, Mikolaj K, Bashir Z, Christensen AN, Petersen OB, Sundberg K, et al. AI supported fetal echocardiography with quality assessment. Sci Rep. 2024;14(1):5809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu L, Wang T, Zhu W, Zhang H, Tian H, Li Y, et al. Intelligent quality assessment of ultrasound images for fetal nuchal translucency measurement during the first trimester of pregnancy based on deep learning models. BMC Pregnancy Childbirth. 2025;25(1):741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.He D, Wang H, Yaqub M. Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings. arXiv preprint. 2025;arXiv:2507.22802.
  • 24.Daum N, Blaivas M, Goudie A, Hoffmann B, Jenssen C, Neubauer R, et al. Student ultrasound education, current view and controversies. Role of Artificial Intelligence, Virtual Reality and telemedicine. Ultrasound J. 2024;16(1):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Luntsi G, Ugwu AC, Nkubli FB, Emmanuel R, Ochie K, Nwobi CI. Achieving universal access to obstetric ultrasound in resource constrained settings: a narrative review. Radiography. 2021;27(2):709–15. [DOI] [PubMed] [Google Scholar]
  • 26.Wong A, Roslan NL, McDonald R, Noor J, Hutchings S, D’Costa P, et al. Clinical obstacles to machine-learning POCUS adoption and system-wide AI implementation (The COMPASS-AI survey). Ultrasound J. 2025;17(1):32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Brian W, Aline C-G, Stefan G, Nina RS. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob Health. 2018;3(4):e000798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Schwalbe N, Wahl B. Artificial intelligence and the future of global health. Lancet. 2020;395(10236):1579–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kim S, Fischetti C, Guy M, Hsu E, Fox J, Young SD. Artificial intelligence (AI) applications for point of care ultrasound (POCUS) in low-resource settings: a scoping review. Diagnostics (Basel). 2024;14(15):1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Davis MA, Lim N, Jordan J, Yee J, Gichoya JW, Lee R. Imaging artificial intelligence: a framework for radiologists to address health equity, from the AJR special series on DEI. AJR Am J Roentgenol. 2023;221(3):302–8. [DOI] [PubMed] [Google Scholar]
  • 31.Kocak B, Ponsiglione A, Romeo V, Ugga L, Huisman M, Cuocolo R. Radiology AI and sustainability paradox: environmental, economic, and social dimensions. Insights Imaging. 2025;16(1):88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Debbink MP, Son SL, Woodward PJ, Kennedy AM. Sonographic assessment of fetal growth abnormalities. Radiographics. 2021;41(1):268–88. [DOI] [PubMed] [Google Scholar]
  • 33.Boumeridja H, Ammar M, Alzubaidi M, Mahmoudi S, Benamer LN, Agus M, et al. Enhancing fetal ultrasound image quality and anatomical plane recognition in low-resource settings using super-resolution models. Sci Rep. 2025;15(1):8376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Buijtendijk MF, Bet BB, Leeflang MM, Shah H, Reuvekamp T, Goring T, et al. Diagnostic accuracy of ultrasound screening for fetal structural abnormalities during the first and second trimester of pregnancy in low-risk and unselected populations. Cochrane Database Syst Rev. 2024;5(5):Cd014715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Morris RK, Johnstone E, Lees C, Morton V, Smith G. Investigation and care of a small-for-gestational-age fetus and a growth restricted fetus (Green-top Guideline No. 31). BJOG. 2024;131(9):e31–80. [DOI] [PubMed] [Google Scholar]
  • 36.Piergianni M, Della Valle L, Khalil A, Rizzo G, Mappa I, Villalain C, et al. Perinatal and maternal outcomes of extremely early-onset fetal growth restriction (≤ 26 weeks): systematic review and meta-analysis. Ultrasound Obstet Gynecol. 2025. 10.1002/uog.70124. [DOI] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12916_2026_4768_MOESM1_ESM.docx (10.8MB, docx)

Additional File 1: Figures S1-S10, Tables S1-S3. FigS1-[Case example of the embedded appeal mechanism for quality assurance of AI-based ultrasound assessments]. FigS2-[Number of uploaded cases every three months by hospital and sonographer(c)]. FigS3-[Sensitivity analysis of quarterly image standardisation levels stratified by hospital retention duration on the AI platform]. FigS4-[Sensitivity analysis using an alternative definition of full-standard examinations]. FigS5-[Within-hospital comparison before and during AI system use in five hospitals equipped with AI-based quality-control system]. FigS6-[Parallel comparator without AI-based quality-control system over corresponding calendar periods]. FigS7-[Quarterly distribution of image standardisation levels by scan type over the 36-month study period from one hospital with complete uploads]. FigS8-[Outcomes of sonographer appeals among appealed images for four representative ultrasound planes]. FigS9-[Trends in common causes of non-standard images across four key ultrasound scan planes]. FigS10-[Quarterly distribution of image uploads across 34 AI hospitals]. TabS1-[Upload coverage in three of 34 hospitals]. TabS2-[Distribution of images by plane type among 358 appealed planes]. TabS3-[Image quality issues identified by the AI-based quality control system in the four key fetal ultrasound planes].

Data Availability Statement

The clinical data and study protocol for this article can be obtained from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.


Articles from BMC Medicine are provided here courtesy of BMC

RESOURCES