Intelligent quality control for first trimester ultrasound scans in Liaoning Province of China

Lizhu Chen; Fujiao He; Ting Lei; Sihong Wang; Yan Wang; Nan Wang; Yafei Yan; Hongning Xie; Ying Huang

doi:10.1038/s41746-025-02207-8

. 2025 Dec 6;9:35. doi: 10.1038/s41746-025-02207-8

Intelligent quality control for first trimester ultrasound scans in Liaoning Province of China

Lizhu Chen ^1,^2,^#, Fujiao He ^1,^#, Ting Lei ^3,^#, Sihong Wang ^1,^4,^#, Yan Wang ¹, Nan Wang ⁵, Yafei Yan ⁵, Hongning Xie ^3,^✉, Ying Huang ^1,^✉

PMCID: PMC12796406 PMID: 41353464

Abstract

The first-trimester (11 to 13 + 6 weeks) ultrasound examination offers an opportunity to detect fetal abnormalities and contributes to the first-trimester aneuploidy screening. The scan conducted according to standardized protocol can maximize its value. Here, we aim to develop the First Trimester Ultrasound Quality-control System (FUQS) for assessing the first-trimester ultrasound scans and further apply it for quality control practice in Liaoning Province, China. To develop and test FUQS, 41,968 images from 7251 pregnancies was collected. The system achieved high consistency with experts on view scoring (ICC 0.957, p < 0.001). In quality audit practice among the randomly selected 35 hospitals, the overall completion rate was 65.85% and the overall median score was 58.50. Tier 3 hospitals outperformed Tier 2 hospitals, while specialized hospitals outperformed general hospitals. Among the required standard views, the transventricular axial view showed best performance. In contrast, abdominal cord insertion view performed worst.

Subject terms: Health care, Medical research, Machine learning, Quality control

Introduction

An obstetrical ultrasound examination provides invaluable information about the fetus. In early pregnancy, it is important to confirm viability, establish gestational age accurately, determine the number of fetuses, and assess both chorionicity and amnionicity for a multiple pregnancy¹. However, with advancements in ultrasound resolution, there has been a gradual shift in the role of early pregnancy ultrasounds from primarily serving a screening function to also encompassing diagnostic capabilities for identifying structural abnormalities. A cohort study has indicated that the incidence rate of fetal structural abnormalities is approximately 3.0%, with 43.1% of these anomalies detectable in early pregnancy². The introduction of nuchal translucency (NT) aneuploidy screening during the 11 to 13 + 6 weeks further arouses interest in early anatomical scanning³. The ultrasound examination carried out according to a standardized protocol can maximize its value and increase the detection rate of fetal abnormalities^2,4. Therefore, the process of quality control for the early ultrasound scan between 11 and 13 + 6 weeks is essential for fetal defects detection at this stage.

However, the implementation of large-scale quality control faces substantial demands on manpower and time. Moreover, inter-observer variability frequently undermines the reliability of the results, presenting a further challenge to its popularization^5,6. In recent years, a growing body of researches have applied artificial intelligence (AI) to obstetric ultrasound, enabling the rapid and repeatable analysis of fetal ultrasound images through innovative methods. Studies have successfully applied AI to identify fetal structures, perform measurements of fetal biometry, and even make intelligent diagnoses^7,8. Besides, several methods for intelligent quality assessment of fetal ultrasound images have been proposed^9–11. Although promising, most existing studies are limited to mid- or late- pregnancy and focus only on biometric or cardiac planes. Consequently, they are inapplicable to first-trimester scans and insufficient for comprehensive intelligent quality control throughout the entire scanning process.

In this study, we collaborated with the ultrasound department of the First Affiliated Hospital of Sun Yat-sen University and AIRIGIN company to develop an First Trimester Ultrasound Quality-control System (FUQS) according to the recognized guideline in China, Chinese expert consensus on assessment standards of obstetric ultrasound training program (2022 edition)¹². This system was then implemented for the evaluation of first trimester ultrasound examinations in Liaoning Province.

Results

Image data for FUQS development and test

In total, 41,968 images from 7251 pregnancies were enrolled for the development and test of FUQS (Fig. 1). Of which, 33,697 images from 6854 pregnancies were collected for model development, including 30,326 images for training and 3371 images for internal testing. Additionally, 8271 images from 300 pregnancies were used for external testing. The details were displayed in Table 1.

Table 1.

Image distribution for the development and test of the FUQS

View Name	Training	Validation	Internal test	External test
Transventricular axial view	3471	458	437	294
Midsagittal view of fetus	2757	361	346	282
Nuchal translucency measurement view	2055	241	255	297
Four-chamber view	2577	343	325	267
Transverse abdominal view	2539	283	313	264
Abdominal cord insertion view	324	40	41	258
Long axis view of both upper limbs	2857	361	358	264
Long axis view of both lower limbs	1916	261	242	279
Other views	8431	1051	1054	6066
Total	26,927	3399	3371	8271

Open in a new tab

FUQS First Trimester Ultrasound Quality-control System.

Performance of FUQS

Identifying the anatomic structures accurately is a prerequisite for the quality audit. Based on the You Only Look Once version 5 (YOLOv5) network trained, we achieved good results in identifying structures from fetal ultrasound images, with a precision of 81.12%, recall of 78.43%, and mean average precision (mAP) of 0.8327 on the internal test dataset. In the task of multi-view clustering -defined as the simultaneous classification of multiple target views- FUQS achieved an overall accuracy of 96.05% based on experts’ decisions. The highest consistency (100%) was found in the transventricular axial view. However the lowest consistency (91.86%) occurred in the abdominal cord insertion view with the most common error being its confusion with the transverse abdominal view. Figure 2a visualized the target view clustering results in the external test dataset.

Fig. 2 — a The performance of FUQS in target view classification. The number on the abscissa axis represents classification of views based on experts’ decisions, while the columns of different colors indicate the specific view classification identified by FUQS. b The discrepancies in scoring between the FUQS and the average evaluations provided by the three experts. Each violin plot depicts the discrepancy between the FUQS and experts scores for a specific view or the overall score; the lines within the violin equate the data into four parts, with the thickest line representing the median value, and lower values indicate superior agreement. 1 Midsagittal view of fetus; 2 Nuchal translucency measurement view; 3 Transventricular axial view; 4 Four-chamber view; 5 Transverse abdominal view; 6 Abdominal cord insertion view; 7 Long axis view of both upper limbs; 8 Long axis view of both lower limbs; Total, the total score of an examination. FUQS First Trimester Ultrasound Quality-control System.

When comparing FUQS and manual scoring, we found a high correlation between FUQS and different experts, with an intraclass correlation coefficient (ICC) of 0.957 (P < 0.001). The ICC among the three experts was 0.955 (P < 0.001). We further created violin plots to narrate the differences between FUQS and the average assessment of the three experts quantitatively. We could see the difference value focused more on zero among the eight target views as well as the total score (Fig. 2b). It took an average of only three seconds for the FUQS to complete the assessment of one examination, significantly shorter than the time required by experts (an average of 63.50 s, P < 0.001).

The application of FUQS: quality audit in Liaoning Province

In total, 35 hospitals in Liaoning Province were randomly selected for the quality audit program. Of these, 22/35 (62.86%) were Tier 3 hospitals, and 13/35 (37.14%) were Tier 2 hospitals. Additionally, 29/35 (82.86%) were general hospitals, while 6/35 (17.14%) were obstetrics-specialized hospitals. The characteristics of the 35 hospitals were detailed in Supplementary Table 1. Thirty-five ultrasound examinations performed in December 2022 were later randomly selected from each hospital. Finally, a total of 1750 cases were selected in the audit.

The overall median number of images saved per case was 19 (Inter-quartile range (IQR) 13–25), with a significant variation among the 1750 cases (ranging from 2 to 87 images, P < 0.001). As shown in Table 2, the number of images saved differed significantly between Tier 3 and Tier 2 hospitals (P < 0.001) as well as between general and specialized hospitals (P < 0.001). Besides, the overall median score was 58.50 (IQR 45.00–72.00) among the hospitals. Tier 3 hospitals had a median score of 62.00 (IQR 47.00–75.00), higher than Tier 2 hospitals with a median score of 52.00 (IQR 42.00–66.63); while specialized hospitals had a median score of 75.00 (IQR 59.00–85.00), outperforming general hospitals, which had a median score of 56.00 (IQR 43.13–69.00) (Table 2 and Fig. 3a).

Table 2.

Details concerning the number of images and scan scores among different types of hospitals

	Images (median (IQR))	P	Score (median (IQR))	P
Total	19 (13, 25)	NA	58.50 (45.00, 72.00)	NA
Tier 2	19 (12, 22)	(P < 0.001)	52.00 (42.00, 66.63)	(P < 0.001)
Tier 3	20 (13, 26)	(P < 0.001)	62.00 (47.00, 75.00)	(P < 0.001)
Specialized	22 (18, 32)	(P < 0.001)	75.00 (59.00, 85.00)	(P < 0.001)
General	19 (13, 24)	(P < 0.001)	56.00 (43.13, 69.00)	(P < 0.001)

Open in a new tab

IQR inter-quartile range.

Fig. 3 — a The performance of first trimester ultrasound examinations among different types of hospitals. The color of the circle represents the scores while the size of the circle represents the completeness rate. b The pass rate and excellent rate of the overall views among all of the hospitals. c1–c8 The rate of unfinished points which contribute to the lower scores of target views. The blue bars present the the scoring points in the guideline while the yellow bars present the deduction of points in the guideline. 1 Midsagittal view of fetus; 2 Nuchal translucency measurement view; 3 Transventricular axial view; 4 Four-chamber view; 5 Transverse abdominal view; 6 Abdominal cord insertion view; 7 Long axis view of both upper limbs; 8 Long axis view of both lower limbs; 9 The total examination.

The overall scan completion rate for the required views was 65.85%. We further summarized the completion rates for individual views. The rates were higher for the midsagittal view of fetus (99.43%), the transventricular axial view (91.31%), and the NT measurement view (89.37%). In contrast, the completion rates were lower for the abdominal cord insertion view (32.69%), the four-chamber view (34.23%), and the transverse abdominal view (40.57%). As for view scores, among the required standard views, the transventricular axial view (median score 90.00 (IQR 90.00–100.00), excellent rate 82.00%, pass rate 89.89%), the NT measurement view (median score 90.00 (IQR 70.00–100.00), excellent rate 59.66%, pass rate 82.23%), and the midsagittal view of fetus (median score 80.00 (IQR 72.00–92.00), excellent rate 38.29%, pass rate 90.97%) showed better performance. In contrast, the performance of the abdominal cord insertion view (median score 0.00 (IQR 0.00–60.00), excellent rate 20.11%, pass rate 30.06%), the transverse abdominal view (median score 0.00 (IQR 0.00–80.00), excellent rate 12.69%, pass rate 40.11%), and the four-chamber view (median score 0.00 (IQR 0.00–86.67), excellent rate 25.20%, pass rate 28.23%) were relatively inferior. The summary was shown in Table 3 and Fig. 3b.

Table 3.

Details concerning the number of images and scan scores among different types of hospitals

Views	Completeness	Median score (IQR)	Excellent rate	Pass rate
Midsagittal view of fetus	99.43%	80.00 (72.00, 92.00)	38.29%	90.97%
Transventricular axial view	91.31%	90.00 (90.00, 100.00)	82.00%	89.89%
Nuchal translucency measurement view	89.37%	90.00 (70.00, 100.00)	59.66%	82.23%
Long axis view of both lower limbs	71.89%	50.00 (0.00, 80.00)	23.71%	39.94%
Long axis view of both upper limbs	67.31%	50.00 (0.00, 80.00)	22.29%	38.23%
Transverse abdominal view	40.57%	0.00 (0.00, 80.00)	12.69%	40.11%
Four-chamber view	34.23%	0.00 (0.00, 86.67)	25.20%	28.23%
Abdominal cord insertion view	32.69%	0.00 (0.00, 60.00)	20.11%	30.06%
Total	65.85%	70.00 (0.00, 92.00)	35.49%	54.96%

Open in a new tab

IQR inter-quartile range.

We conducted a further analysis of the specific structures that contributed to lower scores among the stored views. As shown in Fig. 3c1-c8, the reduced detection rates for the nasal bone/nasal tip (37.18% and 37.92%, respectively) contributed to the lower excellent rates for both the midsagittal view of fetus and the NT measurement view. Additionally, the lower detection rates for the ventricle and choroid plexus (37.18%) and instances of incorrect amplification (35.69%) were key factors in the lower excellent rate for the transventricular axial view. In the four-chamber view and the transverse abdominal view, suboptimal performance was observed in the visualization of the ventricular septum and the umbilical vein, with failure rates of 52.78% and 68.59%, respectively. Besides, nearly half of the limb views included only one side of the limbs.

Discussion

Birth defect is an important public health issue worldwide. The “Report on the Prevention and Control of Birth Defects in China (2012)” showed that the total incidence of birth defects in China was 5.6%¹³. Routine ultrasound examination is an established part of antenatal care for prevention of birth defects. Previously anatomical evaluation of the fetus is mainly performed in the mid-trimester. In the past two decades, with the technological advancements, ultrasound scanning in the first trimester has evolved to a level at which early fetal development can be assessed and monitored in detail. Towards the end of the first trimester (11 to 13 + 6 weeks), the scan offers an opportunity to detect gross fetal abnormalities, confirm gestational age, and measure NT in health systems that offer first-trimester aneuploidy screening¹. The 2013 ISUOG guidelines established systematic requirements for first-trimester fetal ultrasound scan¹. Following this, the China Medical Association also set forth guidelines for screening¹². Additionally, several NT-certified institutions highlight the importance of quality control during this period.

Although routine scanning is increasingly offered during the first trimester in many hospitals in China, there are significant differences in operator expertise. Simply increasing the number of scans does not necessarily enhance scanning quality; however, conducting continuous quality audits can improve operators’ skills, especially through the provision of personalized feedback¹⁴. This underscores the importance of quality audits in first trimester ultrasound screening. Nonetheless, not all hospitals perform regular quality audits. Some large-scale quality control programs, like the Nuchal Translucency Quality Review program, rely on median distributions of NT or crown-rump measurements to assess operator performance, which may not fully reflect the overall quality of scan images^15–17. Moreover, manual large-scale quality control is labor-intensive and time-consuming, with potential issues in reproducibility and reliance on subjective judgments.

In this study, following the established guidelines in China, we successfully developed an AI-driven quality control system for first-trimester ultrasound examinations. Furthermore, this system was effectively implemented for first-trimester ultrasound quality control in Liaoning Province. Our model provides a comprehensive evaluation of first-trimester ultrasound examinations, including detailed scoring of the ultrasound scan and a checklist to ensure completeness. Additionally, it represents a breakthrough in quality control practice by offering an objective method to assess numerous examinations with reduced time requirements. Moreover, the intelligent system could also be applied in the process of qualification and accreditation. As one of the largest provinces in China, Liaoning Province has a diverse range of hospitals practicing obstetric ultrasound, which presents challenge for maintaining audit quality. This study represents the quality audit of the first trimester ultrasound examinations conducted by an official organization, supported by local governments and health committees. With the assistance of FUQS, we gained a better understanding of the current situation regarding fetal scans.

As reported, increased fetal NT thickness during the first trimester is a common phenotypic expression of fetal chromosomal defects, structural abnormalities, and genetic syndromes¹⁸. The accuracy of the measurement critically depends on the quality of the ultrasound view; therefore, the NT measurement view is considered the most important during the examination. Despite a high completion rate of 89.37% for this critical view, lower detection rates of the nasal bone or nasal tip are the primary reasons for the reduced excellent rate. Additionally, the absence of fetal nasal bone is an independent characteristic associated with the first-trimester screening for trisomy 21^19,20. Therefore, it is crucial to emphasize the importance of nasal bone detection. The midsagittal view of fetus is another important view of the ultrasound examination at this stage, as the measurement of crown-rump length in this view is considered the most reliable method for establishing gestational age²¹. The midsagittal view of fetus achieved a high completion rate of 99.43%; however, the excellent rate was only 38.29%. This lower excellent rate is attributed to the failure to detect the nasal bone or nasal tip, as well as the inability to show the full length of the spine. Among the target views, the abdominal cord insertion view and the four-chamber view exhibited the lowest completion rate of 32.69% and 34.23% respectively, which should be taken seriously. The abdominal cord insertion view is important for observing the integrity of the anterior abdominal wall and diagnosing omphalocele, which may be associated with chromosomal abnormalities²². In addition, first-trimester ultrasound examination of the fetal heart allows for the identification of over half of the fetuses affected by major cardiac defects, moreover the use of color-flow Doppler in the four-chamber view assessment is independently associated with a significantly higher rate of detection (P < 0.001)²³. Therefore, enhancing the completeness of the four-chamber view significantly boosts the detection rate of major cardiac abnormalities in the first trimester. Moreover, our data reveals that Tier 3 hospitals outperformed Tier 2 hospitals, and specialized hospitals performed better than general hospitals. This underscores the need for Tier 3 hospitals to maintain their leadership role and for general hospitals to enhance their prenatal ultrasound scanning practices. As demonstrated, FUQS represents an effective tool for quality control in ultrasound practice. Building on these findings, future efforts will focus on broadening its impact through two primary pathways: geographical expansion to more regions for protocol optimization, and integration into clinical training to systematically enhance scanning quality following a structured feedback mechanism for participating hospitals and sonographers.

However, our quality control system still has some limitations. Firstly, all images used for training or testing were obtained from fetuses without structural defects, indicating that the tool’s efficacy in assessing fetuses with defects has not yet been evaluated. Future work should include fetuses with structural defects to assess the tool’s clinical applicability in broader prenatal screening contexts. Secondly, the guidelines utilized in our quality control system currently reflect basic standards for first trimester ultrasound examination, which are more widely recognized in China. Incorporating additional standards could further enrich the quality control practice. Furthermore, the model was trained and validated on datasets from limited geographic regions (Guangdong and Liaoning). Expanding the validation to include examinations from more diverse hospitals and populations could further assess its generalizability. Another limitation of the study is that only 10% of hospitals were stratified and sampled for inclusion in the quality audit, with only 50 examinations collected from each hospital to assess first trimester ultrasound performance. Increasing the sample size in subsequent studies could provide a more comprehensive evaluation.

In conclusion, we have successfully developed a systematic quality audit system for first trimester ultrasound examination, and it was applied to the quality control practice in Liaoning Province. The AI-based quality audit system provides rapid, objective, and repeatable evaluations. Beyond assessing overall performance, our model provides detailed scores for each view, thereby assisting in pinpointing weaknesses in scanning with reduced time consumption. Furthermore, providing feedback to participants may further enhance the quality of their scanning. The application of FUQS could serve as an effective tool for quality control practice and skill advancement elevation for physicians.

Methods

Study design and participants

In this multi-center study, we developed, tested, and applied an intelligent model, FUQS, designed to evaluate the quality of fetal ultrasound examinations in first trimester. Furthermore, we assessed the performance of first-trimester ultrasound scans of medical institutions at all levels in Liaoning Province, China, with the assistance of FUQS.

Fetal anatomical ultrasound images of participants at a gestational age of 11 to 13 + 6 weeks were collected from the First Affiliated Hospital of Sun Yat-sen University between January 2017 and December 2022 to train the FUQS. The external test dataset included fetal ultrasound images stored during routine clinical work at Shengjing Hospital of China Medical University from January 2022 to December 2022. All fetuses included were confirmed to have no evident morphological abnormalities after birth. The quality control survey was conducted in hospitals that provide prenatal screening services in Liaoning Province of China. Initially, we randomly selected 10% of Tier 3 and Tier 2 hospitals respectively for inclusion in the quality audit program. (Chinese hospitals are classified into three Tiers based on scale and technical capabilities, with Tier 1 hospitals not offering prenatal ultrasound screening). We then randomly selected 50 first-trimester (11 to 13 + 6 weeks) ultrasound examinations performed in December 2022 from each hospital. This study was done in accordance with the principles of the Declaration of Helsinki. The use of ultrasound images, clinical information, and data collection protocols was approved by the ethics committee of Shengjing Hospital of China Medical University (2017PS264K). As this study involved a retrospective analysis of pre-existing, de-identified data, patient consent was waived by the ethics committee. All data were handled with strict measures to ensure anonymity and protect patient privacy.

Procedures

Raw data were routinely saved during fetal ultrasound scans and extracted in JPEG format. Ultrasounds were performed transabdominally for pregnancies using various machines, including the GE Voluson 730 Expert/E6/E8/E10, Samsung UGEO WS80A, and Philips EPIQ7C. All images were obtained by various operators, each with more than five years of independent fetal diagnostic experience. We divided the image dataset into standard views and other views. Standard views included all the views required to be saved according to the practice guideline recognized across China¹², while other views comprised those not required to be saved by the guidelines but were frequently captured during the scanning process. The training images were assigned randomly into training, validation, and internal test dataset at a ratio of approximately 8:1:1. Then, the images were labeled using the annotation tool LabelImg. The data annotation process was conducted by a seven-person team consisting of four resident physicians (with five years of independent fetal diagnostic experience) for initial labeling, two superior physicians (with ten years of independent fetal diagnostic experience) to further check and adjust the labels, and one experienced physician (with 20 years of independent fetal diagnostic experience) to recheck the labels finally. All images (both standard and other views) were labeled with a total of 240 annotations in the form of rectangular bounding boxes, including irrelevant information such as machine parameters, image types (single or double views), image location (left or right view for the image with double views), and specific structures assigned to the model. To make the algorithm robust, the training dataset was further augmented threefold before training using CutMix, Crop, Paste, Horizontal/Vertical flip, Rotation, and HSV(Hue, Saturation, Value) color jitter. After augmentation, all images were resized from the original scales into the uniform size of 640 × 640 pixels.

The development of FUQS

The algorithm we developed was based on YOLOv5, a state-of-the-art object detection framework. YOLOv5 is one of the fastest algorithm for image recognition in the AI field and has been extensively studied in areas such as industry and agriculture²⁴. Although research on its application in medicine is relatively rare, it could prove to be a valuable tool for detecting and classifying tasks in fetal ultrasound images, offering rapid and accurate assessments²⁵. The input of the model was the fetal ultrasound images stored during a routine first-trimester ultrasound screening. In the first step, the model analyzed the images from the exam, detecting the key structures within each specific ultrasound images. The second step involved a logical network that utilized prior clinical knowledge (the guidelines) to infer possible view classification results along with corresponding scores. Specifically, the logical network matched each target view against all the eight standard views. This ensured that its key structures were captured in the detection network’s predictions, and the frame was classified as the view with the highest probability. For each image, a score was calculated by summing the scores of the detected structures according to the standard set by the guideline. Subsequently, the third step incorporated an additional view clustering method, a combinatorial optimization algorithm, to address the assignment problem for the target views. Specifically, the algorithm began with the initial image and iteratively optimizes the view clustering results to find the optimal view decision based on the scoring. Then the algorithm iteratively improves the matching by exploring augmenting paths until an optimal solution is reached, achieving maximum matching in a weighted bipartite selection. This effectively pairs items from two sets (target views and specific ultrasound images) to maximize the total score. For each examination, the total score was treated as the sum of each views. Finally, the output results included a checklist of the standard views, along with scores for each view and for the entire examination. The model generated a predictive score between 0 and the maximum value for each view, giving a total score for an integral scan of 100. Note that if some of the standard planes were missing during the ultrasound scan, the corresponding views were scored zero points due to the lack of matching views. The model is illustrated in Fig. 4. The network was trained with stochastic gradient descent using an initial learning rate of 0.01, a momentum of 0.93, and a weight decay of 3 × 10⁻⁴. The batch size was set to 312 on 8 NVIDIA GeForce RTX 2080Ti GPUs. The model was trained for 710 epochs, and the optimal model was selected by the loss of validation set of segmentation.

Fig. 4 — a The structure of the quality-control system with an example of a transventricular axial view; b The structure of YOLOv5 network.

FUQS, First Trimester Ultrasound Quality-control System.

Statistical analysis

The detection performance of the model was first evaluated on internal datasets using precision, recall, and mAP. Intersection over Union (IoU) was employed to quantify the spatial overlap between the ground truth annotation (A) and the model-predicted region (B). An IoU of 1 indicates perfect overlap between A and B, while an IoU of 0 indicates no overlap. In this study, the IoU threshold was set at 0.50. A detection was considered a true positive (TP) if the IoU between A and B exceeded 0.50; a false negative (FN) if the IoU was below 0.50; and a false positive (FP) if a prediction B had no corresponding ground truth A. Based on the counts of TP, FP, and FN, precision and recall were calculated as TP/(TP + FP) and TP/(TP + FN), respectively. The average precision (AP) was determined as the area under the precision-recall curve, and mAP was computed as the mean AP across all target views.

Regarding the multi-view clustering results, the reliability of the model was tested according to the expert’ assessment in the external test set. First, another three experts independently interpreted all the matching images selected by the model to assess the accuracy of the views clustering results. Then, the three experts further scored each scan independently according to the same standard (Supplementary Table 2)¹². The degree of agreement between the experts and FUQS was computed using ICC, with ratings of excellent (>0.90), good (>0.75), moderate (0.50–0.75), and poor (<0.50). Besides, Wilcoxon’s signed-rank test was used to detect differences in the time required for FUQS and expert ratings. The experts involved were certified sonographers, trained in fetal scanning, and had at least 20 years of independent clinical experience in fetal ultrasound diagnosis. All readers were masked to clinical information of the examinations.

For the intelligent evaluation of Liaoning Province, quantitative data were expressed as the median (IQR) and analyzed using the Mann–Whitney U-test, while categorical variables were presented as numbers and percentages and analyzed using the Chi-squared test. If the standard views were scanned, the task was complete, and the completion rate was the proportion of completed standard views to the total scanned views of guidance. The total score for an integral examination was 100, and each view had a different maximum score, so we converted the score of each view into a standardized score (dividing the view score by the maximum score and then multiplying by 100) when analyzing a specific view. A score above 85 was considered excellent, and a score above 60 was considered a pass. We further used the excellent rate and pass rate to assess the performance of the integral examination as well as the performance of different views.

All statistical analyses were performed using SPSS software version 26.0 (IBM, Armonk, NY, USA) and Prism software version 9.5.0. Differences at P < 0.05 were considered statistically significant.

Supplementary information

Supplementary Tables^{(61.8KB, pdf)}

Acknowledgements

This study is supported by the National Scientific Foundation Committee of China (82272022, 82171938), National Key Research and Development Program (2021YFC2701003), Young and middle-aged scientific and technological innovation talents in Shenyang (RC220070). We thank the thirty-five hospitals in Liaoning Province, Affiliated Hospital of Liaoning University of Traditional Chinese Medicine, Benxi Central Hospital, Chaoyang Central Hospital, Changtu Central Hospital, Dengta Central Hospital, Haicheng Hospital of Traditional Chinese Medicine, Huludao Central Hospital, Huludao Huihao Women’s and Children’s Hospital, Huludao Women’s and Children’s Health Hospital, Haicheng Central Hospital, Health industry group of Liaoning iron and Coal General Hospital, Liaoyang City Central Hospital, Liaoyang Obstetrics and Gynecology Hospital, Liaoyang Second People’s Hospital, Liaoyang Third People’s Hospital, Lianshan District People’s Hospital of Huludao, Panjin Liaoyou Baoshihua Hospital, People’s Hospital of Jianchang, Roicare Hospital and Clinics, Shengjing Hospital of China Medical University, Suizhong Hospital, Taian Enliang Hospital, The First Affiliated Hospital of Dalian Medical University, The First Hospital of China Medical University, The Fourth People’s Hospital of Shenyang, The People’s Hospital of Zhangwu, The Second Hospital of Chaoyang, The Second People’s Hospital of Huludao, Tieling County Central Hospital, Tieling Women’s and Children’s Hospital, The First Affiliated Hospital Of Jinzhou Medical University, The Fourth Affiliated Hospital of China Medical University, The People’s Hospital Of Liaoning Province, Women and Children’s Hospital of Anshan City, Xifeng first Hospital, who facilitated the project by allowing participation of these study sites as quality control hospitals. Moreover, we are extremely grateful to Shengjing Hospital of China Medical University, the First Affiliated Hospital of Sun Yat-sen University and AIRIGIN Company who provided the ultrasound dataset and technical assistance throughout the project.

Author contributions

H.Y., C.L.Z., and X.H.N. contributed to the study design. L.T., H.F.J., W.S.H., and W.Y. performed the data collection and data analysis. Y.Y.F. and W.N. developed, trained, and applied the deep-learning-based network. C.L.Z., H.F.J., and L.T. wrote the main manuscript text and prepared the figures. All authors reviewed the manuscript.

Data availability

The data that support the findings of this study are available to qualified researchers on reasonable request from the first author. Please email the first author Prof Lizhu Chen at aliceclz@sina.com.

Code availability

The codes of the paper are available on reasonable request. Please email the first author Prof Lizhu Chen at aliceclz@sina.com.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Lizhu Chen, Fujiao He, Ting Lei, Sihong Wang.

Contributor Information

Hongning Xie, Email: xiehn@mail.sysu.edu.cn.

Ying Huang, Email: huangying712@163.com.

Supplementary information

The online version contains supplementary material available at 10.1038/s41746-025-02207-8.

References

1.Salomon, L. J. et al. ISUOG practice guidelines: performance of first-trimester fetal ultrasound scan. Ultrasound Obstet. Gynecol.41, 102–113 (2013). [DOI] [PubMed] [Google Scholar]
2.Liao, Y. et al. Routine first-trimester ultrasound screening using a standardized anatomical protocol. Am. J. Obstet. Gynecol.224, 396.e1–396.e15 (2021). [DOI] [PubMed] [Google Scholar]
3.Nicolaides, K. H. Nuchal translucency and other first-trimester sonographic markers of chromosomal abnormalities. Am. J. Obstet. Gynecol.191, 45–67 (2004). [DOI] [PubMed] [Google Scholar]
4.Syngelaki, A. et al. Diagnosis of fetal non-chromosomal abnormalities on routine ultrasound examination at 11-13 weeks’ gestation. Ultrasound Obstet. Gynecol.54, 468–476 (2019). [DOI] [PubMed] [Google Scholar]
5.Salomon, L. J. et al. Feasibility and reproducibility of an image-scoring method for quality control of fetal biometry in the second trimester. Ultrasound Obstet. Gynecol.27, 34–40 (2006). [DOI] [PubMed] [Google Scholar]
6.Salomon, L. J., Winer, N., Bernard, J. P. & Ville, Y. A score-based method for quality control of fetal images at routine second-trimester ultrasound examination. Prenat. Diagn.28, 822–827 (2008). [DOI] [PubMed] [Google Scholar]
7.He, F., Wang, Y., Xiu, Y., Zhang, Y. & Chen, L. Artificial intelligence in prenatal ultrasound diagnosis. Front. Med.8, 729978 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ramirez Zegarra, R. & Ghi, T. Use of artificial intelligence and deep learning in fetal ultrasound imaging. Ultrasound Obstet. Gynecol.62, 185–194 (2023). [DOI] [PubMed] [Google Scholar]
9.Wu, L. et al. FUIQA: fetal ultrasound image quality assessment with deep convolutional networks. IEEE Trans. Cybern.47, 1336–1349 (2017). [DOI] [PubMed] [Google Scholar]
10.Dong, J. et al. A generic quality control framework for fetal ultrasound cardiac four-chamber planes. IEEE J. Biomed. Health Inf.24, 931–942 (2020). [DOI] [PubMed] [Google Scholar]
11.Cao, X. et al. Effectiveness and clinical impact of using deep learning for first-trimester fetal ultrasound image quality auditing. BMC Pregnancy Childbirth25, 375 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.UBoCMD, A. ssociation Chinese expert consensus on assessment standards of obstetric ultrasound training program(2022 edition). Chin. J. Ultrasonogr.31, 10 (2022). [Google Scholar]
13.Han, L. The Ministry of Health issued the Report on the Prevention and Control of Birth Defects in China (2012). China Mod. Med.19, 1–1 (2012). [Google Scholar]
14.Torrent, A. et al. Sonologist’s characteristics related to a higher quality in fetal nuchal translucency measured in primary antenatal care centers. Prenat. Diagn.39, 934–939 (2019). [DOI] [PubMed] [Google Scholar]
15.Evans, M. I., Krantz, D. A., Hallahan, T. W. & Sherwin, J. Impact of nuchal translucency credentialing by the FMF, the NTQR or both on screening distributions and performance. Ultrasound Obstet. Gynecol.39, 181–184 (2012). [DOI] [PubMed] [Google Scholar]
16.Cuckle, H. et al. Nuchal Translucency Quality Review (NTQR) program: first one and half million results. Ultrasound Obstet. Gynecol.45, 199–204 (2015). [DOI] [PubMed] [Google Scholar]
17.Thornburg, L. L. et al. United States’ experience in nuchal translucency measurement: variation according to provider characteristics in over five million ultrasound examinations. Ultrasound Obstet. Gynecol.58, 732–737 (2021). [DOI] [PubMed] [Google Scholar]
18.Souka, A. P., Snijders, R. J., Novakov, A., Soares, W. & Nicolaides, K. H. Defects and syndromes in chromosomally normal fetuses with increased nuchal translucency thickness at 10-14 weeks of gestation. Ultrasound Obstet. Gynecol.11, 391–400 (1998). [DOI] [PubMed] [Google Scholar]
19.Karim, J. N., Bradburn, E., Roberts, N. & Papageorghiou, A. T. First-trimester ultrasound detection of fetal heart anomalies: systematic review and meta-analysis. Ultrasound Obstet. Gynecol.59, 11–25 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Kagan, K. O., Cicero, S., Staboulidou, I., Wright, D. & Nicolaides, K. H. Fetal nasal bone in screening for trisomies 21, 18 and 13 and Turner syndrome at 11-13 weeks of gestation. Ultrasound Obstet. Gynecol.33, 259–264 (2009). [DOI] [PubMed] [Google Scholar]
21.Salomon, L. J. et al. ISUOG Practice Guidelines: ultrasound assessment of fetal biometry and growth. Ultrasound Obstet. Gynecol.53, 715–723 (2019). [DOI] [PubMed] [Google Scholar]
22.Fong, K. W. et al. Detection of fetal structural abnormalities with US during early pregnancy. Radiographics24, 157–174 (2004). [DOI] [PubMed] [Google Scholar]
23.Sonek, J. First trimester ultrasonography in screening and detection of fetal anomalies. Am. J. Med. Genet C. Semin. Med. Genet.145C, 45–61 (2007). [DOI] [PubMed] [Google Scholar]
24.Meng, M., Zhang, M., Shen, D., He, G. & Guo, Y. Detection and classification of breast lesions with You Only Look Once version 5. Future Oncol.18, 4361–4370 (2022). [DOI] [PubMed] [Google Scholar]
25.Yang, Y. et al. Classification of normal and abnormal fetal heart ultrasound images and identification of ventricular septal defects based on deep learning. J. Perinat. Med.51, 1052–1058 (2023). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Tables^{(61.8KB, pdf)}

Data Availability Statement

The codes of the paper are available on reasonable request. Please email the first author Prof Lizhu Chen at aliceclz@sina.com.

[CR1] 1.Salomon, L. J. et al. ISUOG practice guidelines: performance of first-trimester fetal ultrasound scan. Ultrasound Obstet. Gynecol.41, 102–113 (2013). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Liao, Y. et al. Routine first-trimester ultrasound screening using a standardized anatomical protocol. Am. J. Obstet. Gynecol.224, 396.e1–396.e15 (2021). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Nicolaides, K. H. Nuchal translucency and other first-trimester sonographic markers of chromosomal abnormalities. Am. J. Obstet. Gynecol.191, 45–67 (2004). [DOI] [PubMed] [Google Scholar]

[CR4] 4.Syngelaki, A. et al. Diagnosis of fetal non-chromosomal abnormalities on routine ultrasound examination at 11-13 weeks’ gestation. Ultrasound Obstet. Gynecol.54, 468–476 (2019). [DOI] [PubMed] [Google Scholar]

[CR5] 5.Salomon, L. J. et al. Feasibility and reproducibility of an image-scoring method for quality control of fetal biometry in the second trimester. Ultrasound Obstet. Gynecol.27, 34–40 (2006). [DOI] [PubMed] [Google Scholar]

[CR6] 6.Salomon, L. J., Winer, N., Bernard, J. P. & Ville, Y. A score-based method for quality control of fetal images at routine second-trimester ultrasound examination. Prenat. Diagn.28, 822–827 (2008). [DOI] [PubMed] [Google Scholar]

[CR7] 7.He, F., Wang, Y., Xiu, Y., Zhang, Y. & Chen, L. Artificial intelligence in prenatal ultrasound diagnosis. Front. Med.8, 729978 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Ramirez Zegarra, R. & Ghi, T. Use of artificial intelligence and deep learning in fetal ultrasound imaging. Ultrasound Obstet. Gynecol.62, 185–194 (2023). [DOI] [PubMed] [Google Scholar]

[CR9] 9.Wu, L. et al. FUIQA: fetal ultrasound image quality assessment with deep convolutional networks. IEEE Trans. Cybern.47, 1336–1349 (2017). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Dong, J. et al. A generic quality control framework for fetal ultrasound cardiac four-chamber planes. IEEE J. Biomed. Health Inf.24, 931–942 (2020). [DOI] [PubMed] [Google Scholar]

[CR11] 11.Cao, X. et al. Effectiveness and clinical impact of using deep learning for first-trimester fetal ultrasound image quality auditing. BMC Pregnancy Childbirth25, 375 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.UBoCMD, A. ssociation Chinese expert consensus on assessment standards of obstetric ultrasound training program(2022 edition). Chin. J. Ultrasonogr.31, 10 (2022). [Google Scholar]

[CR13] 13.Han, L. The Ministry of Health issued the Report on the Prevention and Control of Birth Defects in China (2012). China Mod. Med.19, 1–1 (2012). [Google Scholar]

[CR14] 14.Torrent, A. et al. Sonologist’s characteristics related to a higher quality in fetal nuchal translucency measured in primary antenatal care centers. Prenat. Diagn.39, 934–939 (2019). [DOI] [PubMed] [Google Scholar]

[CR15] 15.Evans, M. I., Krantz, D. A., Hallahan, T. W. & Sherwin, J. Impact of nuchal translucency credentialing by the FMF, the NTQR or both on screening distributions and performance. Ultrasound Obstet. Gynecol.39, 181–184 (2012). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Cuckle, H. et al. Nuchal Translucency Quality Review (NTQR) program: first one and half million results. Ultrasound Obstet. Gynecol.45, 199–204 (2015). [DOI] [PubMed] [Google Scholar]

[CR17] 17.Thornburg, L. L. et al. United States’ experience in nuchal translucency measurement: variation according to provider characteristics in over five million ultrasound examinations. Ultrasound Obstet. Gynecol.58, 732–737 (2021). [DOI] [PubMed] [Google Scholar]

[CR18] 18.Souka, A. P., Snijders, R. J., Novakov, A., Soares, W. & Nicolaides, K. H. Defects and syndromes in chromosomally normal fetuses with increased nuchal translucency thickness at 10-14 weeks of gestation. Ultrasound Obstet. Gynecol.11, 391–400 (1998). [DOI] [PubMed] [Google Scholar]

[CR19] 19.Karim, J. N., Bradburn, E., Roberts, N. & Papageorghiou, A. T. First-trimester ultrasound detection of fetal heart anomalies: systematic review and meta-analysis. Ultrasound Obstet. Gynecol.59, 11–25 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Kagan, K. O., Cicero, S., Staboulidou, I., Wright, D. & Nicolaides, K. H. Fetal nasal bone in screening for trisomies 21, 18 and 13 and Turner syndrome at 11-13 weeks of gestation. Ultrasound Obstet. Gynecol.33, 259–264 (2009). [DOI] [PubMed] [Google Scholar]

[CR21] 21.Salomon, L. J. et al. ISUOG Practice Guidelines: ultrasound assessment of fetal biometry and growth. Ultrasound Obstet. Gynecol.53, 715–723 (2019). [DOI] [PubMed] [Google Scholar]

[CR22] 22.Fong, K. W. et al. Detection of fetal structural abnormalities with US during early pregnancy. Radiographics24, 157–174 (2004). [DOI] [PubMed] [Google Scholar]

[CR23] 23.Sonek, J. First trimester ultrasonography in screening and detection of fetal anomalies. Am. J. Med. Genet C. Semin. Med. Genet.145C, 45–61 (2007). [DOI] [PubMed] [Google Scholar]

[CR24] 24.Meng, M., Zhang, M., Shen, D., He, G. & Guo, Y. Detection and classification of breast lesions with You Only Look Once version 5. Future Oncol.18, 4361–4370 (2022). [DOI] [PubMed] [Google Scholar]

[CR25] 25.Yang, Y. et al. Classification of normal and abnormal fetal heart ultrasound images and identification of ventricular septal defects based on deep learning. J. Perinat. Med.51, 1052–1058 (2023). [DOI] [PubMed] [Google Scholar]

PERMALINK

Intelligent quality control for first trimester ultrasound scans in Liaoning Province of China

Lizhu Chen

Fujiao He

Ting Lei

Sihong Wang

Yan Wang

Nan Wang

Yafei Yan

Hongning Xie

Ying Huang

Abstract

Introduction

Results

Image data for FUQS development and test

Fig. 1. Data source for developing, testing, and applying the FUQS.

Table 1.

Performance of FUQS

Fig. 2. The performance of FUQS in external dataset.

The application of FUQS: quality audit in Liaoning Province

Table 2.

Fig. 3. The results of quality control practice in Liaoning Province.

Table 3.

Discussion

Methods

Study design and participants

Procedures

The development of FUQS

Fig. 4. The structure of the FUQS.

Statistical analysis

Supplementary information

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Contributor Information

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases