Abstract
Purpose
The Western Australian Preterm Birth Prevention Initiative recommends a transabdominal cervical length (TACL) measurement at the mid‐pregnancy ultrasound to screen low‐risk women for preterm birth risk. In view of this recommendation, we assessed the inter‐observer consistency of TACL screening in mid‐pregnancy.
Methods
Routinely collected mid‐pregnancy TACL ultrasound images were graded from 0 to 4 according to the anatomical landmarks identified by a single expert. A random selection of 10 images of each grade were disseminated in an electronic survey to determine inter‐ and intra‐observer variations in the classification of the cervical image.
Results
A total of 244 participants graded 50 TACL images. Six participants repeated the grading. Overall agreement to the exact initial grade for all images was 49.6%, highest for images at both ends of the spectrum (83% Grade 0 and 70.4% for Grade 4). Overall agreement to the initial diagnostic Grades 3 and 4 was 75.3% (95% CI 74.5–76.0%) and was higher when the maternal bladder was empty. There was moderate inter‐rater agreement (κ = 0.42) for Grades 3 and 4 (diagnostic) or Grades 1 and 2 (non‐diagnostic). The intra‐rater agreement was fair to good (κ = 0.59, 95% CI 0.49–0.70) for those who repeated the assessment (including the expert grader).
Conclusions
Sonographic CL screening is considered an important tool for the identification of women at high risk of preterm birth. Image classification of TACL performed poorly compared with previous studies assessing transvaginal cervical length. Improved reliability and measurement consistency may be achieved through high levels of quality assurance, ongoing training and image audit.
Keywords: cervical length screening, perception, preterm birth, transabdominal
Introduction
Preterm birth complicates 8.7% of pregnancies in Australia. 1 As a major cause of perinatal mortality and morbidity, it generates ongoing research aimed to reduce its prevalence. 2 , 3 , 4 A short cervical length (CL) on ultrasound in mid‐pregnancy is a recognised indicator of an increased risk of preterm birth. 5 There is an inverse association between the gestation when a shorter cervix is detected and risk of preterm delivery. 5 , 6 The use of vaginal progesterone for pregnancies with a short mid‐pregnancy CL has been demonstrated to significantly reduce the incidence of preterm birth. 7 Given the availability of this intervention to reduce preterm birth, the measurement of the CL at the time of the mid‐pregnancy fetal anatomy survey as a component of preterm birth prevention strategies has become common. 8
The Western Australian Preterm Birth Initiative (Initiative) commenced in November 2014 and comprised seven key initiatives. 9 Routine sonographic CL screening for all women, with identification and treatment of a short cervix in mid‐pregnancy, is a key component of the Initiative. 9 A transabdominal (TA) approach to CL screening was recommended as the initial approach in population screening, with the transvaginal (TV) approach reserved for those women in whom the cervix could not be adequately imaged transabdominally or who were considered at increased risk of preterm birth on prior risk factors.
The transabdominal measurement of the cervix (TACL) with a partially full maternal bladder has been recommended as the first line of screening in women at low risk of preterm birth. 8 A TACL of >35 mm is reported to preclude a short cervix (transvaginal cervical length (TVCL) <25 mm) with >95% of sensitivity). 8 , 10 However, imaging of CL using the TA approach can be challenging. There is a relationship between maternal bladder filling and visualisation of the cervical landmarks, 11 with a degree of bladder filling required for adequate visualisation of the cervix in the majority of women. 8 , 11 , 12 , 13 There has been variation noted previously in the recorded CL between operators and with the approach undertaken to obtain the measurement. 14 , 15 , 16
This study was designed to assess the inter‐observer variation between TACL measurements in mid‐pregnancy, in terms of recognition of important anatomic landmarks and the diagnostic quality of the image.
Methods
This was a prospective study conducted from March 2015 to the end of December 2016 and performed under the umbrella of the Initiative.
To accurately measure the CL with ultrasound, regardless of the approach, clear identification of the anatomic landmarks is required. The callipers should be placed on the internal and external os, incorporating only the closed length where the endocervical canal walls were touching 11 or bordered by endocervical mucosa 11 , 17 (Figure 1a TACL and b TVCL).
Figure 1.

(a) Transabdominal cervical ultrasound length. A – Internal cervical os; B – external cervical os. Yellow dotted line – cervical canal. (b) Transvaginal cervical length ultrasound. A – internal cervical os; B – external cervical os. Yellow dotted line – cervical canal. [Colour figure can be viewed at wileyonlinelibrary.com]
In this study, 9008 de‐identified TA ultrasound images of the cervix taken during routine mid‐trimester pregnancy (16–22 weeks’ gestation) for the purpose of CL screening at public hospitals and private women's imaging practice in Western Australia during 2015 and 2016 were collected and collated in an image database. An ultrasound scoring system, based on the ability to visualise the cervical landmarks as previously reported by Saul et al., 18 was then applied.
Five landmarks were used in the CL ultrasound scoring system:
-
1
the cervical/vaginal interface (marked by a border of increased echogenicity),
-
2
the internal cervical os,
-
3
the external cervical os,
-
4
the full length of the cervical canal and
-
5
the outline of the cervical corpus.
All collected TACL images were graded, based on what anatomical structures could be identified in the TACL ultrasound image, by the primary author (MKP) using the following criteria:
Grade 0 – No anatomical markers.
Grade 1 – Visible internal os.
Grade 2 – Visible external os.
Grade 3 – Both internal and external os visible.
Grade 4 – Grade 3 and entire cervical canal visible.
The prevalence of grades from the mid‐trimester CL scan imaging database of 9008 individual CL assessments is displayed in the Table S1.
This initial grading prior to the survey compilation was considered to be the ‘standard’. The survey comprising this study consisted of 10 randomly selected TACL images of each grade. The resulting 50 TA mid‐pregnancy images of the cervix were randomly placed into an online survey, blinded to the initial expert grader (MKP).
Images were considered to be ‘diagnostic’ if the internal and the external cervical os were visible to enable the closed CL to be measured (i.e. Grade 3 or Grade 4) or ‘non‐diagnostic’ when neither or either anatomic border markers (internal and external cervical os) could be identified (i.e. Grades 0, 1 or 2). The percentage agreement of participants' gradings to the initial expert grading was summarised overall and by grade (0 to 4) and usability (non‐diagnostic: Grades 0 to 2, diagnostic: Grades 3 to 4). The depth of the bladder and the skin thickness (from the transducer to the bladder) (Figure 2) were retrospectively measured and recorded for each image.
Figure 2.

Transabdominal measurement of the cervical length. A = skin thickness. B = bladder depth. [Colour figure can be viewed at wileyonlinelibrary.com]
A link to the survey was disseminated through professional contacts. The Australasian Society for Ultrasound in Medicine (ASUM) and the Australasian Sonographers Association (ASA) made the survey available to their membership via a link in their respective electronic newsletters.
The survey asked participants their profession, how many mid‐trimester CL measurements they would perform each week on average, how long they had been practising ultrasound (including their training) and to grade TACL images based on the grading criteria provided. The survey data were collated and entered into an Excel database.
A partially filled maternal bladder was specified by the initiative. To assess the impact of maternal bladder filling on sonographer ability to identify cervical landmarks, the 50 images were re‐reviewed by the initial grader, as a separate component of the survey. The images were assessed, measuring the skin thickness and the bladder depth. The skin thickness was the measurement (mm) from the transducer surface/skin surface to the maternal bladder (Figure 2). The bladder depth was the measurement (mm) of the bladder (containing urine) to quantify bladder filling. Transabdominal bladder full (TABF) was a bladder depth > 30 mm and transabdominal bladder empty (TABE) <30 mm. ‘Not recorded’ meant that the scan was zoomed in to the point that a skin measurement was not possible. TABE or TABF was determined by the antero‐posterior (AP) bladder depth measurements (mm) (Figure 2). Participants were not provided with these data. The bladder depth and the skin thickness were included to assess whether bladder filling and maternal phenotype had an impact on image perception, inter‐rater reliability and ‘usability’ of images. Skin thickness measurements and bladder volumes were negatively correlated as continuous measurements.
This study was a component of the ‘Implementation of a state‐wide CL screening program’, which was approved by the Women and Newborn Health Service Human Research Ethics Committee (2015028EW) RGS0000002660.
Statistical analysis
Proportions of agreement among raters to the initial grading were summarised by grade with confidence intervals calculated using the Wilson method recommended for sample proportions with small sample sizes. 19
Logistic regression analysis was conducted to assess the effect of years of experience practising ultrasound (≤10, 11–20 and >20 years), number of mid‐trimester scans performed per week (<10, 10–25 and >25), bladder status (empty and full) and skin measurement (not recorded, <32 mm and ≥32 mm) on the correct identification of usability of images (grades 0–2 vs. 3–4). Skin thickness measurements were categorised using the 75th percentile as the cut‐off value into <32 mm, ≥32 mm or ‘not recorded’ categories. Interactions among bladder fullness, skin thickness, experience of operator and usability of images were also explored. To account for the correlation between multiple responses on each participant, a robust variance estimator was used. Odds ratios (OR) and their 95% of confidence intervals (CI) were used to summarise the covariate effects. In addition to the initial gradings, another two ratings were completed by the initial grader and agreement for each of these ratings was compared against the initial assessment. Inter‐rater reliability and intra‐rater reliability using the Fleiss variant on the Kappa statistic for agreement between multiple raters using a dichotomous classification were calculated.
Data were analysed using Stata 16 Statistical Software, 2019 (College Station, TX).
Results
There were 252 surveys completed, of which eight were excluded; six repeated the survey, including the initial grader, a medical student and an application specialist. Therefore, the study data set available for analysis consisted of 244 surveys performed by 229 sonographers, 3 radiologists and 12 sonologists.
The professional characteristics of the 244 study participants are shown in Table 1. The majority of participants were sonographers (93.9%), and there was an even distribution of participants across the year categories of practising ultrasound (Table 1). Most participants performed <10 mid‐trimester scans per week (59.8%), and only 5.7% performed >25 scans per week.
Table 1.
Professional characteristics of participants (n = 244).
| Characteristics | N (%) |
|---|---|
| Role | |
| Sonographer | 229 (93.9%) |
| Radiologist | 3 (1.2%) |
| Sonologist | 12 (4.9%) |
| Years practising ultrasound | |
| ≤10 | 80 (32.8%) |
| 11–20 | 80 (32.8%) |
| >20 | 84 (34.4%) |
| Number of mid‐trimester scans per week | |
| <10 | 146 (59.8%) |
| 10–25 | 84 (34.4%) |
| >25 | 14 (5.7%) |
Diagnostic accuracy
The overall agreement to the initial grade for all images (Grades 0 to 4) was 49.6%. The highest agreement was recorded for Grades 0 and 4 (83.0% and 70.4%, respectively) and the lowest agreement for Grade 3 images (16.1%) (Table 2).
Table 2.
Percentage agreement with initial grading (shaded grey) and distribution of observed ratings within each initial grading (Grades 0 to 4), and percentage agreement and 95% of confidence interval (CI) for the diagnostic of images overall and by bladder fullness.
| Initial grade | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Observed | Grade 0 | N = 2440 | Grade 1 | N = 2440 | Grade 2 | N = 2440 | Grade 3 | N = 2440 | Grade 4 | N = 2440 |
| Overall | ||||||||||
| Grade 0 | 2024 (83.0%) | 920 (37.7%) | 468 (19.2%) | 289 (11.8%) | 70 (2.9%) | |||||
| Grade 1 | 245 (10.0%) | 985 (40.4%) | 174 (7.1%) | 405 (16.6%) | 145 (5.9%) | |||||
| Grade 2 | 108 (4.4%) | 194 (8.0%) | 935 (38.3%) | 587 (24.1%) | 232 (9.5%) | |||||
| Grade 3 | 32 (1.3%) | 181 (7.4%) | 302 (12.4%) | 392 (16.1%) | 262 (10.7%) | |||||
| Grade 4 | 24 (1.0%) | 152 (6.2%) | 550 (22.5%) | 756 (31.0%) | 1717 (70.4%) | |||||
| Not graded | 7 (0.3%) | 8 (0.3%) | 11 (0.5%) | 11 (0.5%) | 14 (0.6%) | |||||
| Full bladder N = 976 | ||||||||||
| Grade 0 | 801 (82.1%) | 229 (31.3%) | 437 (22.4%) | 176 (14.4%) | 65 (3.8%) | |||||
| Grade 1 | 39 (4.0%) | 260 (35.5%) | 63 (3.2%) | 42 (3.4%) | 102 (6.0%) | |||||
| Grade 2 | 104 (10.7%) | 153 (20.9%) | 766 (39.2%) | 469 (38.4%) | 224 (13.1%) | |||||
| Grade 3 | 17 (1.7%) | 53 (7.2%) | 229 (11.7%) | 155 (12.7%) | 179 (10.5%) | |||||
| Grade 4 | 12 (1.2%) | 35 (4.8%) | 447 (22.9%) | 372 (30.5%) | 1129 (66.1%) | |||||
| Not graded | 3 (0.3%) | 2 (0.3%) | 10 (0.5%) | 6 (0.5%) | 9 (0.5%) | |||||
| Empty bladder N = 1464 | ||||||||||
| Grade 0 | 1223 (83.5%) | 691 (40.5%) | 31 (6.4%) | 113 (9.3%) | 5 (0.7%) | |||||
| Grade 1 | 206 (14.1%) | 725 (42.4%) | 111 (22.7%) | 363 (29.8%) | 43 (5.9%) | |||||
| Grade 2 | 4 (0.3%) | 41 (2.4%) | 169 (34.6%) | 118 (9.7%) | 8 (1.1%) | |||||
| Grade 3 | 15 (1.0%) | 128 (7.5%) | 73 (15.0%) | 237 (19.4%) | 83 (11.3%) | |||||
| Grade 4 | 12 (0.8%) | 117 (6.9%) | 103 (21.1%) | 384 (31.5%) | 588 (80.3%) | |||||
| Not graded | 4 (0.3%) | 6 (0.4%) | 1 (0.2%) | 5 (0.4%) | 5 (0.7%) | |||||
|
Non‐diagnostic (Grade 0–2) % agreement (95% CI) |
Diagnostic (Grade 3–4) % agreement (95% CI) |
|||||||||
| Overall | 82.7 (81.8–83.5) | 64.1 (62.7–65.4) | ||||||||
| Full bladder | 78.2 (76.9–79.6) | 63.0 (61.2–64.7) | ||||||||
| Empty bladder | 87.7 (86.6–88.8) | 66.5 (64.4–68.6) | ||||||||
The overall diagnostic agreement to the initial grade was 75.3% (95% CI 74.5–76.0%). Agreement was higher for non‐diagnostic (Grades 0–2) (82.7%) than diagnostic (Grades 3–4) images (64.1%) (Table 2). Radiologists/sonologists had the highest percentage agreement for diagnostic images (72.3%) and sonographers for non‐diagnostic images (82.8%). The grading of the images generally clustered around the initial grade, except for Grades 2 and 3, where gradings were more evenly distributed across the entire range (Table 2).
Skin measurements and bladder volumes were negatively correlated (r = −0.43) when considered as continuous variables. An association was observed when compared by categories: skin measurements ‘not recorded’ were more common with TABF than TABE (29.6% vs. 4.3%), and skin thickness ≥ 32 mm was more common with a TABE than TABF (52.2% vs. 3.7%), P < 0.001 (Table 3). Overall agreement of the images graded 3 or 4 to the initial assessment was higher for sonologists/radiologists compared with sonographers (77.5% vs. 75.1%, aOR 1.14, 95% CI 1.01–1.28, P = 0.028) (Table 3). Images graded 3 or 4 and years of practising ultrasound were inter‐related, whereby agreement increased with years of practising ultrasound among images graded 0–2 (78.9%, 82.4% and 86.6% agreements for ≤10, 11–20 and > 20 years, respectively) and conversely decreased with years of experience among diagnostic images (69.5%, 64.5% and 58.5%, respectively).
Table 3.
Diagnostic percentage agreement with initial grading and 95% confidence intervals (CI) presented in categories of years practising ultrasound, number of mid‐trimester scans per week, skin thickness and bladder fullness.
| Characteristics | Images N | % Agreement (95% CI) | Unadjusted OR (95% CI) | Adjusted OR (95% CI) | P‐value |
|---|---|---|---|---|---|
| Role | |||||
| Sonographer | 11,450 | 75.1 (74.3–75.9) | 1.00 | 1.00 | |
| Sonologist/radiologist | 750 | 77.5 (74.3–80.3) | 1.14 (2.88–3.16) | 1.14 (1.01–1.28) | 0.028 |
| Mid‐trimester scans per week | |||||
| >25 | 700 | 74.9 (71.5–77.9) | 1.00 | 1.00 | |
| 10–25 | 4200 | 74.6 (73.2–75.8) | 0.98 (0.83–1.17) | 0.99 (0.82–1.19) | 0.890 |
| <10 | 7300 | 75.7 (74.7–76.7) | 1.05 (0.89–1.23) | 1.05 (0.88–1.26) | 0.569 |
| Diagnostic/years practising US | |||||
| Diagnostic and >20 | 1680 | 58.5 (56.1–60.9) | 1.00 | 1.00 | |
| Diagnostic and 11–20 | 1600 | 64.5 (62.1–66.8) | 1.29 (1.03–1.62) | 1.28 (1.01–1.61) | 0.039 |
| Diagnostic and < =10 | 1600 | 69.5 (67.2–71.7) | 1.62 (1.27–2.06) | 1.61 (1.25–2.06) | <0.001 |
| Non‐diagnostic and > 20 | 2520 | 86.6 (85.2–87.9) | 4.58 (3.27–6.41) | 4.30 (3.06–6.04) | <0.001 |
| Non‐diagnostic and 11–20 | 2400 | 82.4 (80.8–83.9) | 3.32 (2.59–4.27) | 3.07 (3.38–3.95) | <0.001 |
| Non‐diagnostic and ≤10 | 2400 | 78.9 (77.2–80.5) | 2.65 (2.03–3.45) | 2.43 (1.86–3.19) | <0.001 |
| Bladder/skin measurement | |||||
| Full and not recorded | 1952 | 67.0 (64.9–69.1) | 0.80 (0.71–0.89) | 0.70 (0.63–0.77) | <0.001 |
| Full and skin <32 mm | 4392 | 71.8 (70.4–73.1) | 1.00 | 1.00 | |
| Empty and skin <32 mm | 2440 | 78.2 (76.5–79.8) | 1.41 (1.27–1.55) | 1.30 (1.17–1.44) | <0.001 |
| Empty and skin ≥32 mm | 2928 | 80.1 (78.6–81.5) | 1.58 (1.45–1.73) | 1.38 (1.26–1.51) | <0.001 |
| Full and skin ≥32 mm | 244 | 92.6 (88.6–95.3) | 4.93 (3.07–7.93) | 3.00 (1.95–4.63) | <0.001 |
| Empty and not recorded | 244 | 98.4 (95.9–99.4) | 23.58 (8.80–63.18) | 14.39 (5.47–37.9) | <0.001 |
Interactions between characteristics are presented showing results for all combinations of possible levels against a reference level in the combined group. Unadjusted and adjusted odds ratios (OR and aOR) and 95% confidence intervals (CI) are reported; the adjusted model includes all characteristics present in the table.
Diagnostic agreement was higher for both combinations of skin thickness (<32 mm or ≥ 32 mm) with TABE and with skin thickness ≥ 32 mm with TABF compared with <32 mm (Table 3). Among images where the skin measurement was ‘not recorded’, a TABF was associated with lower agreement (67%, aOR 0.70, 95% CI 0.63–0.77) and a TABE was associated with higher agreement (98.4%, aOR 14.4, 95% CI 5.47–37.9) (all P‐values <0.001). Agreement was not associated with the number of mid‐trimester scans performed per week (Table 3).
Repeatability of image grading
The overall agreement of the initial grader's second attempt was 68% (95% CI 54–79%). The distribution of other ratings that disagreed within each grade is shown in Table 4a. Agreement to the initial grading for overall usability was 84% (95% CI 71–92%): 80% among diagnostic images and 87% among non‐diagnostic images.
Table 4.
Agreement between initial grader's second rating and the initial grading (4a) and by fullness or emptiness of bladder (4b).
| (a) | |||||
|---|---|---|---|---|---|
| Repeat grade | Initial grading | ||||
| Grade 0 (N = 10) | Grade 1 (N = 10) | Grade 2 (N = 10) | Grade 3 (N = 10) | Grade 4 (N = 10) | |
| Grade 0 | 7 (70%) | 3 (30%) | – | – | – |
| Grade 1 | 3 (30%) | 6 (60%) | – | 1 (10%) | – |
| Grade 2 | – | 1 (10%) | 6 (60%) | 3 (30%) | – |
| Grade 3 | – | – | 3 (30%) | 6 (60%) | 1 (10%) |
| Grade 4 | – | – | 1 (10%) | – | 9 (90%) |
| Non‐diagnostic (Grades 0–2) | Diagnostic (Grades 3–4) | ||||
| Diagnostic | 87% (95% CI 70–95%) | 80% (95% CI 58–92%) | |||
| (b) | |||||
|---|---|---|---|---|---|
| Full bladder | Initial grading | ||||
| Observed | Grade 0 (N = 4) | Grade 1 (N = 3) | Grade 2 (N = 8) | Grade 3 (N = 5) | Grade 4 (N = 7) |
| Grade 0 | 4 (100%) | – | – | – | – |
| Grade 1 | – | 2 (66.7%) | – | – | – |
| Grade 2 | – | 1 (33.3%) |
5 (62.5%) |
2 (40%) | – |
| Grade 3 | – | – | 2 (25%) | 3 (60%) | 1 (14.3%) |
| Grade 4 | – | – | 1 (12.5%) | – | 6 (85.7%) |
| Non‐diagnostic (grades 0–2) | Diagnostic (grades 3–4) | ||||
| Diagnostic | 80% (95% CI 55–93%) | 83% (95% CI 55–95%) | |||
| Empty bladder | Initial Grading | ||||
| Observed | Grade 0 (N = 6) | Grade 1 (N = 7) | Grade 2 (N = 2) | Grade 3 (N = 5) | Grade 4 (N = 3) |
| Grade 0 | 3 (50%) | 3 (42.9%) | – | – | – |
| Grade 1 | 3 (50%) | 4 (57.1%) | – | 1 (20%) | – |
| Grade 2 | – | – | 1 (50%) | 1 (20%) | – |
| Grade 3 | – | – | 1 (50%) | 3 (60%) | – |
| Grade 4 | – | – | – | – | 3 (100%) |
| Non‐diagnostic (grades 0–2) | Diagnostic (grades 3–4) | ||||
| Diagnostic | 93% (95% CI 70–99%) | 75% (95% CI 41–93%) | |||
Number (percent) images that agreed between gradings are highlighted in grey.
Overall agreement between the initial rating and the second grading for 27 images taken with TABF was 74% (95% CI 55–87%). Agreement for overall usability was 81% (95% CI 63–92%): 83% among diagnostic and 80% among non‐diagnostic images (Table 4b). Overall agreement between the initial rating and the second grading for 23 images taken with TABE was 61% (95% CI 41–78%). Agreement for overall usability was 87% (95% CI 68–95%): 75% among diagnostic and 93% among non‐diagnostic images (Table 4b).
There were five participants who completed the survey twice (including the initial grader). The test–retest reliability was calculated for both the initial grader (‘initial grading’) retest, as well as for all those with repeated tests. There were similar intra‐rater results for all those who repeated tests. The results were similar on each occasion. The intra‐rater agreement for the initial grader repeated measurements was fair to good (κ = 0.67, 95% CI 0.45–0.89). The intra‐rater agreement among five participants who had repeated measurements (including the initial grader) was fair to good (κ = 0.59, 95% CI 0.49–0.70). Intra‐rater agreement among four participants who had repeated measurements (excluding the initial grader) was also fair to good (κ = 0.57, 95% CI 0.45–0.69).
Discussion
In our study of interpreting mid‐pregnancy TACL measurement, there was generally low agreement on the image analysis, with the overall agreement for all images (graded 0 to 4) being 49.6%. Higher agreement was recorded for the extreme ends of the grading scale (Grades 0 and 4) at 83.0% and 70.4%, respectively. The lowest agreement was for Grade 3 images (16.1%), where an assessment of the internal and external cervical os position was required. Agreement was higher for non‐diagnostic (Grades 0–2) (82.7%) than diagnostic (Grades 3–4) images (64.1%). Images considered to be diagnostic and years of practising ultrasound were inter‐related, with agreement increasing with years of practice. The diagnostic agreement for TACL grading was higher with TABE (regardless of the skin thickness) and with skin thickness ≥ 32 mm with TABF.
TVCL imaging has superior test performance characteristics compared with TACL, 13 , 20 with the latter associated with overestimation of the true CL secondary to bladder over‐filling and frequent inability to perform the measurement secondary to failure to image the anatomic landmarks. 21 However, universal TVCL is time‐consuming, more expensive and may not be acceptable for some women. 22 The concept of a primary TACL in low‐risk women has been supported by research, demonstrating a 36‐mm threshold for cervical length with the transabdominal approach (bladder full) will detect 96% of TVCL <25 mm with 39% of specificity. 23 More recently, Ginsberg and colleagues reported the routine performance of TVCL screening at the fetal anatomy survey prolonged the examination time by 25%. 24 Additionally, they were in accord with the data from the earlier study from Friedman 23 with a TACL <36 mm providing a 100% sensitivity (95% CI 63–100) with a specificity of 57.5% (95% CI 52.8–62.1%) to detect a short CL <25 mm by TV ultrasound. 23 Both RANZCOG and the initiative have proposed a two‐step model for CL screening, with the use of TACL as a primary screening modality for women considered at low risk for preterm birth at the time of the mid‐pregnancy ultrasound examination, 8 , 9 accepting that TVCL will still be required in 36% 24 secondary to the TACL measuring <36 mm or an inability to adequately image the cervix.
Bladder filling
Limitations of TACL measurement include uterine contractures and fetal parts obscuring the cervix, particularly with advancing gestation. 17 Bladder filling is also a confounding factor, with a TABF causing the cervix to appear falsely elongated. 11 , 12 , 13 , 17 , 23 Chaudhury et al. 11 proposed that increased bladder filling changes the position of the cervix, with progressive vertical orientation. They reported a TABE or a bladder with a vertical pocket ≤3 cm of urine provided cervical visualisation in most women. 11 , 25 To et al. 12 reported that TACL measurement was only possible in 49% of cases with TABE, increasing to 73%–80.7% with a bladder volume > 150 mL. 10 , 12 , 13 Bladder filling can be assessed by measuring the maximum vertical pocket of urine as a means of standardising the measurement technique 18 or the use of bladder volume, 13 although this is more subject to variation.
Suboptimal TACL imaging has been associated with a TABE (54%) and with a short cervix (TVCL ≤25 mm) (88.2%). 26 Consistent with our findings, Pandipati et al. 26 reported suboptimal imaging in over 50% of examinations even when TACL was performed at the end of the examination, suggesting bladder filling did not assist with cervical visualisation.
Pandipati reported a decrease in confidence of the ‘adequate’ image from >99% to 50% with sonographer experience over the duration of their study. 27 We also noted the more we viewed images the less confident we became in grading them, with participants who completed the survey more than once returning variable responses.
Sonographer training and experience
There is a relationship between operator experience, measurement accuracy and reproducibility. 18 , 27 , 28 These factors are important in recognising when the cervix has not been adequately identified 27 so as to progress to TVCL. Tolsgaard et al. 29 concluded that trainees required more frequent supervision when performing TA compared with TV ultrasound. However, we did not observe this phenomenon, with no significant difference in median percentage agreement between experience groups, or in the accuracy of grades 1, 2 and 3 ratings.
Friedman reported that image acceptability was at the discretion of the sonographer, suggesting that TA evaluation of the cervix is readily interpretable. 23 In a study of 192 women at high risk for preterm birth, Saul et al. 18 commented that TA cervical visualisation was principally a matter of sonographer training and experience. 18 They concluded that with adequate probe pressure and sonographer experience, nearly all TACL images could be visualised to an interpretable degree. 18 However, based on the images from our database, the number of diagnostic images was less than optimal for screening, with just over 50% classified as diagnostic, regardless of maternal bladder filling.
Berghella 30 reported that a sonographer should be supervised while expertise is being acquired for the first 50 TV examinations, and Pandipati 27 concluded that sonographers required a learning curve of several months to appropriately assess the cervix transabdominally. Saul concluded that the TACL technique, once mastered, may be useful for universal CL screening. 18 Optimal technique, experience and an understanding of the landmarks of the cervical canal are central to accurate TACL measurement 18 and re‐iterate the critical role of training in CL screening programmes. We support standardisation of imaging technique and training to maintain high levels of quality assurance 31 , 32 if universal screening with TACL is to be a first‐line tool.
We believe this is the first study that has not used preselected specific images for image assessment. The use of TACL images from routine mid‐pregnancy CL obtained from the community is a strength our study. Images used were not specifically optimised for the purpose of the study. The distribution of images across the five grades was consistent between public and private sites. The guidelines currently employed as a component of two‐step CL screening in mid‐pregnancy may not be adequate in detection programmes for preterm birth prevention.
Given the importance of CL screening, the variability of image interpretation and the poorer performance of TACL when compared with prior TVCL measurement studies, 20 regular monitored quality assurance assessments of TACL or the implementation of universal TVCL screening should be considered. Ongoing, regular image assessment and review is mandatory, regardless of the approach taken to measure the CL to maintain consistency and reliability.
Ethical approval
Women and Newborn Health Service Human Research Ethics Committee (2015028EW) RGS0000002660. Participant consent was implied via participation and completion of the survey.
Authorship declaration
The authorship listing conforms with the journal's authorship policy and that all authors are in agreement with the content of the submitted manuscript.
Funding
No funding information is provided.
Conflict of interest
The authors reported no conflict of interest. This study was supported in kind by the Women and Infants Research Foundation of Western Australia. The study sponsor had no role in the design, data collection, analysis, interpretation of data, writing of the manuscript or decision to submit the paper for publication.
Author contributions
Elizabeth Nathan: Data curation (equal); formal analysis (equal); software (equal); writing – original draft (equal); writing – review and editing (equal). Dorota Doherty: Conceptualisation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); project administration (equal); software (equal); supervision (equal); validation (equal); writing – original draft (equal); writing – review and editing (equal). Jan Dickinson: Conceptualisation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); project administration (equal); resources (equal); software (equal); supervision (lead); validation (equal); visualiation (equal); writing – original draft (equal); writing – review and editing (equal).
Supporting information
Table S1 Transabdominal mid‐trimester CL database graded images.
Acknowledgements
Many thanks to all the sonographers, sonologists and radiologists who completed this survey, without whom this research could not be undertaken and to the Australasian Society for Ultrasound in Medicine (ASUM) and the Australasian Sonographers Association (ASA) for aiding in the dissemination of the survey to their members and to Dr Jared Watts.
References
- 1. Morris J, Brown K, Newnham J. The Australian preterm birth prevention Alliance. Aust N Z J Obstet Gynaecol 2020; 60(3): 321–3. [DOI] [PubMed] [Google Scholar]
- 2. O'Reilly P, Dakin A, Keating N, Luethe L, Corcoran S. Does the use of gestation‐specific centiles for cervical length change the management of pregnancies at risk of recurrent spontaneous preterm birth? Eur J Obstet Gynecol Reprod Biol 2021; 264: 349–52. [DOI] [PubMed] [Google Scholar]
- 3. Gulersen MBE, Domney A, Blitz MJ, Rafael TJ, Li X, Krantz D, et al. Cerclage in singleton gestations with an extremely short cervix (≤10 mm) and no history of spontaneous preterm birth. Am J Obstet Gynecol MFM 2021; 3(5): 100430. [DOI] [PubMed] [Google Scholar]
- 4. Frey HA, Stout MJ, Abdelwahab M, Tuuli MG, Woolfolk C, Shamshirsaz AA, et al. Vaginal progesterone for preterm birth prevention in women with arrested preterm labor. J Matern Fetal Neonatal Med 2021; 18: 1–9. [DOI] [PubMed] [Google Scholar]
- 5. Iams JD, Goldenberg RL, Meis PJ, Mercer BM, Moawad A, Das A, et al. The length of the cervix and the risk of spontaneous premature delivery. National Institute of Child Health and Human Development maternal fetal medicine unit network. N Engl J Med 1996; 334(9): 567–72. [DOI] [PubMed] [Google Scholar]
- 6. Crane JM, Hutchens D. Transvaginal sonographic measurement of cervical length to predict preterm birth in asymptomatic women at increased risk: a systematic review. Ultrasound Obstet Gynecol 2008; 31(5): 579–87. [DOI] [PubMed] [Google Scholar]
- 7. Romero R, Nicolaides K, Conde‐Agudelo A, Tabor A, O'Brien JM, Cetingoz E, et al. Vaginal progesterone in women with an asymptomatic sonographic short cervix in the midtrimester decreases preterm delivery and neonatal morbidity: a systematic review and metaanalysis of individual patient data. Am J Obstet Gynecol 2012; 206(2): 124 e1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. The Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) . Measurement of Cervical Length for Prediction of Preterm Birth C‐Obs 27; 2017.
- 9. Newnham JP, White SW, Meharry S, Lee HS, Pedretti MK, Arrese CA, et al. Reducing preterm birth by a statewide multifaceted program: an implementation study. Am J Obstet Gynecol 2017; 216(5): 434–42. [DOI] [PubMed] [Google Scholar]
- 10. Cho HJ, Roh HJ. Correlation between cervical lengths measured by transabdominal and transvaginal sonography for predicting preterm birth. J Ultrasound Med 2016; 35(3): 537–44. [DOI] [PubMed] [Google Scholar]
- 11. Chaudhury K, Ghosh M, Halder A, Senapati S, Chaudhury S. Is transabdominal ultrasound scanning of cervical measurement in mid‐trimester pregnancy a useful alternative to transvaginal ultrasound scan? J Turk Ger Gynecol Assoc 2013; 14(4): 225–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. To MS, Skentou C, Cicero S, Nicolaides KH. Cervical assessment at the routine 23‐weeks' scan: problems with transabdominal sonography. Ultrasound Obstet Gynecol 2000; 15(4): 292–6. [DOI] [PubMed] [Google Scholar]
- 13. Marren AJ, Mogra R, Pedersen LH, Walter M, Ogle RF, Hyett JA. Ultrasound assessment of cervical length at 18‐21 weeks' gestation in an Australian obstetric population: comparison of transabdominal and transvaginal approaches. Aust N Z J Obstet Gynaecol 2014; 54: 250–5. [DOI] [PubMed] [Google Scholar]
- 14. Roh HJ, Ji YI, Jung CH, Jeon GH, Chun S, Cho HJ. Comparison of cervical lengths using transabdominal and transvaginal sonography in midpregnancy. J Ultrasound Med 2013; 32(10): 1721–8. [DOI] [PubMed] [Google Scholar]
- 15. Kuusela P, Jacobsson B, Soderlund M, Bejlum C, Almstrom E, Ladfors L, et al. Transvaginal sonographic evaluation of cervical length in the second trimester of asymptomatic singleton pregnancies, and the risk of preterm delivery. Acta Obstet Gynecol Scand 2015; 94(6): 598–607. [DOI] [PubMed] [Google Scholar]
- 16. Valentin L, Bergelin I. Intra‐ and interobserver reproducibility of ultrasound measurements of cervical length and width in the second and third trimesters of pregnancy. Ultrasound Obstet Gynecol 2002; 20(3): 256–62. [DOI] [PubMed] [Google Scholar]
- 17. Stone PR, Chan EH, McCowan C, Taylor RS, Mitchell JM, Consortium S. Transabdominal scanning of the cervix at the 20‐week morphology scan: comparison with transvaginal cervical measurements in a healthy nulliparous population. Aust N Z J Obstet Gynaecol 2010; 50(6): 523–7. [DOI] [PubMed] [Google Scholar]
- 18. Saul LL, Kurtzman JT, Hagemann C, Ghamsary M, Wing DA. Is transabdominal sonography of the cervix after voiding a reliable method of cervical length assessment? J Ultrasound Med 2008; 27(9): 1305–11. [DOI] [PubMed] [Google Scholar]
- 19. Brown LD, Cat TT, DasGupta A. Interval estimation for a proportion. Stat Sci 2001; 16: 101–17. [Google Scholar]
- 20. Khalifeh A, Berghella V. Not transabdominal! Am J Obstet Gynecol 2016; 215(6): 739–44 e1. [DOI] [PubMed] [Google Scholar]
- 21. Westerway SC, Pedersen LH, Hyett J. Cervical length measurement: comparison of transabdominal and transvaginal approach. Australas J Ultrasound Med 2015; 18(1): 19–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Pedretti MK, Kazemier BM, Dickinson JE, Mol BW. Implementing universal cervical length screening in asymptomatic women with singleton pregnancies: challenges and opportunities. Aust N Z J Obstet Gynaecol 2017; 57(2): 221–7. [DOI] [PubMed] [Google Scholar]
- 23. Friedman AM, Srinivas SK, Parry S, Elovitz MA, Wang E, Schwartz N. Can transabdominal ultrasound be used as a screening test for short cervical length? Am J Obstet Gynecol 2013; 208(3): 190 e1–7. [DOI] [PubMed] [Google Scholar]
- 24. Ginsberg Y, Zipori Y, Khatib N, Schwake D, Goldstein I, Shrim A, et al. It is about time. The advantage of transabdominal cervical length screening. J Matern Fetal Neonatal Med 2020; 1–6. 10.1080/14767058.2020.1864317 [DOI] [PubMed] [Google Scholar]
- 25. Kongwattanakul K, Saksiriwuttho P, Komwilaisak R, Lumbiganon P. Short cervix detection in pregnant women by transabdominal sonography with post‐void technique. J Med Ultrason (2001) 2016; 43(4): 519–22. [DOI] [PubMed] [Google Scholar]
- 26. Pandipati S, Combs CA, Fishman A, Lee SY, Mallory K, Ianovich F. Prospective evaluation of a protocol for using transabdominal ultrasound to screen for short cervix. Am J Obstet Gynecol 2015; 213(1): 99 e1–e13. [DOI] [PubMed] [Google Scholar]
- 27. Pandipati S, Combs CA, Fishman A. Transabdominal ultrasound for cervical length screening (or not?). Am J Obstet Gynecol 2017; 216(6): 621–2. [DOI] [PubMed] [Google Scholar]
- 28. Gascon A, Goya M, Mendoza M, Gracia‐Perez‐Bonfils A, Higueras T, Calero I, et al. Intraobserver and interobserver variability in first‐trimester transvaginal ultrasound cervical length. J Matern Fetal Neonatal Med 2020; 33(1): 136–41. [DOI] [PubMed] [Google Scholar]
- 29. Tolsgaard MG, Rasmussen MB, Tappert C, Sundler M, Sorensen JL, Ottesen B, et al. Which factors are associated with trainees' confidence in performing obstetric and gynecological ultrasound examinations? Ultrasound Obstet Gynecol 2014; 43(4): 444–51. [DOI] [PubMed] [Google Scholar]
- 30. Berghella V, Bega G, Tolosa JE, Berghella M. Ultrasound assessment of the cervix. Clin Obstet Gynecol 2003; 46(4): 947–62. [DOI] [PubMed] [Google Scholar]
- 31. Stamilio D, Carlson LM. Transabdominal ultrasound is appropriate. Am J Obstet Gynecol 2016; 215(6): 739–43.e1. [DOI] [PubMed] [Google Scholar]
- 32. Farras Llobet A, Regincos Marti L, Higueras T, Calero Fernandez IZ, Gascon Portales A, Goya Canino MM, et al. The uterocervical angle and its relationship with preterm birth. J Matern Fetal Neonatal Med 2018; 31(14): 1881–4. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1 Transabdominal mid‐trimester CL database graded images.
