Skip to main content
Maternal-Fetal Medicine logoLink to Maternal-Fetal Medicine
. 2022 Apr 26;4(2):95–102. doi: 10.1097/FM9.0000000000000146

Interobserver Agreement and Reliability of Intrapartum Nonreassuring Cardiotocography and Prediction of Neonatal Acidemia

Zhuyu Li 1, Yan Wang 2, Jian Cai 1, Peizhen Zhao 3, Hanqing Chen 1, Haiyan Liu 1, Lixia Shen 1, Lian Chen 2, Shufang Li 2, Yangyu Zhao 2, Zilian Wang 1,
Editor: Yang Pan
PMCID: PMC12094385  PMID: 40406449

Abstract

Objective:

To evaluate the agreement and reliability of intrapartum nonreasurring cardiotocography (CTG) interpretation and prediction of neonatal acidemia by obstetricians working in different centers.

Methods:

A retrospective cohort study involving two tertiary hospitals (The First Affiliated Hospital of Sun Yat-sen University and Perking University Third Hospital) was conducted between 30th September 2018 and 1st April 2019. Six obstetricians from two hospitals with three levels of experience (junior, medium, and senior) reviewed 100 nonreassuring fetal heart rate (FHR) tracings from 1 hour before the onset of abnormalities until delivery. Each reviewer determined the FHR pattern, the baseline, variability, and presence of acceleration, deceleration, sinusoidal pattern, and predicted whether neonatal acidemia and abnormal umbilical arterial pH < 7.1 would occur. Inter-observer agreement was assessed using the proportions of agreement (Pa) and the proportion of specific agreement (Pa for each category). Reliability was evaluated with the kappa statistic (k-Light's kappa for n raters) and Gwet's AC1 statistic.

Results:

Good inter-observer agreement was found in evaluation of most variables (Pa > 0.5), with the exception of early deceleration (Pa = 0.39, 95% confidence interval (CI): 0.36,0.43). Reliability was also good among most variables (AC1 > 0.40), except for acceleration, early deceleration, and prediction of neonatal acidemia (AC1 = 0.17, 0.10, and 0.25, respectively). There were no statistically significant differences among the three groups, except in the identification of accelerations (Pa = 0.89, 95% CI: 0.83,0.95; Pa = 0.50, 95% CI: 0.41,0.60, and Pa = 0.35, 95% CI: 0.25,0.43 in junior, medium and senior groups, respectively) and the prediction of neonatal acidemia (Pa = 0.52, 0.52, and 0.62 in junior, medium and senior groups, respectively), where agreement was highest and lowest in the junior-level group, respectively. The accuracy and sensitivity of the prediction for umbilical artery pH < 7.1 were similar among the three groups, but the specificity was higher in the senior groups (93.68% vs. 92.53% vs. 98.85% in junior, medium and senior groups, P = 0.015).

Conclusion:

Although we found a good inter-observer agreement in the evaluation of the most basic CTG features and FHR category statistically, it was insufficient to meet the clinical requirements for “no objection” interpretation for FHR tracings. Further specialized training is needed for standardized interpretation of intrapartum FHR tracings.

Keywords: Cardiotocography, Fetal monitoring, Fetal distress

Introduction

Continuous intrapartum electronic fetal monitoring (EFM) is an integral part of intrapartum care to provide a screening test to predict the development of asphyxia in the fetus, subsequent hypoxic ischemic encephalopathy, long-term neurologic damage, and even fetal death.1 The widespread use of intrapartum fetal heart rate (FHR) monitoring has helped to decrease the rate of neonatal seizures at the cost of increasing operative delivery in western countries.2

In China, expert consensus on the application of EFM3 was established in 2015 based on the National Institute of Child Health and Human Development (NICHD) workshop report4 and the American College of Obstetricians and Gynecologists (ACOG).5 A three-tier system for the categorization of fetal heart rate (FHR) patterns including Category I, II and III is recommended. Category II includes a broad spectrum of heterogeneous FHR patterns that are inconsistently associated with fetal acidemia, making clinical management of these situations more uncertain.

The analysis of FHR tracing have shown previously demonstrated to be controversial. Several studies69 have shown that it has inadequate interobserver variability, particularly in cases of abnormal FHR tracing; while a few other investigations10,11 found good agreement and reliability in the evaluation of cardiotocographic tracings.

The aims of this study were (1) to ascertain the agreement and reliability among obstetricians with different skill levels on the interpretation of nonreassuring intrapartum EFM, and (2) to explore the accuracy of prediction for neonatal acidemia.

Material and methods

A retrospective cohort study involving two tertiary hospitals located in Southern and Northern China (The First Affiliated Hospital of Sun Yat-sen University and Perking University Third Hospital) was conducted between 30th September 2018 and 1st April 2019. This study was approved by The First Affiliated Hospital of Sun Yat-sen University Medical Ethics Committee (approval number: 2021 No. 761), and the ethics committee waived the signing of informed consent.

One attending obstetrician from each hospital who was not involved with this study selected 50 tracings respectively, all of which were diagnosed as Category II FHR tracings (according to the Chinese experts’ consensus on EFM in 2015). The inclusion criteria were singleton pregnancy in cephalic presentation, >37  weeks of gestation, successfully vaginal delivery, nonreassuring intrapartum FHR (as diagnosed by two appointed obstetricians), and available results of umbilical arterial pH. Nonreassuring intrapartum FHR was defined by the occurrence of recurrent decelerations at least 15  beats/min below the baseline, lasting for at least 15  seconds in duration, irrespective of its severity; or the presence of baseline alterations; or the presence of minimal/absent variability. Continuous intrapartum EFM was started with the active phase of labor until delivery for all included patients. We excluded patients with multiple pregnancy, fetal anomalies, preterm delivery, and cesarean delivery. Cases were subsequently excluded if one of the following situations was documented: total nonreassuring FHR tracing lasting for <60 minutes and/or signal loss in the last hour of the tracing exceeding 10%.

The maternal baseline characteristics including age, gravidity, parity, gestational age, history of cesarean section, pregnancy complications (such as gestational diabetes mellitus and pregnancy-induced hypertension), and some characteristics associated with fetal growth such as estimated fetal weight, and the amniotic fluid were provided to reviewers. All markings on the tracing were completely obscured, and no reviewer was made aware of the neonatal outcomes, neonatal resuscitation measures, or each other's interpretation. A total of 100 EFM tracings were reviewed independently by obstetricians with three different skill and experience levels from each hospital. In total, there were six reviewers: two senior-level obstetricians with obstetric experience of more than 10  years, two medium-level obstetricians with obstetric experience of 5–10  years, and two junior-level obstetricians with obstetric experience of ≤ 3  years. The definitions and recommendations of the 2015 Chinese experts consensus3 on the intrapartum management of FHR tracing were used at each included institute. Each reviewer determined the FHR pattern (Category I, II, III); FHR baseline (normal, tachycardia, bradycardia); decrease or absence of variability; presence of acceleration; and whether recurrent (present with 50% of contractions in 15 minutes), late, variable or prolonged deceleration; and sinusoidal pattern. Last, the six obstetricians attempted to predict whether the neonatal acidemia and umbilical arterial pH < 7.1 would occur. Neonatal acidemia was defined as umbilical arterial (immediately after birth) pH < 7.20  and/or base excess (BE) <−12.00  mmol/L.12 Additionally, the sensitivity and specificity for umbilical arterial pH < 7.1 was calculated. Specificity = number of true negatives /(number of true negatives + number of false positives) × 100%. Sensitivity = number of true positives/(number of true positives + number of false negatives) × 100%.

Statistical analyses

According to the recommendations of the “Reporting Reliability and Agreement Studies Guidelines (GRRAS),” the agreement among observers was evaluated using the proportions of agreement (Pa) and the specific agreement ratio (Pa of each type of agreement).13 Pa = 0.5 means that when one observer attribute has a certain classification, there is a 50% probability that another observer will classify the same attribute. If the 95% confidence interval(CI) of the lower limit is Pa < 0.50, the agreement is considered poor. Kappa statistics (k-Light's kappa for n raters) and Gwet's AC1 statistics (the first-order agreement coefficient) were used to evaluate reliability. Kappa adjusts Pa to the agreement expected by chance, so the distribution of ratings at different classes influences the results. When the prevalence of a certain rating is high or low, it is possible to obtain a high percentage of consistency and a lower kappa.14 Kappa values <0.20 were considered to indicate slight reliability; between 0.21 and 0.40, fair reliability; between 0.41 and 0.60, medium reliability; between 0.61 and 0.80, substantial reliability; and values >0.80 indicated almost perfect reliability.15 However, Gwet's AC1 was shown to provide a more stable inter-rater reliability coefficient than Kappa.16 Further, Gwet's AC1 was also found to be less affected by prevalence and marginal probability than Kappa, so we used this index for inter-observer reliability analysis. AC1 redefines the probability of chance consistency, and believes that when researchers are unsure of what kind of outcomes to rate the subject, random judgments will be made, and random judgments will lead to chance consistency.1618 The predictive accuracy for umbilical artery pH < 7.1 was assessed using the χ2 test (or Fisher's exact test in case of small cell frequencies).

All statistical analysis was performed using the R package obs.agree version 1.0.

Results

With the screening strategy, 370 cases and 812 cases were initially retrieved by Peking University Third Hospital and the First Affiliated Hospital of Sun Yat-sen University, respectively. Finally, a total of 100 cases from two hospitals with 50 cases per hospital were selected. The flowchart of the screening process of the cases included in the study is shown in Figure 1.

Figure 1.

Figure 1

Flowchart of screening process of the cases included in the study. A Flowchart of screening process in Peking University Third Hospital. B Flowchart of screening process in the First Affiliated Hospital of Sun Yat-sen University. FHR: Fetal heart rate.

The characteristics of the 100 cases submitted to the reviewers are presented in Table 1. Patients delivered at a mean term of 39.8 weeks of gestation. The mean duration between the beginning of abnormal EFM and delivery was 104.1 ± 43.9 minutes. Seventy-five patients had spontaneous vaginal delivery, and instrumental delivery was performed in 25 cases. Details of adverse neonatal outcome are as follows: the umbilical cord arterial pH was <7.1 in 10 cases, the Apgar score was ≤7 at 1 minute in one case, and 18 cases needed admission to the neonatal department.

Table 1.

Characteristics of the 100 cases submitted to the reviewers.

Characteristics Overall (n = 100)
Patients
 Mean age (mean ± SD), years 31.4 ± 3.7
 Advantaged age (≥35 years), n (%) 20 (20)
 Multiparity, n (%) 14 (14)
 Gestational diabetes, n (%) 18 (18)
 Previous cesarean delivery, n (%) 3 (3)
 Preeclampsia, n (%) 3 (3)
 Mean duration of gestation (mean ± SD), weeks 39.8 ± 1.0
 Induction of labor, n (%) 28 (28)
 Duration between the beginning of abnormal EFM and delivery (mean ± SD), minutes 104.1 ± 43.9
 Meconium-stained amniotic fluid, n (%) 16 (16)
Delivery, n (%)
 Spontaneous vaginal delivery 75 (75)
 Instrumental delivery 25 (25)
Neonates
 Birthweight (mean ± SD), g 3333 ± 316
 Male, n (%) 54 (54)
Adverse neonatal outcome, n (%)
 1 minute Apgar score ≤7 1 (1)
 5 minute Apgar score ≤7 0 (0)
Fetal umbilical cord arterial pH (mean ± SD) 7.22 ± 0.1
 pH < 7.1, n (%) 10 (10)
 pH ≥ 7.1, n (%) 90 (90)
Admission to neonatal department, n (%) 18 (18)

Data are presented as mean ± SD or n (%).

SD: Standard deviation; EFM: Electronic fetal monitoring.

In total, 600 trials from six obstetricians were performed on each cardiotocography (CTG) feature. This article analyzes the baseline indicators of FHR, variability, accelerations, deceleration, sinusoidal pattern, pattern of FHR tracings, and prediction of neonatal acidemia and umbilical arterial pH <7.1.

The results of inter-observer agreement and reliability are displayed in Table 2. The inter-observer Pa value for FHR parameters were as follows: baseline FHR (0.94), variability (0.88), acceleration (0.55), early deceleration (0.39), variable deceleration (0.57), late deceleration (0.77), prolonged deceleration (0.70), sinusoidal pattern (0.99), prediction of neonatal acidemia (0.61), whether umbilical arterial pH < 7.1 (0.92), and pattern of FHR tracings (1.00). As shown above, a good inter-observer agreement was found in the evaluation of most variables except early deceleration.

Table 2.

Inter-observer agreement and reliability among six reviewers for evaluation of basic FHR features, pattern, and prediction of neonatal acidemia and umbilical arterial pH < 7.1.

Total from all six reviewers

Variables Number of times rated (600 ratings) Pa (95% CI) Kappa (95% CI) AC1 (95% CI)
Baseline 0.94 (0.91,0.97) 0.46 (0.26,0.74) 0.94 (0.90,0.97)
 Bradycardia 1 0.00 (0.00,0.00)
 Normal 564 0.97 (0.95,0.98)
 Tachycardia 35 0.51 (0.27,0.69)
Variability 0.88 (0.84,0.93) 0.11 (0.09,0.17) 0.88 (0.83, 0.92)
 Absent 3 0.00 (0.00,0.00)
 Minimal 40 0.18 (0.08,0.28)
 Normal 557 0.94 (0.92,0.96)
Accelerations 0.55 (0.52,0.58) 0.09 (0.05,0.13) 0.17 (0.07, 0.26)
 No 212 0.36 (0.31,0.40)
 Yes 388 0.65 (0.60,0.69)
Early decelerations 0.39 (0.36,0.43) 0.09 (0.11,0.25) 0.10 (0.05,0.16)
 No 270 0.48 (0.43,0.53)
 Intermittent 189 0.31 (0.25,0.36)
 Recurrent 141 0.34 (0.26,0.42)
Variable decelerations 0.57 (0.54,0.62) 0.19 (0.14,0.34) 0.43 (0.36, 0.49)
 No 35 0.16 (0.08,0.22)
 Intermittent 204 0.44 (0.37,0.50)
 Recurrent 361 0.69 (0.64,0.74)
Late decelerations 0.77 (0.73,0.82) 0.10 (0.06,0.34) 0.74 (0.68, 0.80)
 No 520 0.87 (0.85,0.90)
 Intermittent 62 0.10 (0.05,0.16)
 Recurrent 18 0.11 (0.05,0.16)
Prolonged decelerations 0.70 (0.66,0.75) 0.38 (0.30,0.47) 0.44 (0.33,0.55)
 No 372 0.76 (0.71,0.81)
 Yes 228 0.61 (0.53,0.66)
Sinusoidal pattern 0.99 (0.98,1.00) N/A 0.99 (0.98,1.00)
 No 598 1.00 (0.99,1.00)
 Yes 2 0.00 (0.00,0.00)
Neonatal acidemia 0.61 (0.57,0.65) 0.23 (0.16,0.30) 0.25 (0.16, 0.33)
 No 358 0.67 (0.62,0.72)
 Yes 242 0.52 (0.43,0.58)
pH<7.1 0.92 (0.89,0.95) N/A 0.91 (0.87, 0.95)
 No 572 0.96 (0.94,0.97)
 Yes 28 0.11 (0.03,0.19)
Pattern of FHR tracings 1.00 (1.00,1.00) N/A 1.00 (1.00,1.00)
 I 0 0.00 (0.00,0.00)
 II 600 1.00 (1.00,1.00)
 III 0 0.00 (0.00,0.00)

N/A: Not applicable.

CI: Confidence interval; FHR: Fetal heart rate; Pa: Proportions of agreement.

Reliability was also prefect among most variables, except for acceleration, early deceleration, and prediction of neonatal acidemia, with low AC1 value 0.17, 0.10, and 0.25, respectively.

Table 3 shows the comparison of the agreement and reliability among the three groups of observers. The Pa of FHR baseline, accelerations, and late decelerations of the junior group respectively was 0.96 (95% CI: 0.92–0.99), 0.89 (95% CI: 0.83–0.95), and 0.92 (95% CI: 0.86–0.97), which were higher than those of the other two groups. The agreement of sinusoidal pattern and pattern of FHR tracings was high and similar for the three groups; however, early decelerations were poor. The Pa value of variable decelerations, neonatal acidemia, and umbilical arterial pH < 7.1 of the senior group were 0.60, 0.62, and 0.98, respectively, which were higher than those of the other two groups.

Table 3.

Inter-observer agreement and reliability between the same level reviewers for evaluation of basic FHR features, pattern, and prediction of neonatal acidemia and umbilical arterial pH<7.1

Junior level Medium level Senior level



Variables Number of times rated (200 ratings) Pa (95% CI) Kappa (95% CI) AC1 (95% CI) Number of times rated (200 ratings) Pa (95% CI) Kappa (95% CI) AC1 (95% CI) Number of times rated (200 ratings) Pa (95% CI) Kappa (95% CI) AC1 (95% CI)
Baseline 0.96 (0.92,0.99) 0.58 (0.00,0.90) 0.96 (0.92,1.00) 0.90 (0.84,0.96) 0.14 (0.03,0.64) 0.89 (0.83, 0.96) 0.94 (0.90,0.99) 0.54 (0.14,0.85) 0.94 (0.88, 0.99)
 Bradycardia 1 0.00 (0.00,0.00) 0 0.00 (0.00,0.00) 0 0.00 (0.00,0.00)
 Normal 190 0.98 (0.96,0.99) 188 0.95 (0.91,0.98) 186 0.97 (0.94,0.99)
 Tachycardia 9 0.67 (0.00,1.00) 12 0.17 (0.00,0.50) 14 0.57 (0.20,0.88)
Variability 0.90 (0.84,0.96) 0.00 (0.00,0.00) 0.89 (0.83,0.96) 0.94 (0.90,0.98) 0.24 (0.00,0.85) 0.94 (0.89, 0.99) 0.77 (0.69,0.85) 0.03 (0.13,0.24) 0.74 (0.64, 0.85)
 Absent 1 0.00 (0.00,0.00) 0 0.00 (0.00,0.00) 2 0.00 (0.00,0.00)
 Minimal 9 0.00 (0.00,0.00) 8 0.25 (0.00,0.67) 23 0.09 (0.00,0.26)
 Normal 190 0.95 (0.91,0.98) 192 0.97 (0.95,0.99) 175 0.87 (0.81,0.92)
Accelerations 0.89 (0.83,0.95) 0.22 (0.04,0.63) 0.87 (0.79, 0.95) 0.50 (0.41,0.60) 0.16 (0.05,0.30) 0.00 (−0.20,0.20) 0.35 (0.25,0.43) 0.07 (0.00,0.14) −0.30 (−0.49,−0.10)
 No 15 0.27 (0.00,0.57) 102 0.51 (0.39,0.63) 95 0.32 (0.19,0.44)
 Yes 185 0.94 (0.90,0.97) 98 0.49 (0.36,0.61) 105 0.38 (0.25,0.47)
Early decelerations 0.35 (0.25,0.45) 0.09 (0.10,0.40) 0.03 (−0.12,0.17) 0.46 (0.37,0.56) 0.12 (0.01,0.32) 0.27 (0.12, 0.42) 0.34 (0.26,0.43) 0.10 (0.01,0.31) 0.03 (−0.11, 0.17)
 No 62 0.29 (0.15,0.43) 125 0.59 (0.49,0.70) 83 0.31 (0.19,0.44)
 Intermittent 62 0.19 (0.07,0.33) 52 0.23 (0.08,0.39) 75 0.35 (0.21,0.47)
 Recurrent 76 0.53 (0.37,0.66) 23 0.26 (0.00,0.48) 42 0.38 (0.18,0.57)
Variable decelerations 0.52 (0.44,0.62) 0.14 (0.00,0.42) 0.34 (0.19, 0.48) 0.57 (0.47,0.66) 0.16 (0.01,0.37) 0.42 (0.28, 0.56) 0.60 (0.50,0.69) 0.19 (0.12,0.51) 0.48 (0.33, 0.62)
 No 17 0.24 (0.00,0.53) 6 0.00 (0.00,0.00) 12 0.17 (0.00,0.33)
 Intermittent 72 0.42 (0.27,0.56) 79 0.48 (0.33,0.61) 53 0.30 (0.14,0.45)
 Recurrent 111 0.63 (0.52,0.74) 115 0.66 (0.56,0.75) 135 0.76 (0.67,0.83)
Late decelerations 0.92 (0.86,0.97) 0.24 (0.01,0.74) 0.92 (0.86, 0.98) 0.68 (0.59,0.77) 0.00 (0.06,0.11) 0.63 (0.50, 0.75) 0.70 (0.61,0.79) 0.10 (0.01,0.24) 0.65 (0.53, 0.77)
 No 189 0.96 (0.93,0.98) 166 0.81 (0.74,0.87) 165 0.82 (0.76,0.89)
 Intermittent 9 0.22 (0.00,0.50) 28 0.07 (0.00,0.21) 25 0.16 (0.00,0.37)
 Recurrent 2 0.00 (0.00,0.00) 6 0.00 (0.00,0.00) 10 0.00 (0.00,0.00)
Prolonged decelerations 0.70 (0.60,0.78) 0.31 (0.12,0.49) 0.48 (0.30,0.67) 0.79 (0.70,0.86) 0.58 (0.42,0.73) 0.58 (0.42, 0.74) 0.77 (0.69,0.85) 0.47 (0.25,0.64) 0.60 (0.43, 0.76)
 No 140 0.79 (0.71,0.85) 95 0.78 (0.67,0.86) 137 0.83 (0.76,0.90)
 Yes 60 0.50 (0.31,0.64) 105 0.80 (0.71,0.87) 63 0.63 (0.48,0.76)
Sinusoidal pattern 1.00 (1.00,1.00) N/A 1.00 (1.00,1.00) 0.99 (0.97,1.00) N/A 0.99 (0.97, 1.00) 0.99 (0.97,1.00) N/A 0.99 (0.97, 1.00)
 No 200 1.00 (1.00,1.00) 199 0.99 (0.98,1.00) 199 0.99 (0.98,1.00)
 Yes 0 0.00 (0.00,0.00) 1 0.00 (0.00,0.00) 1 0.00 (0.00,0.00)
Neonatal acidemia 0.52 (0.43,0.62) 0.14 (0.02,0.28) 0.08 (−0.13,0.30) 0.52 (0.42,0.62) 0.18 (0.07,0.30) 0.06 (−0.14,0.27) 0.62 (0.53,0.72) 0.25 (0.08,0.43) 0.27 (0.07, 0.47)
 No 122 0.61 (0.50,0.70) 116 0.59 (0.48,0.69) 120 0.68 (0.58,0.78)
 Yes 78 0.38 (0.23,0.50) 84 0.43 (0.28,0.55) 80 0.53 (0.38,0.65)
pH<7.1 0.88 (0.81,0.94) 0.03 (0.00,0.10) 0.86 (0.78,0.95) 0.86 (0.79,0.93) 0.00 (0.00,0.00) 0.84 (0.75,0.93) 0.98 (0.96,1.00) N/A 0.98 (0.95, 1.00)
 No 188 0.94 (0.90,0.97) 186 0.92 (0.88,0.96) 198 0.99 (0.98,1.00)
 Yes 12 0.00 (0.00,0.00) 14 0.00 (0.00,0.00) 2 0.00 (0.00,0.00)
Pattern of FHR tracings 1.00 (1.00,1.00) N/A 1.00 (1.00,1.00) 1.00 (1.00,1.00) N/A 1.00 (1.00,1.00) 1.00 (1.00,1.00) N/A 1.00 (1.00,1.00)
 I 0 0.00 (0.00,0.00) 0 0.00 (0.00,0.00) 0 0.00 (0.00,0.00)
 II 200 1.00 (1.00,1.00) 200 1.00 (1.00,1.00) 200 1.00 (1.00,1.00)
 III 0 0.00 (0.00,0.00) 0 0.00 (0.00,0.00) 0 0.00 (0.00,0.00)

NA: Not applicable.

CI: Confidence interval; FHR: Fetal heart rate; Pa: Proportions of agreement.

Agreement and reliability results among observers from two different hospitals are shown in Table 4. A good inter-observer agreement between two hospitals was also found the in the evaluation of most variables, with the exception of early decelerations, with a low Pa value of 0.38. Reliability was almost perfect among most variables, while the reliability was poor with accelerations, early decelerations and neonatal acidemia, with low AC1 values of 0.23, 0.09, and 0.14, respectively.

Table 4.

Inter-observer agreement and reliability between two different hospitals for evaluation of basic FHR features, pattern, and prediction of neonatal acidemia and umbilical arterial pH < 7.1.

Total from all six reviewers

Variables Number of times rated (600 ratings) Pa (95% CI) Kappa (95% CI) AC1 (95% CI)
Baseline 0.93 (0.91,0.96) 0.18 (0.09,0.26) 0.93 (0.90,0.96)
 Bradycardia 1 0.00 (0.00,0.00)
 Normal 564 0.96 (0.95,0.98)
 Tachycardia 35 0.46 (0.22,0.67)
Variability 0.87 (0.83,0.91) 0.05 (−0.04,0.20) 0.86 (0.82,0.9)
 Absent 3 0.00 (0.00,0.00)
 Minimal 40 0.10 (0.00,0.23)
 Normal 557 0.93 (0.91,0.95)
Accelerations 0.58 (0.52,0.63) 0.18 (0.10,0.27) 0.23 (0.11,0.35)
 No 212 0.41 (0.32,0.49)
 Yes 388 0.68 (0.62,0.73)
Early decelerations 0.38 (0.33,0.44) 0.05 (0.06,0.28) 0.09 (0.01,0.17)
 No 270 0.44 (0.36,0.51)
 Intermittent 189 0.26 (0.17,0.35)
 Recurrent 141 0.44 (0.34,0.55)
Variable decelerations 0.56 (0.50,0.62) 0.16 (0.13,0.37) 0.41 (0.33,0.49)
 No 35 0.17 (0.00,0.34)
 Intermittent 204 0.41 (0.32,0.49)
 Recurrent 361 0.69 (0.62,0.74)
Late decelerations 0.77 (0.72,0.81) 0.06 (−0.02,0.16) 0.74 (0.67,0.80)
 No 520 0.87 (0.84,0.90)
 Intermittent 62 0.13 (0.00,0.26)
 Recurrent 18 0.00 (0.00,0.00)
Prolonged decelerations 0.75 (0.70,0.80) 0.48 (0.37,0.57) 0.53 (0.44,0.63)
 No 372 0.80 (0.75,0.84)
 Yes 228 0.68 (0.60,0.74)
Sinusoidal pattern 0.99 (0.99,1.00) 0.00 (−0.01,0.00) 0.99 (0.98,1.00)
 No 598 1.00 (0.99,1.00)
 Yes 2 0.00 (0.00,0.00)
Neonatal acidemia 0.55 (0.50,0.61) 0.18 (0.10,0.26) 0.14 (0.02,0.26)
 No 358 0.63 (0.56,0.68)
 Yes 242 0.45 (0.37,0.53)
pH<7.1 0.91 (0.87,0.93) −0.02 (−0.05,−0.01) 0.90 (0.86,0.94)
 No 572 0.95 (0.93,0.97)
 Yes 28 0.00 (0.00,0.00)

CI: Confidence interval; FHR: Fetal heart rate; Pa: Proportions of agreement.

The predictive accuracy, sensitivity, and specificity of the three groups for the umbilical artery pH < 7.1 are listed in Table 5. The accuracy was similar among three groups (P = 0.285), whereas the specificity was higher in the senior group (98.85%) than the other two groups (93.68% and 92.53%) (P = 0.015).

Table 5.

Predictive accuracy of six reviewers for umbilical artery pH < 7.1.

Variables Junior levels Medium levels Senior levels χ2 P
Specificity (%) 93.68 (88.68–96.64) 92.53 (87.29–95.80) 98.85 (95.47–99.80) 8.338 0.015
Sensitivity (%) 0.00 (0.00–20.05) 5.00 (0.26–26.94) 0.00 (0.00–20.05) 1.851 1.000
Accuracy (%) 84.02 (77.90–88.73) 83.51 (77.36–88.29) 88.66 (83.13–92.60) 2.507 0.285

Data are presented as percentage (range).

Junior levels vs. senior levels P < 0.05.

Medium levels vs. senior levels P < 0.05.

Fisher's exact test.

Discussion

Statistically speaking, we found a good inter-observer agreement in the evaluation of most basic CTG features and FHR category, which was similar to Rei et al.'s report.11 The pattern of FHR tracings was defined as category II by all reviewers, which showed a perfect agreement. A good agreement was observed in CTG features that occur more frequently, such as baseline, variability, variable, late decelerations, prolonged decelerations and sinusoidal. However, early decelerations failed to achieve similar results, regardless of whether inter-observer variability was at the same level. The possible reason may be that early deceleration is less harmful; thus, we do not pay enough attention to master the normative interpretation that should be improved in the future.

Although we found that was a good inter-observer agreement statistically (defined as Pa>0.5) in the evaluation of the most basic CTG features, it was insufficient to meet ideally clinical requirements for “no objection” interpretation for FHR tracings. More training for standardized interpretation of intrapartum FHR tracings are needed for obstetricians with different skill and experience levels, because we found inter-observer agreement and reliability were not strongly affected by clinicians’ years of practice. However, the specificity on the prediction of neonatal acidemia were better for senior-level obstetricians than their lower-level peers, which may because the prognosis for the neonatal acidemia needs overall evaluation. Regarding the prediction of umbilical arterial pH < 7.1, young doctors did not take into account the tolerance of the fetus to acidosis due to lack of experience.19 The evaluation depends not only on the FHR tracings but also on the maternal complications and fetal factors. In our study, the reviewers were aware of the medical and obstetric complications of pregnancies that may influence the judgment, thus senior-level obstetricians were better at predicting umbilical arterial pH.

Several studies8,10 have shown that CTG has a high specificity and limited sensitivity in the prediction of fetal hypoxia/acidosis, which was consistent with our finding. High specificity in our study means that the reviewers had sufficient ability to correctly judge “umbilical arterial pH≥7.1”. However, given the fact that only 10% of the cases included in our study had umbilical arterial pH<7.1, the result has limited significance for predicting adverse neonatal outcomes such as neonatal acidemia. Further studies should increase the sample size of cases with adverse neonatal outcomes to confirm these results. Low sensitivity means that reviewers mistakenly tended to regard those “umbilical cord pH<7.1” as “umbilical arterial pH≥7.1”. In the study of Chauhan et al.,8 based on the ACOG guideline for interpreting EFM, the prediction sensitivities of five clinicians for abnormal outcomes (umbilical artery pH < 7.00) was also low with a value of zero. Additionally, Santo et al.10 found that the sensitivity of all tracings classified by observers as pathological or category III in prediction of newborn acidemia was lowest with the ACOG guideline than with the FIGO and NICE guideline. Chinese experts’ consensus3 on EFM based on the AOCG guideline and the ACOG guidelines5 both tended to classify abnormal patterns more in category II, because of the more restrictive criteria for category III. Some acidemia cases were classified in category II; hence, the tendency for a lower sensitivity and higher specificity of these guidelines.

The main strengths of the study are that it involved clinicians working in different centers where the CTG guidelines were routinely used. The selection of different years of clinical experience also contributes to a greater generalizability of results. As for the selection of tracings, only vaginal deliveries with continuous fetal monitoring to the end of the delivery were included, so that the intrapartum EFM tracing that was used for prediction of umbilical artery pH and fetal acidemia would reflect the real-life situations. Finally, we also used the AC1 index to represent inter-rater reliability. As is known, we would achieve a low kappa when the prevalence of a given rating is very high, while AC1 was found to be less affected by prevalence and marginal probability than Kappa.

Our study also has some limitations. First is the retrospective nature of the study. Second, the sample size of few subtypes of parameters including minimal variability, recurrent late decelerations and umbilical arterial pH < 7.1 were small, which likely reduced the detection efficiency of agreement. More cases with more nonreassuring FHR tracings and more adverse neonatal outcome should be included in future studies. Third, the sample size of the reviewers was small considering that we aimed to analyze the reviewers’ differences in judgement, which should also be increased in future research. In addition, although we did not know which patients had undesirable outcomes, we were aware that the number could be high. Thus, we did not adhere to the ACOG recommendation, in that we reinterpreted the FHR tracing knowing the outcomes.20

Conclusions

The results of the present study suggest that tracing analysis according to Chinese experts’ consensus on EFM achieves a good statistical agreement in the identification of Category II tracing and for most CTG features. However, it was insufficient to meet ideally clinical requirements for “no objection” interpretation for FHR tracings. More extensive training is required for standardized interpretation of intrapartum FHR tracings.

Funding

The study was supported by the National Natural Science Foundation of China (No. 81771606) and undergraduate course teaching reform project of Sun Yat-sen University, China (No.80000-16300046).

Author Contributions

Zhuyu Li and Zilian Wang conceived the study concept and design. Zhuyu Li, Yan Wang, Jian Cai, Hanqing Chen, Lixia Shen, Lian Chen and Shufang Li contributed to the performance of the research. Zhuyu Li, Peizhen Zhao and Haiyan Liu contributed to data analysis. Zhuyu Li and Yan Wang contributed to the writing of the first manuscript and all authors provided revision for the final manuscript.

Conflicts of Interest

None.

Editor Note

Zilian Wang is an Editorial Board Member of Maternal-Fetal Medicine. The article was subject to the journal's standard procedures, with peer review handled independently of this editor and their research groups.

References

  • [1].Williams KP, Galerneau F. Intrapartum fetal heart rate patterns in the prediction of neonatal acidemia. Am J Obstet Gynecol 2003;188(3):820–823. doi: 10.1067/mob.2003.183. [DOI] [PubMed] [Google Scholar]
  • [2].Yang M, Stout MJ, López JD, et al. Association of fetal heart rate baseline change and neonatal outcomes. Am J Perinatol 2017;34(9):879–886. doi: 10.1055/s-0037-1600911. [DOI] [PubMed] [Google Scholar]
  • [3].Chinese Society of Perinatal Medicine. Expert consensus on application of electronic fetal heart monitoring (In Chinese). Chin J Perinat Med 2015;18(7):486–490. doi: 10.3760/cma.j.issn.1007-9408.2015.07.002. [Google Scholar]
  • [4].Macones GA, Hankins GD, Spong CY, et al. The 2008 National Institute of Child Health and Human Development workshop report on electronic fetal monitoring: update on definitions, interpretation, and research guidelines. Obstet Gynecol 2008;112(3):661–666. doi: 10.1097/AOG.0b013e3181841395. [DOI] [PubMed] [Google Scholar]
  • [5].ACOG Practice Bulletin No. 106: Intrapartum fetal heart rate monitoring: nomenclature, interpretation, and general management principles. Obstet Gynecol 2009;114(1):192–202. doi: 10.1097/AOG.0b013e3181aef106. [DOI] [PubMed] [Google Scholar]
  • [6].Blix E, Sviggum O, Koss KS, et al. Inter-observer variation in assessment of 845 labour admission tests: comparison between midwives and obstetricians in the clinical setting and two experts. BJOG 2003;110(1):1–5. [PubMed] [Google Scholar]
  • [7].Sabiani L, Le DÛ R, Loundou A, et al. Intra- and interobserver agreement among obstetric experts in court regarding the review of abnormal fetal heart rate tracings and obstetrical management. Am J Obstet Gynecol 2015;213(6):856. e1-8. doi: 10.1016/j.ajog.2015.08.066. [DOI] [PubMed] [Google Scholar]
  • [8].Chauhan SP, Klauser CK, Woodring TC, et al. Intrapartum nonreassuring fetal heart rate tracing and prediction of adverse outcomes: interobserver variability. Am J Obstet Gynecol 2008;199(6):623.e1–623.e6235. -. doi:10.1016/j.ajog.2008.06.027. [DOI] [PubMed] [Google Scholar]
  • [9].Figueras F, Albela S, Bonino S, et al. Visual analysis of antepartum fetal heart rate tracings: inter- and intra-observer agreement and impact of knowledge of neonatal outcome. J Perinat Med 2005;33(3):241–245. doi: 10.1515/JPM.2005.044. [DOI] [PubMed] [Google Scholar]
  • [10].Santo S, Ayres-de-Campos D, Costa-Santos C, et al. Agreement and accuracy using the FIGO, ACOG and NICE cardiotocography interpretation guidelines. Acta Obstet Gynecol Scand 2017;96(2):166–175. doi: 10.1111/aogs.13064. [DOI] [PubMed] [Google Scholar]
  • [11].Rei M, Tavares S, Pinto P, et al. Interobserver agreement in CTG interpretation using the 2015 FIGO guidelines for intrapartum fetal monitoring. Eur J Obstet Gynecol Reprod Biol 2016;205:27–31. doi: 10.1016/j.ejogrb.2016.08.017. [DOI] [PubMed] [Google Scholar]
  • [12].Low JA, Lindsay BG, Derrick EJ. Threshold of metabolic acidosis associated with newborn complications. Am J Obstet Gynecol 1997;177(6):1391–1394. doi: 10.1016/s0002-9378 (97)70080-2. [DOI] [PubMed] [Google Scholar]
  • [13].Kottner J, Audigé L, Brorson S, et al. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol 2011;64(1):96–106. doi: 10.1016/j.jclinepi.2010.03.002. [DOI] [PubMed] [Google Scholar]
  • [14].Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990;43(6):543–549. doi: 10.1016/0895-4356(90)90158-l. [DOI] [PubMed] [Google Scholar]
  • [15].Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33(1):159–174. [PubMed] [Google Scholar]
  • [16].Wongpakaran N, Wongpakaran T, Wedding D, et al. A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. BMC Med Res Methodol 2013;13:61. doi: 10.1186/1471-2288-13-61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61(Pt 1):29–48. doi: 10.1348/000711006X126600. [DOI] [PubMed] [Google Scholar]
  • [18].Gwet KL. Testing the difference of correlated agreement coefficients for statistical significance. Educ Psychol Meas 2016;76(4):609–637. doi: 10.1177/0013164415596420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Yohai D, Baumfeld Y, Zilberstein T, et al. Does gender of the fetus have any relation with fetal heart monitoring during the first and second stage of labor? J Matern Fetal Neonatal Med 2017;30(2):150–154. doi:10.3109/14767058.2016.1168802. [DOI] [PubMed] [Google Scholar]
  • [20].American College of Obstetricians and Gynecologists. Practice bulletin no. 116: Management of intrapartum fetal heart rate tracings. Obstet Gynecol 2010;116(5):1232–1240. doi: 10.1097/AOG.0b013e3182004fa9. [DOI] [PubMed] [Google Scholar]

Articles from Maternal-Fetal Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES