Skip to main content
PLOS One logoLink to PLOS One
. 2022 Aug 22;17(8):e0273178. doi: 10.1371/journal.pone.0273178

Impact of labor characteristics on maternal and neonatal outcomes of labor: A machine-learning model

Sherif A Shazly 1, Bijan J Borah 1,2,3, Che G Ngufor 3,4, Vanessa E Torbenson 1, Regan N Theiler 1, Abimbola O Famuyide 1,*
Editor: Jonas Bianchi5
PMCID: PMC9394788  PMID: 35994474

Abstract

Introduction

Since Friedman’s seminal publication on laboring women, numerous publications have sought to define normal labor progress. However, there is paucity of data on contemporary labor cervicometry incorporating both maternal and neonatal outcomes. The objective of this study is to establish intrapartum prediction models of unfavorable labor outcomes using machine-learning algorithms.

Materials and methods

Consortium on Safe Labor is a large database consisting of pregnancy and labor characteristics from 12 medical centers in the United States. Outcomes, including maternal and neonatal outcomes, were retrospectively collected. We defined primary outcome as the composite of following unfavorable outcomes: cesarean delivery in active labor, postpartum hemorrhage, intra-amniotic infection, shoulder dystocia, neonatal morbidity, and mortality. Clinical and obstetric parameters at admission and during labor progression were used to build machine-learning risk-prediction models based on the gradient boosting algorithm.

Results

Of 228,438 delivery episodes, 66,586 were eligible for this study. Mean maternal age was 26.95 ± 6.48 years, mean parity was 0.92 ± 1.23, and mean gestational age was 39.35 ± 1.13 weeks. Unfavorable labor outcome was reported in 14,439 (21.68%) deliveries. Starting at a cervical dilation of 4 cm, the area under receiver operating characteristics curve (AUC) of prediction models increased from 0.75 (95% confidence interval, 0.75–0.75) to 0.89 (95% confidence interval, 0.89–0.90) at a dilation of 10 cm. Baseline labor risk score was above 35% in patients with unfavorable outcomes compared to women with favorable outcomes, whose score was below 25%.

Conclusion

Labor risk score is a machine-learning–based score that provides individualized and dynamic alternatives to conventional labor charts. It predicts composite of adverse birth, maternal, and neonatal outcomes as labor progresses. Therefore, it can be deployed in clinical practice to monitor labor progress in real time and support clinical decisions.

Introduction

Management recommendations of labor and delivery evolve constantly to accommodate evidence from literature. A major conundrum every obstetrician faces in managing women in labor is weighing maternal and neonatal risks of delayed intervention against risks of unindicated caesarean delivery (CD). Although the incidence of CD has substantially increased in the past 3 decades [1], there has been no discernible decline in maternal or neonatal adverse outcomes [2]. Labor dystocia represents the most common indication for primary CD, but diagnosis of labor dystocia lacks consistent evidence-based and globally acceptable definition. This may contribute, in part, to the increases in rates of CDs [3].

One of the earliest studies to define normal labor progress was conducted in 1955 by Friedman [4]; based on observation of 500 women in labor, Friedman described the normal course of labor, known as the “Friedman curve”. The World Health Organization (WHO) relied on Freidman’s data in its construction of a labor partogram for managing labor and labor dystocia, particularly in low-resource countries. A recent Cochrane review failed to demonstrate a significant difference in the rate of CD among women who were or were not managed by the WHO partogram [5]. In 2002, Zhang et al [6] studied 1,329 term nulliparous parturient and suggested that the Friedman curve may not be reflective of contemporary labor progress patterns. To verify their hypothesis, Zhang and colleagues conducted a multicenter study that prospectively collected clinical data, including pre-labor characteristics, intrapartum parameters, and maternal and neonatal outcomes of women who delivered at 1 of 12 studied clinical centers across the United States. This has become known as the Consortium on Safe Labor” database [7]. This study created a new labor curve, which substantially modified the management of labor in the United States. However, the authors’ analysis specifically excluded maternal and fetal outcomes data; thus, questions have been raised about the impact of adopting the Zhang labor curve on CD rates as well as maternal and fetal outcomes [8]. The objective of this study is to establish an individualized labor chart, through a series of intrapartum prediction models, using machine-learning algorithms that incorporate data on CD and obstetric outcomes. This dynamic tool may facilitate patient counselling and decision making and reduce the rate of CD, maternal, and neonatal complications.

Materials and methods

The protocol of the current study was reviewed and approved by DASH prior to acquisition of CSL anonymized database. The current study did not require patient contact or new data collection. Thus, institutional review board approval/patient consents were not applicable to the current study. Original data was collected retrospectively by 19 hospitals. All contributing hospitals to this database obtained institutional review boards according to the primary study (Zhang et al 2010).

Study population

The Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) established the NICHD Data and Specimen Hub (DASH), which is a database sharing provider data that ‎enables investigators to use de-identified data from NICHD-funded research ‎studies for the purpose of research. A consortium of 12 clinical centers located in all 9 districts of the American College of Obstetricians and Gynecologists provided electronic obstetric, labor, and newborn data between 2002 and 2008, which created a large database, known as the Consortium on Safe Labor database. This database was used by Zhang et al [7] to create the contemporary labor curves published in 2010. This database includes 228,438 deliveries with a total of 779 antepartum, intrapartum, and postpartum variables. The de-identified version of this database was obtained with permission through a DASH data use agreement for the purpose of this study.

Study outcomes

The aim of the current study is to develop a series of intrapartum models that comprise baseline variables and dynamic (intrapartum) variables to predict the probability of unfavorable labor outcome (labor risk score [LRS]). Unfavorable labor outcome is defined as the composite of 1 of the following unfavorable outcomes: unsuccessful vaginal delivery (CD in active labor), postpartum hemorrhage (defined as estimated blood loss >1,000 mL) or need for transfusion of blood products, suspected or confirmed intra-amniotic infection (IAI), shoulder dystocia, admission to the neonatal intensive care unit (NICU), Apgar score below 7 at 5 minutes, umbilical arterial pH below 7.00, neonatal hypoxemic ischemic encephalopathy, neonatal ventilation use or continuous positive airway pressure therapy, neonatal intracranial hemorrhage, neonatal sepsis, or neonatal death. LRS is a term that describes the probability of unfavorable labor outcome, as calculated by the model.

To accommodate the objectives of this study, women with multifetal pregnancy, intrauterine fetal death, preterm labor (defined as birth at <37 weeks of gestation), fetal anomalies, or women who underwent elective CD, CD for failed induction, fetal malpresentation, cord prolapse, active herpetic lesion, CD performed prior to the onset of active labor (CD at cervical dilation of ≤5 cm), and women with 3 or more prior CDs (history of CD) were excluded. Women with inadequate documentation, defined as documentation of less than 2 cervical examinations, were also excluded from the study.

Prediction models

A set of prediction models were established to predict the primary outcome of this study. A baseline model was created using variables identified at the time of patient admission (baseline predictors). A series of intrapartum prediction models were set up to incorporate dynamic variables determined by pelvic examination starting at a cervical dilation of 4 cm and other parameters, including use of oxytocin to augment labor and meconium-stained amniotic fluid. These variables included current cervical dilation, cervical effacement (categorized as 0%-30%, 40–50%, 60–70%, or 80% or more), head station (categorized as –3, –2, –1, 0, +1, or +2), time interval between current and previous examinations, change in cervical dilation between current and previous examinations, and dilation delta (defined as change in cervical dilation from previous examination divided by time interval between the 2 examinations). Intrapartum variables that could not be linked to a particular cervical dilation (e.g., meconium-stained amniotic fluid) were incorporated into the 10-cm prediction model. Although intrapartum fetal heart rate monitoring was considered in study protocol, it was not included in these models due to lack of documentation of this variable in the Consortium on Safe Labor database.

Each intrapartum prediction model estimated the probability of unfavorable primary outcome (LRS) based on baseline predictors and dynamic labor variables, as well as the most recent LRS estimated using data captured up to the prior examination during labor progression.

Statistical analysis

All data analyses were performed using the R programming environment for statistical computing version 3.5.1 (R Development Core Team, 2018). We reported descriptive statistics of all covariates in the final sample: mean (standard deviation) was used to summarize continuous variables and counts (percentages) for categorical variables.

Intrapartum prediction models

Given that the progress of labor is affected by time-varying (or dynamic) confounders, methods appropriate for adjusting such dynamic confounders are needed to predict maternal and neonatal outcomes more accurately. Methods adopted by Zhang et al [9] were limited in capturing this dynamic aspect of the data. In this study, the use of machine-learning methods capable of capturing representative features from changing labor characteristics is proposed.

Existing analytic methods for labor progression have been based on traditional statistical approaches, which, however, tend to make unrealistic assumptions regarding the functional form of the model and distribution of variables. These assumptions are often not applicable in complex clinical situations such as the dynamic labor process. As a result, the models may not fit the data well and may not be generalizable. Machine-learning methods, on the other hand, can estimate complex relationships between clinical measurements with reasonable accuracy, thus producing robust and consistent estimates without making a priori assumptions. These advanced data analytic techniques have been repeatedly shown to produce astonishing results in many applications in computer sciences, bioinformatics, health care, and elsewhere [1013]. Thus, in this study, we propose applying machine-learning methods to collectively analyze the patterns of changes in usual prenatal and intrapartum variables based on the large DASH database. Specifically, we implemented an incremental extreme gradient boosting (XGBoost) algorithm [14,15], where starting from the baseline model, the dynamic labor variables (at cervical dilation of 4, 5, 6, 7, 8, 9, and 10 cm) are incrementally used to extend the knowledge of an existing XGBoost model.

The (XGBoost) [15] algorithm is a generalized implementation of the gradient boosting machine (GBM) [14] technique with several algorithmic enhancements designed to significantly improve prediction accuracy, training speed and scalability. An important enhancement is the implementation of the of the least absolute shrinkage and selection operator (LASSO) and the ridge (Ridge) regularization methods [16], which are techniques designed to prevent overfitting.

Handling intrapartum time-varying variables

Throughout the labor process, pelvic examination variables are repeatedly measured for each patient, and as such they are potentially correlated, which presents a major challenge for most machine-learning models [11]. Therefore, we aggregated the repeated observations for each patient prior to the current dilation to construct each intrapartum prediction model. Specifically, cross-sectional data for each intrapartum prediction model was created by aggregating dynamic variables to 3 variables: the frequency (count), the median, and the last observed value.

We imputed missing values (with ≤30% missing observations) in the data with the random forest imputation method, missForest [17].

Training and validation

The GBM model requires a number of tuning parameters to be set for optimal performance and to avoid overfitting. Consequently, we set up a grid for each combination of tuning parameters and the best combination selected in a 10-fold cross validation. In 10-fold cross validation, 1 randomly partitions the data into 10 mutually exclusive subsets (or folds); 9 folds are used for training and the hold-out fold for testing the performance of the model. We repeated the entire procedure 10 times and averaged the performances on all test folds and computed confidence intervals. The workflow of the training and validation procedure is illustrated in the supplementary figure (S1 Fig).

Results

Baseline and intrapartum dynamic characteristics

Out of 228,438 delivery episodes that compose the Consortium on Safe Labor database, 66,586 episodes were eligible for this study (S2 Fig). Mean maternal age at admission was 26.95±6.48 years, mean parity was 0.92±1.23, and pre-pregnancy body mass index (BMI) was 25.24±5.58 kg/m2 with a mean of 14.71±5.92 kg weight gain during pregnancy. Race and ethnicity were diverse; 21,155 (31.8%) were white; 23,128 (34.7%) African-American; 14,862 (22.3%) Hispanic; 2,745 (4.1%) Asian/Pacific Islander; 193 (0.3%) multi-racial; 2,072 (3.1%) belonged to other unspecified races; and 2,431 (3.7%) were reported as unknown. Mean gestational age at admission to labor was 39.35±1.13 weeks of gestation. Medical complications of pregnancy included 10,305 (2.0%) diagnosed with pregestational diabetes during that pregnancy; 1,041 (1.6%) diagnosed with gestational diabetes; 1,106 (1.7%) with gestational hypertension; 1,085 (1.6%) with preeclampsia; and 1,085 (1.6%) with chronic hypertension. The rate of prior CD was 2,394 (3.6%) for the entire cohort. Delivery was initiated by labor induction in 31,932 (48.0%) of these episodes. Detailed demographic and clinical characteristics of the study population are shown in Table 1.

Table 1. Characteristics of eligible patients.

Variablesa Patients With Favorable Outcomes
(n = 52,147)
Patients With Unfavorable Outcomes
(n = 14,439)
All Patients
(N = 66,586)
P Value
Maternal age, years 26.80±6.40 27.47±6.73 26.95±6.48 < .001
Parity 1.03±1.26 0.52±1.01 0.92±1.23 < .001
History of macrosomia in previous pregnancies 850 (1.6) 115 (0.8) 965 (1.4) < .001
Prepregnancy BMI, kg/m2 25.05±5.42 25.94±6.05 25.24±5.58 < .001
Pregestational diabetes 941 (1.8) 364 (2.5) 1,305 (2.0) < .001
History of heart disease 473 (0.9) 127 (0.9) 600 (0.9) .757
Antenatal-positive GBS status 10,852 (20.8) 3,103 (21.5) 13,955 (21.0) .076
Smoking 2,674 (5.1) 651 (4.5) 3,325 (5.0) .003
Cerclage placement in current pregnancy 111 (0.2) 28 (0.2) 139 (0.2) .659
Gestational hypertension 796 (1.5) 310 (2.1) 1,106 (1.7) < .001
Preeclampsia 711 (1.4) 374 (2.6) 1,085 (1.6) < .001
Eclampsia 31 (0.1) 9 (0.1) 40 (0.1) .900
Superimposed preeclampsia 364 (0.7) 212 (1.5) 576 (0.9) < .001
Chronic hypertension 549 (1.1) 229 (1.6) 778 (1.2) < .001
Gestational diabetes 725 (1.4) 316 (2.2) 1041 (1.6) < .001
Intrauterine growth restriction 292 (0.6) 79 (0.5) 371 (0.6) .855
Oligohydramnios 967 (1.9) 413 (2.9) 1,380 (2.1) < .001
Polyhydramnios 74 (0.1) 43 (0.3) 117 (0.2) < .001
Maternal weight on admission, kg 81.43±16.29 84.00±17.79 81.99±16.66 < .001
Gestational age on admission 39.31±1.11 39.50±1.17 39.35±1.13 < .001
Maternal ethnicity < .001
    White 16,807 (32.2) 4,348 (30.1) 21,155 (31.8)
    Black 18,055 (34.6) 5,073 (35.1) 23,128 (34.7)
    Hispanic 11,707 (22.4) 3,155 (21.9) 14,862 (22.3)
    Asian/Pacific Islander 2,054 (3.9) 691 (4.8) 2,745 (4.1)
    Multi-racial 153 (0.3) 40 (0.3) 193 (0.3)
    Other 1,567 (3.0) 505 (3.5) 2,072 (3.1)
    Unknown 1,804 (3.5) 627 (4.3) 2,431 (3.7)
Maternal height, m 1.63±0.07 1.62±0.07 1.63±0.07 < .001
Alcohol use 1,134 (2.2) 291 (2.0) 1,425 (2.1) .242
Weight change during pregnancy, kg 14.47±5.82 15.58±6.20 14.71±5.92 < .001
ECV in this pregnancy 92 (0.2) 16 (0.1) 108 (0.2) .083
Pre-pregnancy weight, kg 66.95±15.68 68.39±17.27 67.26±16.05 < .001
Fetal sex < .001
    Female 26,164 (50.2) 6,568 (45.5) 32,732 (49.2)
    Male 25,932 (49.7) 7,836 (54.3) 33,768 (50.7)
    Ambiguous 1 (0.0) 1 (0.0) 2 (0.0)
    Unknown 50 (0.1) 34 (0.2) 84 (0.1)
Previous CDs < .001
    0 50,683 (97.2) 13,509 (93.6) 64,192 (96.4)
    1 1,420 (2.7) 833 (5.8) 2,253 (3.4)
    2 39 (0.1) 87 (0.6) 126 (0.2)
Induction of labor 23,586 (45.2) 8,346 (57.8) 31,932 (48.0) < .001
Meconium stained amniotic fluid < .001
    No 47,375 (90.8) 12,422 (86.0) 59,797 (89.8)
    Yes (unspecified) 4,639 (8.9) 1,954 (13.5) 6,593 (9.9)
    Thin 81 (0.2) 34 (0.2) 115 (0.2)
    Moderate 1 (0.0) 2 (0.0) 3 (0.0)
    Thick 51 (0.1) 27 (0.2) 78 (0.1)
Method of labor induction
    AROM 1,292 (2.5) 268 (1.9) 1,560 (2.3) < .001
    Prostaglandin E1 1,067 (2.0) 719 (5.0) 1,786 (2.7) < .001
    Mechanical methods 43 (0.1) 41 (0.3) 84 (0.1) < .001
    Prostaglandin E2 412 (0.8) 148 (1.0) 560 (0.8) .006
    Oxytocin 12,427 (23.8) 3,952 (27.4) 16,379 (24.6) < .001
Method of ROM < .001
    AROM 30,380 (58.3) 8,275 (57.3) 38,655 (58.1)
    SROM 20,012 (38.4) 5,713 (39.6) 25,725 (38.6)
    PROM 14 (0.0) 8 (0.1) 22 (0.0)
    Others 356 (0.7) 46 (0.3) 402 (0.6)
    Unknown 1,385 (2.7) 397 (2.7) 1,782 (2.7)

Abbreviations: AROM, artificial rupture of membranes; BMI, body mass index; CD, cesarean delivery; ECV, external cephalic version; GBS, group B streptococci; PROM, prelabor rupture of membranes; ROM, rupture of membranes; SROM, spontaneous rupture of membranes.

a Continuous variables are presented as means ± standard deviation; categorical variables are presented as number and percentages.

Primary and secondary outcome variables

Unfavorable labor outcomes, based on study definition, were reported in 14,439 (21.68%) of total delivery episodes. Of these, 10,466 (15.7%) deliveries were intrapartum CDs; 2,395 (3.6%) cases were diagnosed with IAI; 1,261 (2.0%) had postpartum hemorrhage; and 3,743 (5.6%) of delivered neonates were admitted to NICU. The incidence of neonatal sepsis and neonatal death were 880 (1.3%) and 49 (0.1%), respectively (S1 Table).

Predicting labor outcomes

On admission, machine-learning–based prediction models that performed at a sensitivity of 0.69 (95% confidence interval [CI], 0.68–0.70) and a specificity of 0.68 (95% CI, 0.67–0.69) were used to predict unfavorable labor outcome; Area under curve was 0.75 (95% CI, 0.75–0.75) (Table 2). The highest contributing independent variable to this model was parity. Other significant variables included prior CD, maternal age, maternal pre-pregnancy body mass index, height, gestational age at admission, absence of uterine contractions at admission, and maternal weight gain during pregnancy (Fig 1). The diagnostic performance of intrapartum prediction models trended up with advancement of cervical dilation; model sensitivity increased gradually from 0.70 (95% CI, 0.69–0.70) at 4 cm to 0.79 (95% CI, 0.78–0.80) at 10 cm. Similarly, model specificity rose from 0.72 (95% CI, 0.71–0.73) at 4 cm to 0.84 (95% CI, 0.83–0.85) at 10 cm (Table 2). As shown in Fig 2, Area under curve of intrapartum prediction models at 4, 6, 8, and 10 cm reflected a similar trend (0.78 [95% CI, 0.77–0.78] at 4 cm; 0.89 [95% CI, 0.89–0.90] at 10 cm). The most substantial variable for all intrapartum models was prior risk score from the previous model. Other contributing factors to these models included cervical dilation at last examination, number of cervical examinations, current head station, cervical dilation change, current cervical dilation, and dilation delta. The spectrum of contributing factors and the magnitude of their contribution to baseline and intrapartum prediction models are shown in Fig 1.

Table 2. Diagnostic performance of machine-learning–based prediction models of unfavorable labor outcomes and intrapartum cesarean delivery at first stage of labor a.

Outcome Cervical Dilation (in cm) Error AUC Sensitivity Specificity PPV
Composite outcome (unfavorable labor outcomes) Baseline 0.31 (0.31–0.32) 0.75 (0.75–0.75) 0.69 (0.68–0.70) 0.68 (0.67–0.69) 0.42 (0.42–0.42)
4 0.29 (0.29–0.30) 0.78 (0.77–0.78) 0.70 (0.69–0.70) 0.72 (0.71–0.73) 0.50 (0.49–0.51)
5 0.28 (0.28–0.28) 0.80 (0.80–0.80) 0.70 (0.70–0.71) 0.74 (0.73–0.75) 0.52 (0.52–0.53)
6 0.27 (0.26–0.27) 0.81 (0.81–0.81) 0.72 (0.70–0.73) 0.75 (0.74–0.77) 0.55 (0.54–0.55)
7 0.25 (0.25–0.26) 0.83 (0.82–0.83) 0.73 (0.72–0.74) 0.76 (0.75–0.77) 0.56 (0.55–0.57)
8 0.25 (0.24–0.25) 0.84 (0.83–0.84) 0.75 (0.74–0.76) 0.75 (0.74–0.77) 0.56 (0.54–0.57)
9 0.24 (0.24–0.24) 0.85 (0.84–0.85) 0.76 (0.75–0.77) 0.76 (0.76–0.77) 0.57 (0.56–0.57)
10 0.19 (0.18–0.19) 0.89 (0.89–0.90) 0.79 (0.78–0.80) 0.84 (0.83–0.85) 0.67 (0.66–0.68)
Intrapartum cesarean delivery Baseline 0.29 (0.29–0.30) 0.78 (0.77–0.78) 0.71 (0.70–0.72) 0.70 (0.69–0.71) 0.37 (0.36–0.37)
4 0.27 (0.26–0.27) 0.81 (0.81–0.82) 0.72 (0.71–0.74) 0.74 (0.73–0.75) 0.46 (0.45–0.47)
5 0.24 (0.24–0.24) 0.84 (0.84–0.84) 0.75 (0.75–0.76) 0.76 (0.76–0.77) 0.49 (0.48–0.49)
6 0.23 (0.22–0.23) 0.86 (0.85–0.86) 0.76 (0.74–0.77) 0.79 (0.78–0.79) 0.52 (0.51–0.53)
7 0.21 (0.21–0.22) 0.87 (0.87–0.88) 0.78 (0.78–0.79) 0.79 (0.78–0.80) 0.53 (0.52–0.54)
8 0.20 (0.20–0.20) 0.88 (0.88–0.89) 0.78 (0.77–0.79) 0.82 (0.81–0.83) 0.56 (0.55–0.57)
9 0.19 (0.18–0.19) 0.90 (0.90–0.90) 0.80 (0.79–0.80) 0.83 (0.82–0.83) 0.58 (0.57–0.59)
10 0.12 (0.11–0.12) 0.95 (0.95–0.95) 0.87 (0.86–0.88) 0.90 (0.89–0.91) 0.72 (0.71–0.74)

Abbreviations: AUC, area under curve; PPV positive predictive value.

a Values between brackets present 95% confidence interval.

Fig 1. Baseline and intrapartum predictors of composite unfavorable labor outcome and magnitude of contribution to prediction models.

Fig 1

A, Prediction model on admission. B, Prediction model at 4 cm cervical dilation. C, Prediction model at 6 cm cervical dilation. D, Prediction model at 8 cm cervical dilation. E, Prediction model at 10 cm cervical dilation.

Fig 2. Diagnostic performance of baseline and intrapartum prediction models for composite unfavorable labor outcome.

Fig 2

A, Area under curve (AUC) on admission. B, AUC at 4 cm cervical dilation. C, AUC at 6 cm cervical dilation. D, AUC at 8 cm cervical dilation. E, AUC at 10 cm cervical dilation. NICU indicates neonatal intensive care unit.

LRS was plotted against cervical dilation to demonstrate the LRS trend among women who had favorable versus unfavorable composite outcome (S3 Fig). Women with unfavorable composite outcome had a baseline LRS score above 35%. Their scores at 4 to 6 cm were between 45% and 50% and consistently trended up beyond 60% over increasing cervical dilation. Baseline LRS scores were below 25% among women with favorable composite outcome. The scores trended down from 23% at 4 cm, to 20% at 7 cm, to 15% at 10 cm. Similarly, risk of failed vaginal delivery trended up from 34% on admission to 72% at 10 cm in women delivering by intrapartum CD. In women who had successful vaginal delivery, the risk of failed vaginal delivery was below 20% and trended below 10% at 10 cm (S3 Fig).

Discussion

The Consortium on Safe Labor database is a multicenter observational database that is comprised of 228,438 deliveries. Utilizing this database, this study applied machine-learning algorithms to generate a series of prediction models that incorporates both static and dynamic predictors, including patient baseline characteristics, most recent clinical assessment, and cumulative labor progress from admission. These models may provide an alternative to current practice, which endorses the use of labor charts. In contrast to labor charts, which set constant margins to safe labor course, machine-learning models promote individualization of clinical decisions using baseline and labor characteristics of each patient.

For several decades, Friedman’s sloping curve was cited as a reference of normal labor progress [18]. The terms “latent labor” and “active labor” were introduced in the literature to discriminate initial slow interval (<3–3.5 cm) from subsequent accelerated labor course. In 1972, Philpott and Castle [19,20] proposed the use of “alert lines” and “action lines” to facilitate management of labor through a prospective study of 624 Rhodesian African primigravida women and provided simplified directions to midwives in isolated areas. The WHO partogram was derived from these studies and has served as an important tool in managing labor, especially in low-resource countries.

Although the WHO partogram has been adopted globally to standardize labor care and prevent prolonged labor, the routine use of the WHO partogram has been questioned; a Cochrane review of 3 clinical trials (1,813 patients) comparing partogram to no partogram use did not reveal differences in CD rates, duration of first stage of labor, or Apgar score less than 7 at 5 minutes [8]. Despite its increasing use, the rate of CD has substantially increased in the past 3 decades, reaching 32% of total deliveries in the United States in 2017 [21]. This rising trend has not been associated with a concomitant decline in maternal or neonatal mortality [22]. Furthermore, the current rate of NICU admission among neonates delivered at term is notable, accounting for 4.6% of neonates delivered electively at 39 weeks of gestation [23]. The trend is increasing, and term neonates weighing at least 2,500 grams at birth may represent more than 50% of total NICU admissions [24].

Zhang et al [7] hypothesized that current recommendations of management of labor were based on Friedman’s study in the 1950s and may not reflect current populations. The new partogram differs from the WHO partogram; the 95th percentile line, which corresponds to the WHO action line, is an exponential-like stair line, which outlines a contemporary course of cervical dilation. Unlike the WHO partogram, which aims to prevent prolonged labor, Zhang et al proposed their partogram as a clinical tool to prevent premature CD without taking into account important maternal and neonatal outcomes. Thus, a secondary analysis of a prospective cohort study of 7,845 women with term low-risk pregnancy from 2010 through 2014 was conducted to assess maternal and neonatal outcomes after implementation of the Zhang labor curves. Rosenbloom et al [25] reported that the primary CD rate did not decline between 2010 and 2014 (15.8% vs 17.7%, P = 0.5). In addition, maternal and neonatal morbidity significantly increased in the same time frame. A multicenter, cluster-randomized, controlled trial (LaPS trial) was conducted in 14 clusters in Norway; 7 obstetric units were randomly assigned to intervention (managed by Zhang’s guidelines, n = 3,972) versus 7 units that were assigned to control (managed by the WHO partogram, n = 3,305). Again, the rate of intrapartum CD was not significantly different between the 2 groups [26].

In this study, we hypothesized that the challenges associated with creation of labor charts are attributed to more than index population. Labor is a complex physiologic process, and outcomes are likely to be influenced by several factors. These factors can either be identifiable (determined at baseline) or unknown, yet they are indirectly reflected on labor course. Machine-learning algorithms have been increasingly used in data mining scenarios of large databases when the domain is poorly understood or when dynamic models are needed. Compared to conventional statistical methods, machine learning minimizes statistical assumption and works by identifying hidden patterns within data and incorporating evolving risks during the labor progression into outcome predictions. Therefore, the predictive power of these models is generally strong [27,28]. In this study, we used a large national database to create a series of dynamic prediction models, as an alternative to conventional labor charts, to predict labor outcomes. These models promote individualized assessment of labor progress based on patient characteristics and current labor patterns. They do not incorporate fixed definitions of latent labor, active labor, or rate of cervical dilation. Alternatively, an LRS graph can be used to determine the cumulative likelihood of safe labor, taking into account the likelihood of CD and any adverse maternal and neonatal outcomes. A patient’s baseline LRS, LRS trend over time, and LRS graph in relation to reference LRS graph can improve the intrapartum decision-making process.

To our knowledge, this is the first study that implements machine-learning algorithms in labor management. The study is highly generalizable because it used a large national database with an ethnically diverse cohort of women that is not restricted by parity, previous CD, or certain maternal or neonatal outcomes. The study is limited by the retrospective nature of collected data. Furthermore, decision of CD obscures outcomes of further expectant management. However, these limitations are inherent in all labor and delivery studies due to ethical concerns of maternal and fetal exposure to unjustifiable risks. Fetal heart rate monitoring was not included in this study due to lack of documentation. Therefore, fetal heart tracing should be interpreted independently, and response to abnormal findings should be made per protocol. Other potential limitation of our study is the definition of the composite outcome, which is comprised of heterogenous adverse labor outcomes. However, the clinical rational for constructing this composite outcome is that the occurrence of any of the events in the composite outcome would trigger ending labor and expediting delivery. In addition, summarizing the risk in a single parameter like the one defined through our composite outcome can be easily interpreted by the obstetrician and the patient for counseling and decision-making purposes. Finally, the results of this study cannot be converted to a printed labor chart due to the complexity of machine-learning algorithms. However, a digital application is currently under development to facilitate clinical use of this developing tool.

Conclusions

In conclusion, utilization of machine-learning–based algorithms may provide a dynamic, cumulative, and individualized model for prediction of outcomes of vaginal delivery and facilitation of intrapartum decision making. LRS charts may be used as an efficient alternative to conventional labor charts. However, further prospective studies are warranted to assess outcomes of implementation of these models in labor units.

Supporting information

S1 Fig. Workflow for training and validation of the incremental machine learning model.

Each model (except baseline model) uses labor risk score (LRS) predictions from the previous model. Data were randomly divided into 10 equal and independent parts: The model was trained on 9 folds and validated on the last fold. The procedure was repeated until each fold was used once for validation. At each step, optimal tuning parameters of the model were selected, and performance was evaluated on the validation fold. Overall iterative process was repeated 10 times and performance results were averaged.

(GIF)

S2 Fig. Flow chart of study cohort.

(PNG)

S3 Fig. Trend of Labor Risk Score Over Labor Progress Among Women with Favourable and Unfavourable Labor Outcome.

A, Women with unfavourable (red line) versus favourable (green line) composite labor outcome. B, Women who had cesarean delivery (red line) versus vaginal delivery (green line).

(PNG)

S1 Table. Unfavourable labor outcomes among study population.

(DOCX)

Acknowledgments

The study was conducted using the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Consortium on Safe Labor database.

Data Availability

Our data set belongs to the NICHD. We have attached a copy the agreement we had with NICHD. To access these data sets, please reach out directly to the NICHD: NICHD DASH Administrator https://dash.nichd.nih.gov It is titled “ Consortium of Safe Labor” datasets from the NICHD. I can confirm authors did not have special privileges not available to others who apply for access to the data.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Hamilton BE, Martin JA, Osterman MJ, Driscoll AK, Rossen LM. Births: Provisional data for 2016. Vital Statistics Rapid Release. 2017;2. [Google Scholar]
  • 2.Caughey AB, Cahill AG, Guise J-M, Rouse DJ. Safe prevention of the primary cesarean delivery. American Journal of Obstetrics & Gynecology. 2014;210(3):179–93. doi: 10.1016/j.ajog.2014.01.026 [DOI] [PubMed] [Google Scholar]
  • 3.Neal JL, Ryan SL, Lowe NK, Schorn MN, Buxton M, Holley SL, et al. Labor dystocia: Uses of related nomenclature. Journal of Midwifery & Women’s Health. 2015;60(5):485–98. doi: 10.1111/jmwh.12355 [DOI] [PubMed] [Google Scholar]
  • 4.Friedman E. Graphic appraisal of labor, a study of 500 primigravidae. Bull Sloane Hosp. 1955;1:42–51. [Google Scholar]
  • 5.Lavender T, Hart A, Smyth R. Effect of partogram use on outcomes for women in spontaneous labor at term. The Cochrane Library. 2013. doi: 10.1002/14651858.CD005461.pub4 [DOI] [PubMed] [Google Scholar]
  • 6.Zhang J, Troendle JF, Yancey MK. Reassessing the labor curve in nulliparous women. American Journal of Obstetrics & Gynecology. 2002;187(4):824–8. doi: 10.1067/mob.2002.127142 [DOI] [PubMed] [Google Scholar]
  • 7.Zhang J, Landy HJ, Branch DW, Burkman R, Haberman S, Gregory KD, et al. Contemporary patterns of spontaneous labor with normal neonatal outcomes. Obstetrics and gynecology. 2010;116(6):1281. doi: 10.1097/AOG.0b013e3181fdef6e [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lavender T, Cuthbert A, Smyth RM. Effect of partograph use on outcomes for women in spontaneous labor at term and their babies. Cochrane Database of Systematic Reviews. 2018;(8). doi: 10.1002/14651858.CD005461.pub5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhang J, Troendle J, Grantz KL, Reddy UM. Statistical aspects of modeling the labor curve. American Journal of Obstetrics & Gynecology. 2015;212(6):750. e1–. e4. doi: 10.1016/j.ajog.2015.04.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Collins FS, Varmus H. A new initiative on precision medicine. New England journal of medicine. 2015;372(9):793–5. doi: 10.1056/NEJMp1500523 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ngufor C, Van Houten H, Caffo BS, Shah ND, McCoy RG. Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c. Journal of biomedical informatics. 2019;89:56–67. doi: 10.1016/j.jbi.2018.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dallas KB, Rogo-Gupta L, Elliott C. Machine Learning Algorithms Successfully Identify the Quality of Lay-Person Directed Articles Online [37T]. Obstetrics & Gynecology. 2019;133:222S–3S. [Google Scholar]
  • 13.Emin EI, Emin E, Papalois A, Willmott F, Clarke S, Sideris M. Artificial Intelligence in Obstetrics and Gynaecology: Is This the Way Forward? in vivo. 2019;33(5):1547–51. doi: 10.21873/invivo.11635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Natekin A, Knoll A. Gradient boosting machines, a tutorial. Frontiers in neurorobotics. 2013;7:21. doi: 10.3389/fnbot.2013.00021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen T, He T, Benesty M, Khotilovich V, Tang Y. Xgboost: extreme gradient boosting. R package version 04–2. 2015:1–4. [Google Scholar]
  • 16.Zou Hui, and Hastie Trevor. "Regularization and variable selection via the elastic net." Journal of the royal statistical society: series B (statistical methodology) 67.2 (2005): 301–320. [Google Scholar]
  • 17.Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2011;28(1):112–8. doi: 10.1093/bioinformatics/btr597 [DOI] [PubMed] [Google Scholar]
  • 18.Friedman EA. The graphic analysis of labor. American Journal of Obstetrics & Gynecology. 1954;68(6):1568–75. doi: 10.1016/0002-9378(54)90311-7 [DOI] [PubMed] [Google Scholar]
  • 19.Philpott R, Castle W. Cervicographs in the management of labor in primigravidae: I. The alert line for detecting abnormal labor. BJOG: An International Journal of Obstetrics & Gynaecology. 1972;79(7):592–8. [DOI] [PubMed] [Google Scholar]
  • 20.Phillpott R, Castle W. Cervicograph on the management of labor in primigravida II. The action line and treatment of abnormal labor. J Obstet Gynaecol Br Commonw. 1972;79:599–606. doi: 10.1111/j.1471-0528.1972.tb14208.x [DOI] [PubMed] [Google Scholar]
  • 21.Hamilton BE, Osterman MJ, Driscoll AK, Rossen LM. Births: provisional data for 2017. 2018. [Google Scholar]
  • 22.Gregory KD, Jackson S, Korst L, Fridman M. Cesarean versus vaginal delivery: whose risks? Whose benefits? American journal of perinatology. 2012;29(01):07–18. doi: 10.1055/s-0031-1285829 [DOI] [PubMed] [Google Scholar]
  • 23.Clark SL, Miller DD, Belfort MA, Dildy GA, Frye DK, Meyers JA. Neonatal and maternal outcomes associated with elective term delivery. American journal of obstetrics and gynecology. 2009;200(2):156. e1–. e4. doi: 10.1016/j.ajog.2008.08.068 [DOI] [PubMed] [Google Scholar]
  • 24.Harrison W, Goodman D. Epidemiologic trends in neonatal intensive care, 2007–2012. JAMA pediatrics. 2015;169(9):855–62. doi: 10.1001/jamapediatrics.2015.1305 [DOI] [PubMed] [Google Scholar]
  • 25.Rosenbloom JI, Stout MJ, Tuuli MG, Woolfolk CL, López JD, Macones GA, et al. New labor management guidelines and changes in cesarean delivery patterns. American Journal of Obstetrics & Gynecology. 2017;217(6):689. e1–. e8. doi: 10.1016/j.ajog.2017.10.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bernitz S, Dalbye R, Zhang J, Eggebø TM, Frøslie KF, Olsen IC, et al. The frequency of intrapartum caesarean section use with the WHO partograph versus Zhang’s guideline in the Labor Progression Study (LaPS): a multicentre, cluster-randomised controlled trial. The Lancet. 2019;393(10169):340–8. doi: 10.1016/S0140-6736(18)31991-3 [DOI] [PubMed] [Google Scholar]
  • 27.Mitchell TM. Machine learning. McGraw hill; 1997. [Google Scholar]
  • 28.Gimovsky AC, Levine JT, Pham A, Dunn J, Zhou D, Peaceman AM. Pushing the bounds of second stage in term nulliparas with a predictive model. American journal of obstetrics & gynecology MFM. 2019. Aug 1;1(3):100028. doi: 10.1016/j.ajogmf.2019.07.001 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Jonas Bianchi

27 Apr 2022

PONE-D-22-07903Impact of Labour Characteristics on Maternal and Neonatal Outcomes of Labour:  A Machine-Learning ModelPLOS ONE

Dear Dr. Famuyide,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jonas Bianchi, DDD, MS, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The authors used machine learning models to predict adverse pregnancy outcomes during labor. Although their study is well presented and the statistical methods rigorous and apparently adequate i believe that the incorporation of several adverse outcomes that mix up maternal and fetal/neonatal events does not help physicians to interprete the provided results. Furthermore, the variable that were used as predictive are unclear as the authors do not present them in their study. I believe that the authors should evaluate separately the adverse pregnancy outcomes that are related to maternal and fetal/neonatal adverse events and incorporate variables that are pathophysiologically correlated to these outcomes. Given the sample size of their population this seems to be powered enough and may provide a trully intelligent and usefull information for physicians.

Reviewer #3: Abstract:

Abstract is succinct, but I recommend adding that this is a gradient boosting machine learning model to be more specific.

Conclusion is very general, I suggest making it more relevant to the presented data.

Introduction:

Line 48- I would recommend rewording the sentence about Friedman as it was not a 'trial,' which implies a randomized controlled trial.

Materials and Methods:

Why did the authors define their composite this way? It seems very heterogeneous at it includes labor outcomes, maternal morbidities and neonatal morbidities. If the authors could imagine counseling a patient on these risks, would this type of composite information make it easier or more difficult to counsel someone? Could the authors predict these outcomes individually? Or group them by mother and baby? This may be more clinically relevant.

How was the composite calculated? Is this a linear calculation? Were variables weighted?

Why is IAI and outcome of interest? I would include this as a baseline characteristic because it develops during labor.

Why chose meconium as a baseline variable? This has not been used in the US clinically to make decisions for a few decades. Did it make it into the model? Is this to appeal to an international audience?

The authors should still have submitted this work to their IRB for exempt determination status.

Why include multiparous women? Most adverse labor outcomes happen in nulliparas and this might make your model more specific.

Did you include patients with and without epidural anesthesia?

Results:

Interesting analysis of accuracy with each advancement of cervical dilation. How did the authors account for time? How did they account for women with a repeat cervical exam of the same dilation as the one prior?

How many exams were recorded per patient, on average?

Need more discussion of the LRS score in the Methods section. What does this mean exactly? Does a labor risk score = risk of adverse labor outcome? This is a bit unclear.

Discussion:

How does one use this tool? Is it online? on paper? I see that the authors discuss a digital application development. Given that the authors discuss the WHO partogram in detail, will the authors also develop something that would be universally accusable? Ie, if OBGYN providers dont have smartphones or wifi to access an app, how can this model be useful?

Line 287- please cite: Gimovsky AC, Levine JT, Pham A, Dunn J, Zhou D, Peaceman AM. Pushing the bounds of second stage in term nulliparas with a predictive model. Am J Obstet Gynecol MFM. 2019 Aug;1(3):100028. doi: 10.1016/j.ajogmf.2019.07.001. Epub 2019 Jul 20. PMID: 33345792.

I recommend adding more info about gradient boosting- ie, why choose this type of model, how does it compare to other machine learning models, etc.

Tables:

Are the headings correct in Table 1? ie more patients had unfavorable (52,147) outcomes than favorable (14,439)? I think these columns might be mislabeled. As it reads now you have a better outcome is you are older, less parous (would change to median/standard error, as parity is an integer), have diabetes, hypertension, preeclampsia, oligohydramnios, etc....

Minor: As Plos One is an American journal I suggest the use of American English spellings... ie "labor", not "labour"; foetal = fetal, etc.

The authors should review the manuscript for several typos and grammatical errors.

Reviewer #4: In this study Authors evaluated the performance of a ML model in predicting labor outcome from data retrospectively analyzed form a large database. The subject is of interest and I would like to congratulate with Authors for their effort

My comments are

1)the definition of unfavorable outcome was really heterogeneous, In other words Authors included variables with different pathogenesis such as emergency CS and postpartum hemorrhage for which the constructed model may have a different performance. Since the database is relatively large I strongly suggest too construct individual model for each outcome variable

2) the evaluation of the model according to cervical dilation is of interest but clinically speaking is only one of the variable that influence labor outcome. Have Authors concomitant data on fetal head station, occiput position and duration on labor? I guess that some of these data are present in. the database and should be used

3)parity is a crucial point in predicting labor outcome. So different models should be used for nulli and para women

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes: Giuseppe Rizzo

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 22;17(8):e0273178. doi: 10.1371/journal.pone.0273178.r002

Author response to Decision Letter 0


29 Jun 2022

Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The authors used machine learning models to predict adverse pregnancy outcomes during labor. Although their study is well presented and the statistical methods rigorous and apparently adequate i believe that the incorporation of several adverse outcomes that mix up maternal and fetal/neonatal events does not help physicians to interprete the provided results. Furthermore, the variable that were used as predictive are unclear as the authors do not present them in their study. I believe that the authors should evaluate separately the adverse pregnancy outcomes that are related to maternal and fetal/neonatal adverse events and incorporate variables that are pathophysiologically correlated to these outcomes. Given the sample size of their population this seems to be powered enough and may provide a trully intelligent and usefull information for physicians.

Response: Thank you for your valuable comments:

● We appreciate the reviewer’s comment about having multiple heterogenous adverse labor outcomes included in our composite outcome. However, the clinical rationale for constructing this composite outcome is that the occurrence of this composite outcome would recommend ending labor. Summarizing the risk in a single parameter like the one defined through our composite outcome can be easily interpreted by the obstetrician and the patient for counseling and decision-making purposes. Please note that a follow-up study is currently in progress to use this model to predict probability of individual outcomes. Nevertheless, we acknowledge this is a potential limitation in the Discussion Section of the manuscript. It is important, though, to emphasize that this information is descriptive of the model and should not be used to indicate a direct correlation between a variable and an outcome which is not the intent of machine learning models.

● Regarding variables used, all variables were plotted in Figure 1 according to their magnitude of impact on model outcome

● Since these models were created using machine learning algorithms, selection of variables using our knowledge of a pathophysiological correlation, would impact the performance of machine learning algorithms. Technically, we do not intervene in variable selection to allow machine learning to recognize the hidden interactions between variables, some of which may not be known to affect outcome. This is one of the key features of ML vs. conventional statistics in establishing prediction models, which does not apply if the indication of the study is to investigate an association between a variable(s) and an outcome.

Reviewer #3: Abstract:

Abstract is succinct, but I recommend adding that this is a gradient boosting machine learning model to be more specific.

Response: Thank you. We added model specification to the abstract (abstract: under materials and methods)

Conclusion is very general, I suggest making it more relevant to the presented data.

Response: we changed “the conclusion” to be more specific of the results of the study (abstract: under conclusion)

Introduction:

Line 48- I would recommend rewording the sentence about Friedman as it was not a 'trial,' which implies a randomized controlled trial.

Response: The word “trials” was corrected to “studies” (highlighted in yellow)

Materials and Methods:

Why did the authors define their composite this way? It seems very heterogeneous at it includes labor outcomes, maternal morbidities and neonatal morbidities. If the authors could imagine counseling a patient on these risks, would this type of composite information make it easier or more difficult to counsel someone? Could the authors predict these outcomes individually? Or group them by mother and baby? This may be more clinically relevant.

Response: We used a composite outcome of adverse labor events, occurrence of any of which would recommend against continuation of labor, to provide a single parameter that can be easily interpreted by the obstetrician and the patient for counseling and decision-making purposes. Each component of our composite outcome by itself can potentially inform against a decision to proceed with normal delivery. Thus, obstetricians and patients would be informed and be able to relate to the fact that there was a strong likelihood of a major adverse event if labor continues beyond a particular point. In summary, setting a single parameter was meant to be the trigger for potential intervention/counseling. However, an application is currently in progress to use this model to predict probability of individual outcomes.

How was the composite calculated? Is this a linear calculation? Were variables weighted?

Response: We did not weight the components in the definition of our composite outcome. The composite outcome was deemed to occur if any of the unfavorable labor outcomes described in the “Study Outcomes” of the manuscript occurred. As explained above, the rationale for forming the composite in this way was that the occurrence of any of those unfavorable outcomes would recommend against continuation of labor, and therefore we did not think providing differential weights for the components was relevant.

Why is IAI and outcome of interest? I would include this as a baseline characteristic because it develops during labor.

Response: We consider IAI an outcome of interest rather than a baseline variable since women were included early in labor where no IAI had developed yet. IAI is a considerable event that usually develops late in labor and could likely present a complication of prolonged labor course. So, it was important for us to consider it as an outcome to balance against prolonged expectancy for the sake of achieving vaginal delivery as a sole target.

Why chose meconium as a baseline variable? This has not been used in the US clinically to make decisions for a few decades. Did it make it into the model? Is this to appeal to an international audience?

Response: since we used machine learning algorithms to develop these models, we did not select variables to be included in the model. Machine learning algorithms ideally work by feeding them with all available variables, even if not apparently clinically relevant, since they can recognize hidden interactions among variables which would collectively impact outcome. Including a variable in the algorithm does not mean that this variable would impact the outcome and the algorithm will decide whether it would or would add to model predictability and to what extent. This eliminates a source of bias that besets conventional statistical analysis. It is important to highlight that machine learning models are used to predict an outcome based on a set of variables and how they interact and should not be used to establish an association or causality between a single variable and an outcome, which is better investigated using conventional statistics.

As for “meconium” specifically, it does not merely guide a decision and is considered in the context of other factors to decide if it, at all, would change the probability of an outcome. This hypothesis appears to be true in our data as the impact of meconium in our model appears to be only moderate as is shown in Figure 1-E.

The authors should still have submitted this work to their IRB for exempt determination status.

Thanks for this comment. We submitted our work to Mayo Clinic IRB for review, and because our dataset is completely deidentified, Mayo IRB determined that our study does not require a review. We have included a sentence at the beginning of the Methods section to this effect. Please also see the attached memo from Mayo IRB.

Why include multiparous women? Most adverse labor outcomes happen in nulliparas and this might make your model more specific.

Response: We included multipara to ensure generalizability of results. Multipara still would benefit from a chart or a score, which would ensure their labor progress and management is within the safe limit. In fact, current labor charts are used for both primigravida and multipara although they were created using data from primigravida, and there are no labor charts that are designed for higher parity.

Understanding that nullipara is specifically at higher risk of complications, the model also recognizes that, and treats parity as one of the variables before calculating the probability of an outcome. Therefore, including multipara to the model adds to its generalizability and should not impact its capacity to calculate probability of adverse outcomes in nullipara.

Did you include patients with and without epidural anesthesia?

Response: Yes. Type of anesthesia was not used as a selection criterion.

Results:

Interesting analysis of accuracy with each advancement of cervical dilation. How did the authors account for time? How did they account for women with a repeat cervical exam of the same dilation as the one prior?

Response: Time was documented in the database and was calculated using time of birth as a 00:00 point. Thus, each cervical dilation/examination was documented against time of birth and time between exams was simple to calculate.

All repeat exams were considered as new exams, and thus a new observation in our data. In situations where the dilation remained constant from prior exam(s), however, since other patient factors may have changed since last exam, the repeated exams automatically become new observations (new rows) in our data.

By considering an incremental XGBoost model, that takes into account prior learned knowledge by the model, this may help to correct for any correlation that may exists between the repeated observations for the same subject.

How many exams were recorded per patient, on average?

Our data indicate an average of 4 exams. A few women had as many as 14 exams!

Need more discussion of the LRS score in the Methods section. What does this mean exactly? Does a labor risk score = risk of adverse labor outcome? This is a bit unclear.

Response: That is correct. So, LRS is just a term for ease of use, which indicates the score or the probability of labor adverse outcome, as calculated by the model. We modified the methodology section to clarify this point (highlighted in yellow: under study outcomes).

Discussion:

How does one use this tool? Is it online? on paper? I see that the authors discuss a digital application development. Given that the authors discuss the WHO partogram in detail, will the authors also develop something that would be universally accusable? Ie, if OBGYN providers dont have smartphones or wifi to access an app, how can this model be useful?

Response: Machine learning models typically use complex calculations to predict probability of an outcome implying that it is not feasible to use them using simple equations. Therefore, a paper form is not ideal to use them. Thus, these models are used through an application that treats simple inputs provided by the user to calculate the score. These applications can be available through smart phone, and online. An alternative could be an offline application which would work on any PC, even if not connected to the internet. We are working to develop these additional components in the next few months.

Line 287- please cite: Gimovsky AC, Levine JT, Pham A, Dunn J, Zhou D, Peaceman AM. Pushing the bounds of second stage in term nulliparas with a predictive model. Am J Obstet Gynecol MFM. 2019 Aug;1(3):100028. doi: 10.1016/j.ajogmf.2019.07.001. Epub 2019 Jul 20. PMID: 33345792.

Response: Reference added (highlighted in yellow [27])

I recommend adding more info about gradient boosting- ie, why choose this type of model, how does it compare to other machine learning models, etc.

Response: Thank you. We have added a brief description of the extreme gradient boosting algorithm used to develop our LRS incremental model.

Tables:

Are the headings correct in Table 1? ie more patients had unfavorable (52,147) outcomes than favorable (14,439)? I think these columns might be mislabeled. As it reads now you have a better outcome is you are older, less parous (would change to median/standard error, as parity is an integer), have diabetes, hypertension, preeclampsia, oligohydramnios, etc....

Response: Thanks for catching this, the columns were mislabeled and have been corrected.

Minor: As Plos One is an American journal I suggest the use of American English spellings... ie "labor", not "labour"; foetal = fetal, etc.

The authors should review the manuscript for several typos and grammatical errors.

Response: we have reviewed and modified the manuscript using American English style

Reviewer #4: In this study Authors evaluated the performance of a ML model in predicting labor outcome from data retrospectively analyzed form a large database. The subject is of interest and I would like to congratulate with Authors for their effort

My comments are

1)the definition of unfavorable outcome was really heterogeneous, In other words Authors included variables with different pathogenesis such as emergency CS and postpartum hemorrhage for which the constructed model may have a different performance. Since the database is relatively large I strongly suggest too construct individual model for each outcome variable

Response: Please see the responses to the reviewer # 2 above on this point. In essence, we used a composite outcome of adverse labor events, which would recommend against continuation of labor, to provide a single parameter that can be easily interpreted by the obstetrician and the patient for counseling and decision-making purposes. However, a study is currently in progress to use this model to predict probability of individual outcomes. Thus, obstetricians and patients would be able to recognize what would be the major concern if labor continues beyond a particular point. So, in summary, setting a single parameter was meant to be the trigger for potential intervention/counseling

2) the evaluation of the model according to cervical dilation is of interest but clinically speaking is only one of the variable that influence labor outcome. Have Authors concomitant data on fetal head station, occiput position and duration on labor? I guess that some of these data are present in. the database and should be used

Response: Yes. We included all data, related to the examination, as variables in the models, including fetal head station, cervical effacement and membrane status, are used to predict outcomes. However, we did not use head position since it was not routinely commented on in the database during the first stage. Since occiput position is not routinely checked in the first stage of labor, it would be challenging to document for probability calculation if included in our model.

3)parity is a crucial point in predicting labor outcome. So different models should be used for nulli and para women

Response: Parity is a major factor in predicting clinical outcomes in relation to course. We included parity as one of the variables, which means that it is considered, in association with other interacting clinical factors, in model prediction, and it contributes significantly to prediction as shown in figure 1. Including parity as a variable also covers for a wider range of parity without needing to split the database since there would be a chance that labor course would differ between a P1 and higher parities or between low parity and high parity (e.g., P4, 5 or more)

Attachment

Submitted filename: Response to Reviewers_aof_edits.docx

Decision Letter 1

Jonas Bianchi

4 Aug 2022

Impact of Labor Characteristics on Maternal and Neonatal Outcomes of Labor:  A Machine-Learning Model

PONE-D-22-07903R1

Dear Dr. Famuyide,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jonas Bianchi, DDD, MS, Ph.D

Academic Editor

PLOS ONE

Acceptance letter

Jonas Bianchi

12 Aug 2022

PONE-D-22-07903R1

Impact of Labor Characteristics on Maternal and Neonatal Outcomes of Labor: A Machine-Learning Model

Dear Dr. Famuyide:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jonas Bianchi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Workflow for training and validation of the incremental machine learning model.

    Each model (except baseline model) uses labor risk score (LRS) predictions from the previous model. Data were randomly divided into 10 equal and independent parts: The model was trained on 9 folds and validated on the last fold. The procedure was repeated until each fold was used once for validation. At each step, optimal tuning parameters of the model were selected, and performance was evaluated on the validation fold. Overall iterative process was repeated 10 times and performance results were averaged.

    (GIF)

    S2 Fig. Flow chart of study cohort.

    (PNG)

    S3 Fig. Trend of Labor Risk Score Over Labor Progress Among Women with Favourable and Unfavourable Labor Outcome.

    A, Women with unfavourable (red line) versus favourable (green line) composite labor outcome. B, Women who had cesarean delivery (red line) versus vaginal delivery (green line).

    (PNG)

    S1 Table. Unfavourable labor outcomes among study population.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers_aof_edits.docx

    Data Availability Statement

    Our data set belongs to the NICHD. We have attached a copy the agreement we had with NICHD. To access these data sets, please reach out directly to the NICHD: NICHD DASH Administrator https://dash.nichd.nih.gov It is titled “ Consortium of Safe Labor” datasets from the NICHD. I can confirm authors did not have special privileges not available to others who apply for access to the data.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES