A machine learning pipeline to classify foetal heart rate deceleration with optimal feature set

Sahana Das; Sk Md Obaidullah; Mufti Mahmud; M Shamim Kaiser; Kaushik Roy; Chanchal Kumar Saha; Kaushik Goswami

doi:10.1038/s41598-023-27707-z

. 2023 Feb 13;13:2495. doi: 10.1038/s41598-023-27707-z

A machine learning pipeline to classify foetal heart rate deceleration with optimal feature set

Sahana Das ¹, Sk Md Obaidullah ², Mufti Mahmud ^3,^✉, M Shamim Kaiser ⁴, Kaushik Roy ¹, Chanchal Kumar Saha ⁵, Kaushik Goswami ⁶

PMCID: PMC9925757 PMID: 36781920

Abstract

Deceleration is considered a commonly practised means to assess Foetal Heart Rate (FHR) through visual inspection and interpretation of patterns in Cardiotocography (CTG). The precision of deceleration classification relies on the accurate estimation of corresponding event points (EP) from the FHR and the Uterine Contraction Pressure (UCP). This work proposes a deceleration classification pipeline by comparing four machine learning (ML) models, namely, Multilayer Perceptron (MLP), Random Forest (RF), Naïve Bayes (NB), and Simple Logistics Regression. Towards an automated classification of deceleration from EP using the pipeline, it systematically compares three approaches to create feature sets from the detected EP: (1) a novel fuzzy logic (FL)-based approach, (2) expert annotation by clinicians, and (3) calculated using National Institute of Child Health and Human Development guidelines. The classification results were validated using different popular statistical metrics, including receiver operating characteristic curve, intra-class correlation coefficient, Deming regression, and Bland-Altman Plot. The highest classification accuracy (97.94%) was obtained with MLP when the EP was annotated with the proposed FL approach compared to RF, which obtained 63.92% with the clinician-annotated EP. The results indicate that the FL annotated feature set is the optimal one for classifying deceleration from FHR.

Subject terms: Computational science, Information technology

Introduction

Monitoring of labour is essential as there is a chance that the fetus might suffer from oxygen deficiency which ultimately may lead to lifelong debility or even death. A major source of information about foetal health is Cardiotocography (CTG), which concurrently records Foetal Heart Rate (FHR) and the mother’s uterine Contraction Pressure (UCP). Physicians visually evaluate the patterns of these two signals and apply the knowledge of their prior experience to evaluate the status of foetal health and to take appropriate actions. Since there is a great disparity in how physicians interpret the signals, there are, at times, false alarms that lead to unnecessary C-sections. On the other hand, sometimes significant, ominous patterns are overlooked, resulting in foetal compromise. 50% of birth-related brain damages are avoidable with accurate interpretation of CTG¹. A huge legal cost is involved due to the malpractice claims that are filed every year². This is also evident from the statistics reported between 2005 and 2014 that in the US, Obstetrics and Gynaecology claims had the second-highest average indemnity payment and the fifth-highest paid-to-closed ratio of all medical specialities². Out of the four parameters of FHR, deceleration is the most complex to interpret. It is also central to the correct interpretation of CTG, and hence the foetal status³. Emphasis is placed on the association between the correct physiology of deceleration and the patterns of FHR and UCP changes in order to identify the foetal status. Decelerations are generally not visible in antenatal CTG. However, if present, then foetal health should be further investigated. Mild deceleration usually requires no intervention, but during labour, abrupt and frequent dips of FHR from the baseline with varying depth and duration may be ominous. Standard guidelines for CTG interpretation put forward by the National Institute of Child Health and Human Development (NICHD), the International Federation of Gynaecology and Obstetrics (FIGO), the Royal College of Obstetricians and Gynaecologists (RCOG) etc., classify deceleration based on the shape or time descent of the FHR^4–6. Decelerations are categorised as ‘early’, ‘late‘ and ‘variable’. These categorisations are mainly based on the temporal relationship between the deceleration, its duration and the corresponding uterine contraction and the duration of contraction. ‘Early’ decelerations are considered benign, while ‘late’ and ‘variable’ decelerations are considered ‘pathological’ and ‘suspicious’ respectively; hence these two decelerations require careful attention to ensure foetal good health.

Despite the existence of several guidelines, disagreement arises in the classification of deceleration. A survey revealed that British practitioners considered ‘early’ deceleration as the most common, while NICE guidelines 2007 reported that ‘early’ decelerations are the rarest and the ‘variable’ decelerations are most common⁷. When it comes to the classification of deceleration, it is important to relate the deceleration nadir (i.e., the lowest point in the declaration) with the peak of the contraction. According to the literature, the ‘early’ deceleration occurs when the two points match. This is not a very common phenomenon. Deceleration is classified as ‘late’ if it starts after the peak of the uterine contraction. Nadir is thus reached almost at the end of the contraction.

True ‘early’ decelerations whose nadir coincides exactly with the peak of the contraction is rare. It would be wrong to classify decelerations as ‘late’ that start recovering immediately after the peak of the contraction. In such cases, hard classification boundaries are not appropriate. Fuzzy classification is thus more appropriate for such borderline cases.

Physiology of FHR deceleration

FHR deceleration is the transient drop in the heart rate below the baseline value by 15 bpm or more and lasting for 15 s or longer. There exists a temporal relationship between decelerations and uterine contraction, which in turn is linked with rising in the internal pressure of the uterus and a decrease in maternal uterine artery blood flow. Even in normal labour, placental gas exchange is reduced. This leads to a fall in pH and oxygen tension and elevation of CO2, and base deficit in normal labour.

For most fetuses, the placental oxygen capacity is enough to overcome the repeated reduction in oxygen supply during labour. However, for fetuses that are already vulnerable, this repeated hypoxia may become life-threatening. It was also found that there are times when even a normal fetus is not able to withstand uterine hyperstimulation⁸.

Asphyxia is the deficiency of oxygen which, if prolonged, leads to hypoxemia and subsequent metabolic acidosis or accumulation of the waste product in the blood. Most hypoxic episodes during labour are brief and benign, lasting less than 1 min. These are reflected by brief deceleration. However, if hypoxia is severe and lasts more than three minutes, the initial vagal bradycardia is sustained by myocardial hypoxia. Thus the depth of deceleration is associated with a reduction in uteroplacental blood flow⁹. Studies have shown that deep deceleration is associated with an intense lack of oxygen to the brain with a chance of neuronal injury if the hypoxemia lasts more than ten minutes. Whether decelerations of shorter duration are benign or not depends upon three factors:

Criticality of foetal health before labour.
Pre-labour placental reserve of oxygen
Duration and frequency of deceleration

Different obstetric bodies, such as NICHD, FIGO etc., provided standard guidelines for the classification of deceleration based on its shape, time and duration with respect to the uterine contraction. The overall process overview of the proposed work is shown in Fig. 1. The details of this categorisation are shown in Table 1.

Flow diagram depicting the overview of the proposed model.

Table 1.

Categorisation of the deceleration of FHR.

Type of declaration	Stage of labour	Nadir of declaration	Physiology	Clinical opinion
Early	1st or 2nd	Peak of uterine contraction	Head compression	Benign
Late	Any	$> 30$ s after the peak of the contraction	Foetal hypoxia	Pathological
Variable	Any	Variable	Cord compression	Suspicious/Pathological

Feature	Description	Membership function (mf)
Duration (T)	Duration of time FHR is below the baseline	Trapezoidal
Depth (N)	Distance of the nadir from the baseline	Trapezoidal

ANT									CON
T<13.5	13.5 $\leq$ T<15	15 $\leq$ T $\leq$ 120	120 < T $\leq$ 360	360 < T < 600	T $\geq$ 600	N < 12	12 $\leq$ N < 15	N $\geq$ 15	CON
$✓$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	ND
$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	ND
$\times$	$✓$	$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	ND
$\times$	$✓$	$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	ND
$\times$	$✓$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	ND
$\times$	$\times$	$✓$	$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	ND
$\times$	$\times$	$✓$	$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	ND
$\times$	$\times$	$✓$	$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	D
$\times$	$\times$	$\times$	$✓$	$\times$ .	$\times$	$✓$	$\times$	$\times$	ND
$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	$\times$	$✓$	$\times$	ND
$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$✓$	$\times$	$\times$	ND
$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	$✓$	$\times$	PD
$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	$\times$	$✓$	PD
$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	$✓$	$\times$	$\times$	ND
$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$✓$	$\times$	BC
$\times$	$\times$	$\times$	$\times$	$\times$	$✓$	$\times$	$\times$	$✓$	BC

KMO measure of sampling adequacy		0.815
Bartlett’s test of sphericity	Approx. $χ^{2}$	3234.897
	df	78
	Sig.	0.000

	Component
	1	2	3
U_st_time	0.983
D_n_time	0.983	$- 0.102$
D_st_time	0.982	$- 0.102$
D_e_time	0.982	$- 0.104$
U_e_time	0.954
U_p_time	0.923
U_p_point	0.416	$- 0.391$	0.362
D_st_point	0.198	0.945	0.104
Baseline	0.241	0.935	0.163
D_e_point	0.254	0.932	0.170
U_e_point		$- 0.436$	0.694
U_st_point	$- 0.140$	$- 0.449$	0.587
D_n_point		0.485	0.566

Class	ICC	95% CI
Class	ICC	Upper limit	Lower limit
Early	0.988	0.981	0.985
Variable	0.984	0.953	0.962
Late	0.879	0.883	0.886

Classifier	Statistical parameters of the classification for feature set S1								Class
Classifier	TP	FP	Prec.	Rec.	F-S.	ROC	Sen.	Spec.	Class
Random Forest	0.949	0.017	0.974	0.949	0.961	0.998	0.949	0.983	Early
	0.973	0.017	0.973	0.973	0.973	0.996	0.973	0.983	Variable
	1.0	0.013	0.955	1.0	0.977	1.0	1.0	0.987	Late
MLP	0.974	0.017	0.974	0.974	0.974	0.999	0.974	0.983	Early
	0.973	0.013	0.973	0.973	0.973	0.993	0.973	0.987	Variable
	1.0	0	1.0	1.0	1.0	1.0	0.999	0.998	Late
Naïve Bayes	0.872	0.017	0.971	0.872	0.919	0.989	0.872	0.983	Early
	0.919	0.133	0.810	0.919	0.861	0.938	0.919	0.867	Variable
	0.857	0.026	0.900	0.857	0.878	0.977	0.857	0.974	Late
Simple Logistics	0.949	0.017	0.974	0.949	0.961	0.987	0.949	0.983	Early
	0.973	0.050	0.923	0.973	0.947	0.986	0.973	0.950	Variable
	0.905	0.013	0.950	0.905	0.927	0.986	0.905	0.987	Late

Classifier	Statistical parameters of all the classifiers for feature set S1
Classifier	Accuracy	Kappa	RMSE	Avg. TP	Avg. FP	Avg. Prec.	Avg. Recall	Avg. F-Score
Random Forest	96.91	0.952	0.974	0.969	0.016	0.969	0.969	0.969
MLP	97.94	0.968	0.81	0.979	0.013	0.979	0.979	0.979
Naïve Bayes	88.66	0.824	0.240	0.887	0.063	0.894	0.887	0.888
Simple Logistics	94.85	0.92	0.177	0.948	0.029	0.949	0.948	0.948

Test Result Variable(s)	Area	Std. Error	Asymptotic Sig.	Asymptotic 95% Confidence Interval
Test Result Variable(s)	Area	Std. Error	Asymptotic Sig.	Lower Bound	Upper Bound
Visual	0.693	0.075	0.146	0.427	0.721
NICHD-based	0.574	0.085	0.579	0.526	0.861

	Intraclass Correlation	95% Confidence Interval
	Intraclass Correlation	Lower Bound	Upper Bound
Single Measures	0.766	0.670	0.838
Average Measures	0.868	0.802	0.912

	Value	Lower bound 95% (Mean)	Upper bound 95% (Mean)
Intercept	0.018	$- 0.117$	0.153
Slope coefficient	1.108	1.025	1.191

Classifier	Statistical parameters of the classification for feature set S2								Class
Classifier	TP	FP	Prec.	Rec.	F-S.	ROC	Sen.	Spec.	Class
Random Forest	0.625	0.211	0.676	0.625	0.649	0.789	0.625	0.788	Early
	0.740	0.468	0.627	0.740	0.679	0.726	0.740	0.532	Variable
	0	0.011	0	0	0	0.464			Late
MLP	0.625	0.316	0.581	0.625	0.602	0.639	0.625	0.680	Early
	0.620	0.383	0.633	0.620	0.626	0.644	0.620	0.617	Variable
	0	0.056	0	0	0	0.483			Late
Naïve Bayes	0.450	0.193	0.621	0.450	0.522	0.646	0.48	0.785	Early
	0.660	0.511	0.579	0.660	0.617	0.591	0.66	0.49	Variable
	0	0.122	0	0	0	0.417			Late
Simple Logistics	0.600	0.263	0.615	0.600	0.608	0.668	0.66	0.737	Early
	0.600	0.404	0.612	0.600	0.606	0.598	0.61	0.61	Variable
	0	1.0	0	0	0	0.450			Late

Classifier	Statistical parameters of all the classifiers for feature set S2
Classifier	Accuracy	kappa	RMSE	Avg. TP	Avg. FP	Avg. Prec.	Avg. Recall	Avg. F-Score
Random Forest	63.92	0.317	0.398	0.639	0.329	0.602	0.639	0.618
MLP	57.73	0.236	0.490	0.577	0.332	0.566	0.577	0.571
Naïve Bayes	52.58	0.162	0.169	0.526	0.352	0.554	0.526	0.533
Simple Logistics	55.67	0.218	0.544	0.557	0.324	0.569	0.557	0.563

Classifier	Hyperparameter	Values
Random Forest	Batch size	100
	Bag size	100
	Iterations	100
	Seed	1
MLP	Batch size	100
	Hidden layers	2
	Learning rate	0.4
Naïve Bayes	Batch size	100
Simple Logistics	Batch size	100
	Heuristic stop	50
	Max boosting iteration	400

Classifier	TP	FP	Prec.	Rec.	F-S	ROC	Sen.	Spec.	Class
Random Forest	0.600	0.263	0.615	0.600	0.608	0.668	0.66	0.737	Early
	0.600	0.404	0.612	0.600	0.606	0.598	0.60	0.61	Variable
	0	1.0	0	0	0	0.450			Late
MLP	0.650	0.316	0.591	0.650	0.619	0.699	0.66	0.49	Early
	0.660	0.298	0.702	0.660	0.680	0.729	0.66	0.70	Variable
	0	0.067	0	0	0	0.656			Late
Naïve Bayes	0.450	0.193	0.621	0.450	0.522	0.646	0.45	0.80	Early
	0.660	0.511	0.579	0.660	0.617	0.591	0.66	0.49	Variable
	0	0.122	0	0	0	0.601			Late
Simple logistics	0.525	0.263	0.583	0.525	0.553	0.659	0.58	0.73	Early
	0.660	0.532	0.569	0.660	0.611	0.602	0.66	0.46	Variable
	0	0.033	0	0	0	0.368			Late

Sample	Observed	Predicted			Percent Correct
Sample	Observed	Early	Late	Variable	Percent Correct
Training	Early	17	0	8	68.0%
	Late	9	0	7	0.0%
	Variable	9	0	21	70.0%
Testing	Early	9	0	5	64.3%
	Late	4	0	1	0.0%
	Variable	3	0	4	57.1%

	(a) Label by clinicians		(b) Label using NICHD definition
		Area		Area
Classified as	Early	0.659	Early	0.653
	Late	0.561	Late	0.604
	Variable	0.704	Variable	0.621

PERMALINK

A machine learning pipeline to classify foetal heart rate deceleration with optimal feature set

Sahana Das

Sk Md Obaidullah

Mufti Mahmud

M Shamim Kaiser

Kaushik Roy

Chanchal Kumar Saha

Kaushik Goswami

Abstract

Introduction

Physiology of FHR deceleration

Figure 1.

Table 1.

Figure 2.

Physiology of deceleration types

Figure 3.

Problems with the identification and classification of deceleration

Related work

Methods

Algorithm for determining deceleration

Algorithm 1: estimation of deceleration event points

Fuzzification and detection of identifiable deceleration

Table 2.

Table 3.

Classification of deceleration

Feature sets

Adequacy of the feature set

Table 4.

Table 5.

Figure 4.

Figure 5.

Comparison with other methods

Algorithm 2: NICHD guideline based estimation and classification of deceleration

Warrick’s method

Ethical approval

Results

Inter-observer agreement

Identification of deceleration

Classification of deceleration

Table 6.

Table 7.

Figure 6.

Performance assessment of the classifiers

Table 8.

Table 9.

Table 10.

Comparison of annotation by visual estimation with NICHD based estimation

Using ROC curve

Figure 7.

Table 11.

Reliability measure using ICC

Table 12.

Deming regression

Table 13.

Figure 8.

Bland-Altman Plot

Figure 9.

Statistical estimation of the classifier performance for NICHD-based annotation

Table 14.

Table 15.

Table 16.

Table 17.

Figure 10.

Statistical estimation of the neural network-based model of warrick

Table 18.

Table 19.

Figure 11.

Table 20.

Discussion

Inter-observer agreement

Performance assessment of the classifiers

Table 21.

Comparison of visual classification with NICHD-based classification

Classifier performance for NICHD-based labelling

Performance of neural network based model of warrick

Conclusion

Acknowledgements

Author contributions

Data availability