Abstract
During labor, fetal heart rate (FHR) is monitored externally using Doppler ultrasound. This is done continuously, but for various reasons (e.g., fetal or maternal movements) the system does not record any samples for varying periods of time. In many settings, it would be quite beneficial to estimate the missing samples. In this paper, we propose a (deep) Gaussian process-based approach for estimation of consecutively missing samples in FHR recordings. The method relies on similarities in the state space and on exploiting the concept of attractor manifolds. The proposed approach was tested on a short segment of real FHR recordings. The experimental results indicate that the proposed approach is able to provide more reliable results in comparison to several interpolation methods that are commonly applied for processing of FHR signals.
Index Terms: Fetal heart rate, deep Gaussian processes, attractor, state space, consecutively missed samples
I. Introduction
The fetus depends on the mother for placental exchange of oxygen and carbon dioxide, which relies on adequate maternal blood gas concentrations, uterine blood supply, placental transfer and fetal gas transport. During labor, disruption of any of these can cause fetal hypoxia, which, despite compensatory mechanisms, may lead to acidosis which in turn may cause permanent brain damage or even death of the fetus [1]. The aim of fetal monitoring is to alert obstetricians of such challenges or risks for appropriate and timely intervention [2]. The most widely adopted approach of fetal monitoring during labor is by Cardiotocography (CTG) in which both fetal heart rate and uterine activity signals are recorded and then visually inspected by clinicians [3]. Despite the availability of clinical guidelines regarding FHR evaluation from both the National Institute of Child Health and Human Development (NICHD) and the International Federation of Gynecology and Obstetrics (FIGO) [4], [5], the interpretation of FHR demonstrates high intra- and inter-observer variability due to the subjectivity in visual inspections and complexity of CTG recordings [6]. Therefore, many efforts have been made in automated or computerized analysis of FHR recordings. Computerized methods, equipped with data-driven machine learning techniques, are capable of extracting features and discovering patterns that cannot be seen or interpreted by naked eyes.
One of the hallmarks of FHR recordings obtained externally using Doppler ultrasound is the common existence of missing samples, as shown in Fig. 1, which are often caused by, e.g., fetal or maternal movements and misplaced electrodes. Empirically, the percentage of missing samples varies from 0–40% and 0–10% for external Doppler ultrasound measurements and internal direct fetal electrocardiogram (FECG), respectively. Although FECG can provide more accurate measurements, in practice, the FHR recordings are usually obtained externally using Doppler ultrasound techniques because of their non-invasive nature. It is worth noting that there are still no guidelines on what percentage of missing samples will disqualify an FHR recording from visual inspection or from computerized analysis. Although the high percentage of missing samples can be tolerated by obstetricians because of the robustness of human visual perception, computerized analysis requires proper handling of such distortions. In [7], the authors showed that the values of various popular FHR features can change dramatically if the missing samples are not properly estimated or addressed.
Fig. 1.
A segment of un-preprocessed (raw) FHR recording.
In the literature, the estimation of missing FHR samples are often addressed using methods based on sparse representation and dictionary learning, where the sparse coding step and dictionary update step are applied in an alternating fashion until convergence [8], [9]. However, such methods usually exploit linear transformations while the underlying relationship may be better modeled by non-linear transformations. More importantly, such methods are not able to provide estimation results within a probabilistic framework, instead, only point estimates of missing samples are provided. In our previous work [10], we proposed a Gaussian processes-based method for estimation of missing FHR samples, where the time instants of missing samples were assumed to have a uniform distribution. However, in reality, the missing samples are more likely to occur in a consecutive manner, i.e., in the form of bursts. In Fig. 2, we present a histogram of the length of consecutively missing FHR samples obtained from an open access intrapartum cardiotocography database [11]. We observe that the most frequently occurring gap length is 5.
Fig. 2.
A histogram of the length of consecutively missing samples, i.e., gap length, in the open access intrapartum cardiotocography database described in [11].
In this paper, we propose a (deep) GP-based method that is capable of utilizing attractor manifolds for the estimation of consecutively missing samples in FHR recordings. The underlying idea is that by incorporating an attractor manifold, the proposed approach is able to utilize not only the correlation in time but also the similarity in state space, which can benefit the estimation of consecutively missing samples in FHR recordings. We first validated the ability of the GP-based model for attractor reconstruction on a Lorenz system, where the ground truth attractor manifold is accessible. Then the GP-based model was tested on a short segment of a real FHR signal. The results indicate that the GP-based approach perform better than benchmark models that are commonly applied for gap treatment in automated FHR analysis.
II. Background
A. Open Access CTG Database
In this work, we utilized an open access intrapartum CTG database from the Czech Technical University in Prague and the University Hospital in Brno, which has 552 CTG recordings where both FHR and UA signals are sampled at 4Hz. The CTG recordings were carefully selected from 9164 recordings collected between year 2010 and 2012 using various criteria, including, singleton pregnancy, gestational age more than 36 weeks and no a priori known developmental defects, etc. The details about this database are available in [11].
B. State Space Reconstruction
The concept of state space or phase space, which is a space where all the possible states of a studied system are represented, is fundamental in dynamical system modeling. However, often the phase space and the mathematical description of a dynamical system are unknown. Many efforts have been made in developing attractor reconstruction methods in order to reconstruct the phase space of a system. To that end, Takens’ theorem, proposed by Floris Takens in [12], is of great importance. It provides theoretical guarantee that, generically, the information about the hidden states of a dynamical system can be reconstructed from a single observation variable of the system, when the stated conditions in the theorem are satisfied. Next, we state the theorem:
Theorem 1 (Takens’ theorem):
Let be a compact manifold of (integer) dimension d. Then for generic pairs (ϕ, y), where
is a C2-diffeomorphism of in itself,
- is a C2-differentiable function, the map given by
is an embedding of in .
The most common choice of ϕ is a delay by a constant τ. A fundamental contribution of Takens’ theorem is the claim that for a reliable reconstruction of a manifold of dimension d, it is sufficient to have a delay embedding of dimension E = 2d + 1. We point out that in reality the true manifold that is responsible for generating the data is usually latent. As a result, we do not have any knowledge about its true dimension d (or E = 2d + 1). Furthermore, theoretically speaking, τ is a free parameter that can be arbitrarily selected. Nevertheless, in practice, because the length of the time series, i.e., number of observations, is finite, the value of τ will actually affect the quality of attractor reconstruction. If τ is too small, each dimension will be strongly correlated. On the other hand if τ is too large, we will lose information about the underlying dynamical system. In practice, E and τ are often selected using false nearest neighbours [13] and mutual information-based method described in [14], respectively, which are of grid search nature and not principled.
C. Gaussian Processes
A GP is a stochastic process with every finite collection of random variables having a multivariate normal distribution [15]. Essentially it extends a multivariate Gaussian distribution to infinite dimensionality. Therefore, a GP can be seen as a distribution of a real-valued function f(x) in which x denotes the input, which is usually a vector. In the machine learning literature, GPs provide powerful and flexible Bayesian non-parametric framework for modeling functions and mappings, and they have been successfully applied in both supervised and unsupervised learning [16]. Similar to Gaussian distributions, a GP is completely specified by its mean function m(x) and covariance function kf(x, x′), which are defined by , and . To reduce the number of hyperparameters, a GP is often assumed to be zero mean. The GP framework frees us from assuming specific analytical form of the latent function and provides proper uncertainties management within the Bayesian framework. With GPs, assumptions and prior knowledge can be conveniently encoded in their covariance functions. The generative process that is commonly used with GPs is as follows:
(1) |
where is additive white Gaussian noise.
Let be the collection of all input vectors and be the corresponding outputs. The covariance matrix is denoted as , which is constructed by evaluating the covariance function on X. Then the prior distribution of the latent f is given by , where denotes θ the hyperparameters in GP. The learning requires maximizing the log-likelihood under the additive Gaussian noise assumption, which can be derived as
(2) |
from which the hyperparameters θ and noise variance can be estimated by using training data and maximizing the log-likelihood.
D. Deep Gaussian Processes
The joint Gaussianity enforced by the definition of GP ensured the tractability of the framework. However, as a trade-off, the joint Gaussianity also introduced limitation on the expressiveness of GP, e.g., the simple step function cannot be well modeled by a simple GP prior. It can be shown that a GP is closely related to a neural network [17]. With the development of deep neural networks, deep Gaussian processes (DGPs) emerged naturally. The concept of a DGP was first proposed in [18], where the main idea is that nonlinear mappings will not preserve Gaussianity. Therefore, by adopting a function composition, the expressiveness of the model is improved. As a trade-off, the marginal log-likelihood that is required for training is no longer tractable under a deep GP setting. In [18], the intractability was addressed by using variational inference. Many efforts have been made for inference of DGP, e.g., in [19], a doubly stochastic variational inference algorithm was proposed. In our work, we adopted the inference method from [18].
III. Model Description
Given a FHR recording x(t), we first construct an initial attractor manifold reconstruction Minit using delay embedding, with τ = 1 (delay by one sample) and E that is relatively large, e.g., E = 10. The intuition of it is that, a relatively large E ensures the conditions in Takens’ theorem to be satisfied, and τ = 1 is the minimum delay, which suggests that there is no information loss regarding the underlying dynamics. Consequently, Minit is of high dimension, E, and the variables from the different dimensions are highly correlated. The point corresponding to time instant t on is an E-dimensional vector .
Then we model M as the output of a dynamically constrained DGP, as illustrated in Fig. 3, where the input Z is a function of t, i.e., Z is dynamically constrained. Essentially, we model . Since the FHR sigal x(t) contains missing samples, there are three types of E dimensional vectors (rows) or points in M: fully observed vectors collected in Mfo, partially observed vectors collected in Mpo, and entirely unobserved vectors collected in Meu. We train the DGP using the pairs of fully observed vectors and their corresponding time instants, i.e., we set Y and t in Fig. 3 to Mfo and tfo, respectively for training.
Fig. 3.
A general form of dynamically constrained DGPs
Without loss of generality, we continue the discussion with a simpler example of DGP, shown in Fig. 4.
Fig. 4.
A dynamically constrained DGP with three layers of mappings.
The generative process according to this DGP takes the form
(3) |
where , and are additive white Gaussian noises, fY, fX and fZ are the latent mappings governed by three different GPs, i.e., fY ~ GP (0, kY (X, X′)), fX ~ GP (0, kX(Z, Z′)), and fZ ~ GP (0, kZ(t, t′). In this work, the covariance function of the GP for the dynamic layer, i.e., the first layer, is a Matérn class covariance function with , and the covariance functions of the GPs for the remaining layers are RBFs with automatic relevance determination. The dimension for all layers, except for the dynamic layers, are set to E, although different configurations can be explored.
The learning requires maximization of the log-marginal likelihood,
(4) |
Because of the nonlinear mappings, Gaussianity is not preserved. Consequently, the log-likelihood in Eq. 4 is intractable. This difficulty can be addressed by introducing Np ≤ N inducing or pseudo input-output pairs for each intermediate or latent layers, such that, in each layer, the inducing input-output pairs are mapped by the same latent mapping in that layer. Specifically, and where are introduced for the first intermediate layer. Similarly, and where are introduced for the second intermediate layer. The joint probability density function (pdf) can be formulated accordingly:
(5) |
We note that the intractability is caused by the marginalization of the latent variables, which corresponds to the terms p(FY |UY , X) and p(FX|UX, Z). With the inference method proposed in [18], these two terms can be canceled out and a variational lower bound on Eq. 4 can be obtained as
(6) |
The detailed derivation of Eq. 6 can be found in [20] and [21]. This tractable lower bound not only can be used as the objective function for training, but also for providing guidance for setting the number of layers. Some recently analytical analysis on the setting the number of inducing points can be found in [22]. In our experiments, we adopted the DGP shown in Fig. 4, and the number of inducing points was Np = N.
For prediction, the true posterior distribution p(Mpo|Mfo) or p(Meu|Mfo) are approximated using the Gaussian variational distribution q(Meu) or q(Mfo) under the variational framework. In our work, we use the mean of q(Meu) and q(Mfo) as corresponding point estimates, which can be regarded as MAP estimates since the mean of a Gaussian distribution is also its mode. The uncertainty in learning and prediction is embedded in the covariance, which can provide additional guidance in model selection. The specific derivation and computation can be found in [20]. Finally, for the missing samples of the time series x(t), their final estimates are averages over all available estimates, provided in both recovered Meu and Mpo.
IV. Experiments and Results
A. Synthetic Data
We first show that the GP-based method can provide a better attractor reconstruction comparing with direct delay embedding on the well known Lorenz system defined by Eq. 7, which was first studied by Edward Lorenz in modeling atmospheric convection. We recall that the Lorenz system is nonlinear, non-periodic, three-dimensional and deterministic,
(7) |
We generated the ground truth attractor (a set of solutions for Eq. 7) of length 369 with a classic set of parameter values a = 10, , and c = 28. Then x(t) was used to reconstruct the attractor using dynamically constrained DGP and direct delay embedding, respectively. For direct delay embedding, E and τ were optimally set using the grid search method described in [13] and [14], respectively. The reconstruction results, ground truth attratcor , and x(t) are shown in Fig. 5, where we can see that the estimation results of the DGP are more topologically similar to the ground truth attractor than the results obtained by delayed embedding.
Fig. 5.
The time series x(t) generated with Lorenz system (upper), the ground truth attractor (bottom left), reconstructed attractor using direct delay embedding (bottom middle) and reconstructed attractor using DGP (bottom right).
B. Real Segment of FHR Recording
We selected a short FHR segment which contains 360 samples, then 10 consecutive samples were randomly selected to be missing, by setting their values to zero. We then implemented the GP-based approach to recover the values of consecutively missing samples. The estimation results are shown in Fig. 6, where the estimation results provided by linear interpolation, cubic spline interpolation, and an autoregressive model are also included for benchmark purpose. The order of the autoregressive model was optimally selected with Bayesian information criterion (BIC) from model orders from 2 to 10.
Fig. 6.
The segment of FHR recording adopted for experiments (upper), and the corresponding estimation results for different methods (bottom).
The performance of each estimation method is summarized in Table I, where the mean squared error (MSE) was adopted as a performance metric. The results, clearly show that the proposed method outperformed its competitors.
Table I.
Comparisons of performance
estimation method | MSE [BPM] |
GP-based method | 0.36 |
Linear interpolation | 7.08 |
Cubic spline interpolation | 1.98 |
Autoregressive model | 58.83 |
V. Conclusion and Future Work
In this paper, we proposed a Bayesian approach for estimation of consecutive missing samples in FHR recordings based on deep Gaussian processes. By way of attractor manifold learning, our approach has the capacity of utilizing the similarity in state space instead of only considering the correlation in time. The experimental results on a real segment of FHR recording showed that the estimates provided by this GP-based approach are closer to the ground truth and that the proposed method achieved a much better MSE compared to that of the benchmark methods. We point out that we modeled the dynamic of attractor manifold as a function of time, which may not be suitable, depending on the shape of the attractor manifold, e.g., involving sharp changes. In such cases, the resulting posteriors will have large variance and the GPs will have low signal-to-noise ratio (SNR). In future work, we plan to explore the direction of modeling the dynamic of attractor manifold as functions of previously observed samples on the attractor manifold, where the length of history to be included must be determined carefully.
Acknowledgments
This work has been supported by NIH under Award 1RO1HD097188-01.
References
- [1].Bobrow CS and Soothill PW, “Causes and consequences of fetal acidosis,” Archives of Disease in Childhood-Fetal and Neonatal Edition, vol. 80, no. 3, pp. F246–F249, 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Afors K and Chandraharan E, “Use of continuous electronic fetal monitoring in a preterm fetus: clinical dilemmas and recommendations for practice,” Journal of pregnancy, vol. 2011, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Georgoulas G, Karvelis P, Spilka J, Chudáček V, Stylios CD, and Lhotská L, “Investigating ph based evaluation of fetal heart rate (FHR) recordings,” Health and technology, vol. 7, no. 2–3, pp. 241–254, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Ayres-de Campos D, Spong CY, Chandraharan E, and F. I. F. M. E. C. Panel, “FIGO consensus guidelines on intrapartum fetal monitoring: Cardiotocography,” International Journal of Gynecology & Obstetrics, vol. 131, no. 1, pp. 13–24, 2015. [Online]. Available: 10.1016/j.ijgo.2015.06.020 [DOI] [PubMed] [Google Scholar]
- [5].Macones GA, Hankins GD, Spong CY, Hauth J, and Moore T, “The 2008 National Institute of Child Health and Human Development workshop report on electronic fetal monitoring: Update on definitions, interpretation, and research guidelines,” Journal of Obstetric, Gynecologic, & Neonatal Nursing, vol. 37, no. 5, pp. 510–515, 2008. [DOI] [PubMed] [Google Scholar]
- [6].Georgieva A, Abry P, Chudáček V, Djurić PM, Frasch MG, Kok R, Lear CA, Lemmens SN, Nunes I, Papageorghiou AT et al. , “Computer-based intrapartum fetal monitoring and beyond: A review of the 2nd workshop on signal processing and monitoring in labor (october 2017, oxford, uk),” Acta obstetricia et gynecologica Scandinavica, vol. 98, no. 9, pp. 1207–1217, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Spilka J, Chudáček V, Burša M, Zach L, Huptych M, Lhotská L, Janků P, and Hruban L, “Stability of variability features computed from fetal heart rate with artificially infused missing data,” in Computing in Cardiology (CinC), 2012. IEEE, 2012, pp. 917–920. [Google Scholar]
- [8].Oikonomou VP, Spilka J, Stylios CD, and Lhotská L, “An adaptive method for the recovery of missing samples from FHR time series.” in CBMS, 2013, pp. 337–342. [Google Scholar]
- [9].Barzideh F, Urdal J, Hussein K, Engan K, Skretting K, Mdoe P, Kamala B, and Brunner S, “Estimation of missing data in fetal heart rate signals using shift-invariant dictionary,” in 2018 26th European Signal Processing Conference (EUSIPCO) IEEE, 2018, pp. 762–766. [Google Scholar]
- [10].Feng G, Quirk JG, and Djurić PM, “Recovery of missing samples in fetal heart rate recordings with Gaussian processes,” in Signal Processing Conference (EUSIPCO), 2017 25th European IEEE, 2017, pp. 261–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Chudáček V, Spilka J, Burša M, Janků P, Hruban L, Huptych M, and Lhotská L, “Open access intrapartum CTG database,” BMC pregnancy and childbirth, vol. 14, no. 1, p. 16, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Takens F, “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980. Springer, 1981, pp. 366–381. [Google Scholar]
- [13].Kennel MB, Brown R, and Abarbanel HD, “Determining embedding dimension for phase-space reconstruction using a geometrical construction,” Physical Review A, vol. 45, no. 6, p. 3403, 1992. [DOI] [PubMed] [Google Scholar]
- [14].Fraser AM and Swinney HL, “Independent coordinates for strange attractors from mutual information,” Physical Review A, vol. 33, no. 2, p. 1134, 1986. [DOI] [PubMed] [Google Scholar]
- [15].Rasmussen CE, Gaussian Processes for Machine Learning. Citeseer, 2006.
- [16].Feng G, Quirk JG, and Djurić PM, “Supervised and unsupervised learning of fetal heart rate tracings with deep gaussian processes,” in 2018 14th Symposium on Neural Networks and Applications (NEUREL). IEEE, 2018, pp. 1–6. [Google Scholar]
- [17].Duvenaud D, Rippel O, Adams R, and Ghahramani Z, “Avoiding pathologies in very deep networks,” in Artificial Intelligence and Statistics, 2014, pp. 202–210. [Google Scholar]
- [18].Damianou A and Lawrence N, “Deep gaussian processes,” in Artificial Intelligence and Statistics, 2013, pp. 207–215. [Google Scholar]
- [19].Salimbeni H and Deisenroth M, “Doubly stochastic variational inference for deep Gaussian processes,” in Advances in Neural Information Processing Systems, 2017, pp. 4591–4602. [Google Scholar]
- [20].Damianou A, “Deep gaussian processes and variational propagation of uncertainty,” Ph.D. dissertation, University of Sheffield, 2015. [Google Scholar]
- [21].Damianou AC, Titsias MK, and Lawrence ND, “Variational inference for latent variables and uncertain inputs in Gaussian processes,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1425–1486, 2016. [Google Scholar]
- [22].Burt DR, Rasmussen CE, and Van Der Wilk M, “Rates of convergence for sparse variational gaussian process regression,” arXiv preprint arXiv:1903.03571, 2019. [Google Scholar]