Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2014 Nov 14;2014:1748–1757.

Risk Prediction for Acute Hypotensive Patients by Using Gap Constrained Sequential Contrast Patterns

Shameek Ghosh 1, Mengling Feng 2,3, Hung Nguyen 4, Jinyan Li 1,4,*
PMCID: PMC4419954  PMID: 25954447

Abstract

The development of acute hypotension in a critical care patient causes decreased tissue perfusion, which can lead to multiple organ failures. Existing systems that employ population level prognostic scores to stratify the risks of critical care patients based on hypotensive episodes are suboptimal in predicting impending critical conditions, or in directing an effective goal-oriented therapy. In this work, we propose a sequential pattern mining approach which target novel and informative sequential contrast patterns for the detection of hypotension episodes. Our results demonstrate the competitiveness of the approach, in terms of both prediction performance as well as knowledge interpretability. Hence, sequential patterns-based computational biomarkers can help comprehend unusual episodes in critical care patients ahead of time for early warning systems. Sequential patterns can thus aid in the development of a powerful critical care knowledge discovery framework for facilitating novel patient treatment plans.

Introduction

Critical care patients in an intensive care unit (ICU) may undergo dynamic and rapid physiological changes, subject to various biological conditions. There are multiple symptoms which require immediate attention in an ICU. Among these, acute hypotensive events (AHE) are of great importance1. An AHE is defined as a drastic minimization of patient blood pressure for an extended period of time. It can be caused by shock, and it may lead to multiple organ failures. As a result, changes in hemodynamic conditions need to be detected as early as possible, so that effective medical interventions can be staged. Subsequently, the effectiveness of a medical intervention in an ICU can be assessed by the associated risk of mortality and the medical costs involved1,2. Both these factors tend to rise with the passage of critical time. Hence, a medical intervention could be termed most effective if it has been staged pro-actively to prevent a shock, based on an early warning system. A pro-active intervention is contingent to obtaining clinical evidence of an impending event in a patient. Good clinical evidence may have two important properties viz. 1) predictive capability and 2) interpretability i.e. intelligible to a clinician, so as to make a quick decision. Interpretable evidence is probably the first step that helps a clinician decide the next course of action. The path to a good treatment plan is thus largely dependent on a clinician’s strong understanding of the patient’s physiological state, from the time the treatment was initiated. Discovery of such representatively simple yet effective clinical evidence requires the development of a powerful knowledge discovery framework which possesses the above properties and can process readily available streaming critical care data. Stream bed monitors, which continuously monitor physiological variables viz. arterial blood pressure, heart rate, pulse and blood temperature, generally have embedded rules to predict critical events, but they are also known to generate a lot of false alarms3. Such devices seldom take into account the sequences of micro physiological events and numerous associations among physiological variables in the course of the patient’s ICU stay. Thus, a system that is capable of deriving events that are based on temporal relationships to predict future hemodynamic behavior can be highly beneficial to clinicians in various ways such as - 1) reduction in ICU operational costs and increase in efficiency, 2) in the development of novel goal directed therapies and 3) reporting of a patient’s state for scheduling additional services2.

Generally, multivariate analyses of existing data are very useful for ascertaining prognosis at population cohort levels and to holistically improve the efficiency of resource allocation in hospitals, but may not be of much help, in the context of a specific ICU patient with a rapidly developing critical condition2. In contrast, individual goal directed therapies could be administered to an ICU patient, based on dynamic analyses of streaming physiological data to help counter critical conditions in a shorter span of time4,5. Typically, an array of machine learning methods has been employed in the past to make individualized patient predictions. Although, the theoretical novelties of these methods are substantial, yet their black-box nature impedes the model interpretability6. As a result, significant results in machine learning may not be transformed into interpretable solutions, which could be analyzed by clinicians towards developing evidence-based therapies. In contrast, identifying interesting structures from continuous streams of physiological data may be useful in providing a dynamic view of the patient’s hemodynamic state7 and help develop a testable hypothesis towards initiating a medical intervention.

In this study, we propose to mine interesting minimal sequential contrast patterns from blood pressure time series, which can be used as novel computational biomarkers in a predictive model to stratify the risks of patients for the onset of a future AHE. These informative sequential patterns are extracted from a given patient population, who had historical occurrences of AHE and/or had been administered pressor medications to stabilize blood pressure levels.

The importance of this work lies in the fact that long-term physiological time series data in ICUs that register episodes of sharp rises and falls in various sequences are extremely hard to be explored by general machine learning models. In contrast, sequential pattern mining methods can extract unique episodes of arbitrary length which may be over or under-represented in medical time series data. These episodes may not just be employed to determine the onset of an AHE, but can also gauge a clinical understanding of the types of episodes that are characteristic of specific critical conditions like AHE.. The motivation towards this study is thus driven by a requirement to generate novel medical insights in critical care, which is possible by the application of promising sequential pattern mining approaches to discover clinically relevant episodes connected by temporal relationships.

Our contributions in this work mainly involve - 1) the advancement of existing work in acute hypotension prediction using a sequential pattern mining approach, 2) a demonstration of the feasibility of extracting easily understandable distinguishing sequential patterns from physiological time series that could be effective in predicting AHE ahead of time, and 3) validation of sequential patterns using a large scale critical care database like MIMIC II9, with the intention of developing novel pattern mining methods for making predictions in ICUs.

The Blood Pressure (BP) Prediction Problem

The objective of the BP prediction problem is to predict the arterial blood pressure (ABP) of patients in an ICU. Knowledge of the ABP state within normal function or hypertensive (high BP) or hypotension (low BP) regimes, in a future time window may turn out to be critical information. Typically, the ABP is a variable that registers strong correlation with the heart beat frequency within normal physiological limits and is measured in mmHg8. A derived variable known as the mean arterial pressure (MAP) is often used in medicine as the popular measure of blood pressure which can be defined a0073

MAP=2(diasPress)+sysPress3 (1)

In equation (1), sysPress (the systolic blood pressure) denotes the arterial blood pressure when the heart beats while pumping blood, whereas diasPress refers to blood pressure when the heart is at rest between beats.

For learning BP behaviour, as shown in the MAP time series in Figure 1, the learning window is shown as beginning from the ICU admission to T0. Our aim is to predict the occurrence of an AHE in the prediction window demarcated by the period T0 to T0+1 hour.

Figure 1:

Figure 1:

Mean Arterial Pressure time series with learning and prediction windows. T0 indicates the time point following which, an AHE prediction is to be made. Typically the window after T0, up to the patient’s discharge, is part of a future time window. For purposes of prediction, one may choose to predict an event (an AHE) in a prescribed prediction window (i.e. from T0 to T0+1 hour).

Towards this objective, a dataset for a large number of patients has been made available as part of MIMIC II - a multi-parameter intensive care monitoring database9. MIMIC II consists of approximately 30,000 patients with ABP waveforms in various contiguous segments ranging in minutes to hours to even days that were recorded for patient stays in the ICU. In the context of BP prediction, a subset of patient data with occurrences of acute hypotension was extracted from MIMIC-II and distributed as part of a set of challenges organized in 2009 by Physionet8. The details of the challenge datasets are provided in the following section.

Data Description

Acute Hypotensive Episode (AHE)

Given a MAP time series, an acute hypotensive episode (AHE) may be defined as a period of 30 minutes or more when 90% or more of the MAP measurements are below the 60 mmHg regime. For training purposes, MIMIC II waveform signals were divided into two major groups viz. H and C, which indicated an occurrence of AHE in the forecast window and no occurrences of AHE in the forecast window. The groups H and C were further subdivided into H1, H2 and C1, C2. The definitions for each sub-group are as follows.

  • H1: Patients receiving pressor medication.

  • H2: Patients not receiving pressor medication.

  • C1: Patients with no acute hypotensive episodes during entire hospital stay.

  • C2: Patients having AHE before or after the forecast window.

Vital signs time series variables like heart rate (HR) and mean, systolic, and diastolic ABP were available for all the patient records. Some of the records often included respiration rate and SpO2. Although for the purposes of this study, we only consider the MAP time series (mean ABP) for each patient, as done in earlier studies. Accordingly, the prediction tasks consisted of the following two events -

  • Event I: Patient risk classification between H1 and C1

  • Event II: Patient risk classification between H and C

Dataset specifications are as given in Figure 2.

Figure 2:

Figure 2:

Dataset Specifications for the AHE prediction problem

Especially in the case of Event I, the objective of the classification is to distinguish between two groups of patients, in which both received pressor medications. According to Moody and Lehman (2009), these two sets (H1 and C1) represented the extremes of AHE-associated risks.

Related Work

In recent years, a number of methods and scores have been suggested for understanding and predicting patient hemodynamic behaviour7. The Physionet 2009 AHE detection challenge served to advance such studies further by providing a test-bench for the use of neural-network based multi-models, static rules, support vector machines, histograms and statistical indices10,11,12,13,14. Although, these models extract informative features to construct off-line predictive models, they may be limited in their scope when analyzing real-time longitudinal medical data. Moreover, complete utilization of long range time series data may not be possible without powerful dimensionality reduction techniques. Typically, sequences of unusual spikes or falls in the physiological variables if detected may go a long way in the understanding of critical blood pressure behaviour that tends to occur before the onset of critical conditions. In a related context, Wang et al.15,16 indicated that pattern extraction from medical data is particularly challenging due to their sparse and longitudinal nature. Towards this aspect, the authors proposed a geometric image based mapping framework to mine temporal event signatures from large scale heterogeneous records of hospital patients. Moreover, popular methods like motif mining in bioinformatics have also been applied to cardiovascular time series for classifying medical events17. Accordingly, recent attention on data mining in healthcare indicate the shifts from traditional pattern mining problems to handling specific cases towards medical care improvements. Mining complex medical patterns could thus be more predictive of immediate outcomes, and may be able to report episodes soon after ICU admissions. From a clinical point of view, identifying such patterns are paramount, to detect and understand the variability of a critical care patient’s physiological response to an event like hypotension and thus help in the determination of the most optimal treatment combinations.

Methodology

In the following sections we provide an overview of sequential pattern mining and subsequently describe the processes involved in the discovery of sequential contrast patterns that were over-represented in the positive samples and under-represented in the negative samples. Sequential patterns are generally described in the form of short sub-sequences. Depending on the constraints of the formulated problem, representative sequences of certain lengths may be grown, which may demonstrate extensive repetitive behavior in the concerned datasets. In this context, sequential pattern mining techniques can provide a flexible and fast way to deal with streaming physiological data to derive interesting patterns which may provide important insights in the form of a chain of episodes, which could be of concern.

Symbolic Data Transformation

With the explosion of streaming real-valued physiological time series data generated from stream bed monitors in ICUs, it becomes essential to transform such data into symbolic representations for large scale data mining operations. Symbolic sequences can subsequently help avail a variety of existing pattern mining algorithms for efficient manipulation of such representations. Pattern mining algorithms typically employ techniques related to hashing, markov models, suffix trees, decision trees etc., which may be applied to symbolic sequences unlike real-valued continuous representations. In the last decade, the symbolic aggregate approximation (SAX) method has emerged as a popular technique which achieves efficient informative symbolization of large-scale time series data (for e.g. more than a billion time points)18. SAX first transforms the real-valued time series into a piecewise aggregate approximation (PAA) representation19 and then converts the PAA series to a symbolic string. As claimed by the authors, the advantages of SAX involve that of dimensionality reduction and lower bounding. As a result, due to the nature of physiological time series generated over a number of days, and their importance in determining critical conditions, SAX provides a proper platform to create efficient indexing and pattern mining algorithms for medical purposes. Figure 3 illustrates a real-valued time series being converted to a sequence like PQRQQPQ, where P, Q and R are the three regions demarcated by the X and Y cut-points on the vertical axis.

Figure 3:

Figure 3:

SAX Approximation of a time series. The given time series is symbolically represented as PQRQQPQ, using P, Q and R which denote three equiprobable regions.

Each MAP time series patient record before being discretized to a symbolic form goes through certain data transformation steps based on the SAX algorithm. This involves normalization of the time series to have a mean as 0 and variance as 1. Since normalized time series follow a Gaussian distribution, breakpoints are selected such that discrete symbols are equiprobable in the time series. For a normalized time series, this means that with five symbols, discrete regions are given by [−inf, −0.84, −0.25, +0.25, +0.84, +inf]. SAX thus adopts a symbolic representation which characterizes the inherent properties of the time series data. Hence SAX ensures that there exists an equiprobable distribution of symbols in the given time series.

Contrast Patterns

Mining contrasting patterns between groups of data under various labels was initially introduced as Emerging patterns by Dong & Li20. It is a sub-field of pattern mining, which typically aims to discover statistically significant patterns based on principles of frequency support, in various kinds of data viz. transactional, sequences, time series etc. Given that sequences are an important representation for data, minimal distinguishing subsequences (MDS) were later proposed as sequential contrast patterns21. Typically, an MDS is a sequential pattern that does not have a subsequence, which itself is a sequential pattern satisfying algorithmic preconditions of minimum frequency support in the given data. In the following sections, we briefly describe the terminologies and processes associated with extracting MDS patterns.

Definitions and Terminologies

Subsequences

Let us define a set of items as I = {i1, i2, i3…,in}, also known as the alphabet. A sequence S = e1-e2-…-en can be defined as an ordered collection of items belonging to I. A sequence A = ek1-ek2-…-ekm is said to be contained in B = e1-e2-…-en, such that 1<k1<k2<…<km<n. This means that the order of sequence A is maintained in B, although the exact items may not be consecutive i.e. occurrences of A in B, may have gaps between the exact items, but the order will always be maintained. Thus if A is contained in B, then B is a supersequence of A. For example, given a set of symbols, P={X, Y, Z}, XY is a subsequence of XZYZ, but not YX.

Suppose D = {D1, D2… Dn} be a set of sequences organized in a database. Now if there exists a sequential pattern P, such that P is contained in ‘q’ number of sequences in the database, then ‘q’ is defined as the frequency support of P, denoted as freqsup.

Minimal Sequential Contrast Patterns

Given two sets of sequences D+ and D, where D+ comprises of sequences with a positive class label and D is from a negative class label, we need to find the set S of all subsequences of a given length L, such that each subsequence sk ϵ S has the following properties:

i)freqsuppos(sk)α (2)
ii)freqsupneg(sk)<β (3)

iii) There is no subsequence of sk that satisfies (2) and (3)[condition of minimality]

Here α is the minimum support required for D+ and β is the maximum support allowed for D.

Thus given D+, D, α and β, the MDS mining problem is concerned with finding all the minimal sequential patterns that are highly likely to occur in the positive set of samples but less likely to occur in the negative set.

Gap Constraint

While mining for a sequential pattern or a sequence, it may not be necessary for elements in a sequence, to occur consecutively. In such a scenario, defining a maximum gap constraint denoted as g, allows the mining algorithm to search for a sequence, in which consecutive elements may have contiguous differences upto g. For example, if g=2, then XY is a subsequence of XZY but not XZZZY.

Max-prefix

The max-prefix of a sequence A is the leading sequence of elements, without the final element of A. Thus the max-prefix of XZY is XZ.

Suppose, D+ = {XYZY, YZXY, YYZY} and D = {XZXZ, ZXXY}, are two sets of sequences. Let us consider ZY as a sequential pattern. Then for G=1, ZY has a frequency support of 3 in D+ and 0 in D. In this context, if the positive (α) and negative (β) thresholds are set to 2 and 1, then ZY may be accepted as a sequential pattern.

Mining Minimal Contrast Sequences in Symbolic MAP Data

Towards the purpose of mining minimal contrast subsequences we employ the ConSGapMiner21 algorithm, which is used to solve the MDS mining problem with gap constraints. The algorithm involves the application of a depth first search (DFS) technique to generate a set of distinguishing subsequences between two sets of sequences. Internally, the frequency support of each generated subsequence is computed for comparison with α and β, as per conditions (2) and (3) (the minimum positive and maximum negative support thresholds). After the set of all contrast subsequences are obtained, a post-processing step is employed to remove sequences that are non-minimal. These three stages of the algorithm are described next.

Generating Candidate Subsequences

Traditionally, every pattern mining algorithm begins with the candidate generation process. A candidate solution is one that can satisfy all the constraints posed by the concerned problem formulation. To generate candidate subsequences for MDS mining, DFS is performed to obtain a lexicographic sequence tree (LST) as shown in Figure 5. The DFS operation is a popular technique to grow rooted trees which have user-defined nodes. In the present context, each node in the LST (i.e. the DFS constructed tree) represents a sequence, along with its positive and negative frequency supports21. For example, in the LST in figure 5, node AAC(2,1) represents the sequence AAC with 2 as positive and 1 as negative supports. A child sequence may be grown by extending the parent sequence with a unique symbol from the alphabet, based on a certain lexicographic order. Thus, given the present LST, whose alphabet is defined as I = {A, B, C}, AAC may has three children nodes as AACA, AACB and AACC. Subsequently each node’s supports are computed from the positive (D+) and negative (D) sample sets.

Figure 5:

Figure 5:

Part of a lexicographic sequence tree (LST), where the symbol alphabet consists of {A, B, C}.

Pruning non-minimal distinguishing subsequences

After a sequence node is generated, if it satisfies the conditions (2) and (3), then the sequence is not extended any further. This is based on the fact that a supersequence of a distinguishing contrast sequence cannot be minimal21. Testing the minimality condition on generated sequences allows us to restrict the generation of redundant sequential patterns or superseqeunces. Handling minimality thus turns out to be extremely important since it can be a major factor in speeding up the mining process.

Pruning infrequent max-prefix extensions

If the current node’s positive support is less than α, then we need not extend that node further. This is because, if a certain node fails to satisfy the condition in equation (1), its descendent nodes are also expected to be infrequent21.

Checking Gap Constraints

Given a sequence XY, a gap constraint of g in a sequence P may be considered satisfied for XY if the sequential difference between the position of X and Y does not exceed g. For verifying if a generated candidate satisfies the gap constraint, a bitmap representation22 is employed.

As shown in Table 1, we check for the gap satisfaction of XY in XZXZY, when maximum gap is set to 2. Initially, all the occurrences of X in the given sequence are set to 1(as shown in Xindex). In the present case, these are positions 1 and 3, in the given sequence. Later, (g+1) bits are set to 1 for each occurrences of X, separately as illustrated in rows 3(given as 1X) and 4(given as 2X). The bit vectors in rows 3 & 4 are then processed by the logical OR operator in row 5. Finally, a logical AND operation is carried out on the bit vectors in row 5 and for the occurrence of Y in row 6 to obtain a final sequence of bits, as in row 7. An occurrence of 1 anywhere in the final bit vector indicates that the gap constraint of 2 has been satisfied.

Table 1.

Checking gap constraint satisfaction of XY in XZXZY. The existence of a ‘1’ in the last (7th) row denoted by AND indicates that a maximum gap of 2 is satisfied by XY.

X Z X Z Y
Index 1 2 3 4 5
Xindex 1 0 1 0 0
1X 0 1 1 1 0
2X 0 0 0 1 1
1X(OR)2X 0 1 1 1 1
Y 0 0 0 0 1
AND 0 0 0 0 1

Post Processing

A post-processing step is finally applied to remove any sequence, which turns out to be a supersequence of at least another shorter subsequence in the resultant MDS set.

A flowchart illustrating the data flow architecture of the system is given in Figure 6.

Figure 6:

Figure 6:

The Data Flow Architecture for mining sequential contrast sequences using a depth first search method (DFS).

Results and Discussion

The above described methodology was employed to mine sequential contrast patterns for the Physionet 2009 AHE prediction challenge datasets. Towards this purpose, we employed the SAX converted truncated MAP time series data from the given observation windows, for each patient in the given AHE training datasets. For event I, the training set comprised of H1 (for positive) and C1 (for negative), whereas for event II the whole dataset of size 60 was considered for training purposes.

In this context, an example training record like a40439 consisted of T0 indicated as 18.30 on 04/09/2008 (T0 was provided with each record). The time series data prior to T0 was thus used for training purposes (treated as the observation window). The extracted contrast sets for the two events were then applied to the test sets (as given in Figure 2). Each test segment comprised of an unlabeled MAP time series segment. If a test sample matched any one of the sequences in the contrast set, it was predicted as positive (H1 for event I and H for event II). A number of simulations were carried out with various parameters viz. subsequence length (L), alphabet size (S) and maximum gap (G). A 2 –fold cross validation (CV) was also performed using the larger training dataset consisting of 60 samples (30 – H and 30 - C), which reported a CV accuracy of 94.9%. In Table 2, the AHE test prediction accuracies have been given for event I and II, when the gap constraint was set to 3. As seen, the best performances were achieved using a maximum gap of 3, subsequence length of 10 and alphabet of cardinality 5.

Table 2.

AHE Test Prediction Classification Accuracies while varying S and L for events I and II. Best performances are recorded for G=3, S=5, L=10/11 for event I and II.

G=3 Event I Event II
S=3 S=4 S=5 S=3 S=4 S=5
L=5 5/10 5/10 7/10 19/40 19/40 32/40
L=6 5/10 5/10 7/10 19/40 19/40 32/40
L=7 5/10 5/10 7/10 19/40 19/40 32/40
L=8 5/10 7/10 7/10 23/40 23/40 32/40
L=9 5/10 7/10 9/10 23/40 25/40 33/40
L=10 5/10 7/10 10/10 25/40 32/40 36/40
L=11 5/10 7/10 10/10 25/40 32/40 36/40

In Table 3, we provide a comparison of our results with the reported results from the Physionet 2009 challenge. As seen, models employing neural networks (GRNN, RPS-NN) and kernel methods like SVM are heavily dependent on several parameters, and can have performances over wide ranges11,12,23. Most of the other methods employed rules based on simple averaging measures and still performed fairly10,13. Moreover, hidden markov models (HMM) for hypotension had reported a cross-validation accuracy close to 97%24, which compares well with our cross-validation results too. Given this context, our results are comparable to the physionet 2009 challenge results. A general trend may be observed, where informative sequences could be extracted if the maximum gap constraint is iteratively increased. This is demonstrated by the Figure 7. The given results indicate that G=3 provides a good coverage to the length of the sequence.

Table 3:

A Comparison of classification methods employed for the AHE prediction problem. Sequential patterns report comparable accuracies against existing methods.

Method Event I Event II
GRNN11 10/10 37/40
5-min average of diastolic ABP10 10/10 37/40
MAP averaging Rule12 10/10 36/40
5-min average of ABP10 10/10 36/40
Linear Regression10 10/10 36/40
Median of MAP10 10/10 34/40
NN with feature selection12 9/10 32/40
SVM13 10/10 30/40
RPS-NN12 2/10 25/40
Sequential Contrast Patterns 10/10 36/40

Figure 7:

Figure 7:

Effect of parameters L and G on the performance (A) For Event I, (B) For Event II.

It may be observed that performances tend to improve with larger values for gap sizes. At the same time keeping G too large, also means that two consecutive symbols in a sequential pattern may have occurred over a wide range, whose size was G. Extremely large gap sizes can thus impede a proper interpretation of contiguous events of importance. Finding an optimal value of G is therefore important to obtain meaningful predictions. Thus finding interesting sequences is highly dependent on the use of various parameters like the number of symbols, subsequence length and gap sizes. Based on the results, for detecting differential blood pressure patterns, obtaining optimal gap sizes may be more effective in reporting important episodes. For larger cohorts, finding out the optimal gap would thus be very important, given their dependency on the resolution of the time series (i.e. the sampling frequency). In addition, increasing S provides more number of discrete cut points for MAP, and enables the algorithm to capture patterns which characterize significant fluctuations in the BP. Thus, for cases with S=5, the algorithm is able to find a more expressive pattern, than for S=3. This also contributes to making improved predictions. Hence, selecting an alphabet size of 5 turned out to be an optimal choice, both in terms of the discretization of blood pressure range as well as keeping the algorithmic running costs within limits.

In contrast to our method, the 5 minute averaging measures are statistical features obtained from a 5 minutes window prior to the immediate occurrence of an AHE. Thus, a major difference lies in the fact that our method considers a wider window prior to the onset of AHE in comparison to just a 5 minute window using averaging measures10. This also indicates that a method which is effective in predictions within a 1 hour window may be more suitable in a real time scenario, in comparison to statistical measures obtained from a 5 minutes time window (prior to AHE). In this context, better results from 5 minutes prior to an AHE, may be due to temporal proximity to the onset of an AHE. For methods based on neural networks, both GRNN and RPS-NN report 10/10, 2/10 (for Event 1) and 37/40, 25/40 (for Event II). These methods tend to be strongly dependent on parameter tuning, as was also discussed by the authors11. The contrast mining method, on the other hand, helps to extract discretized sequential representations of the MAP time series, which provide the maximum support towards the occurrence of an AHE. These patterns are later useful, to not only predict an AHE for an unknown record, but may also be employed for further clinical interpretation by domain experts.

Our results indicate that sequential contrast patterns are capable of extracting informative symbolic episodes, which may be employed for both AHE risk prediction and understanding of hemodynamic behaviour towards effective analyses of sequential episodes that may be indicative of medical symptoms.

Clinical Significance of Sequential Contrast Patterns

In contrast to traditional machine learning models, the purpose of pattern mining is to extract hidden and interpretable knowledge from large amounts of data. In a typical medical care environment, the consumers of an important medical insight are both clinicians and computational systems. Machine learning methods like neural networks, SVM etc. which are heavily dependent on parameters may use extracted patterns as inputs/features to fine tune models, but do not crawl large-scale medical databases towards clinical knowledge discovery. Some of the sequential contrast patterns from the contrast sets extracted by the MDS algorithm are given in Table 4 for each of the two events.

Table 4:

Examples of sequential contrast patterns for AHE prediction events

Event I DEDEDABCDC, DCEDCBCDCD, BCDCACDEDC, DCECACDEDE
Event II CABAEECBCD, ABAEDBBBCA, ECBABABCD, ABBDECBCD

Our approach was accordingly able to extract a sequential pattern like ABAEDBBBCA, which was prominent in acute hypotensive patients. Since the mean arterial pressure area was divided into 5 equiprobable regions (given by A, B, C, D, E), the above pattern indicates that the blood pressure signal follows a situation where majority of the AHE patients record an episode of events represented by the MAP value in the following order of blood pressure regimes − A < B < A < E< D < B < B < B < B < C < A. Extracting a pattern of this nature which may be regularly occurring in AHE patients, in contrast to patients with no occurrences of AHE, provides a sound basis towards carrying out a medical analysis of the given train of events. If later, we could derive the exact symptomatic signs displayed by a patient corresponding to this chain of events, we can establish a potential combination of observable physical indicators that precede AHE. Traditional machine learning models like SVM, neural networks are typically unable to extract such patterns that may lead to a significant discovery of a sequence of important stages or events which possibly lead to critical condition. Thus a higher frequency of the occurrence of complex contrast sequences while comparing hypotensive and normotensive patient groups may be beneficial to a clinician to develop a clinical hypothesis relating a succession of clinical events leading to an AHE. Thus, extracting sequential patterns from hypotensive patient groups can inform decision-making towards the diagnosis and investigation of AHEs. Moreover, the MDS method is flexible enough to accommodate clinician-defined constraints to detect specific types of patterns in medical care databases. Future work in this area, would concentrate on mining contrast patterns for larger populations with AHE. Although, earlier studies have reported good results using MAP, yet the development of real-time robust predictors using multivariate physiological data has still remained an open area. Additionally, a significant problem in predicting hypotensive events arises when the prediction window is placed further away from the observation window.

Conclusion

In this study, we present a novel application for mining contrast sequential patterns to predict acute hypotension in critical care patients. Mining sequential patterns has typically been a difficult problem for streaming long sequences. Moreover, this study aimed to introduce the extraction of contrast sequences for predicting critical conditions in ICUs. The importance of this work is associated with the introduction of a powerful sequential pattern mining based knowledge discovery process for analyzing time series data towards critical care predictions. In the future, we intend to enhance the existing framework for real time analyses while considering temporal associations and computational speed, in the context of specific issues in intensive care units.

Acknowledgments

Shameek Ghosh would like to thank Dr. Qian Liu, Jing Ren and Renhua Song from the Advanced Analytics Institute, the University of Technology, Sydney for their comments about the manuscript.

References

  • 1.Roth RN, Idris AH, Fowler . Hypotension and shock. In: Kuehl AE, editor. Prehospital systems and medical oversight. 3rd ed. Dubuque: Kendall/Hunt; 2002. [Google Scholar]
  • 2.Lilly CM, Cody S, Zhao H, Landry K, Baker SP, McIlwaine J, et al. Hospital mortality, length of stay, and preventable complications among critically ill patients before and after tele-ICU reengineering of critical care processes. JAMA. 2011;305(21):2175–2183. doi: 10.1001/jama.2011.697. [DOI] [PubMed] [Google Scholar]
  • 3.Pinsky MR. Hemodynamic evaluation and monitoring in the ICU. CHEST Journal. 2007;132(6):2020–2029. doi: 10.1378/chest.07-0073. [DOI] [PubMed] [Google Scholar]
  • 4.Mayaud L, Lai PS, Clifford GD, Tarassenko L, Celi LA, Annane D. Dynamic Data During Hypotensive Episode Improves Mortality Predictions Among Patients With Sepsis and Hypotension*. Critical care medicine. 2013;41(4):954–962. doi: 10.1097/CCM.0b013e3182772adb. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ehlenbach WJ, Cooke CR. Making ICU Prognostication Patient Centered: Is There a Role for Dynamic Information?*. Critical care medicine. 2013;41(4):1136–1138. doi: 10.1097/CCM.0b013e31827c03eb. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hart A, Wyatt J. Evaluating black-boxes as medical decision aids: issues arising from a study of neural networks. Informatics for Health and Social Care. 1990;15(3):229–236. doi: 10.3109/14639239009025270. [DOI] [PubMed] [Google Scholar]
  • 7.Truijen J, van Lieshout JJ, Wesselink WA, Westerhof BE. Noninvasive continuous hemodynamic monitoring. Journal of clinical monitoring and computing. 2012;26(4):267–278. doi: 10.1007/s10877-012-9375-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Moody GB, Lehman LH. Predicting acute hypotensive episodes: The 10th annual physioNet/computers in cardiology challenge. Computers in Cardiology. 2009:541–544. [PMC free article] [PubMed] [Google Scholar]
  • 9.Saeed M, Villarroel M, Reisner AT, Clifford G, Lehman LW, Moody G, Heldt T, Kyaw TH, Moody B, Mark RG. Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database. Crit Care Med. 2011;39:952–960. doi: 10.1097/CCM.0b013e31820a92c6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen X, Xu D, Zhang G, Mukkamala R. Forecasting acute hypotensive episodes in intensive care patients based on a peripheral arterial blood pressure waveform. Computers in Cardiology. 2009:545–548. [Google Scholar]
  • 11.Henriques J, Rocha TR. Prediction of acute hypotensive episodes using neural network multi-models. Computers in Cardiology. 2009:549–552. doi: 10.1016/j.compbiomed.2011.07.006. [DOI] [PubMed] [Google Scholar]
  • 12.Mneimneh MA, Povinelli RJ. A rule-based approach for the prediction of acute hypotensive episodes. Computers in Cardiology. 2009:557–560. [Google Scholar]
  • 13.Fournier PA, Roy JF. Acute hypotension episode prediction using information divergence for feature selection, and non-parametric methods for classification. Computers in Cardiology. 2009:625–628. [Google Scholar]
  • 14.Ho TCT, Chen X. Utilizing histogram to identify patients using pressors for acute hypotension. Computers in Cardiology. 2009:797–800. [Google Scholar]
  • 15.Wang F, Lee N, Hu J, Sun J, Ebadollahi S, Laine AF. A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data. IEEE Trans Pattern Anal Mach Intell. 2013;35(2):272–285. doi: 10.1109/TPAMI.2012.111. [DOI] [PubMed] [Google Scholar]
  • 16.Wang F, Lee N, Hu J, Sun J, Ebadollahi S. Towards heterogeneous temporal clinical event pattern discovery: a convolutional approach; Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining; 2012. pp. 453–461. [Google Scholar]
  • 17.Syed Z, Stultz C, Kellis M, Indyk P, Guttag J. Motif discovery in physiological datasets: a methodology for inferring predictive elements. ACM Trans Knowl Discov Data. 2010;4(1):2. doi: 10.1145/1644873.1644875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lin J, Keogh E, Lonardi S, Chiu B. A symbolic representation of time series, with implications for streaming algorithms; Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery; 2003. pp. 2–11. [Google Scholar]
  • 19.Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems. 2001;3(3):263–286. [Google Scholar]
  • 20.Dong G, Li J. Efficient mining of emerging patterns: Discovering trends and differences; Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining; 1999. pp. 43–52. [Google Scholar]
  • 21.Ji X, Bailey J, Dong G. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems. 2007;11(3):259–286. [Google Scholar]
  • 22.Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation; Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining; 2002. pp. 429–435. [Google Scholar]
  • 23.Jousset F, Lemay M, Vesin JM. Computers in cardiology/physioNet challenge 2009: Predicting acute hypotensive episodes. Computers in Cardiology. 2009;36:637–640. [PMC free article] [PubMed] [Google Scholar]
  • 24.Singh A, Tamminedi T, Yosiphon G, Ganguli A, Yadegar J. Hidden Markov Models for modeling blood pressure data to predict acute hypotension; IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP); IEEE; 2010. pp. 550–553. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES