Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: J Biomed Inform. 2023 Mar 20;140:104344. doi: 10.1016/j.jbi.2023.104344

Discovering interpretable medical process models: A case study in trauma resuscitation

Keyi Li a,*, Ivan Marsic a, Aleksandra Sarcevic b, Sen Yang d, Travis M Sullivan c, Peyton E Tempel c, Zachary P Milestone c, Karen J O’Connell c, Randall S Burd c
PMCID: PMC10111432  NIHMSID: NIHMS1886015  PMID: 36940896

Abstract

Understanding the actual work (i.e., “work-as-done”) rather than theorized work (i.e., “work-as-imagined”) during complex medical processes is critical for developing approaches that improve patient outcomes. Although process mining has been used to discover process models from medical activity logs, it often omits critical steps or produces cluttered and unreadable models. In this paper, we introduce a Trace Alignment-based Process Discovery method called TAD Miner to build interpretable process models for complex medical processes. TAD Miner creates simple linear process models using a threshold metric that optimizes the consensus sequence to represent the backbone process, and then identifies both concurrent activities and uncommon-but-critical activities to represent the side branches. TAD Miner also identifies the locations of repeated activities, an essential feature for representing medical treatment steps. We conducted a study using activity logs of 308 pediatric trauma resuscitations to develop and evaluate TAD Miner. TAD Miner was used to discover process models for five resuscitation goals, including establishing intravenous (IV) access, administering non-invasive oxygenation, performing back assessment, administering blood transfusion, and performing intubation. We quantitively evaluated the process models with several complexity and accuracy metrics, and performed qualitative evaluation with four medical experts to assess the accuracy and interpretability of the discovered models. Through these evaluations, we compared the performance of our method to that of two state-of-the-art process discovery algorithms: Inductive Miner and Split Miner. The process models discovered by TAD Miner had lower complexity and better interpretability than the state-of-the-art methods, and the fitness and precision of the models were comparable. We used the TAD process models to identify (1) the errors and (2) the best locations for the tentative steps in knowledge-driven expert models. The knowledge-driven models were revised based on the modifications suggested by the discovered models. The improved modeling using TAD Miner may enhance understanding of complex medical processes.

Keywords: Process mining, Knowledge discovery, Resuscitation, Consensus sequence

1. Introduction

Evaluation and treatment processes in high-acuity medical settings are complex and often performed by multidisciplinary teams. While efficient communication and coordination among team members can prevent errors, these aspects of work may be challenging to achieve in time-critical and dynamic scenarios. Discovering meaningful process models from recorded data can help medical teams understand the actual work practices (i.e., “work-as-done”), to inform new approaches for team training and improving patient care. Process mining techniques have been widely used to automatically discover process models from activity logs obtained from reviewing the performed activities [1]. Given the dynamic and loosely structured nature of medical work, using traditional process mining methods for complex medical processes may produce cluttered, spaghetti-like models that include all possible paths and can be difficult to interpret. Most process mining algorithms have been optimized to identify the essential process and usually filter out uncommon steps as noise [2,3,4]. Applying these algorithms to a complex medical process may filter out uncommon-but-critical activities and pathways. A major challenge in generating data-driven medical process models is ensuring interpretability without losing critical information [5,6].

Trace alignment algorithms may be more suitable for analyzing complex and less structured processes. They generate readable, easy-to-interpret linear sequence of activities that are common across different process executions (“consensus sequence”) from the alignment matrix [7,8,9]. The consensus sequence can represent the backbone (i.e., essential activities) of a process. During a medical process, providers may perform the same activity multiple times for different purposes. Using the linear consensus sequence, we can accurately identify the location of repeated activities in the process, which is a required feature for analyzing medical processes. Despite the advantages of trace alignment algorithms, we still lack an approach for transferring an alignment into a process model. Current knowledge gaps in process discovery from trace alignment include the absence of several features: (1) a standard rule for selecting a frequency threshold to generate the most representative consensus sequence, (2) an approach for automatically identifying the concurrent activities of a process, and (3) an approach for including uncommon-but-critical steps into the process model.

To address these gaps, we introduce TAD Miner, a method for discovering models from complex processes by aligning their activity traces. As part of this method, we developed an algorithm for finding an optimal threshold for the frequency of activities that form the consensus sequence. Because collecting a large number of logs for complex medical processes is challenging, we also designed a statistical approach to determine if the log size is sufficient for meaningful process mining. TAD Miner identifies concurrent and uncommon activities in the log, ensuring the accuracy of the process model. We applied TAD Miner on real-world medical activity logs from 308 pediatric trauma resuscitations. Trauma resuscitation—the early evaluation and management of injured patients in the emergency department—is a time-critical process that requires the simultaneous pursuit of several resuscitation goals. Our TAD Miner discovered interpretable process models for five resuscitation goals, including establishing intravenous (IV) access, administrating non-invasive oxygenation, performing back assessment, administrating blood transfusion, and performing intubation. To evaluate TAD Miner, we compared the TAD models with models generated by two established process discovery methods—Inductive Miner (IM) and Split Miner (SM). Eight quality metrics were used to quantitatively compare the performance of these models. As a result, we could provide possible modifications for two resuscitation goals (blood transfusion and intubation) by identifying the differences between the knowledge-driven and discovered models. In addition, four medical experts qualitatively assessed the accuracy and interpretability of these models. The models discovered by TAD Miner had better interpretability than IM and SM models, while achieving comparable accuracy.

Our study provides an approach for discovering an accurate and interpretable process model of complex medical processes. Our contributions are threefold: (1) an automatic process discovery algorithm based on trace alignment to extract interpretable process models from activity logs; (2) an evaluation of the process models’ interpretability; and (3) knowledge discovery using data-driven process models that assist experts in better understanding complex medical processes.

2. Related work

2.1. Process discovery techniques

Heuristic Miner [2] and Fuzzy Miner [3] have been most common automatic process discovery methods in medical domains [10]. Both algorithms build a directly-follows graph (DFG) of the process model based on dependencies between the activities. Heuristics Miner has been robust when encountering infrequent steps in a dataset, treating these steps as noise and preventing their visualization in the DFG. Fuzzy Miner provides tunable hyperparameters on the activity frequency or the edge frequency, allowing the generation of models at a desired level of abstraction. Users can prune the activity numbers or edge numbers to simplify the model. Despite these advantages, both algorithms can lose information when generating models for unstructured medical processes with considerable variability.

To avoid losing information while also retaining an interpretable structure, alternative process discovery methods that generate simple and precise models have been proposed. For example, the state-of-the-art methods like Inductive Miner [11] and Split Miner [12] have achieved low complexity, high fitness, and high precision. Inductive Miner and Split Miner build models using business process modeling notation (BPMN) that can express the precise semantics of the process [13]. Based on an initially created DFG, Inductive Miner first iteratively filters out the infrequent paths and identifies the cuts (i.e., control-flow dependencies such as parallelism, iteration, and sequential and exclusive dependency) in the DFG to achieve high fitness, and then converts the DFG into a BPMN model. Split Miner first identifies the concurrencies based on an initial DFG and then filters the invalid paths to achieve high accuracy and low complexity. The split and join gateways are then added to the filtered DFG to generate the BPMN model.

However, these existing methods treat repeated activities as equivalent and aggregate them into a single activity in the model. This aggregation introduces cycles in the model that may not exist in the actual workflow. In complex medical processes, an activity may be performed multiple times because of the need for repeated assessments, different clinical goals, or failed initialization. Simply merging the repeated activities may cause information loss. To address this limitation, different data preprocessing approaches have been introduced that can refine the activity log [14,15]. However, these preprocessing approaches often require additional effort to manually label the repeated activities or obtain context information for each activity in the log.

A process model can be more informative if activities are shown in a linear structure. Trace alignment algorithms can derive a consensus sequence without any cycles by aligning the activities of every trace from an activity log [16,17]. Common and uncommon activities in the consensus sequence can help distinguish between “routine” and “non-routine” steps while also avoiding the loss of critical information [9]. The activity logs derived from clinical processes are less structured and may be incomplete because some activities do not need to be performed in all cases. The trace alignment method aligns the common activities, which can fill in the missed or skipped parts and build a complete process model [16]. Because of these advantages, we adopted the trace alignment algorithm as a basis for developing our approach for the process model discovery.

In addition to process mining algorithms, there are temporal data mining methods such as time interval related patterns (TIRPs) mining [18] and Careflow mining [19] that have been frequently used for discovering the common procedures from medical events. The TIRPs mining algorithm discovers the frequent time interval patterns occurring inside each sequence in the dataset based on the temporal information of each event. Careflow mining algorithm derives multiple linear procedures of medical activities by accumulating the number of variant activities at each timestamp. It aims to identify the temporal phenotypes in a cohort from the electronic health records (EHR) data. Unlike these temporal data mining methods, our study focuses on discovering a process model for each clinical goal based on the order of the start point of each activity in the log. Since the time interval information for each activity is not used in our context, these temporal data mining methods are not applicable in our study.

2.2. Evaluating the interpretability of process models

Interpretability is defined as having evident reasoning about how and why a model makes predictions or decisions. People tend to understand and trust the models with fewer but more relevant features [20]. A process model is more interpretable when it: (1) has a simple structure, i.e., lower complexity and (2) allows people to identify the activities that are relevant to the process. The complexity metrics associated with the number of nodes and edges of the process models have been shown to affect the interpretability of the models [21]. Other evaluation metrics for assessing process models include fitness and precision. The fitness score represents the ability of a model to replay the processes in the activity log, while the precision score represents the ability of a model to produce the processes in the activity log [22,23]. In addition to these quantitative metrics, human interpretation and assessment of the models are also critical, especially in the healthcare domains [6].

3. Method

TAD Miner generates a process model from trace alignment by: (1) discovering the backbone process by identifying an optimal frequency threshold that selects the most representative consensus sequence, (2) validating the stability of the consensus sequence based on a random sampling approach, and (3) discovering the side branches of the process model by identifying concurrent and uncommon activities. TAD Miner uses DFG to represent the process model. We introduced new notations in the DFG to better represent the semantics of the process. We next define the terms and definitions used in this paper and then describe each aspect of the TAD Miner approach in detail.

3.1. Terms and definitions

We define the activity log L as a collection of all process traces: L=T1,T2,,Tm. A trace Ti records a series of activities in one process case and orders them based on their starting time (e.g., Ti=a1,a3,a2,a4,a5,, where ai is an activity type).

A DFG process model λ=(A,E) describes the steps and dependencies of activities during a process. A=a1,a2,,an is a set of activities that occurred in the activity log, where n is the number of activity types. E contains the directed edges that describe the dependencies between the activities.

We define aiaj as a sequential dependency that activity aj is performed after activity ai. We define aiajakae as a parallel dependency that activity ai is performed concurrently with a process fragment ajakae.

3.2. Backbone process discovery

A backbone process highlights the main execution steps in the process. We first developed a metric for discovering the backbone from trace alignment results. Because the dataset may be small, we introduced a method to determine if the size of the dataset is sufficient by testing the backbone’s stability under smaller data subsets.

3.2.1. Trace alignment algorithm

Trace alignment algorithms aim to align the elements in an activity log L. These algorithms better explore loosely structured processes by aligning the same types of activities in the same columns [7]. Trace alignment algorithms take an activity log and return a 2-D alignment matrix Mm×l, where the number of rows m is the total number of process traces in L, and the number of columns l is the number of activities in the aligned dataset. Mm×l contains one process trace instance T per row. For a given column r, each row contains a single type of activity or a gap (empty cell). The number of rows in which this activity occurred in the given column represents the frequency of that activity frai(0<r<l) and corresponds to the level of consensus in column r, i.e., the position r in the process. If the activity did not occur in a given row, a gap is inserted to ensure the alignment of other columns. An optimal solution for Mm×l is obtained by iteratively minimizing the number of gaps and maximizing the number of activities in the columns. Consensus activities ai(c)(0<i<n) are columns in Mm×l with activity frequency larger than a given threshold t (Fig. 1). We define a consensus sequence CS as a series of consensus activities (e.g., a1(c),a2(c),). The consensus activities in CS have sequential dependencies. By adding directed edges between them, the consensus activities construct the backbone of a process model (e.g., a1(c)a2(c)).

Fig. 1.

Fig. 1.

Example 2-D alignment matrix M derived from five process traces. At the bottom, the consensus activities are selected by threshold t. The consensus activities constitute the backbone of the process: a1(c)a2(c)a2(c)a1(c)a3(c)a3(c).

In this work, we used an existing optimized trace alignment algorithm, process-oriented iterative multiple alignment (PIMA) [24], to identify the optimal solution of an alignment matrix. PIMA achieves a more precise alignment in the time complexity of ONL2, which is one magnitude O(N) lower than the traditional trace alignment algorithm that has the time complexity of ON2L2 [16].

3.2.2. Threshold selection for consensus sequence

We aimed to extract a representative consensus sequence CS=ai(c),aj(c),,ak(c) from an alignment matrix. A frequency threshold t controls the number of activities included in a CS. A higher threshold filters out infrequent activities, leading to a sequence with fewer activities. A lower threshold leads to more activities in the sequence and a potentially cluttered process model. We used the following approach to identify the optimal threshold.

We defined a consensus loss metric CL as the sum of distances between CS and each process trace Ti in the alignment matrix:

CL=i=1mEDCS,Ti (1)

where ED is edit distance (Levenshtein distance) [25] and m is the number of traces (row number of Mm×l). We applied the concept of grid search to the threshold (Algorithm 1) and found the output sequence with minimal consensus loss.

3.2.

where q represents the increment of threshold t. Given the activity log L, we first generated the alignment matrix using the trace alignment algorithm. For each threshold, we derived the CS and calculated the consensus loss (summation of the edit distance) between CS and every trace T in L. Algorithm 1 returns the threshold that gives the minimal loss. This threshold derives a CS that is most similar to traces in the dataset.

3.2.3. Consensus sequence stability

Manually coding activity logs from medical processes is labor-intensive and usually requires costly domain expertise. Because data collection is challenging, datasets are often small and lead to biased consensus sequences when analyzed. We designed an approach to evaluate the stability of the consensus sequences generated under different data sizes. If a CS generated from a random sample was the same or close to a CS generated from the whole dataset, we considered the sample size sufficient to produce a representative result. We used the edit distance to measure the similarity between consensus sequences [17].

If we randomly select a sample of k traces from m traces, the number of possible combinations N will be mk. When N is large, it is challenging to generate CS for all possible samples. For this reason, we needed an optimal number for repeatedly taking random samples from the whole dataset. We adopted the Cochran formula to calculate the optimal number of repetitions given a desired confidence level, precision, and estimated proportion of the samples present in the population:

s=Z2p(1-p)e2 (2)

where:

  • s is the number of repetitions to take samples

  • e is a given level of precision

  • p is a given proportion of N

  • Z is found in the Z table [26]

For a small population size, we used a modified equation to calculate the optimal number of selection times s:

s=s1+(s-1)N (3)

where N is the number of possible sample combinations for a given k.

In our work, the population size N was the number of possible choices for trace selections from the activity log. We assumed that half of these choices could generate a consensus sequence close to that derived from all data (i.e., p=0.5). We set the 95% confidence level and a z-score of 1.96, respectively. We adopted the precision level e as 0.05 based on sampling techniques from the literature [27,28].

To ensure that the identified consensus sequence was stable with a smaller dataset, we first calculated the edit distances between the CSs generated by different sample sizes k and the CS generated by the whole dataset using the optimal threshold. For each sample size, we performed random sampling s times to avoid selection bias. A sufficient sample size k0 was found when the mean value of edit distances converged.

3.3. Side branch discovery

In addition to the backbone, our interpretable process model uses parallel branches to represent concurrent activities. Complex medical processes involve many activity sequences performed concurrently by team members. To model this parallelism, we leveraged the information from non-consensus activities. Although the non-consensus activities are usually infrequent or may represent process “noise,” some are frequent but dispersed within the traces and represent parallel activities. Some infrequent activities may also need to be included in the process model because they are critical to patient care. We defined three activity types (Fig. 2) and proposed Algorithm 2 to identify these activities:

  • Consensus activity ai(c)(0<i<n): An activity in a consensus sequence (Algorithm 2, Step 3).

  • Common-but-dispersed activity aj(d) (0<j<n): An activity dispersed across the columns of an alignment matrix that has a total occurrence frequency within a given range (Algorithm 2, Step 4 to Step 8).

  • Uncommon-but-critical activity ak(u)(0<k<n): An infrequent activity defined as critical by domain experts (Algorithm 2, Step 9 to Step 10).

where pos in Algorithm 2 recorded the column position of activity ai in Mm×l. The common-but-dispersed activities aj(d) were originally dispersed throughout many columns in the trace alignment. If the sum of an activity aj(d)’s frequency was within a custom-selected range, we defined it as a common-but-dispersed activity. In our experiments, we set this range as [t,1), where t was the threshold found by Algorithm 1. A common-but-dispersed activity aj(d) is spread across a fragment of the backbone process (e.g., a1(c)a2(c)a3(c)), meaning that aj(d) is likely to be performed in parallel with those consensus activities (i.e., aj(d)a1(c)a2(c)a3(c). For many workflows, some infrequent activities that were neither included in CS nor Sd may still be critical. These activities ak(u) were defined as uncommon-but-critical activities and were specified by medical experts before model development.

Fig. 2.

Fig. 2.

The alignment result. From the result, we found the consensus activities and the non-consensus activities (e.g., common-but-dispersed activity a4(d)  and uncommon-but-critical activity a5(u)).

3.

3.

The consensus activities constructed the backbone for the model (e.g., a1(c)a2(c). Side branches containing common-but-dispersed activities and uncommon-but-critical activities were added to the backbone. We proposed an algorithm that takes the alignment matrix M,CS,Sd,Su, and returns a process model λ (Algorithm 3, Fig. 3). In Step 1, we used directed edges to link the consensus activities and built the backbone of the process model. In Step 2 to Step 7, we iterated over the activity in Sd=aj(d),pos(0<j<n,0<pos<l) and Su=ak(u),pos(0<k<n,0<pos<l). For the activities in Sd and Su, we found the nearest preceding and succeeding consensus activities based on their positions, and then added them as parallel side branches in between.

Fig. 3.

Fig. 3.

The procedure of process model construction. The backbone is constructed by consensus activities and side branches are constructed by non-consensus activities.

3.4. Semantics of the process model discovered by TAD Miner

To aid medical experts in better understanding the model derived steps, we used a modified Directly-Follows Graph (DFG) to represent the process model λ discovered by TAD Miner (Fig. 3, right). We distinguished between the backbone and side-branch activity nodes using two fill-colors and two dash-types of outlines. Grey nodes were used to represent the activities in the backbone (i.e., ai(c)), highlighting significant execution steps in the process. We then added the start and end nodes as the endpoints of the backbone. Next, side branches were added to the grey-nodes sequence. Because traditional DFG cannot show concurrency within the process, we used white nodes with a solid-line border to represent concurrent activities (i.e., aj(d)). A concurrent activity that spans a backbone fragment represents an activity that is performed in parallel with that fragment (e.g., in Fig. 3, a4(d)a1(c)a2(c)a3(c). We used white nodes with a dashed-line border to represent infrequent activities (i.e., ak(u)). An infrequent activity spanning across a backbone fragment will likely be performed in parallel to that fragment (e.g., in Fig. 3, a5(u)a2(c)a3(c)). Some repeated activities may appear at different positions in the process model. We added an indexed suffix to distinguish between the repeated activities at different positions (e.g., in Fig. 3, an activity a2 that appeared in two different positions is indexed as a2-1 and a2-2). A number (p) in parentheses shown in an activity node denotes this activity’s frequency. The DFG generated by TAD Miner is a directed acyclic graph (DAG), which has lower complexity and is easier to interpret.

3.5. Evaluation methods

3.5.1. Quantitative evaluation metrics

Based on the existing complexity measurements for process [29] and decision models [30], we adopted five metrics to evaluate our TAD Miner approach:

  • Number of Activities (NOA) [29]: the number of the activity nodes in a process model.

  • Number of Decisions (NOD) [30]: the number of the decision nodes in a process model. A model with a smaller NOD has fewer decisions and is less complex.

  • Cyclicity (CYC) [21]: the fraction of activities that belong to a cycle in a process model. A model with lower cyclicity has fewer and shorter loops, which makes it simpler and easier to follow.

  • Cyclomatic Complexity (CC) [30,31]: McCabe’s cyclomatic complexity (CC) metric is a widely used complexity metrics. It is calculated by taking the number of edges (E) in a graph minus the number of nodes (N) and adding two times the number of connected components (i.e., CC=E-N+2, because our process model has one connected component). A higher cyclomatic complexity indicates that the model is more difficult to understand.

  • Sequentiality (SEQ) [30,32]: the number of parallel pathways in a model. A higher SEQ indicates that the model has more parallel pathways, making it looks more like a parallel network rather than a sequence, resulting in a more complex model.

We also used three accuracy metrics, including fitness, precision, and F-score FScore (2×fitness× precision  fitness + precision ), to evaluate the performance of the process models [22,23].

3.5.2. Evaluation of the process models for trauma resuscitation

To discover and evaluate the process models for pediatric trauma resuscitation, medical experts on our research team first identified the resuscitation goals and associated activities based on the Advanced Trauma Life Support (ATLS) protocol, a standardized evaluation and management procedure for injured patients [33,34]. The ATLS protocol is broadly categorized into the primary and secondary survey. The primary survey is designed to evaluate the patient’s airway, breathing, circulatory, and neurological systems for immediate life-threatening injuries, while the secondary survey assesses for other injuries. Although existing protocols provide algorithms to determine when to perform certain activities, little is known about the actual steps that providers take to perform these activities. Because providers may perform several interventions to evaluate and manage patients, medical experts on our team identified two multi-level major resuscitation goals, including (1) “intubation” and its subgoal “bilateral breath sounds assessment,” which aim to establish a patent airway and provide mechanical ventilatory support, and (2) “transfusion” and its subgoal “vital signs assessment,” which aim to restore blood volume and coagulation factors. Experts also identified three common goals that are routinely pursued during resuscitations, including (1) “intravenous (IV) access,” which aims to allow for fluid, blood product or medication infusion, (2) “non-invasive oxygenation,” which aims to improve tissue oxygenation with minimal intervention, and (3) “back assessment,” which aims to assess the potential spinal column and cord injuries. We designed an activity dictionary with more than 200 different resuscitation activity types. After identifying the resuscitation goals, we a priori identified the activities from our dictionary that are associated with these goals (Table 1).

Table 1.

Activity list for each resuscitation goal. (a) Associated activities for common resuscitation goals. (b) Associated activities for two multi-level major resuscitation goals and their subgoals.

(a)
Common Resuscitation Goals IV Access Non-Invasive Oxygenation Back Assessment
Associated Activities IV placement Bag ventilation Roll patient to side (Log roll)
IV placement confirmation Passive oxygen applied Visual inspection
Drill to skin Oxygen removed Palpation(non-spine)
Oxygen preparation C-spine
Oxygen held T-spine
Oxygen L-spine
Rectal
(b)
Major Resuscitation Goals and Subgoals Intubation Intubation – Bilateral Breath Sounds Assessment Transfusion Transfusion – Vital Signs Assessment
Associated Activities Airway Assessment Endotracheal Tube Secured Vital Signs Assessment Heart Rate
Breathing Assessment Breathing Tube Depth from Lips - cm Bleeding Identified Peripheral Pulse Check
Noninvasive Oxygenation Chest X-Ray Intravenous (IV) Access Central Pulse Check
Intravenous (IV) Access Listen to Breath Sounds Decision to Give Blood Manual Blood Pressure
Intubation Decision Blood Given Automatic Blood Pressure
Pre-oxygenation Crystalloid Given
Intubation Medicine External Hemorrhage Control
Intubation Vital Signs Reassessment
Bilateral Breath Sounds Vital Signs Reassessment after
Assessment Blood
Readjust Tube
Ventilation

Medical researchers on our team then manually coded video recordings to create an activity log. To mitigate the possible coding inaccuracy and inconsistency, our team members only started coding the videos after they passed the inter-rater reliability test [35]. From the coded activity log, we extracted the process traces for each resuscitation goal based on the associated activities. We then applied our TAD Miner to discover the process models for each resuscitation goal. For each of the two major goals, the process models were separately discovered for the main goals and subgoals.

For qualitative evaluation, medical experts on our research team derived two knowledge-driven models for transfusion and intubation goals based on the established protocols [33,36]. The knowledge-driven models showed the theorized and optimal steps for performing these two interventions. By comparing the knowledge-driven models to the discovered models, we were able to suggest several modifications for the knowledge-driven models. We recruited four outside medical experts and asked them to evaluate the accuracy and interpretability of the models discovered through different methods using a survey approach.

To evaluate the ability of our approach to handle more complex processes, we conducted an additional case study on the entire primary survey from the ATLS protocol. Primary survey is longer and contains 61 activity types compared to around 10 activity types within single resuscitations goals.

4. Results from TAD Miner application in the trauma resuscitation domain

This study was approved by the Institutional Review Board at Children’s National Hospital in Washington, DC. Medical researchers in our team manually coded video recordings from 308 resuscitation cases to create an activity log using an activity dictionary with more than 200 different activity types. The resuscitation goals identified by the experts were not pursued in each of the 308 cases in the dataset. Only the cases in which the goals were pursued were included in the experiments.

4.1. Process models discovery

We applied PIMA to determine the base alignment matrix and used TAD Miner to discover the underlying models for three common goals (IV access, non-invasive oxygenation, and back assessment (Table 2)). To discover the CS, we found the optimal t by grid searching between 0 and 1, with a step 0.05. Only activities with a column frequency higher than t were included in the CS.

Table 2.

Dataset statistics for three common goals.

IV Access Non-Invasive Oxygenation Back Assessment
Num. of Traces 187 252 267
Num. of Activities 365 2327 1919
Num. of Act. Types 3 6 7
Length of the Longest Trace 8 77 20
Mean ± Std. val. of Trace Len. 1.95 ± 1.40 9.23 ± 9.58 7.19 ± 2.62
Mode val. of Trace Len. 1 5 6

If t is close to 0, CS will include activities regardless of their frequency in the alignment matrix, yielding a high sum of edit distance (i.e., consensus loss). As we raised the threshold, the consensus loss first dropped as infrequent activities were filtered out, then increased because too few activities were left in the CS (Fig. 4). The consensus loss sometimes remained constant within certain threshold values ranges (e.g., for t between 0.45 and 0.7 for the “IV access” goal). The CS generated within these ranges remained the same.

Fig. 4.

Fig. 4.

Consensus loss of threshold from 0 to 1 with step 0.05. (IV: IV access, NIO: non–invasive oxygenation, BK: back assessment.

To determine if our data size was sufficient to produce an unbiased CS, we randomly sampled traces from the whole dataset and tested the stability of the CS generated by these samples. Using the approach described in Section 3.2.3, we calculated the number of repetitions needed to avoid the selection bias. Because the number of combinations was very large (e.g., a random sampling of five traces from 187 “IV access” traces would result in 18751.81×109 number of combinations), we applied Eq. (2) to calculate the optimal number of repetitions. Given an estimated proportion p=0.5 and a confidence level of 95%, the optimal repetition number for all sample sizes was 385.

For each of the three common goals, we derived the curve of the edit distance between the CS generated from different sample sizes and the CS generated from the entire data. The curve starts flattening when the sample size exceeds 50 (Fig. 5). This result shows that the sequence becomes stable when the sample size reaches 50 traces, suggesting that smaller subsets can also produce a similar CS to the one derived from the entire dataset.

Fig. 5.

Fig. 5.

Curve of the edit distance between the CS derived from different sample sizes and all data. For each sample size we repeated the selection 385 times. The curves show the average distances and the vertical bars show the standard deviations.

The backbone activities performed to achieve the resuscitation goals were represented through the CS with the minimal consensus loss. The threshold filtered out some activities from the activity log. For example, for the “IV access” goal, the “IV placement confirmation” and “drill to skin” activities were filtered out because they were either scattered or rarely occurred. We then added the common-but-dispersed or uncommon-but-critical activities to the backbone as parallel branches (Fig. 6).

Fig. 6.

Fig. 6.

Process models of three common resuscitation goals discovered by TAD Miner.

4.2. Comparison and evaluation of the TAD, IM, and SM models

We used Inductive Miner (IM, as implemented in the ProM software1) and Split Miner (SM, as implemented on the Apromore website2) as the baseline methods to generate process models for three common resuscitations goals, two major resuscitations goals and their subgoals. We tuned the hyperparameters of these methods through grid search to identify the process models with the highest F-score. We then evaluated the process models discovered by TAD, IM and SM using quantitative and qualitative methods. To compute the accuracy metrics (i.e., fitness, precision, and F-score), we used the ProM tools. Originally, IM generated Petri-nets, and SM generated both DFG and BPMN models. To compute fitness and precision using the ProM measuring tools, we converted all the models into Petri-nets. Because the ProM software does not provide a calculation of complexity, we separately calculated these values using five complexity metrics (Section 3.5.1). To calculate the number of decisions (NOD), we considered the branching nodes in the models as decisions. To calculate the cyclomatic complexity (CC), we used the DFG format for SM and TAD models to ensure that only activity nodes were considered.

4.2.1. Quantitative results and comparison

We compared the models for all resuscitation goals from each approach using the complexity and accuracy metrics (Table 3). The entries marked with “-” indicate that the metric could not be calculated because: (1) the conversion of IM models to DFG format is not supported and (2) the SM algorithm extensively uses XOR gates instead of AND gates to achieve high model accuracy. The models discovered by TAD Miner had the smallest number of decisions (NOD), cyclicity (CYC), cyclomatic complexity (CC) and sequentiality (SEQ) on all resuscitation goals. The TAD models had more activities (NOA) because repeated activities were represented as different nodes. The NOD for TAD Models was smaller because we only introduced decision points for common-but-dispersed activities, while SM and IM identify decision nodes based on the split points given by direct activity transitions in the traces. Because TAD Miner identifies fewer branching nodes, users may find it easier to identify the critical decision points in TAD models. Compared to the SM models, the TAD models had smaller CC, resulting in better understandability. Compared to the IM models, the TAD models had smaller SEQ in five of the seven goals (G2, G3, G4, G5, G7), and the same SEQ for the remaining two goals (G1, G6). The IM models contained more parallel pathways, which means the model can only replay part of the traces, resulting in an underfit of the dataset. In four of the seven resuscitation goals (G1, G2, G3, G5), the SM models contained cycles, a feature that may decrease the interpretability related to activity order. For the accuracy metrics, the SM models achieved the best performance in F-score in six of the seven goals (G1, G2, G3, G5, G6, G7), and IM achieved the best F-score in G4. Although we did not observe a significantly better performance of our TAD Miner on the accuracy comparisons, its performance was still comparative to that of the state-of-the-art methods. Considering a high fitness or high precision may respectively cause underfitting or overfitting of the activity log, obtaining comparative accuracy results is expected. Our goal has been to reduce the complexity while maintaining a balance between fitness and precision.

Table 3.

Quantitative comparison of Inductive Miner (IM), Split Miner (SM), and TAD Miner.

Resuscitation Goal Method Complexity Accuracy
NOA NOD CYC CC SEQ Fitness Precision F-score
G1.IV Access IM 3 3 0 2 0.71 0.86 0.78
SM 3 3 0.67 7 0.93 0.99 0.96
TAD 4 1 0 6 2 0.74 0.89 0.81
G2.Non-Invasive Oxygenation IM 6 10 0 5 0.6 0.83 0.69
SM 6 3 0.50 9 0.7 0.91 0.79
TAD 12 2 0 8 4 0.61 0.87 0.72
G3.Back Assessment IM 7 2 0 7 0.83 0.59 0.69
SM 7 5 0.71 17 0.81 0.91 0.86
TAD 8 2 0 8 2 0.82 0.75 0.78
G4.Intubation IM 11 4 0 6 0.78 0.82 0.8
SM 11 4 0 7 0.67 0.97 0.79
TAD 11 2 0 7 4 0.84 0.67 0.75
G5.Bilateral Breath Sounds Assessment IM 4 2 0 3 0.68 0.79 0.73
SM 4 4 0.25 8 0.69 0.96 0.8
TAD 5 2 0 5 2 0.71 0.85 0.77
G6.Transfusion IM 9 7 0 0 0.8 0.92 0.85
SM 9 6 0 18 0.87 0.91 0.88
TAD 9 2 0 7 0 0.6 0.92 0.73
G7.Vital Signs Assessment IM 5 3 0 4 0.92 0.74 0.82
SM 5 3 0 9 0.78 0.97 0.86
TAD 5 2 0 5 0 0.74 0.98 0.84

4.2.2. Qualitative results and comparison

The list of differences between the discovered process models (Fig. 8, Fig. 9) and the knowledge-driven models (Fig. 7) was reviewed by the medical experts on our team. The experts validated the differences as suggested modifications to their original knowledge-driven models. Table 4 summarizes the differences between the knowledge-driven models for the two multi-level major resuscitation goals and the corresponding models discovered by different process mining methods, and the results of expert validations.

Fig. 8.

Fig. 8.

Data-driven models discovered by TAD Miner for (a) “intubation” and (b) “transfusion”.

Fig. 9.

Fig. 9.

Data-driven models of “intubation” goal and “transfusion” goal discovered by Inductive Miner (shown in (a), (b)) and Split Miner (shown in (c), (d)).

Fig. 7.

Fig. 7.

Knowledge-driven models for the (a) “intubation” and (b) “transfusion” goals. Some modifications (M #) are marked in red (details explained in Table 4).

Table 4.

Modifications of the knowledge-driven models suggested by the models discovered by different methods and validated by a medical expert.

Resuscitation Goal Suggested Modifications Methods Valid
IM SM TAD
Intubation Activity Order M1. Pre-Oxygenation is performed before IV Access. N Y Y Y
M2. Endotracheal Tube Secured is performed after Breathing Tube Depth from Lips-cm. N Y Y Y
M3. Airway Assessment is performed before Breathing Assessment. Y Y Y N
Activity Concurrency M4. Intubation Medicines is performed concurrently with Intubation. Y N N N
M5. Ventilation started concurrently with other activities from start. N N Y Y
M6. Pre-Oxygenation started concurrently with other activities from start. Y N Y Y
M7. Noninvasive Oxygenation started concurrently with other activities from start. Y N Y Y
M8. Listen to Breath Sounds is performed concurrently with all other activities in the subgoal. Y N Y Y
Transfusion Activity Order M9. Crystalloid Given is sometimes performed before IV Access. Y Y Y N
M10. Peripheral Pulse Check is performed after Heart Rate. N Y Y N
M11. Vital Signs Assessment is performed before IV Access. Y Y Y Y
M12. Bleeding Identification is performed from the start of the process. Y Y Y Y
Activity Concurrency M13. External Hemorrhage Control is performed concurrently from start point to Blood Given. N N Y Y
M14. Central Pulse Check is performed concurrently with Heart Rate and Blood Pressure. Y N Y Y

The data-driven models generated by the three methods uncovered 14 suggested modifications, with 11 of the modifications uncovered by more than one method. Ten suggested modifications were confirmed as valid by medical experts, and four were judged as invalid (Table 4). Invalid modifications were caused by: (1) errors in manual coding of the dataset (M9); (2) preferred order of activity performance by providers at our research site (M3, M10), and (3) underfitting of the IM model (M4). Compared to other methods, TAD Miner uncovered more valid modifications. These modifications mainly referred to the activity order and concurrency. IM and SM could not identify the accurate sequential and concurrent relationships between activities. For example, IM did not discover M1 and M2, and SM did not discover M5, M13, and M14 (Fig. 7). IM usually generated models with more concurrencies, but sometimes missed the order information (Fig. 9 (a) and (b)). SM models had more XOR gates that might have hindered discovery of concurrent activities (Fig. 9 (c) and (d)). TAD Miner suggested more valid modifications because it separately discovers the backbone (i.e., sequential activity relations) and branches (i.e., concurrent activity relations). TAD models provided a representation of the processes that aided the modification of the knowledge-driven models. Medical experts modified their knowledge-driven models based on the suggestions discovered by the data-driven models.

To further evaluate the accuracy and interpretability of data-driven models discovered by the IM, SM, and TAD methods, we recruited four medical experts from the hospital’s emergency department (two pediatric emergency physicians and two nurses). These participants were not involved in the study design or in the development of the knowledge-based models. We asked each participant to independently evaluate six data-driven process models using an electronic survey. The six models included the IM, SM, and TAD models for two multi-level resuscitation goals (“intubation” and “transfusion”) and their subgoals. To avoid evaluation bias, the process discovery methods were not indicated to the participants and each model was assigned a code name. For each goal, we asked the participants to first review the knowledge-driven model created by medical experts on our research team. The data-driven models were then presented in a random order for each goal. The visual attributes (colors, line properties, and workflow symbols) are integral parts of each modeling method. To ensure each model was viewed in its original format, we did not modify any visual attributes across the models to create a uniform look and feel. Any influence of the visual attributes on participants’ perceptions of the model’s interpretability and accuracy was considered acceptable. The participants expressed their preferences in the survey and evaluated each model’s accuracy based on the original models generated by each approach. For each model, we provided instructions on how to read them and explained the meaning of the notations to ensure the participants understood model semantics:

  • TAD model: 1. The grey ovals represent the main process flow; 2. The side branches of white ovals do not mean the process will skip the grey ovals, but rather that the activity is a parallel activity and can occur at any time during that segment; 3. Dashed outline means that activity is infrequent in our dataset.

  • IM model: 1. Parallel gateway Inline graphic: all paths will be performed concurrently; 2. Exclusive gateway Inline graphic: either one of the paths will be performed.

  • SM model: 1. The shade of the nodes represents the frequency of that activity in our dataset. The darker the node is, the more frequent it occurred; 2. The weight of the edges represents the frequency of the steps. A heavier edge means that the step occurs more frequently; 3. Exclusive gateway Inline graphic: either one of the paths will be performed.

After reviewing each data-driven model, the participants answered two questions that assessed the accuracy (Q1) and interpretability (Q2) of the process models using a Likert-scale (with 1 indicating “strongly disagree” and 5 “strongly agree”):

  • Q1. This data-driven model accurately represents the events of the resuscitation goal.

  • Q2. Using this data-driven model, I can identify uncertain positions of activities and errors in the order of activities in the knowledge-driven model.

A higher agreement level on both questions suggested better performance. We also asked the participants to identify the model that best represented the process steps for achieving the goals. Finally, we asked them to describe any shortcomings of each model using open-ended questions.

Among the six data-driven models, the models generated by TAD Miner received the highest level of agreement based on accuracy and interpretability (Table 5). The experts found that the TAD models better matched knowledge-driven models. When mismatches occurred, the TAD models provided a clear indication of the errors in the expert model.

Table 5.

Survey responses for two Likert-scale rating questions (1: strongly disagree, 2: disagree, 3: neutral, 4: agree, 5: strongly agree).

Resuscitation Goal Participant IM SM TAD
Q1 Q2 Q1 Q2 Q1 Q2
Intubation P1 3 2 4 2 4 4
P2 2 4 3 4 4 4
P3 4 4 4 3 4 4
P4 4 3 2 4 2 3
Transfusion P1 4 3 3 2 4 4
P2 3 4 3 4 4 4
P3 5 4 3 3 4 4
P4 3 3 2 3 4 4
sum 28 27 24 25 30 31

When asked which model best represented the steps of the treatment goal process, three participants preferred the TAD models (P1, P2, P4) and one participant chose the IM model (P3). The participants described the TAD model as “easiest to interpret visually” [P1], “it seemed to make the most sense for flow and content” [P2], and “it seems very intuitive” [P4]. P3 highly rated the IM models because these were “easiest to read and follow.” While the IM models might be more succinct than others, P4 explained that “there’re unexpected paths to skip too many things making it less than ideal.” When asked about the model shortcomings, P1 mentioned that SM models were “hard to read and hard to follow.” P2 and P4 identified several invalid process steps in each model (Table 4) and considered these invalid steps as shortcomings. For the TAD models, P3 stated: “weird that the main points are on the right (branches), while the ‘lesser’ points are on the left (backbones).” This impression can be explained by the rare occurrence of critical activities in our dataset, which made these activities appear in the branches of the models. Because we used the consensus activity sequence as the model backbone, the backbone activities may appear less important than the branch activities.

4.2.3. Case study on the primary survey process

We applied TAD Miner, IM, and SM on the entire primary survey process to test how they generalize to longer processes with larger number of activity types (Fig. 10). For comparison, we also applied the three approaches on the back assessment goal which is simpler and contains less activity types (Fig. 11). The three models of the entire primary survey process were longer, contained more branches and decision nodes, and more difficult to interpret than the back assessment models. To manage this complexity, the entire primary survey process can be segmented into different stages by applying domain knowledge, e.g., identifying the associated activities of certain resuscitation goals and extracting relevant traces. The model discovery process can be independently applied to each stage using our method at a desired granularity. The finer-granularity models can then be recombined into a formal process model. Although all the methods generated complex models, the TAD model was more interpretable than others because it was free of cycles and provided a linear backbone process.

Fig. 10.

Fig. 10.

Process models for the entire primary survey stage discovered by (a) TAD Miner, (b) Inductive Miner and (c) Split Miner. The processes are relatively long and have a large number of activity types. Red box 1 in (a) shows the IV placement activity performed for blood transfusion, and red box 2 shows the IV placement activity performed for medication administration.

Fig. 11.

Fig. 11.

Process models for the “back assessment” goal discovered by (a) TAD Miner, (b) Inductive Miner and (c) Split Miner.

5. Discussion

Our results showed that TAD Miner discovered models with lower complexity and better interpretability. The TAD model contains a backbone and branches but no cycles. Some cycles appeared in the models generated by SM and IM because they represented repeated activities as equivalent using a single node. The repeated activities occur during resuscitations because of clinical needs such as multiple attempts at performing interventions. Showing the entire process in a linear order is less confusing and may improve interpretability. For example, in the TAD model of the “back assessment” goal (Fig. 11(a)), “T-spine” was mainly performed after “visual inspection” and before “L-spine.” An additional side branch of “T-spine” (highlighted with the red box in Fig. 11) showed that “T-spine” also occurred at any time during the process. For longer processes like the entire primary survey phase of the resuscitation, the activities are more likely to occur multiple times for different goals. For example, the TAD model found the “IV placement” activity performed in multiple locations for different intentions, e.g., for blood transfusion or for medication administration (Fig. 10(a)). In contrast, the “IV placement” activity was shown in loops in both IM model (Fig. 10(b)), and SM model (Fig. 10(c)), which was not clinically meaningful. Compared to the existing label refining techniques for handling the repeated activities [14,15], our TAD Miner can automatically identify the repeated activities during process model discovery, eliminating the need for additional labeling and context-deriving efforts.

The TAD model showed the sequential relationships and represented repeated activities in a linear order. Process models showing both the linear order and parallelism of activities may better support understanding of complex medical processes. When comparing the primary survey model (Fig. 10) and back assessment model (Fig. 11) discovered by TAD Miner and those discovered by other algorithms, medical experts found that understanding the transitions and cycles in the SM models was more challenging. Experts also found it challenging to identify the activity order in the IM models because many pathways were shown in parallel.

Our results showed that the TAD models did not achieve better F-scores than the SM models (Table 3). The F-score is the harmonic mean of fitness and precision. The fitness and precision scores used in the process mining literature are defined differently from those used in the traditional machine learning domain. The fitness score measures the ability of a model to replay all traces in the activity log [23], while the precision score measures the ability of a model to generate only the traces in the log [22]. It is easier to quantify fitness because the size of an activity log is bounded. In contrast, quantifying precision is more challenging because the number of paths in a process model is unbounded [37]. If a model has loops, the number of paths will be infinite and result in zero precision based on the machine-learning definition of precision. In our study, we adopted the alignment-based precision metric [22] because it (1) can measure the conformance between the model and the activity log, (2) is well-established, and (3) is implemented in the framework of the ProM tool. This method calculates precision by first building a prefix automaton based on the optimal alignments between the traces in the activity log and the paths in the model. The automaton contains one state per unique prefix of each trace in the activity log. This method then determines which activities are allowed as the next activities by the process model for each state. The activities that are allowed by the process model but never observed in the activity log are considered “escape activities” and are considered as imprecision of the process model. A process model with fewer “escape activities” better conforms to the activity log. Since the alignment-based precision metric is computed based on the traces’ prefix automaton, it does not cover all the paths that can be produced by the model. It is possible that some paths in the model do not occur in the activity log. These paths may make the model less precise and are not measured in the alignment-based precision. These are the limitations of using the F-score metric under the process mining context.

In addition, the F-score is computed from the entire activity log directly. While the process models generated by SM conform better to the observed traces, they are often overfitting to the incomplete activity log (e.g., human errors like omitted treatment activities, the coders missed recording some activities, or some activities do not need to be performed in all cases). TAD Miner, on the other hand, aligns the common activities in the traces and fills-in the missed or skipped parts of the traces to build a complete backbone process. The process models generated by IM and SM that conform to the incomplete traces could be misleading and hard to comprehend, as they may not represent a meaningful process. For example, if we replay the process model in Fig. 11 to obtain a trace, the IM and SM models could produce an extremely short trace (such as “Start -> C-Spine -> End”), which is not helpful for knowledge discovery. TAD cannot generate this type of trivial models. Fig. 11 shows that TAD includes only meaningful traces, with the backbone being: “Start -> Log Roll -> Visual Inspection -> T-Spine -> L-Spine -> Rectal -> End.” These meaningful models help to understand the process.

The modifications for the knowledge-driven model suggested by the discovered models (Table 4) were assessed by the medical experts from our team. Valid suggestions were adopted to modify the knowledge-driven model, and the invalid ones were rejected based on the experts’ practice. For example, the simultaneous administration of “intubation medications” while performing “intubation” identified as M4 in Table 4 is technically feasible but would be undesirable. Without allowing the paralytics and sedatives to reach therapeutic levels, the process of intubation would be difficult for the provider performing the task and uncomfortable for the patient. For this reason, it is not standard of care to do these steps simultaneously. This modification is considered invalid by our experts. Measuring the clinical outcomes based on the validations of some modifications presented in Table 4 would be difficult. For example, for M10 in Table 4, performing a peripheral pulse check simultaneously with or after measuring the heart rate has no clinical significance and depends on training and the clinical scenario in which it is first performed. We believe validations of some other modifications may be shown by clinical outcome improvement, which is a future task of our study.

For the temporal data mining methods that have been used for discovering medical procedures, time interval related patterns (TIRPs) mining algorithm [18] discovers the frequent time interval patterns based on Allen’s temporal relations (i.e., before, meets, overlaps, contains, etc.) between the activities. Considering high-acuity, time-critical clinical scenarios where the start point of an activity is the most critical decision, our trace alignment-based algorithm focuses on discovering process models based on the start time of each activity. Compared to the Careflow mining algorithm [19] that identifies multiple linear procedures from a process dataset, our TAD Miner identifies the backbone and concurrent activities, focusing on discovering one representative process model that better assists medical experts in understanding the overall workflow for a resuscitation goal.

6. Study limitations

6.1. Data quality

The credibility of the process models that rely on data can be affected by the quality of the dataset. Our team consisted of 20–25 medical researchers who manually coded the 308 resuscitation video recordings, using an activity dictionary with more than 200 different activity types. This process might have led to potential discrepancies and human bias in the dataset. To mitigate these issues, we used a pre-defined training process for any individual performing coding. Before coding, the researchers were required to complete five test videos that had already been coded by a designated “expert coder” (an individual who had been trained, passed the inter-rater reliability test, and had prior experience coding medical videos). A detailed activity dictionary was provided to new coders, describing the visual cues for each activity and the start and stop times to assign to each activity.

To ensure consistency and accuracy in the dataset, the new coder’s outputs were compared to the expert’s output using a Cohen’s Kappa test to assess the inter-rater reliability. A Kappa score was generated for both activity recognition (to ensure the new coder is identifying the correct activities) and timing (to ensure the new coder is selecting the correct start and stop times for each activity). Before being permitted to code new resuscitation videos, the new coders were required to achieve a Kappa score greater than 0.75 [38]. If they did not meet this threshold, they were given the opportunity to review the expert’s output and practice coding additional test videos until their scores met the cutoff values.

6.2. Evaluation bias

The validation of the suggested modifications by the data-driven process models (Table 4) may be biased by using clinical experts at the research site and the small sample size of clinical evaluators in this study. For example, the activity orders found by M3, M10, and M14 (Table 4) could be different at a separate institution or by different clinicians at the same institution. The activities could be performed simultaneously or in different order. Our future work will be evaluating the data-driven and knowledge-driven models by a larger and more diverse group of clinical experts, including those from outside the research institution, and to incorporate data from other institutions.

7. Conclusion

In this work, we introduced TAD Miner for discovering complex medical process models. TAD Miner first uses an existing optimized trace alignment algorithm to construct interpretable process models. We introduced an approach to computing the threshold for selecting an optimal consensus sequence from the alignment and introduced a random sampling method to validate the sufficiency of small datasets. The optimal consensus sequence was found by maximizing the similarity between the consensus sequence and all sequences in the alignment matrix. The optimal consensus sequence represented the most typical execution of the process. The consensus sequence was then set as the backbone of the process model. Some non-consensus activities were then identified as common-but-dispersed (i.e., concurrent) and uncommon-but-critical activities and inserted as parallel branches in the model.

Based on the quantitative and qualitative evaluation, the process models discovered by TAD Miner outperformed the models generated by two other established methods: IM and SM. TAD Miner generated linearly structured models, which avoided generating confusing cycles generated by other methods. Despite the complexity of medical cases, the models discovered by our method preserved both sequential order and concurrency for the treatment activities. Our model achieved informativeness without compromising interpretability.

TAD Miner provides insights into modeling complex medical processes using activity logs. When developing an accurate process model to guide the decision making for medical providers, differences between the data-driven model and the knowledge-driven model can be considered as suggestions to modify or correct the knowledge-driven model. The discovered model can help medical experts better understand actual practice and design an accurate framework for decision support. TAD Miner can be further improved in several ways. First, although our method discovers the decision points in a model, it needs to be augmented with the values of the context attributes that lead to different branches of each workflow. Second, the trace alignment mechanism makes it difficult to detect early exit points of a process. In the future, we will incorporate relevant context attributes into the models. Process models for different patient contexts can provide medical experts with more effective support during treatments that are targeted to specific contexts (e.g., different treatment steps for different vital signs). For future evaluations of our approach, we will compare its effectiveness with other modeling strategies in other practice settings and for different clinical processes.

Acknowledgements

This work is supported by the U.S. National Institutes of Health/National Library of Medicine under grant number R01LM011834.

Footnotes

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  • [1].Van Der Aalst W, Process mining, Communications of the ACM 55 (8) (2012) 76–83, 10.1145/2240236.2240257. [DOI] [Google Scholar]
  • [2].Weijters A, van Der Aalst WM and De Medeiros AA, Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, Vol. 166. July 2017 (2006), 1–34. [Google Scholar]
  • [3].Günther CW, Van Der Aalst WM, Fuzzy mining–adaptive process simplification based on multi-perspective metrics, In International conference on business process management. 328–343 (2007), 10.1007/978-3-540-75183-0_24. [DOI] [Google Scholar]
  • [4].Fahland D, Van Der Aalst WM, Simplifying discovered process models in a controlled manner, Information Systems 38 (4) (2013) 585–605, 10.1016/j.is.2012.07.004. [DOI] [Google Scholar]
  • [5].Martin N, De Weerdt J, Fernández-Llatas C, Gal A, Gatta R, Ibáñez G, Johnson O, Mannhardt F, Marco-Ruiz L, Mertens S, Recommendations for enhancing the usability and understandability of process mining in healthcare, Artificial Intelligence in Medicine 109 (2020), 101962, 10.1016/j.artmed.2020.101962. [DOI] [PubMed] [Google Scholar]
  • [6].Munoz-Gama J, Martin N, Fernandez-Llatas C, Johnson OA, Sepúlveda M, Helm E, Galvez-Yanjari V, Rojas E, Martinez-Millana A, Aloini D, Process mining for healthcare: Characteristics and challenges, Journal of Biomedical Informatics 127 (2022), 103994, 10.1016/j.jbi.2022.103994. [DOI] [PubMed] [Google Scholar]
  • [7].Jagadeesh Chandra Bose R and Aalst W. v. d., 2010. Trace alignment in process mining: opportunities for process diagnostics. In International Conference on Business Process Management. 227–242. [Google Scholar]
  • [8].Yang S, Zhou M, Chen S, Dong X, Ahmed O, Burd RS, Marsic I, Medical workflow modeling using alignment-guided state-splitting HMM, in: In 2017 IEEE international conference on healthcare informatics (ICHI), 2017, pp. 144–153, 10.1109/ICHI.2017.66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Bose RJC, van der Aalst WM, Analysis of Patient Treatment Procedures, In Business Process Management Workshops 1 (2011) 165–166, 10.1007/978-3-642-28108-2_17. [DOI] [Google Scholar]
  • [10].Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D, Process mining in healthcare: A literature review, Journal of biomedical informatics 61 (2016) 224–236, 10.1016/j.jbi.2016.04.007. [DOI] [PubMed] [Google Scholar]
  • [11].Leemans SJ, Fahland D, Van Der Aalst WM, Discovering block-structured process models from event logs containing infrequent behaviour, In International conference on business process management. 66–78 (2013), 10.1007/978-3-319-06257-0_6. [DOI] [Google Scholar]
  • [12].Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A, Split miner: automated discovery of accurate and simple business process models from event logs, Knowledge and Information Systems 59 (2) (2019) 251–284, 10.1007/s10115-018-1214-x. [DOI] [Google Scholar]
  • [13].White SA, Introduction to BPMN, Ibm Cooperation 2 (2004). [Google Scholar]
  • [14].Alharbi A, Bulpitt A, Johnson O, Improving pattern detection in healthcare process mining using an interval-based event selection method, In International conference on business process management. 88–105 (2017), 10.1007/978-3-319-65015-9_6. [DOI] [Google Scholar]
  • [15].Lu X, Fahland D, van den Biggelaar FJ, van der Aalst WM, Handling duplicated tasks in process discovery by refining event labels, In International Conference on Business Process Management. 90–107 (2016), 10.1007/978-3-319-45348-4_6. [DOI] [Google Scholar]
  • [16].Bose RJC, van der Aalst WM, Process diagnostics using trace alignment: opportunities, issues, and challenges, Information Systems 37 (2) (2012) 117–141, 10.1016/j.is.2011.08.003. [DOI] [Google Scholar]
  • [17].Bouarfa L, Dankelman J, Workflow mining and outlier detection from clinical activity logs, Journal of biomedical informatics 45 (6) (2012) 1185–1190, 10.1016/j.jbi.2012.08.003. [DOI] [PubMed] [Google Scholar]
  • [18].Moskovitch R, Shahar Y, Medical temporal-knowledge discovery via temporal abstraction, in: In AMIA annual symposium proceedings, 2009, p. 452. [PMC free article] [PubMed] [Google Scholar]
  • [19].Dagliati A, Sacchi L, Zambelli A, Tibollo V, Pavesi L, Holmes JH, Bellazzi R, Temporal electronic phenotyping by mining careflows of breast cancer patients, Journal of biomedical informatics 66 (2017) 136–147, 10.1016/j.jbi.2016.12.012. [DOI] [PubMed] [Google Scholar]
  • [20].Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM, Explainable artificial intelligence: an analytical review, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11 (5) (2021) e1424. [Google Scholar]
  • [21].Mendling J, Metrics for business process models, in: Metrics for Process Models, Springer, 2008, pp. 103–133. [Google Scholar]
  • [22].Adriansyah A, Munoz-Gama J, Carmona J, Dongen B. F. v. and van der Aalst WM, 2012. Alignment based precision checking. In International conference on business process management. 137–149. 10.1007/978-3-642-36285-9_15. [DOI] [Google Scholar]
  • [23].Adriansyah A, van Dongen BF, van der Aalst WM, Conformance checking using cost-based fitness analysis, in: In 2011 IEEE 15th international enterprise distributed object computing conference, 2011, pp. 55–64, 10.1109/EDOC.2011.12. [DOI] [Google Scholar]
  • [24].Chen S, Yang S, Zhou M, Burd R, Marsic I, Process-oriented iterative multiple alignment for medical process mining, in: In 2017 IEEE international conference on data mining workshops (ICDMW), 2017, pp. 438–445, 10.1109/ICDMW.2017.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Levenshtein VI, Binary codes capable of correcting deletions, insertions, and reversals, In Soviet physics doklady. (1966) 707–710. [Google Scholar]
  • [26].Beyer WH, Handbook of tables for probability and statistics, Crc Press, 2019. [Google Scholar]
  • [27].Cochran WG, Sampling techniques, John Wiley & Sons, 2007. [Google Scholar]
  • [28].Israel GD, 1992. Determining sample size. Fact Sheet PEOD-6, Gainesville: University of Florida. [Google Scholar]
  • [29].Cardoso J, Mendling J, Neumann G, Reijers HA, A discourse on complexity of process models, In International Conference on Business Process Management. 117–128 (2006), 10.1007/11837862_13. [DOI] [Google Scholar]
  • [30].Hasíc F, Vanthienen J, Complexity metrics for DMN decision models, Computer Standards & Interfaces 65 (2019) 15–37, 10.1016/j.csi.2019.01.006. [DOI] [Google Scholar]
  • [31].McCabe TJ, A complexity measure, IEEE Transactions on software Engineering Vol. SE-2. 4 (1976) 308–320, 10.1109/TSE.1976.233837. [DOI] [Google Scholar]
  • [32].Polaňcič G, Cegnar B, Complexity metrics for process models–A systematic literature review, Computer Standards & Interfaces 51 (2017) 104–117, 10.1016/j.csi.2016.12.003. [DOI] [Google Scholar]
  • [33].Carter EA, Waterhouse LJ, Kovler ML, Fritzeen J, Burd RS, Adherence to ATLS primary and secondary surveys during pediatric trauma resuscitation, Resuscitation 84 (1) (2013) 66–71, 10.1016/j.resuscitation.2011.10.032. [DOI] [PubMed] [Google Scholar]
  • [34].A. Subcommittee and I. A. W. Group, Advanced trauma life support (ATLS®): the ninth edition, in: The journal of trauma and acute care surgery, 2013, pp. 1363–1366, 10.1097/TA.0b013e31828b82f5. [DOI] [PubMed] [Google Scholar]
  • [35].Hallgren KA, Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology, Vol. 8. 1 (2012), 23. 10.20982/tqmp.08.1.p023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].O’Connell KJ, Yang S, Cheng M, Sandler AB, Cochrane NH, Yang J, Webman RB, Marsic I, Burd R, Process conformance is associated with successful first intubation attempt and lower odds of adverse events in a paediatric emergency setting, Emergency Medicine Journal 36 (9) (2019) 520–528, 10.1136/emermed-2018-208133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Syring AF, Tax N and van der Aalst WM, Evaluating conformance measures in process mining using conformance propositions, in: Transactions on Petri Nets and Other Models of Concurrency XIV, Springer, 2019, pp. 192–221. [Google Scholar]
  • [38].Fleiss JL, Measuring nominal scale agreement among many raters, Psychological bulletin 76 (5) (1971) 378, 10.1007/978-3-662-60651-3_8. [DOI] [Google Scholar]

RESOURCES