Discovering interpretable medical process models: A case study in trauma resuscitation

Keyi Li; Ivan Marsic; Aleksandra Sarcevic; Sen Yang; Travis M Sullivan; Peyton E Tempel; Zachary P Milestone; Karen J O’Connell; Randall S Burd

doi:10.1016/j.jbi.2023.104344

. Author manuscript; available in PMC: 2024 Apr 1.

Published in final edited form as: J Biomed Inform. 2023 Mar 20;140:104344. doi: 10.1016/j.jbi.2023.104344

Discovering interpretable medical process models: A case study in trauma resuscitation

Keyi Li ^a,^*, Ivan Marsic ^a, Aleksandra Sarcevic ^b, Sen Yang ^d, Travis M Sullivan ^c, Peyton E Tempel ^c, Zachary P Milestone ^c, Karen J O’Connell ^c, Randall S Burd ^c

PMCID: PMC10111432 NIHMSID: NIHMS1886015 PMID: 36940896

Abstract

Understanding the actual work (i.e., “work-as-done”) rather than theorized work (i.e., “work-as-imagined”) during complex medical processes is critical for developing approaches that improve patient outcomes. Although process mining has been used to discover process models from medical activity logs, it often omits critical steps or produces cluttered and unreadable models. In this paper, we introduce a Trace Alignment-based Process Discovery method called TAD Miner to build interpretable process models for complex medical processes. TAD Miner creates simple linear process models using a threshold metric that optimizes the consensus sequence to represent the backbone process, and then identifies both concurrent activities and uncommon-but-critical activities to represent the side branches. TAD Miner also identifies the locations of repeated activities, an essential feature for representing medical treatment steps. We conducted a study using activity logs of 308 pediatric trauma resuscitations to develop and evaluate TAD Miner. TAD Miner was used to discover process models for five resuscitation goals, including establishing intravenous (IV) access, administering non-invasive oxygenation, performing back assessment, administering blood transfusion, and performing intubation. We quantitively evaluated the process models with several complexity and accuracy metrics, and performed qualitative evaluation with four medical experts to assess the accuracy and interpretability of the discovered models. Through these evaluations, we compared the performance of our method to that of two state-of-the-art process discovery algorithms: Inductive Miner and Split Miner. The process models discovered by TAD Miner had lower complexity and better interpretability than the state-of-the-art methods, and the fitness and precision of the models were comparable. We used the TAD process models to identify (1) the errors and (2) the best locations for the tentative steps in knowledge-driven expert models. The knowledge-driven models were revised based on the modifications suggested by the discovered models. The improved modeling using TAD Miner may enhance understanding of complex medical processes.

Keywords: Process mining, Knowledge discovery, Resuscitation, Consensus sequence

1. Introduction

Evaluation and treatment processes in high-acuity medical settings are complex and often performed by multidisciplinary teams. While efficient communication and coordination among team members can prevent errors, these aspects of work may be challenging to achieve in time-critical and dynamic scenarios. Discovering meaningful process models from recorded data can help medical teams understand the actual work practices (i.e., “work-as-done”), to inform new approaches for team training and improving patient care. Process mining techniques have been widely used to automatically discover process models from activity logs obtained from reviewing the performed activities [1]. Given the dynamic and loosely structured nature of medical work, using traditional process mining methods for complex medical processes may produce cluttered, spaghetti-like models that include all possible paths and can be difficult to interpret. Most process mining algorithms have been optimized to identify the essential process and usually filter out uncommon steps as noise [2,3,4]. Applying these algorithms to a complex medical process may filter out uncommon-but-critical activities and pathways. A major challenge in generating data-driven medical process models is ensuring interpretability without losing critical information [5,6].

Trace alignment algorithms may be more suitable for analyzing complex and less structured processes. They generate readable, easy-to-interpret linear sequence of activities that are common across different process executions (“consensus sequence”) from the alignment matrix [7,8,9]. The consensus sequence can represent the backbone (i.e., essential activities) of a process. During a medical process, providers may perform the same activity multiple times for different purposes. Using the linear consensus sequence, we can accurately identify the location of repeated activities in the process, which is a required feature for analyzing medical processes. Despite the advantages of trace alignment algorithms, we still lack an approach for transferring an alignment into a process model. Current knowledge gaps in process discovery from trace alignment include the absence of several features: (1) a standard rule for selecting a frequency threshold to generate the most representative consensus sequence, (2) an approach for automatically identifying the concurrent activities of a process, and (3) an approach for including uncommon-but-critical steps into the process model.

To address these gaps, we introduce TAD Miner, a method for discovering models from complex processes by aligning their activity traces. As part of this method, we developed an algorithm for finding an optimal threshold for the frequency of activities that form the consensus sequence. Because collecting a large number of logs for complex medical processes is challenging, we also designed a statistical approach to determine if the log size is sufficient for meaningful process mining. TAD Miner identifies concurrent and uncommon activities in the log, ensuring the accuracy of the process model. We applied TAD Miner on real-world medical activity logs from 308 pediatric trauma resuscitations. Trauma resuscitation—the early evaluation and management of injured patients in the emergency department—is a time-critical process that requires the simultaneous pursuit of several resuscitation goals. Our TAD Miner discovered interpretable process models for five resuscitation goals, including establishing intravenous (IV) access, administrating non-invasive oxygenation, performing back assessment, administrating blood transfusion, and performing intubation. To evaluate TAD Miner, we compared the TAD models with models generated by two established process discovery methods—Inductive Miner (IM) and Split Miner (SM). Eight quality metrics were used to quantitatively compare the performance of these models. As a result, we could provide possible modifications for two resuscitation goals (blood transfusion and intubation) by identifying the differences between the knowledge-driven and discovered models. In addition, four medical experts qualitatively assessed the accuracy and interpretability of these models. The models discovered by TAD Miner had better interpretability than IM and SM models, while achieving comparable accuracy.

Our study provides an approach for discovering an accurate and interpretable process model of complex medical processes. Our contributions are threefold: (1) an automatic process discovery algorithm based on trace alignment to extract interpretable process models from activity logs; (2) an evaluation of the process models’ interpretability; and (3) knowledge discovery using data-driven process models that assist experts in better understanding complex medical processes.

2. Related work

2.1. Process discovery techniques

Heuristic Miner [2] and Fuzzy Miner [3] have been most common automatic process discovery methods in medical domains [10]. Both algorithms build a directly-follows graph (DFG) of the process model based on dependencies between the activities. Heuristics Miner has been robust when encountering infrequent steps in a dataset, treating these steps as noise and preventing their visualization in the DFG. Fuzzy Miner provides tunable hyperparameters on the activity frequency or the edge frequency, allowing the generation of models at a desired level of abstraction. Users can prune the activity numbers or edge numbers to simplify the model. Despite these advantages, both algorithms can lose information when generating models for unstructured medical processes with considerable variability.

To avoid losing information while also retaining an interpretable structure, alternative process discovery methods that generate simple and precise models have been proposed. For example, the state-of-the-art methods like Inductive Miner [11] and Split Miner [12] have achieved low complexity, high fitness, and high precision. Inductive Miner and Split Miner build models using business process modeling notation (BPMN) that can express the precise semantics of the process [13]. Based on an initially created DFG, Inductive Miner first iteratively filters out the infrequent paths and identifies the cuts (i.e., control-flow dependencies such as parallelism, iteration, and sequential and exclusive dependency) in the DFG to achieve high fitness, and then converts the DFG into a BPMN model. Split Miner first identifies the concurrencies based on an initial DFG and then filters the invalid paths to achieve high accuracy and low complexity. The split and join gateways are then added to the filtered DFG to generate the BPMN model.

However, these existing methods treat repeated activities as equivalent and aggregate them into a single activity in the model. This aggregation introduces cycles in the model that may not exist in the actual workflow. In complex medical processes, an activity may be performed multiple times because of the need for repeated assessments, different clinical goals, or failed initialization. Simply merging the repeated activities may cause information loss. To address this limitation, different data preprocessing approaches have been introduced that can refine the activity log [14,15]. However, these preprocessing approaches often require additional effort to manually label the repeated activities or obtain context information for each activity in the log.

A process model can be more informative if activities are shown in a linear structure. Trace alignment algorithms can derive a consensus sequence without any cycles by aligning the activities of every trace from an activity log [16,17]. Common and uncommon activities in the consensus sequence can help distinguish between “routine” and “non-routine” steps while also avoiding the loss of critical information [9]. The activity logs derived from clinical processes are less structured and may be incomplete because some activities do not need to be performed in all cases. The trace alignment method aligns the common activities, which can fill in the missed or skipped parts and build a complete process model [16]. Because of these advantages, we adopted the trace alignment algorithm as a basis for developing our approach for the process model discovery.

In addition to process mining algorithms, there are temporal data mining methods such as time interval related patterns (TIRPs) mining [18] and Careflow mining [19] that have been frequently used for discovering the common procedures from medical events. The TIRPs mining algorithm discovers the frequent time interval patterns occurring inside each sequence in the dataset based on the temporal information of each event. Careflow mining algorithm derives multiple linear procedures of medical activities by accumulating the number of variant activities at each timestamp. It aims to identify the temporal phenotypes in a cohort from the electronic health records (EHR) data. Unlike these temporal data mining methods, our study focuses on discovering a process model for each clinical goal based on the order of the start point of each activity in the log. Since the time interval information for each activity is not used in our context, these temporal data mining methods are not applicable in our study.

2.2. Evaluating the interpretability of process models

Interpretability is defined as having evident reasoning about how and why a model makes predictions or decisions. People tend to understand and trust the models with fewer but more relevant features [20]. A process model is more interpretable when it: (1) has a simple structure, i.e., lower complexity and (2) allows people to identify the activities that are relevant to the process. The complexity metrics associated with the number of nodes and edges of the process models have been shown to affect the interpretability of the models [21]. Other evaluation metrics for assessing process models include fitness and precision. The fitness score represents the ability of a model to replay the processes in the activity log, while the precision score represents the ability of a model to produce the processes in the activity log [22,23]. In addition to these quantitative metrics, human interpretation and assessment of the models are also critical, especially in the healthcare domains [6].

3. Method

TAD Miner generates a process model from trace alignment by: (1) discovering the backbone process by identifying an optimal frequency threshold that selects the most representative consensus sequence, (2) validating the stability of the consensus sequence based on a random sampling approach, and (3) discovering the side branches of the process model by identifying concurrent and uncommon activities. TAD Miner uses DFG to represent the process model. We introduced new notations in the DFG to better represent the semantics of the process. We next define the terms and definitions used in this paper and then describe each aspect of the TAD Miner approach in detail.

3.1. Terms and definitions

We define the activity log $L$ as a collection of all process traces: $L = \{T_{1}, T_{2}, \dots, T_{m}\}$ . A trace $T_{i}$ records a series of activities in one process case and orders them based on their starting time (e.g., $T_{i} = ⟨a_{1}, a_{3}, a_{2}, a_{4}, a_{5}, \dots⟩$ , where $a_{i}$ is an activity type).

A DFG process model $λ = (A, E)$ describes the steps and dependencies of activities during a process. $A = \{a_{1}, a_{2}, \dots, a_{n}\}$ is a set of activities that occurred in the activity log, where $n$ is the number of activity types. $E$ contains the directed edges that describe the dependencies between the activities.

We define $|a_{i} \to a_{j}|$ as a sequential dependency that activity $a_{j}$ is performed after activity $a_{i}$ . We define $a_{i} \land |a_{j} \to a_{k} \to \dots \to a_{e}|$ as a parallel dependency that activity $a_{i}$ is performed concurrently with a process fragment $|a_{j} \to a_{k} \to \dots \to a_{e}|$ .

3.2. Backbone process discovery

A backbone process highlights the main execution steps in the process. We first developed a metric for discovering the backbone from trace alignment results. Because the dataset may be small, we introduced a method to determine if the size of the dataset is sufficient by testing the backbone’s stability under smaller data subsets.

3.2.1. Trace alignment algorithm

Trace alignment algorithms aim to align the elements in an activity log $L$ . These algorithms better explore loosely structured processes by aligning the same types of activities in the same columns [7]. Trace alignment algorithms take an activity log and return a 2-D alignment matrix $M_{m \times l}$ , where the number of rows $m$ is the total number of process traces in $L$ , and the number of columns $l$ is the number of activities in the aligned dataset. $M_{m \times l}$ contains one process trace instance $T$ per row. For a given column $r$ , each row contains a single type of activity or a gap (empty cell). The number of rows in which this activity occurred in the given column represents the frequency of that activity $f_{r}^{(a_{i})} (0 < r < l)$ and corresponds to the level of consensus in column $r$ , i.e., the position $r$ in the process. If the activity did not occur in a given row, a gap is inserted to ensure the alignment of other columns. An optimal solution for $M_{m \times l}$ is obtained by iteratively minimizing the number of gaps and maximizing the number of activities in the columns. Consensus activities $a_{i}^{(c)} (0 < i < n)$ are columns in $M_{m \times l}$ with activity frequency larger than a given threshold $t$ (Fig. 1). We define a consensus sequence $C S$ as a series of consensus activities (e.g., $⟨a_{1}^{(c)}, a_{2}^{(c)}, \dots⟩$ ). The consensus activities in $C S$ have sequential dependencies. By adding directed edges $\to$ between them, the consensus activities construct the backbone of a process model (e.g., $|a_{1}^{(c)} \to a_{2}^{(c)} \to \dots|$ ).

Fig. 1. — Example 2-D alignment matrix $M$ derived from five process traces. At the bottom, the consensus activities are selected by threshold $t$ . The consensus activities constitute the backbone of the process: $|a_{1}^{(c)} \to a_{2}^{(c)} \to a_{2}^{(c)} \to a_{1}^{(c)} \to a_{3}^{(c)} \to a_{3}^{(c)}|$ .

In this work, we used an existing optimized trace alignment algorithm, process-oriented iterative multiple alignment (PIMA) [24], to identify the optimal solution of an alignment matrix. PIMA achieves a more precise alignment in the time complexity of $O (N L^{2})$ , which is one magnitude $O (N)$ lower than the traditional trace alignment algorithm that has the time complexity of $O (N^{2} L^{2})$ [16].

3.2.2. Threshold selection for consensus sequence

We aimed to extract a representative consensus sequence $C S = ⟨a_{i}^{(c)}, a_{j}^{(c)}, \dots, a_{k}^{(c)}⟩$ from an alignment matrix. A frequency threshold $t$ controls the number of activities included in a $C S$ . A higher threshold filters out infrequent activities, leading to a sequence with fewer activities. A lower threshold leads to more activities in the sequence and a potentially cluttered process model. We used the following approach to identify the optimal threshold.

We defined a consensus loss metric $C L$ as the sum of distances between $C S$ and each process trace $T_{i}$ in the alignment matrix:

C L = \sum_{i = 1}^{m} E D (C S, T_{i})

(1)

where $E D$ is edit distance (Levenshtein distance) [25] and $m$ is the number of traces (row number of $M_{m \times l}$ ). We applied the concept of grid search to the threshold (Algorithm 1) and found the output sequence with minimal consensus loss.

3.2.

where $q$ represents the increment of threshold $t$ . Given the activity log $L$ , we first generated the alignment matrix using the trace alignment algorithm. For each threshold, we derived the $C S$ and calculated the consensus loss (summation of the edit distance) between $C S$ and every trace $T$ in $L$ . Algorithm 1 returns the threshold that gives the minimal loss. This threshold derives a $C S$ that is most similar to traces in the dataset.

3.2.3. Consensus sequence stability

Manually coding activity logs from medical processes is labor-intensive and usually requires costly domain expertise. Because data collection is challenging, datasets are often small and lead to biased consensus sequences when analyzed. We designed an approach to evaluate the stability of the consensus sequences generated under different data sizes. If a $C S$ generated from a random sample was the same or close to a $C S$ generated from the whole dataset, we considered the sample size sufficient to produce a representative result. We used the edit distance to measure the similarity between consensus sequences [17].

If we randomly select a sample of $k$ traces from $m$ traces, the number of possible combinations $N$ will be $(\frac{m}{k})$ . When $N$ is large, it is challenging to generate $C S$ for all possible samples. For this reason, we needed an optimal number for repeatedly taking random samples from the whole dataset. We adopted the Cochran formula to calculate the optimal number of repetitions given a desired confidence level, precision, and estimated proportion of the samples present in the population:

s = \frac{Z^{2} p (1 - p)}{e^{2}}

(2)

where:

$s$ is the number of repetitions to take samples
$e$ is a given level of precision
$p$ is a given proportion of $N$
$Z$ is found in the $Z$ table [26]

For a small population size, we used a modified equation to calculate the optimal number of selection times $s^{'}$ :

s^{'} = \frac{s}{1 + \frac{(s - 1)}{N}}

(3)

where $N$ is the number of possible sample combinations for a given $k$ .

In our work, the population size $N$ was the number of possible choices for trace selections from the activity log. We assumed that half of these choices could generate a consensus sequence close to that derived from all data (i.e., $p = 0.5$ ). We set the 95% confidence level and a $z$ -score of 1.96, respectively. We adopted the precision level $e$ as 0.05 based on sampling techniques from the literature [27,28].

To ensure that the identified consensus sequence was stable with a smaller dataset, we first calculated the edit distances between the $C S s$ generated by different sample sizes $k$ and the $C S$ generated by the whole dataset using the optimal threshold. For each sample size, we performed random sampling $s$ times to avoid selection bias. A sufficient sample size $k_{0}$ was found when the mean value of edit distances converged.

3.3. Side branch discovery

In addition to the backbone, our interpretable process model uses parallel branches to represent concurrent activities. Complex medical processes involve many activity sequences performed concurrently by team members. To model this parallelism, we leveraged the information from non-consensus activities. Although the non-consensus activities are usually infrequent or may represent process “noise,” some are frequent but dispersed within the traces and represent parallel activities. Some infrequent activities may also need to be included in the process model because they are critical to patient care. We defined three activity types (Fig. 2) and proposed Algorithm 2 to identify these activities:

Consensus activity $a_{i}^{(c)} (0 < i < n)$ : An activity in a consensus sequence (Algorithm 2, Step 3).
Common-but-dispersed activity $a_{j}^{(d)} (0 < j < n)$ : An activity dispersed across the columns of an alignment matrix that has a total occurrence frequency within a given range (Algorithm 2, Step 4 to Step 8).
Uncommon-but-critical activity $a_{k}^{(u)} (0 < k < n)$ : An infrequent activity defined as critical by domain experts (Algorithm 2, Step 9 to Step 10).

where $p o s$ in Algorithm 2 recorded the column position of activity $a_{i}$ in $M_{m \times l}$ . The common-but-dispersed activities $a_{j}^{(d)}$ were originally dispersed throughout many columns in the trace alignment. If the sum of an activity $a_{j}^{(d)}$ ’s frequency was within a custom-selected range, we defined it as a common-but-dispersed activity. In our experiments, we set this range as $[t, 1)$ , where $t$ was the threshold found by Algorithm 1. A common-but-dispersed activity $a_{j}^{(d)}$ is spread across a fragment of the backbone process (e.g., $|a_{1}^{(c)} \to a_{2}^{(c)} \to a_{3}^{(c)}|$ ), meaning that $a_{j}^{(d)}$ is likely to be performed in parallel with those consensus activities (i.e., $a_{j}^{(d)} \land |a_{1}^{(c)} \to a_{2}^{(c)} \to a_{3}^{(c)}|$ . For many workflows, some infrequent activities that were neither included in $C S$ nor $S^{d}$ may still be critical. These activities $a_{k}^{(u)}$ were defined as uncommon-but-critical activities and were specified by medical experts before model development.

Fig. 2. — The alignment result. From the result, we found the consensus activities and the non-consensus activities (e.g., common-but-dispersed activity $a_{4}^{(d)}$ and uncommon-but-critical activity $a_{5}^{(u)}$ ).

The consensus activities constructed the backbone for the model (e.g., $|a_{1}^{(c)} \to a_{2}^{(c)} \to \dots|$ . Side branches containing common-but-dispersed activities and uncommon-but-critical activities were added to the backbone. We proposed an algorithm that takes the alignment matrix $M, C S, S^{d}, S^{u}$ , and returns a process model $λ$ (Algorithm 3, Fig. 3). In Step 1, we used directed edges $\to$ to link the consensus activities and built the backbone of the process model. In Step 2 to Step 7, we iterated over the activity in $S^{d} = ⟨a_{j}^{(d), p o s}⟩ (0 < j < n, 0 < p o s < l)$ and $S^{u} = ⟨a_{k}^{(u), p o s}⟩ (0 < k < n, 0 < p o s < l)$ . For the activities in $S^{d}$ and $S^{u}$ , we found the nearest preceding and succeeding consensus activities based on their positions, and then added them as parallel side branches in between.

Fig. 3. — The procedure of process model construction. The backbone is constructed by consensus activities and side branches are constructed by non-consensus activities.

3.4. Semantics of the process model discovered by TAD Miner

To aid medical experts in better understanding the model derived steps, we used a modified Directly-Follows Graph (DFG) to represent the process model $λ$ discovered by TAD Miner (Fig. 3, right). We distinguished between the backbone and side-branch activity nodes using two fill-colors and two dash-types of outlines. Grey nodes were used to represent the activities in the backbone (i.e., $a_{i}^{(c)}$ ), highlighting significant execution steps in the process. We then added the start and end nodes as the endpoints of the backbone. Next, side branches were added to the grey-nodes sequence. Because traditional DFG cannot show concurrency within the process, we used white nodes with a solid-line border to represent concurrent activities (i.e., $a_{j}^{(d)}$ ). A concurrent activity that spans a backbone fragment represents an activity that is performed in parallel with that fragment (e.g., in Fig. 3, $a_{4}^{(d)} \land |a_{1}^{(c)} \to a_{2}^{(c)} \to a_{3}^{(c)}|$ . We used white nodes with a dashed-line border to represent infrequent activities (i.e., $a_{k}^{(u)}$ ). An infrequent activity spanning across a backbone fragment will likely be performed in parallel to that fragment (e.g., in Fig. 3, $a_{5}^{(u)} \land |a_{2}^{(c)} \to a_{3}^{(c)}|$ ). Some repeated activities may appear at different positions in the process model. We added an indexed suffix to distinguish between the repeated activities at different positions (e.g., in Fig. 3, an activity $a_{2}$ that appeared in two different positions is indexed as $a_{2 -} 1$ and $a_{2 -} 2$ ). A number $(p)$ in parentheses shown in an activity node denotes this activity’s frequency. The DFG generated by TAD Miner is a directed acyclic graph (DAG), which has lower complexity and is easier to interpret.

3.5. Evaluation methods

3.5.1. Quantitative evaluation metrics

Based on the existing complexity measurements for process [29] and decision models [30], we adopted five metrics to evaluate our TAD Miner approach:

Number of Activities (NOA) [29]: the number of the activity nodes in a process model.
Number of Decisions (NOD) [30]: the number of the decision nodes in a process model. A model with a smaller NOD has fewer decisions and is less complex.
Cyclicity (CYC) [21]: the fraction of activities that belong to a cycle in a process model. A model with lower cyclicity has fewer and shorter loops, which makes it simpler and easier to follow.
Cyclomatic Complexity (CC) [30,31]: McCabe’s cyclomatic complexity (CC) metric is a widely used complexity metrics. It is calculated by taking the number of edges (E) in a graph minus the number of nodes (N) and adding two times the number of connected components (i.e., $CC = E - N + 2$ , because our process model has one connected component). A higher cyclomatic complexity indicates that the model is more difficult to understand.
Sequentiality (SEQ) [30,32]: the number of parallel pathways in a model. A higher SEQ indicates that the model has more parallel pathways, making it looks more like a parallel network rather than a sequence, resulting in a more complex model.

We also used three accuracy metrics, including fitness, precision, and F-score $F - Score ((\frac{2 \times fitness \times precision}{fitness + precision}))$ , to evaluate the performance of the process models [22,23].

3.5.2. Evaluation of the process models for trauma resuscitation

To discover and evaluate the process models for pediatric trauma resuscitation, medical experts on our research team first identified the resuscitation goals and associated activities based on the Advanced Trauma Life Support (ATLS) protocol, a standardized evaluation and management procedure for injured patients [33,34]. The ATLS protocol is broadly categorized into the primary and secondary survey. The primary survey is designed to evaluate the patient’s airway, breathing, circulatory, and neurological systems for immediate life-threatening injuries, while the secondary survey assesses for other injuries. Although existing protocols provide algorithms to determine when to perform certain activities, little is known about the actual steps that providers take to perform these activities. Because providers may perform several interventions to evaluate and manage patients, medical experts on our team identified two multi-level major resuscitation goals, including (1) “intubation” and its subgoal “bilateral breath sounds assessment,” which aim to establish a patent airway and provide mechanical ventilatory support, and (2) “transfusion” and its subgoal “vital signs assessment,” which aim to restore blood volume and coagulation factors. Experts also identified three common goals that are routinely pursued during resuscitations, including (1) “intravenous (IV) access,” which aims to allow for fluid, blood product or medication infusion, (2) “non-invasive oxygenation,” which aims to improve tissue oxygenation with minimal intervention, and (3) “back assessment,” which aims to assess the potential spinal column and cord injuries. We designed an activity dictionary with more than 200 different resuscitation activity types. After identifying the resuscitation goals, we a priori identified the activities from our dictionary that are associated with these goals (Table 1).

Table 1.

Activity list for each resuscitation goal. (a) Associated activities for common resuscitation goals. (b) Associated activities for two multi-level major resuscitation goals and their subgoals.

(a)
Common Resuscitation Goals	IV Access	Non-Invasive Oxygenation	Back Assessment
Associated Activities	IV placement	Bag ventilation	Roll patient to side (Log roll)
	IV placement confirmation	Passive oxygen applied	Visual inspection
	Drill to skin	Oxygen removed	Palpation(non-spine)
		Oxygen preparation	C-spine
		Oxygen held	T-spine
		Oxygen	L-spine
			Rectal

(b)
Major Resuscitation Goals and Subgoals	Intubation	Intubation – Bilateral Breath Sounds Assessment	Transfusion	Transfusion – Vital Signs Assessment
Associated Activities	Airway Assessment	Endotracheal Tube Secured	Vital Signs Assessment	Heart Rate
	Breathing Assessment	Breathing Tube Depth from Lips - cm	Bleeding Identified	Peripheral Pulse Check
	Noninvasive Oxygenation	Chest X-Ray	Intravenous (IV) Access	Central Pulse Check
	Intravenous (IV) Access	Listen to Breath Sounds	Decision to Give Blood	Manual Blood Pressure
	Intubation Decision		Blood Given	Automatic Blood Pressure
	Pre-oxygenation		Crystalloid Given
	Intubation Medicine		External Hemorrhage Control
	Intubation		Vital Signs Reassessment
	Bilateral Breath Sounds		Vital Signs Reassessment after
	Assessment		Blood
	Readjust Tube
	Ventilation

Open in a new tab

Medical researchers on our team then manually coded video recordings to create an activity log. To mitigate the possible coding inaccuracy and inconsistency, our team members only started coding the videos after they passed the inter-rater reliability test [35]. From the coded activity log, we extracted the process traces for each resuscitation goal based on the associated activities. We then applied our TAD Miner to discover the process models for each resuscitation goal. For each of the two major goals, the process models were separately discovered for the main goals and subgoals.

For qualitative evaluation, medical experts on our research team derived two knowledge-driven models for transfusion and intubation goals based on the established protocols [33,36]. The knowledge-driven models showed the theorized and optimal steps for performing these two interventions. By comparing the knowledge-driven models to the discovered models, we were able to suggest several modifications for the knowledge-driven models. We recruited four outside medical experts and asked them to evaluate the accuracy and interpretability of the models discovered through different methods using a survey approach.

To evaluate the ability of our approach to handle more complex processes, we conducted an additional case study on the entire primary survey from the ATLS protocol. Primary survey is longer and contains 61 activity types compared to around 10 activity types within single resuscitations goals.

4. Results from TAD Miner application in the trauma resuscitation domain

This study was approved by the Institutional Review Board at Children’s National Hospital in Washington, DC. Medical researchers in our team manually coded video recordings from 308 resuscitation cases to create an activity log using an activity dictionary with more than 200 different activity types. The resuscitation goals identified by the experts were not pursued in each of the 308 cases in the dataset. Only the cases in which the goals were pursued were included in the experiments.

4.1. Process models discovery

We applied PIMA to determine the base alignment matrix and used TAD Miner to discover the underlying models for three common goals (IV access, non-invasive oxygenation, and back assessment (Table 2)). To discover the $C S$ , we found the optimal $t$ by grid searching between 0 and 1, with a step 0.05. Only activities with a column frequency higher than $t$ were included in the $C S$ .

Table 2.

Dataset statistics for three common goals.

	IV Access	Non-Invasive Oxygenation	Back Assessment
Num. of Traces	187	252	267
Num. of Activities	365	2327	1919
Num. of Act. Types	3	6	7
Length of the Longest Trace	8	77	20
Mean ± Std. val. of Trace Len.	1.95 ± 1.40	9.23 ± 9.58	7.19 ± 2.62
Mode val. of Trace Len.	1	5	6

Open in a new tab

If $t$ is close to 0, $C S$ will include activities regardless of their frequency in the alignment matrix, yielding a high sum of edit distance (i.e., consensus loss). As we raised the threshold, the consensus loss first dropped as infrequent activities were filtered out, then increased because too few activities were left in the $C S$ (Fig. 4). The consensus loss sometimes remained constant within certain threshold values ranges (e.g., for $t$ between 0.45 and 0.7 for the “IV access” goal). The $C S$ generated within these ranges remained the same.

Fig. 4. — Consensus loss of threshold from 0 to 1 with step 0.05. (IV: IV access, NIO: non–invasive oxygenation, BK: back assessment.

To determine if our data size was sufficient to produce an unbiased $C S$ , we randomly sampled traces from the whole dataset and tested the stability of the $C S$ generated by these samples. Using the approach described in Section 3.2.3, we calculated the number of repetitions needed to avoid the selection bias. Because the number of combinations was very large (e.g., a random sampling of five traces from 187 “IV access” traces would result in $(\frac{187}{5}) \approx 1.81 \times 10^{9}$ number of combinations), we applied Eq. (2) to calculate the optimal number of repetitions. Given an estimated proportion $p = 0.5$ and a confidence level of 95%, the optimal repetition number for all sample sizes was 385.

For each of the three common goals, we derived the curve of the edit distance between the $C S$ generated from different sample sizes and the $C S$ generated from the entire data. The curve starts flattening when the sample size exceeds 50 (Fig. 5). This result shows that the sequence becomes stable when the sample size reaches 50 traces, suggesting that smaller subsets can also produce a similar $C S$ to the one derived from the entire dataset.

Fig. 5. — Curve of the edit distance between the $C S$ derived from different sample sizes and all data. For each sample size we repeated the selection 385 times. The curves show the average distances and the vertical bars show the standard deviations.

The backbone activities performed to achieve the resuscitation goals were represented through the $C S$ with the minimal consensus loss. The threshold filtered out some activities from the activity log. For example, for the “IV access” goal, the “IV placement confirmation” and “drill to skin” activities were filtered out because they were either scattered or rarely occurred. We then added the common-but-dispersed or uncommon-but-critical activities to the backbone as parallel branches (Fig. 6).

Fig. 6. — Process models of three common resuscitation goals discovered by TAD Miner.

4.2. Comparison and evaluation of the TAD, IM, and SM models

We used Inductive Miner (IM, as implemented in the ProM software¹) and Split Miner (SM, as implemented on the Apromore website²) as the baseline methods to generate process models for three common resuscitations goals, two major resuscitations goals and their subgoals. We tuned the hyperparameters of these methods through grid search to identify the process models with the highest F-score. We then evaluated the process models discovered by TAD, IM and SM using quantitative and qualitative methods. To compute the accuracy metrics (i.e., fitness, precision, and F-score), we used the ProM tools. Originally, IM generated Petri-nets, and SM generated both DFG and BPMN models. To compute fitness and precision using the ProM measuring tools, we converted all the models into Petri-nets. Because the ProM software does not provide a calculation of complexity, we separately calculated these values using five complexity metrics (Section 3.5.1). To calculate the number of decisions (NOD), we considered the branching nodes in the models as decisions. To calculate the cyclomatic complexity (CC), we used the DFG format for SM and TAD models to ensure that only activity nodes were considered.

4.2.1. Quantitative results and comparison

We compared the models for all resuscitation goals from each approach using the complexity and accuracy metrics (Table 3). The entries marked with “-” indicate that the metric could not be calculated because: (1) the conversion of IM models to DFG format is not supported and (2) the SM algorithm extensively uses XOR gates instead of AND gates to achieve high model accuracy. The models discovered by TAD Miner had the smallest number of decisions (NOD), cyclicity (CYC), cyclomatic complexity (CC) and sequentiality (SEQ) on all resuscitation goals. The TAD models had more activities (NOA) because repeated activities were represented as different nodes. The NOD for TAD Models was smaller because we only introduced decision points for common-but-dispersed activities, while SM and IM identify decision nodes based on the split points given by direct activity transitions in the traces. Because TAD Miner identifies fewer branching nodes, users may find it easier to identify the critical decision points in TAD models. Compared to the SM models, the TAD models had smaller CC, resulting in better understandability. Compared to the IM models, the TAD models had smaller SEQ in five of the seven goals (G2, G3, G4, G5, G7), and the same SEQ for the remaining two goals (G1, G6). The IM models contained more parallel pathways, which means the model can only replay part of the traces, resulting in an underfit of the dataset. In four of the seven resuscitation goals (G1, G2, G3, G5), the SM models contained cycles, a feature that may decrease the interpretability related to activity order. For the accuracy metrics, the SM models achieved the best performance in F-score in six of the seven goals (G1, G2, G3, G5, G6, G7), and IM achieved the best F-score in G4. Although we did not observe a significantly better performance of our TAD Miner on the accuracy comparisons, its performance was still comparative to that of the state-of-the-art methods. Considering a high fitness or high precision may respectively cause underfitting or overfitting of the activity log, obtaining comparative accuracy results is expected. Our goal has been to reduce the complexity while maintaining a balance between fitness and precision.

Table 3.

Quantitative comparison of Inductive Miner (IM), Split Miner (SM), and TAD Miner.

Resuscitation Goal	Method	Complexity					Accuracy
		NOA	NOD	CYC	CC	SEQ	Fitness	Precision	F-score
G1.IV Access	IM	3	3	0	–	2	0.71	0.86	0.78
	SM	3	3	0.67	7	–	0.93	0.99	0.96
	TAD	4	1	0	6	2	0.74	0.89	0.81
G2.Non-Invasive Oxygenation	IM	6	10	0	–	5	0.6	0.83	0.69
	SM	6	3	0.50	9	–	0.7	0.91	0.79
	TAD	12	2	0	8	4	0.61	0.87	0.72
G3.Back Assessment	IM	7	2	0	–	7	0.83	0.59	0.69
	SM	7	5	0.71	17	–	0.81	0.91	0.86
	TAD	8	2	0	8	2	0.82	0.75	0.78
G4.Intubation	IM	11	4	0	–	6	0.78	0.82	0.8
	SM	11	4	0	7	–	0.67	0.97	0.79
	TAD	11	2	0	7	4	0.84	0.67	0.75
G5.Bilateral Breath Sounds Assessment	IM	4	2	0	–	3	0.68	0.79	0.73
	SM	4	4	0.25	8	–	0.69	0.96	0.8
	TAD	5	2	0	5	2	0.71	0.85	0.77
G6.Transfusion	IM	9	7	0	–	0	0.8	0.92	0.85
	SM	9	6	0	18	–	0.87	0.91	0.88
	TAD	9	2	0	7	0	0.6	0.92	0.73
G7.Vital Signs Assessment	IM	5	3	0	–	4	0.92	0.74	0.82
	SM	5	3	0	9	–	0.78	0.97	0.86
	TAD	5	2	0	5	0	0.74	0.98	0.84

Open in a new tab

4.2.2. Qualitative results and comparison

The list of differences between the discovered process models (Fig. 8, Fig. 9) and the knowledge-driven models (Fig. 7) was reviewed by the medical experts on our team. The experts validated the differences as suggested modifications to their original knowledge-driven models. Table 4 summarizes the differences between the knowledge-driven models for the two multi-level major resuscitation goals and the corresponding models discovered by different process mining methods, and the results of expert validations.

Fig. 7. — Knowledge-driven models for the (a) “intubation” and (b) “transfusion” goals. Some modifications (M #) are marked in red (details explained in Table 4).

Table 4.

Modifications of the knowledge-driven models suggested by the models discovered by different methods and validated by a medical expert.

Resuscitation Goal	Suggested Modifications		Methods			Valid
Resuscitation Goal	Suggested Modifications		IM	SM	TAD	Valid
Intubation	Activity Order	M1. Pre-Oxygenation is performed before IV Access.	N	Y	Y	Y
		M2. Endotracheal Tube Secured is performed after Breathing Tube Depth from Lips-cm.	N	Y	Y	Y
		M3. Airway Assessment is performed before Breathing Assessment.	Y	Y	Y	N
	Activity Concurrency	M4. Intubation Medicines is performed concurrently with Intubation.	Y	N	N	N
		M5. Ventilation started concurrently with other activities from start.	N	N	Y	Y
		M6. Pre-Oxygenation started concurrently with other activities from start.	Y	N	Y	Y
		M7. Noninvasive Oxygenation started concurrently with other activities from start.	Y	N	Y	Y
		M8. Listen to Breath Sounds is performed concurrently with all other activities in the subgoal.	Y	N	Y	Y
Transfusion	Activity Order	M9. Crystalloid Given is sometimes performed before IV Access.	Y	Y	Y	N
		M10. Peripheral Pulse Check is performed after Heart Rate.	N	Y	Y	N
		M11. Vital Signs Assessment is performed before IV Access.	Y	Y	Y	Y
		M12. Bleeding Identification is performed from the start of the process.	Y	Y	Y	Y
	Activity Concurrency	M13. External Hemorrhage Control is performed concurrently from start point to Blood Given.	N	N	Y	Y
		M14. Central Pulse Check is performed concurrently with Heart Rate and Blood Pressure.	Y	N	Y	Y

Open in a new tab

The data-driven models generated by the three methods uncovered 14 suggested modifications, with 11 of the modifications uncovered by more than one method. Ten suggested modifications were confirmed as valid by medical experts, and four were judged as invalid (Table 4). Invalid modifications were caused by: (1) errors in manual coding of the dataset (M9); (2) preferred order of activity performance by providers at our research site (M3, M10), and (3) underfitting of the IM model (M4). Compared to other methods, TAD Miner uncovered more valid modifications. These modifications mainly referred to the activity order and concurrency. IM and SM could not identify the accurate sequential and concurrent relationships between activities. For example, IM did not discover M1 and M2, and SM did not discover M5, M13, and M14 (Fig. 7). IM usually generated models with more concurrencies, but sometimes missed the order information (Fig. 9 (a) and (b)). SM models had more XOR gates that might have hindered discovery of concurrent activities (Fig. 9 (c) and (d)). TAD Miner suggested more valid modifications because it separately discovers the backbone (i.e., sequential activity relations) and branches (i.e., concurrent activity relations). TAD models provided a representation of the processes that aided the modification of the knowledge-driven models. Medical experts modified their knowledge-driven models based on the suggestions discovered by the data-driven models.

To further evaluate the accuracy and interpretability of data-driven models discovered by the IM, SM, and TAD methods, we recruited four medical experts from the hospital’s emergency department (two pediatric emergency physicians and two nurses). These participants were not involved in the study design or in the development of the knowledge-based models. We asked each participant to independently evaluate six data-driven process models using an electronic survey. The six models included the IM, SM, and TAD models for two multi-level resuscitation goals (“intubation” and “transfusion”) and their subgoals. To avoid evaluation bias, the process discovery methods were not indicated to the participants and each model was assigned a code name. For each goal, we asked the participants to first review the knowledge-driven model created by medical experts on our research team. The data-driven models were then presented in a random order for each goal. The visual attributes (colors, line properties, and workflow symbols) are integral parts of each modeling method. To ensure each model was viewed in its original format, we did not modify any visual attributes across the models to create a uniform look and feel. Any influence of the visual attributes on participants’ perceptions of the model’s interpretability and accuracy was considered acceptable. The participants expressed their preferences in the survey and evaluated each model’s accuracy based on the original models generated by each approach. For each model, we provided instructions on how to read them and explained the meaning of the notations to ensure the participants understood model semantics:

TAD model: 1. The grey ovals represent the main process flow; 2. The side branches of white ovals do not mean the process will skip the grey ovals, but rather that the activity is a parallel activity and can occur at any time during that segment; 3. Dashed outline means that activity is infrequent in our dataset.
IM model: 1. Parallel gateway : all paths will be performed concurrently; 2. Exclusive gateway : either one of the paths will be performed.
SM model: 1. The shade of the nodes represents the frequency of that activity in our dataset. The darker the node is, the more frequent it occurred; 2. The weight of the edges represents the frequency of the steps. A heavier edge means that the step occurs more frequently; 3. Exclusive gateway : either one of the paths will be performed.

After reviewing each data-driven model, the participants answered two questions that assessed the accuracy (Q1) and interpretability (Q2) of the process models using a Likert-scale (with 1 indicating “strongly disagree” and 5 “strongly agree”):

Q1. This data-driven model accurately represents the events of the resuscitation goal.
Q2. Using this data-driven model, I can identify uncertain positions of activities and errors in the order of activities in the knowledge-driven model.

A higher agreement level on both questions suggested better performance. We also asked the participants to identify the model that best represented the process steps for achieving the goals. Finally, we asked them to describe any shortcomings of each model using open-ended questions.

Among the six data-driven models, the models generated by TAD Miner received the highest level of agreement based on accuracy and interpretability (Table 5). The experts found that the TAD models better matched knowledge-driven models. When mismatches occurred, the TAD models provided a clear indication of the errors in the expert model.

Table 5.

Survey responses for two Likert-scale rating questions (1: strongly disagree, 2: disagree, 3: neutral, 4: agree, 5: strongly agree).

Resuscitation Goal	Participant	IM		SM		TAD
		Q1	Q2	Q1	Q2	Q1	Q2
Intubation	P1	3	2	4	2	4	4
	P2	2	4	3	4	4	4
	P3	4	4	4	3	4	4
	P4	4	3	2	4	2	3
Transfusion	P1	4	3	3	2	4	4
	P2	3	4	3	4	4	4
	P3	5	4	3	3	4	4
	P4	3	3	2	3	4	4
	sum	28	27	24	25	30	31

Open in a new tab

When asked which model best represented the steps of the treatment goal process, three participants preferred the TAD models (P1, P2, P4) and one participant chose the IM model (P3). The participants described the TAD model as “easiest to interpret visually” [P1], “it seemed to make the most sense for flow and content” [P2], and “it seems very intuitive” [P4]. P3 highly rated the IM models because these were “easiest to read and follow.” While the IM models might be more succinct than others, P4 explained that “there’re unexpected paths to skip too many things making it less than ideal.” When asked about the model shortcomings, P1 mentioned that SM models were “hard to read and hard to follow.” P2 and P4 identified several invalid process steps in each model (Table 4) and considered these invalid steps as shortcomings. For the TAD models, P3 stated: “weird that the main points are on the right (branches), while the ‘lesser’ points are on the left (backbones).” This impression can be explained by the rare occurrence of critical activities in our dataset, which made these activities appear in the branches of the models. Because we used the consensus activity sequence as the model backbone, the backbone activities may appear less important than the branch activities.

4.2.3. Case study on the primary survey process

We applied TAD Miner, IM, and SM on the entire primary survey process to test how they generalize to longer processes with larger number of activity types (Fig. 10). For comparison, we also applied the three approaches on the back assessment goal which is simpler and contains less activity types (Fig. 11). The three models of the entire primary survey process were longer, contained more branches and decision nodes, and more difficult to interpret than the back assessment models. To manage this complexity, the entire primary survey process can be segmented into different stages by applying domain knowledge, e.g., identifying the associated activities of certain resuscitation goals and extracting relevant traces. The model discovery process can be independently applied to each stage using our method at a desired granularity. The finer-granularity models can then be recombined into a formal process model. Although all the methods generated complex models, the TAD model was more interpretable than others because it was free of cycles and provided a linear backbone process.

Fig. 10. — Process models for the entire primary survey stage discovered by (a) TAD Miner, (b) Inductive Miner and (c) Split Miner. The processes are relatively long and have a large number of activity types. Red box 1 in (a) shows the IV placement activity performed for blood transfusion, and red box 2 shows the IV placement activity performed for medication administration.

Fig. 11. — Process models for the “back assessment” goal discovered by (a) TAD Miner, (b) Inductive Miner and (c) Split Miner.

5. Discussion

Our results showed that TAD Miner discovered models with lower complexity and better interpretability. The TAD model contains a backbone and branches but no cycles. Some cycles appeared in the models generated by SM and IM because they represented repeated activities as equivalent using a single node. The repeated activities occur during resuscitations because of clinical needs such as multiple attempts at performing interventions. Showing the entire process in a linear order is less confusing and may improve interpretability. For example, in the TAD model of the “back assessment” goal (Fig. 11(a)), “T-spine” was mainly performed after “visual inspection” and before “L-spine.” An additional side branch of “T-spine” (highlighted with the red box in Fig. 11) showed that “T-spine” also occurred at any time during the process. For longer processes like the entire primary survey phase of the resuscitation, the activities are more likely to occur multiple times for different goals. For example, the TAD model found the “IV placement” activity performed in multiple locations for different intentions, e.g., for blood transfusion or for medication administration (Fig. 10(a)). In contrast, the “IV placement” activity was shown in loops in both IM model (Fig. 10(b)), and SM model (Fig. 10(c)), which was not clinically meaningful. Compared to the existing label refining techniques for handling the repeated activities [14,15], our TAD Miner can automatically identify the repeated activities during process model discovery, eliminating the need for additional labeling and context-deriving efforts.

The TAD model showed the sequential relationships and represented repeated activities in a linear order. Process models showing both the linear order and parallelism of activities may better support understanding of complex medical processes. When comparing the primary survey model (Fig. 10) and back assessment model (Fig. 11) discovered by TAD Miner and those discovered by other algorithms, medical experts found that understanding the transitions and cycles in the SM models was more challenging. Experts also found it challenging to identify the activity order in the IM models because many pathways were shown in parallel.

Our results showed that the TAD models did not achieve better F-scores than the SM models (Table 3). The F-score is the harmonic mean of fitness and precision. The fitness and precision scores used in the process mining literature are defined differently from those used in the traditional machine learning domain. The fitness score measures the ability of a model to replay all traces in the activity log [23], while the precision score measures the ability of a model to generate only the traces in the log [22]. It is easier to quantify fitness because the size of an activity log is bounded. In contrast, quantifying precision is more challenging because the number of paths in a process model is unbounded [37]. If a model has loops, the number of paths will be infinite and result in zero precision based on the machine-learning definition of precision. In our study, we adopted the alignment-based precision metric [22] because it (1) can measure the conformance between the model and the activity log, (2) is well-established, and (3) is implemented in the framework of the ProM tool. This method calculates precision by first building a prefix automaton based on the optimal alignments between the traces in the activity log and the paths in the model. The automaton contains one state per unique prefix of each trace in the activity log. This method then determines which activities are allowed as the next activities by the process model for each state. The activities that are allowed by the process model but never observed in the activity log are considered “escape activities” and are considered as imprecision of the process model. A process model with fewer “escape activities” better conforms to the activity log. Since the alignment-based precision metric is computed based on the traces’ prefix automaton, it does not cover all the paths that can be produced by the model. It is possible that some paths in the model do not occur in the activity log. These paths may make the model less precise and are not measured in the alignment-based precision. These are the limitations of using the F-score metric under the process mining context.

In addition, the F-score is computed from the entire activity log directly. While the process models generated by SM conform better to the observed traces, they are often overfitting to the incomplete activity log (e.g., human errors like omitted treatment activities, the coders missed recording some activities, or some activities do not need to be performed in all cases). TAD Miner, on the other hand, aligns the common activities in the traces and fills-in the missed or skipped parts of the traces to build a complete backbone process. The process models generated by IM and SM that conform to the incomplete traces could be misleading and hard to comprehend, as they may not represent a meaningful process. For example, if we replay the process model in Fig. 11 to obtain a trace, the IM and SM models could produce an extremely short trace (such as “Start -> C-Spine -> End”), which is not helpful for knowledge discovery. TAD cannot generate this type of trivial models. Fig. 11 shows that TAD includes only meaningful traces, with the backbone being: “Start -> Log Roll -> Visual Inspection -> T-Spine -> L-Spine -> Rectal -> End.” These meaningful models help to understand the process.

The modifications for the knowledge-driven model suggested by the discovered models (Table 4) were assessed by the medical experts from our team. Valid suggestions were adopted to modify the knowledge-driven model, and the invalid ones were rejected based on the experts’ practice. For example, the simultaneous administration of “intubation medications” while performing “intubation” identified as M4 in Table 4 is technically feasible but would be undesirable. Without allowing the paralytics and sedatives to reach therapeutic levels, the process of intubation would be difficult for the provider performing the task and uncomfortable for the patient. For this reason, it is not standard of care to do these steps simultaneously. This modification is considered invalid by our experts. Measuring the clinical outcomes based on the validations of some modifications presented in Table 4 would be difficult. For example, for M10 in Table 4, performing a peripheral pulse check simultaneously with or after measuring the heart rate has no clinical significance and depends on training and the clinical scenario in which it is first performed. We believe validations of some other modifications may be shown by clinical outcome improvement, which is a future task of our study.

For the temporal data mining methods that have been used for discovering medical procedures, time interval related patterns (TIRPs) mining algorithm [18] discovers the frequent time interval patterns based on Allen’s temporal relations (i.e., before, meets, overlaps, contains, etc.) between the activities. Considering high-acuity, time-critical clinical scenarios where the start point of an activity is the most critical decision, our trace alignment-based algorithm focuses on discovering process models based on the start time of each activity. Compared to the Careflow mining algorithm [19] that identifies multiple linear procedures from a process dataset, our TAD Miner identifies the backbone and concurrent activities, focusing on discovering one representative process model that better assists medical experts in understanding the overall workflow for a resuscitation goal.

6. Study limitations

6.1. Data quality

The credibility of the process models that rely on data can be affected by the quality of the dataset. Our team consisted of 20–25 medical researchers who manually coded the 308 resuscitation video recordings, using an activity dictionary with more than 200 different activity types. This process might have led to potential discrepancies and human bias in the dataset. To mitigate these issues, we used a pre-defined training process for any individual performing coding. Before coding, the researchers were required to complete five test videos that had already been coded by a designated “expert coder” (an individual who had been trained, passed the inter-rater reliability test, and had prior experience coding medical videos). A detailed activity dictionary was provided to new coders, describing the visual cues for each activity and the start and stop times to assign to each activity.

To ensure consistency and accuracy in the dataset, the new coder’s outputs were compared to the expert’s output using a Cohen’s Kappa test to assess the inter-rater reliability. A Kappa score was generated for both activity recognition (to ensure the new coder is identifying the correct activities) and timing (to ensure the new coder is selecting the correct start and stop times for each activity). Before being permitted to code new resuscitation videos, the new coders were required to achieve a Kappa score greater than 0.75 [38]. If they did not meet this threshold, they were given the opportunity to review the expert’s output and practice coding additional test videos until their scores met the cutoff values.

6.2. Evaluation bias

The validation of the suggested modifications by the data-driven process models (Table 4) may be biased by using clinical experts at the research site and the small sample size of clinical evaluators in this study. For example, the activity orders found by M3, M10, and M14 (Table 4) could be different at a separate institution or by different clinicians at the same institution. The activities could be performed simultaneously or in different order. Our future work will be evaluating the data-driven and knowledge-driven models by a larger and more diverse group of clinical experts, including those from outside the research institution, and to incorporate data from other institutions.

7. Conclusion

In this work, we introduced TAD Miner for discovering complex medical process models. TAD Miner first uses an existing optimized trace alignment algorithm to construct interpretable process models. We introduced an approach to computing the threshold for selecting an optimal consensus sequence from the alignment and introduced a random sampling method to validate the sufficiency of small datasets. The optimal consensus sequence was found by maximizing the similarity between the consensus sequence and all sequences in the alignment matrix. The optimal consensus sequence represented the most typical execution of the process. The consensus sequence was then set as the backbone of the process model. Some non-consensus activities were then identified as common-but-dispersed (i.e., concurrent) and uncommon-but-critical activities and inserted as parallel branches in the model.

Based on the quantitative and qualitative evaluation, the process models discovered by TAD Miner outperformed the models generated by two other established methods: IM and SM. TAD Miner generated linearly structured models, which avoided generating confusing cycles generated by other methods. Despite the complexity of medical cases, the models discovered by our method preserved both sequential order and concurrency for the treatment activities. Our model achieved informativeness without compromising interpretability.

TAD Miner provides insights into modeling complex medical processes using activity logs. When developing an accurate process model to guide the decision making for medical providers, differences between the data-driven model and the knowledge-driven model can be considered as suggestions to modify or correct the knowledge-driven model. The discovered model can help medical experts better understand actual practice and design an accurate framework for decision support. TAD Miner can be further improved in several ways. First, although our method discovers the decision points in a model, it needs to be augmented with the values of the context attributes that lead to different branches of each workflow. Second, the trace alignment mechanism makes it difficult to detect early exit points of a process. In the future, we will incorporate relevant context attributes into the models. Process models for different patient contexts can provide medical experts with more effective support during treatments that are targeted to specific contexts (e.g., different treatment steps for different vital signs). For future evaluations of our approach, we will compare its effectiveness with other modeling strategies in other practice settings and for different clinical processes.

Acknowledgements

This work is supported by the U.S. National Institutes of Health/National Library of Medicine under grant number R01LM011834.

Footnotes

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

https://www.promtools.org/.

https://apromore.com/.

References

[1].Van Der Aalst W, Process mining, Communications of the ACM 55 (8) (2012) 76–83, 10.1145/2240236.2240257. [DOI] [Google Scholar]
[2].Weijters A, van Der Aalst WM and De Medeiros AA, Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, Vol. 166. July 2017 (2006), 1–34. [Google Scholar]
[3].Günther CW, Van Der Aalst WM, Fuzzy mining–adaptive process simplification based on multi-perspective metrics, In International conference on business process management. 328–343 (2007), 10.1007/978-3-540-75183-0_24. [DOI] [Google Scholar]
[4].Fahland D, Van Der Aalst WM, Simplifying discovered process models in a controlled manner, Information Systems 38 (4) (2013) 585–605, 10.1016/j.is.2012.07.004. [DOI] [Google Scholar]
[5].Martin N, De Weerdt J, Fernández-Llatas C, Gal A, Gatta R, Ibáñez G, Johnson O, Mannhardt F, Marco-Ruiz L, Mertens S, Recommendations for enhancing the usability and understandability of process mining in healthcare, Artificial Intelligence in Medicine 109 (2020), 101962, 10.1016/j.artmed.2020.101962. [DOI] [PubMed] [Google Scholar]
[6].Munoz-Gama J, Martin N, Fernandez-Llatas C, Johnson OA, Sepúlveda M, Helm E, Galvez-Yanjari V, Rojas E, Martinez-Millana A, Aloini D, Process mining for healthcare: Characteristics and challenges, Journal of Biomedical Informatics 127 (2022), 103994, 10.1016/j.jbi.2022.103994. [DOI] [PubMed] [Google Scholar]
[7].Jagadeesh Chandra Bose R and Aalst W. v. d., 2010. Trace alignment in process mining: opportunities for process diagnostics. In International Conference on Business Process Management. 227–242. [Google Scholar]
[8].Yang S, Zhou M, Chen S, Dong X, Ahmed O, Burd RS, Marsic I, Medical workflow modeling using alignment-guided state-splitting HMM, in: In 2017 IEEE international conference on healthcare informatics (ICHI), 2017, pp. 144–153, 10.1109/ICHI.2017.66. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Bose RJC, van der Aalst WM, Analysis of Patient Treatment Procedures, In Business Process Management Workshops 1 (2011) 165–166, 10.1007/978-3-642-28108-2_17. [DOI] [Google Scholar]
[10].Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D, Process mining in healthcare: A literature review, Journal of biomedical informatics 61 (2016) 224–236, 10.1016/j.jbi.2016.04.007. [DOI] [PubMed] [Google Scholar]
[11].Leemans SJ, Fahland D, Van Der Aalst WM, Discovering block-structured process models from event logs containing infrequent behaviour, In International conference on business process management. 66–78 (2013), 10.1007/978-3-319-06257-0_6. [DOI] [Google Scholar]
[12].Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A, Split miner: automated discovery of accurate and simple business process models from event logs, Knowledge and Information Systems 59 (2) (2019) 251–284, 10.1007/s10115-018-1214-x. [DOI] [Google Scholar]
[13].White SA, Introduction to BPMN, Ibm Cooperation 2 (2004). [Google Scholar]
[14].Alharbi A, Bulpitt A, Johnson O, Improving pattern detection in healthcare process mining using an interval-based event selection method, In International conference on business process management. 88–105 (2017), 10.1007/978-3-319-65015-9_6. [DOI] [Google Scholar]
[15].Lu X, Fahland D, van den Biggelaar FJ, van der Aalst WM, Handling duplicated tasks in process discovery by refining event labels, In International Conference on Business Process Management. 90–107 (2016), 10.1007/978-3-319-45348-4_6. [DOI] [Google Scholar]
[16].Bose RJC, van der Aalst WM, Process diagnostics using trace alignment: opportunities, issues, and challenges, Information Systems 37 (2) (2012) 117–141, 10.1016/j.is.2011.08.003. [DOI] [Google Scholar]
[17].Bouarfa L, Dankelman J, Workflow mining and outlier detection from clinical activity logs, Journal of biomedical informatics 45 (6) (2012) 1185–1190, 10.1016/j.jbi.2012.08.003. [DOI] [PubMed] [Google Scholar]
[18].Moskovitch R, Shahar Y, Medical temporal-knowledge discovery via temporal abstraction, in: In AMIA annual symposium proceedings, 2009, p. 452. [PMC free article] [PubMed] [Google Scholar]
[19].Dagliati A, Sacchi L, Zambelli A, Tibollo V, Pavesi L, Holmes JH, Bellazzi R, Temporal electronic phenotyping by mining careflows of breast cancer patients, Journal of biomedical informatics 66 (2017) 136–147, 10.1016/j.jbi.2016.12.012. [DOI] [PubMed] [Google Scholar]
[20].Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM, Explainable artificial intelligence: an analytical review, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11 (5) (2021) e1424. [Google Scholar]
[21].Mendling J, Metrics for business process models, in: Metrics for Process Models, Springer, 2008, pp. 103–133. [Google Scholar]
[22].Adriansyah A, Munoz-Gama J, Carmona J, Dongen B. F. v. and van der Aalst WM, 2012. Alignment based precision checking. In International conference on business process management. 137–149. 10.1007/978-3-642-36285-9_15. [DOI] [Google Scholar]
[23].Adriansyah A, van Dongen BF, van der Aalst WM, Conformance checking using cost-based fitness analysis, in: In 2011 IEEE 15th international enterprise distributed object computing conference, 2011, pp. 55–64, 10.1109/EDOC.2011.12. [DOI] [Google Scholar]
[24].Chen S, Yang S, Zhou M, Burd R, Marsic I, Process-oriented iterative multiple alignment for medical process mining, in: In 2017 IEEE international conference on data mining workshops (ICDMW), 2017, pp. 438–445, 10.1109/ICDMW.2017.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Levenshtein VI, Binary codes capable of correcting deletions, insertions, and reversals, In Soviet physics doklady. (1966) 707–710. [Google Scholar]
[26].Beyer WH, Handbook of tables for probability and statistics, Crc Press, 2019. [Google Scholar]
[27].Cochran WG, Sampling techniques, John Wiley & Sons, 2007. [Google Scholar]
[28].Israel GD, 1992. Determining sample size. Fact Sheet PEOD-6, Gainesville: University of Florida. [Google Scholar]
[29].Cardoso J, Mendling J, Neumann G, Reijers HA, A discourse on complexity of process models, In International Conference on Business Process Management. 117–128 (2006), 10.1007/11837862_13. [DOI] [Google Scholar]
[30].Hasíc F, Vanthienen J, Complexity metrics for DMN decision models, Computer Standards & Interfaces 65 (2019) 15–37, 10.1016/j.csi.2019.01.006. [DOI] [Google Scholar]
[31].McCabe TJ, A complexity measure, IEEE Transactions on software Engineering Vol. SE-2. 4 (1976) 308–320, 10.1109/TSE.1976.233837. [DOI] [Google Scholar]
[32].Polaňcič G, Cegnar B, Complexity metrics for process models–A systematic literature review, Computer Standards & Interfaces 51 (2017) 104–117, 10.1016/j.csi.2016.12.003. [DOI] [Google Scholar]
[33].Carter EA, Waterhouse LJ, Kovler ML, Fritzeen J, Burd RS, Adherence to ATLS primary and secondary surveys during pediatric trauma resuscitation, Resuscitation 84 (1) (2013) 66–71, 10.1016/j.resuscitation.2011.10.032. [DOI] [PubMed] [Google Scholar]
[34].A. Subcommittee and I. A. W. Group, Advanced trauma life support (ATLS^®): the ninth edition, in: The journal of trauma and acute care surgery, 2013, pp. 1363–1366, 10.1097/TA.0b013e31828b82f5. [DOI] [PubMed] [Google Scholar]
[35].Hallgren KA, Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology, Vol. 8. 1 (2012), 23. 10.20982/tqmp.08.1.p023. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].O’Connell KJ, Yang S, Cheng M, Sandler AB, Cochrane NH, Yang J, Webman RB, Marsic I, Burd R, Process conformance is associated with successful first intubation attempt and lower odds of adverse events in a paediatric emergency setting, Emergency Medicine Journal 36 (9) (2019) 520–528, 10.1136/emermed-2018-208133. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Syring AF, Tax N and van der Aalst WM, Evaluating conformance measures in process mining using conformance propositions, in: Transactions on Petri Nets and Other Models of Concurrency XIV, Springer, 2019, pp. 192–221. [Google Scholar]
[38].Fleiss JL, Measuring nominal scale agreement among many raters, Psychological bulletin 76 (5) (1971) 378, 10.1007/978-3-662-60651-3_8. [DOI] [Google Scholar]

[R1] [1].Van Der Aalst W, Process mining, Communications of the ACM 55 (8) (2012) 76–83, 10.1145/2240236.2240257. [DOI] [Google Scholar]

[R2] [2].Weijters A, van Der Aalst WM and De Medeiros AA, Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, Vol. 166. July 2017 (2006), 1–34. [Google Scholar]

[R3] [3].Günther CW, Van Der Aalst WM, Fuzzy mining–adaptive process simplification based on multi-perspective metrics, In International conference on business process management. 328–343 (2007), 10.1007/978-3-540-75183-0_24. [DOI] [Google Scholar]

[R4] [4].Fahland D, Van Der Aalst WM, Simplifying discovered process models in a controlled manner, Information Systems 38 (4) (2013) 585–605, 10.1016/j.is.2012.07.004. [DOI] [Google Scholar]

[R5] [5].Martin N, De Weerdt J, Fernández-Llatas C, Gal A, Gatta R, Ibáñez G, Johnson O, Mannhardt F, Marco-Ruiz L, Mertens S, Recommendations for enhancing the usability and understandability of process mining in healthcare, Artificial Intelligence in Medicine 109 (2020), 101962, 10.1016/j.artmed.2020.101962. [DOI] [PubMed] [Google Scholar]

[R6] [6].Munoz-Gama J, Martin N, Fernandez-Llatas C, Johnson OA, Sepúlveda M, Helm E, Galvez-Yanjari V, Rojas E, Martinez-Millana A, Aloini D, Process mining for healthcare: Characteristics and challenges, Journal of Biomedical Informatics 127 (2022), 103994, 10.1016/j.jbi.2022.103994. [DOI] [PubMed] [Google Scholar]

[R7] [7].Jagadeesh Chandra Bose R and Aalst W. v. d., 2010. Trace alignment in process mining: opportunities for process diagnostics. In International Conference on Business Process Management. 227–242. [Google Scholar]

[R8] [8].Yang S, Zhou M, Chen S, Dong X, Ahmed O, Burd RS, Marsic I, Medical workflow modeling using alignment-guided state-splitting HMM, in: In 2017 IEEE international conference on healthcare informatics (ICHI), 2017, pp. 144–153, 10.1109/ICHI.2017.66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Bose RJC, van der Aalst WM, Analysis of Patient Treatment Procedures, In Business Process Management Workshops 1 (2011) 165–166, 10.1007/978-3-642-28108-2_17. [DOI] [Google Scholar]

[R10] [10].Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D, Process mining in healthcare: A literature review, Journal of biomedical informatics 61 (2016) 224–236, 10.1016/j.jbi.2016.04.007. [DOI] [PubMed] [Google Scholar]

[R11] [11].Leemans SJ, Fahland D, Van Der Aalst WM, Discovering block-structured process models from event logs containing infrequent behaviour, In International conference on business process management. 66–78 (2013), 10.1007/978-3-319-06257-0_6. [DOI] [Google Scholar]

[R12] [12].Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A, Split miner: automated discovery of accurate and simple business process models from event logs, Knowledge and Information Systems 59 (2) (2019) 251–284, 10.1007/s10115-018-1214-x. [DOI] [Google Scholar]

[R13] [13].White SA, Introduction to BPMN, Ibm Cooperation 2 (2004). [Google Scholar]

[R14] [14].Alharbi A, Bulpitt A, Johnson O, Improving pattern detection in healthcare process mining using an interval-based event selection method, In International conference on business process management. 88–105 (2017), 10.1007/978-3-319-65015-9_6. [DOI] [Google Scholar]

[R15] [15].Lu X, Fahland D, van den Biggelaar FJ, van der Aalst WM, Handling duplicated tasks in process discovery by refining event labels, In International Conference on Business Process Management. 90–107 (2016), 10.1007/978-3-319-45348-4_6. [DOI] [Google Scholar]

[R16] [16].Bose RJC, van der Aalst WM, Process diagnostics using trace alignment: opportunities, issues, and challenges, Information Systems 37 (2) (2012) 117–141, 10.1016/j.is.2011.08.003. [DOI] [Google Scholar]

[R17] [17].Bouarfa L, Dankelman J, Workflow mining and outlier detection from clinical activity logs, Journal of biomedical informatics 45 (6) (2012) 1185–1190, 10.1016/j.jbi.2012.08.003. [DOI] [PubMed] [Google Scholar]

[R18] [18].Moskovitch R, Shahar Y, Medical temporal-knowledge discovery via temporal abstraction, in: In AMIA annual symposium proceedings, 2009, p. 452. [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Dagliati A, Sacchi L, Zambelli A, Tibollo V, Pavesi L, Holmes JH, Bellazzi R, Temporal electronic phenotyping by mining careflows of breast cancer patients, Journal of biomedical informatics 66 (2017) 136–147, 10.1016/j.jbi.2016.12.012. [DOI] [PubMed] [Google Scholar]

[R20] [20].Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM, Explainable artificial intelligence: an analytical review, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11 (5) (2021) e1424. [Google Scholar]

[R21] [21].Mendling J, Metrics for business process models, in: Metrics for Process Models, Springer, 2008, pp. 103–133. [Google Scholar]

[R22] [22].Adriansyah A, Munoz-Gama J, Carmona J, Dongen B. F. v. and van der Aalst WM, 2012. Alignment based precision checking. In International conference on business process management. 137–149. 10.1007/978-3-642-36285-9_15. [DOI] [Google Scholar]

[R23] [23].Adriansyah A, van Dongen BF, van der Aalst WM, Conformance checking using cost-based fitness analysis, in: In 2011 IEEE 15th international enterprise distributed object computing conference, 2011, pp. 55–64, 10.1109/EDOC.2011.12. [DOI] [Google Scholar]

[R24] [24].Chen S, Yang S, Zhou M, Burd R, Marsic I, Process-oriented iterative multiple alignment for medical process mining, in: In 2017 IEEE international conference on data mining workshops (ICDMW), 2017, pp. 438–445, 10.1109/ICDMW.2017.63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Levenshtein VI, Binary codes capable of correcting deletions, insertions, and reversals, In Soviet physics doklady. (1966) 707–710. [Google Scholar]

[R26] [26].Beyer WH, Handbook of tables for probability and statistics, Crc Press, 2019. [Google Scholar]

[R27] [27].Cochran WG, Sampling techniques, John Wiley & Sons, 2007. [Google Scholar]

[R28] [28].Israel GD, 1992. Determining sample size. Fact Sheet PEOD-6, Gainesville: University of Florida. [Google Scholar]

[R29] [29].Cardoso J, Mendling J, Neumann G, Reijers HA, A discourse on complexity of process models, In International Conference on Business Process Management. 117–128 (2006), 10.1007/11837862_13. [DOI] [Google Scholar]

[R30] [30].Hasíc F, Vanthienen J, Complexity metrics for DMN decision models, Computer Standards & Interfaces 65 (2019) 15–37, 10.1016/j.csi.2019.01.006. [DOI] [Google Scholar]

[R31] [31].McCabe TJ, A complexity measure, IEEE Transactions on software Engineering Vol. SE-2. 4 (1976) 308–320, 10.1109/TSE.1976.233837. [DOI] [Google Scholar]

[R32] [32].Polaňcič G, Cegnar B, Complexity metrics for process models–A systematic literature review, Computer Standards & Interfaces 51 (2017) 104–117, 10.1016/j.csi.2016.12.003. [DOI] [Google Scholar]

[R33] [33].Carter EA, Waterhouse LJ, Kovler ML, Fritzeen J, Burd RS, Adherence to ATLS primary and secondary surveys during pediatric trauma resuscitation, Resuscitation 84 (1) (2013) 66–71, 10.1016/j.resuscitation.2011.10.032. [DOI] [PubMed] [Google Scholar]

[R34] [34].A. Subcommittee and I. A. W. Group, Advanced trauma life support (ATLS^®): the ninth edition, in: The journal of trauma and acute care surgery, 2013, pp. 1363–1366, 10.1097/TA.0b013e31828b82f5. [DOI] [PubMed] [Google Scholar]

[R35] [35].Hallgren KA, Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology, Vol. 8. 1 (2012), 23. 10.20982/tqmp.08.1.p023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].O’Connell KJ, Yang S, Cheng M, Sandler AB, Cochrane NH, Yang J, Webman RB, Marsic I, Burd R, Process conformance is associated with successful first intubation attempt and lower odds of adverse events in a paediatric emergency setting, Emergency Medicine Journal 36 (9) (2019) 520–528, 10.1136/emermed-2018-208133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Syring AF, Tax N and van der Aalst WM, Evaluating conformance measures in process mining using conformance propositions, in: Transactions on Petri Nets and Other Models of Concurrency XIV, Springer, 2019, pp. 192–221. [Google Scholar]

[R38] [38].Fleiss JL, Measuring nominal scale agreement among many raters, Psychological bulletin 76 (5) (1971) 378, 10.1007/978-3-662-60651-3_8. [DOI] [Google Scholar]

PERMALINK

Discovering interpretable medical process models: A case study in trauma resuscitation

Keyi Li

Ivan Marsic

Aleksandra Sarcevic

Sen Yang

Travis M Sullivan

Peyton E Tempel

Zachary P Milestone

Karen J O’Connell

Randall S Burd

Abstract

1. Introduction

2. Related work

2.1. Process discovery techniques

2.2. Evaluating the interpretability of process models

3. Method

3.1. Terms and definitions

3.2. Backbone process discovery

3.2.1. Trace alignment algorithm

Fig. 1.

3.2.2. Threshold selection for consensus sequence

3.2.3. Consensus sequence stability

3.3. Side branch discovery

Fig. 2.

Fig. 3.

3.4. Semantics of the process model discovered by TAD Miner

3.5. Evaluation methods

3.5.1. Quantitative evaluation metrics

3.5.2. Evaluation of the process models for trauma resuscitation

Table 1.

4. Results from TAD Miner application in the trauma resuscitation domain

4.1. Process models discovery

Table 2.

Fig. 4.

Fig. 5.

Fig. 6.

4.2. Comparison and evaluation of the TAD, IM, and SM models

4.2.1. Quantitative results and comparison

Table 3.

4.2.2. Qualitative results and comparison

Fig. 8.

Fig. 9.

Fig. 7.

Table 4.

Table 5.

4.2.3. Case study on the primary survey process

Fig. 10.

Fig. 11.

5. Discussion

6. Study limitations

6.1. Data quality

6.2. Evaluation bias

7. Conclusion

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases