Skip to main content
BMC Medical Informatics and Decision Making logoLink to BMC Medical Informatics and Decision Making
. 2025 Aug 14;25:306. doi: 10.1186/s12911-025-02967-z

Process mining in healthcare: a tertiary study

Adauto Santos 1,, Gislaine Camila Lapasini Leal 1,#, Renato Balancieri 1,#
PMCID: PMC12355893  PMID: 40814102

Abstract

Business processes in healthcare are complex and multidisciplinary, involving various professional profiles and different healthcare structures, and each medical treatment may require distinct clinical pathways. Process mining can assist in discovering trajectories, verifying compliance, and enabling an understanding of the involvement of different organizational aspects. The main goal of this study is to provide a comprehensive overview of the application of process mining in healthcare. For this, a tertiary review was conducted, gathering 18 secondary reviews that addressed different aspects, such as the objectives of process mining in healthcare, types of activities and perspectives, available resources, primary medical specialties, types of medical processes, and limitations and challenges. The study reveals that process discovery is the most common activity, while the control flow was the most used perspective. The Heuristics Miner and Fuzzy Miner algorithms were the most relevant, and oncology was the medical specialty in which process mining was most used. Process mining has proven to be an effective tool for analyzing healthcare workflows, improving understanding of clinical guidelines and protocols, and supporting decision-making. However, it is necessary to deal with noisy or missing data and establish visualization mechanisms that ensure clarity in data presentation.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12911-025-02967-z.

Keywords: Process mining, Healthcare process, Knowledge discovery, Tertiary review, Clinical pathway

Introduction

Health is a fundamental right and an instrument of citizenship. In 1946, the World Health Organization (WHO) defined health as a complete physical, mental, and social well-being, not merely the absence of disease or infirmity [1]. Considering the different forms of public and private funding, countries in the Organisation for Economic Co-operation and Development (OECD) allocated, on average, 8.8% of GDP to healthcare expenditures in 2019. Projections indicate that this proportion could increase from 8.8% in 2019 to 10.8% in 2040 [2].

Given the rising healthcare costs, techniques that enable the creation of effective strategies, assist in decision-making, and transform raw data into relevant information must be adopted. Process mining can be used for this purpose. According to Van Der Aalst [3], process mining bridges the gap between traditional process analysis and data-driven analysis techniques, such as machine learning. This technique generally involves extracting knowledge from event logs, allowing for the automatic generation of process models. This facilitates knowledge discovery, conformance analysis, and process improvement.

Business processes in healthcare are complex, dynamic, and multidisciplinary, involving various professional profiles working in different healthcare structures and providing different levels of care. Furthermore, healthcare encompasses different medical specialties, each with its own characteristics. These particularities make healthcare a promising field for the use of process mining. In the secondary review conducted by Santos Garcia et al. [4], it is noted that healthcare is indeed one of the main areas of interest for applying this technique.

In recent years, various initiatives have promoted process mining in healthcare, including the conduct of systematic reviews dedicated to this topic. These reviews address specific uses, such as care for frail elderly patients, as discussed by Farid et al. [5], disease trajectories, as explored by Kusuma et al. [6], and applications in various medical specialties, as detailed by Grüger et al. [7], Kusuma et al. [8], and Kurniati et al. [9]. Additionally, reviews were also found that broadly analyzed the application of process mining in healthcare, considering the main aspects involved in the development of these projects, as addressed by Erdogan and Tarhan [10], Rojas et al. [11], and Batista and Solanas [12].

Therefore, the objective of this study is to analyze the application of process mining in healthcare to identify practices, techniques, and challenges from the perspective of researchers and healthcare professionals within the context of a tertiary review of the existing literature, which provides a comprehensive view of the knowledge in a specific field by compiling and analyzing secondary reviews.

Our study aims to answer the following research questions: How are process mining techniques applied in healthcare? (RQ1); What are the types and perspectives of process mining? (RQ2); What are the main algorithms, supporting techniques, tools, and methodologies used for process mining? (RQ3); In which medical specialties are process mining used? (RQ4); What types of medical processes is process mining applied to? (RQ5); What are the limitations, challenges, and future directions of process mining applied to healthcare? (RQ6).

We investigated the existence of other tertiary reviews that might be related to our study. In this process, we identified the works of Ghasemi and Amyot [13] and Tozzi et al. [14]. In Ghasemi and Amyot’s study [13], a systematic literature review was conducted to identify publication trends up to 2016, particularly in the healthcare field. In total, three systematic reviews were found where process mining was applied in a healthcare context. In Tozzi et al. [14], 25 systematic reviews related to using Artificial Intelligence in pediatric oncology were analyzed. This study found no work applying process mining techniques for this purpose.

Thus, our work distinguishes itself by conducting a comprehensive, detailed, and up–to-date tertiary review. By providing a consolidated view of these aspects, our study aims to map the main objectives, classify the types of activities and perspectives, address technical aspects, verify the medical specialties and medical processes in which process mining is utilized, as well as list limitations, challenges, and future directions for research and development. Although its conclusions may partly reflect the existing state of the art, this characteristic is inherent to the nature of a tertiary review, whose primary objective is to organize and synthesize previously published information. In this way, the work highlights patterns, gaps, and opportunities that can guide more in-depth investigations in the field.

The document is organized as follows: Sect. 2 presents the theoretical foundation of process mining. Section Section 3 highlights the methodological aspects of the work. Section 4 answers the research questions based on the identified systematic reviews. Section 5 discusses the results obtained to deepen the understanding of the application of process mining in healthcare. Section 6 addresses the main limitations and threats to the study’s validity. Section 7 presents the final considerations.

Literature review

This section provides an overview of process mining, covering the types of activities, perspectives, and main algorithms. Additionally, it discusses the application of this technique in healthcare, highlighting how it assists in decision-making. Finally, it presents related works.

Process mining

Process mining is a research area located at the intersection of machine learning, data mining, and process modeling and analysis. This area provides techniques, methodologies, algorithms, and tools to understand processes based on the data present in information systems [15]. According to Van Der Aalst [3], process mining aims to monitor and improve processes by extracting knowledge from event logs in information systems.

According to Van Der Aalst [15], process mining has three types of activities: discovery, conformance checking, and enhancement. In discovery, the process model is created based on the processing of event logs to understand a specific process. Typically, in this type of activity, the event logs do not contain additional knowledge [12]. In conformance checking, the original process is compared with the process model generated by log records during the execution of activities to verify whether real-world activities comply with the proposed model. In healthcare, this may refer to clinical guidelines or models established by medical specialists. Finally, in enhancement activities, the goal is to extend or improve an existing process model based on the processing of log records.

According to Van Der Aalst [15], different perspectives can be considered in addition to the three types of mining activities. These perspectives include control flow, organizational, case, and temporal perspectives, representing different viewpoints from which processes can be analyzed. The control-flow perspective aims to characterize the possible paths a process can follow. The organizational perspective focuses on structuring the organization by classifying different actors according to their roles and organizational units. From the case perspective, the analysis centers on the properties of the cases. For example, a particular task may be defined by its path or the actors involved.

Additionally, one may wish to understand the values of the elements involved in the process, such as the number of requests made. Finally, from a temporal perspective, the frequency of events and the duration of processes are analyzed. The objective is monitoring resource utilization, measuring service time, identifying bottlenecks, and predicting processing time.

In addition to the types of tasks and perspectives proposed by Van Der Aalst [15], process mining can be categorized based on the types of event data (prospective or retrospective) and the types of models (normative or descriptive) [16]. Prospective data refers to ongoing or unfinished cases. Therefore, the resulting model from mining can influence the outcome of these cases. In contrast, retrospective data refers to information from already completed cases; thus, the mining result serves to understand the process. Regarding model types, the normative model specifies how those involved should execute the process. On the other hand, the descriptive model aims to demonstrate how the process is carried out in practice. For instance, the descriptive model can be obtained through process discovery, while the normative model is used for conformance checking.

In process mining, algorithms play a fundamental role, as they are responsible for processing event log files to produce the graphs used in discovery, conformance checking, and process improvement activities. The Alpha Miner algorithm, for example, aims to produce a Petri Net based on a simple event log, allowing for process discovery through the resulting graph. Alpha Miner can be considered a pioneering algorithm in process mining, and many of its ideas have been incorporated into more modern techniques. According to Van Der Aalst [15], the Alpha Miner algorithm may face noise-related challenges, infrequent behavior, and the production of complex graphs. However, it serves as a good introduction to process mining.

In the real world, it is rare for logs to be complete and entirely free of errors (noise). Additionally, the different stages of a process may exhibit varying frequency levels. The Heuristics Miner algorithm was proposed to address these issues of noise and infrequent activities in the resulting graphs. This algorithm considers the order of events in a given case and establishes causal dependencies between activities. In this way, infrequent activities are removed from the models, making the resulting graph more robust and understandable [15].

Despite efforts to improve the quality of the resulting process model, it is still possible that the result is a “Spaghetti Process”, as indicated by Van Der Aalst [15], where the graph is difficult to understand. The Fuzzy Miner algorithm was developed to address this issue, as proposed by Günther and Van Der Aalst [17]. This technique aims to simplify the model representation by considering the correlation and significance of the graph components. In summary, highly significant behaviors are maintained, less significant but highly correlated behaviors are aggregated, and less significant and poorly correlated behaviors are removed from the graph. The proprietary software DISCO is based on Fuzzy Miner [18].

While main algorithms, such as Alpha Miner, tend to produce a Petri Net to represent the process, inductive algorithms produce their equivalent in a process tree. Among these, the Inductive Miner, proposed by Leemans [19], stands out, as it recursively divides the log file into smaller sublogs until a base condition is reached, using each sublog to build a process tree. Inductive algorithms are becoming one of the leading approaches in process mining because, in addition to handling noise and infrequent behaviors, they are scalable to handle large models, allowing for the discovery of a much broader class of processes than traditional algorithms [15].

Process mining in healthcare

Business processes in healthcare are complex, inconsistent, and multidisciplinary, involving various professional profiles (doctors, technicians, administrators, etc.) working in different healthcare structures (clinics, hospitals, laboratories, etc.) and offering different levels of care (primary, secondary, tertiary, emergency, and urgent care). Furthermore, the healthcare field encompasses various medical specialties, as described by Mans et al. [16] and Homayounfar [20].

Information systems applied to healthcare can cover a wide range of devices, from the infrastructure for communication and integration between healthcare devices to the information systems where the data from medical and organizational processes are stored. Some of the main purposes of these systems include storing and monitoring patient conditions, managing and flowing data, and controlling financial aspects [20]. In addition to the complexity of healthcare processes and the scope of systems, it is important to highlight that medical treatment can exhibit high variability. This is due to each patient’s response to the same treatment, the specific clinical condition of each individual, and the discovery of new medications or medical guidelines.

According to Santos Garcia et al. [4], healthcare is one of the main areas where process mining can be applied. Using patient event logs enables the management of process models, evaluation of compliance with clinical guidelines and protocols, identification of areas for process improvement, and overall decision-making. According to Mans et al. [16], process mining in healthcare allows for answering typical data mining questions, such as: What happened? (e.g., What is the usual treatment for patients with a specific type of disease?); Why did it happen? (e.g., Why was the quality of service not met?); What will happen? (e.g., What is the estimated cost of treatment?); What is the best that can happen? (e.g., How can the workload be redistributed among professionals?). Formulating and answering these questions can provide valuable insights into the analyzed activities.

Methodology

The research was conducted using the guidelines from Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [21]. Table 1 presents the research questions defined for this study.

Table 1.

Research questions defined for the study

Research question Objective
RQ1: How are process mining techniques applied in healthcare? Understand the main objectives for which mining techniques are applied in healthcare.
RQ2: What are the types and perspectives of process mining? Categorize the mining projects, as proposed by Van Der Aalst [15].
RQ3: What are the main algorithms, supporting techniques, tools, and methodologies used for process mining? Identify the main approaches for project development, considering different stages.
RQ4: In which medical specialties are process mining used? Identify the healthcare fields with the most significant interest in applying process mining.
RQ5: What types of medical processes is process mining applied to? Analyze whether mining is applied more in the medical or administrative context.
RQ6: What are the limitations, challenges, and future directions of process mining applied to healthcare? Understand the obstacles and opportunities for process mining in healthcare.

Source: The authors

Search process

The search string was composed of three parts. The first part refers to process mining and the types of activities defined by Van Der Aalst [15], as initial tests showed that the term “process mining” was sometimes not directly mentioned, with more common references to types of activities such as “process discovery” or “conformance checking”. The second part of the string was dedicated to the search for systematic reviews, using different combinations to cover various possibilities. The third part focused on finding papers in the healthcare field, employing broad terms to avoid inadvertently excluding any relevant papers.

After identifying the most relevant terms and an initial phase of search testing, calibration, and deliberation among the authors, the search string was established, as presented in Listing 1.

Listing 1.

Listing 1

Query string

We used the following search sources: IEEE, PubMed, and ACM. Additionally, we adjusted the search string according to the particularities and limitations of each search tool. It is worth noting that no date filter was applied. We conducted the query on 04/02/2024, and Table 2 shows the initial results obtained from each source.

Table 2.

Summary of initial findings from the search platforms

Research source Returned results
PubMed 178
ACM 54
IEEEXplore 18
Total 250

Source: The authors

Selection process

The inclusion criteria were:

  • (IC1) Studies and papers that use process mining in the healthcare field;

  • (IC2) Studies that answer the proposed research questions;

The exclusion criteria were:

  • (EC1) Duplicate studies and papers;

  • (EC2) Studies and papers where it is not possible to verify the use of process mining techniques;

  • (EC3) Studies and papers not related to healthcare;

  • (EC4) Papers not written in English;

  • (EC5) Calls or abstracts for conferences and events;

  • (EC6) Papers inaccessible through digital or paid means;

  • (EC7) Studies and papers that are not Systematic Reviews or Mappings;

The search in the research sources resulted in 250 studies. The tool used StArt2, eliminating two duplicate papers, resulting in 248 studies. Next, we reviewed the titles and abstracts to check adherence to the inclusion and exclusion criteria, reducing the total to 21 studies. Eight were removed after a complete reading of these 21 studies, leaving 13 studies. Based on these studies, we applied a snowballing process, identifying nine more studies, totaling 22. Finally, the studies were thoroughly analyzed, identifying 18 secondary reviews in which process mining was applied in healthcare. Figure 1 shows the results obtained from the selection process.

Fig. 1.

Fig. 1

Paper selection process flowchart. Source: The authors

It is important to note that some systematic reviews, as observed in the works of Iachecen et al. [22] and Oliart et al. [23], were excluded from the final version of this study. Although they are relevant reviews, they do not directly address process mining, which justifies their exclusion. However, these works illustrate the growing interest in the topic, even in areas not directly related to process mining. The review by Rojas et al. [24] was removed because it was an initial work, followed by the publication by Rojas et al. [11], which addressed the topic in greater depth.

Data extraction

The systematic reviews examined in this study reveal a wide range of topics of interest in process mining applied to healthcare. In total, 18 systematic reviews were identified, divided into ten thematic reviews and eight general (or panoramic) reviews, covering the period from 2014 to 2023. The thematic systematic reviews analyzed studies in which process mining was applied to a specific healthcare topic, such as in the study by Yang and Su [25], which identified applications of process mining in clinical pathways. The general systematic reviews, on the other hand, aimed to broadly analyze the application of process mining in healthcare, addressing various aspects to be considered during project development, as observed in Rojas et al. [11] and Guzzo et al. [26].

Table 3 presents the main characteristics of the obtained reviews, and the next section discusses the results in detail.

Table 3.

Overview of the secondary reviews identified

ID Citation and Title Focus Year Paper Period RQ1 RQ2 RQ3 RQ4 RQ5 RQ6
SR1 [25] Process Mining for Clinical Pathway Clinical pathways 2014 37 2004–2013 Yes Yes No Yes No Yes
SR2 [27] Process Mining for Clinical Processes: A Comparative Analysis of Four Australian Hospitals General 2015 28 2007–2012 Yes Yes Yes No No No
SR3 [10] Process Mining for Healthcare Process Analytics General 2016 11 2008–2015 Yes Yes Yes No No No
SR4 [9] Process Mining in Oncology: A Literature Review Oncology 2016 37 2008–2016 Yes Yes Yes Yes No Yes
SR5 [11] Process mining in healthcare: A literature review General 2016 74 2005–2016 No Yes Yes Yes No Yes
SR6 [28] Systematic Mapping of Process Mining Studies in Healthcare General 2018 172 2005–2017 Yes Yes Yes Yes No Yes
SR7 [8] Process Mining in Cardiology: A Literature Review Cardiology 2018 32 2008–2017 Yes Yes Yes Yes Yes Yes
SR8 [12] Process Mining in Healthcare: A Systematic Review General 2018 55 2008–2018 Yes Yes Yes Yes Yes Yes
SR9 [29] Process Mining in Primary Care: A Literature Review Primary care 2018 7 2013–2017 No No No Yes No No
SR10 [5] Process Mining in Frail Elderly Care: A Literature Review Frail elderly 2019 8 2011–2018 No No Yes Yes Yes Yes
SR11 [30] Towards the Use of Standardized Terms in Clinical Case Studies for Process Mining in Healthcare Information standardization 2020 38 2016–20181 No Yes Yes Yes No Yes
SR12 [7] Process mining for case acquisition in oncology: a systematic literature review Oncology 2020 55 2008–2019 No Yes Yes Yes Yes Yes
SR13 [6] Process Mining of Disease Trajectories: A Literature Review Disease trajectory 2021 4 2019–2020 No Yes Yes Yes No No
SR14 [31] Opportunities and challenges for applying process mining in healthcare: a systematic mapping study General 2022 270 2002–2019 Yes Yes Yes No No Yes
SR15 [32] Process mining in healthcare–An updated perspective on the state of the art General 2022 263 2005–2021 Yes Yes No No Yes Yes
SR16 [33] A literature review on the analysis of symptom-based clinical pathways: Time for a different approach? Respiratory problems 2022 1 2012–2021 Yes No Yes Yes No No
SR17 [26] Process mining applications in the healthcare domain: a comprehensive review General 2022 172 2010–2021 Yes Yes Yes No Yes Yes
SR18 [34] Process mining and data mining applications in the domain of chronic diseases: A systematic review Chronic diseases 2023 13 2017–2021 No No Yes Yes No Yes

Source: The authors

1 It is a continuation of [11]

Results and discussion

The systematic reviews analyzed cover a wide range of topics related to the application of process mining in healthcare. A total of 18 systematic reviews were identified, of which ten are thematic: Yang and Su [25], Kurniati et al. [9], Kusuma et al. [8], Williams et al. [29], Farid et al. [5], Helm et al. [30], Grüger et al. [7], Kusuma et al. [6], Gunatilleke et al. [33], and Chen et al. [34]. The remaining eight reviews are general or panoramic: Partington et al. [27], Erdogan and Tarhan [10], Rojas et al. [11], Erdogan and Tarhan [28], Batista and Solanas [12], Dallagassa et al. [31], De Roock and Martin [32], and Guzzo et al. [26]. The panoramic reviews identified multiple objectives for the application of process mining, highlighting critical aspects for project development and the necessary components to ensure the effective delivery of results. In contrast, the thematic reviews applied process mining to specific areas of healthcare, such as cardiology, oncology, and clinical care pathways. These reviews analyzed the unique aspects of each field and how implementing process mining techniques can improve outcomes and efficiency in these specialties.

The following subsections address each of the outlined research questions, providing an in-depth analysis of the included studies. They highlight observed trends, identify existing gaps, and explore future opportunities for the application of process mining in healthcare. This approach allows for a detailed understanding of the advances and challenges in this field, offering valuable insights for the development and implementation of process mining techniques in various healthcare settings.

RQ1: how are process mining techniques applied in healthcare?

Batista and Solanas [12] identified that, among the 55 analyzed papers, the objectives of process mining in healthcare were divided as follows: comparison between different healthcare processes in 21 papers; exploring healthcare processes to understand workflow in 19 papers; validating the use of mining for healthcare in 17 papers; proposing methodologies for mining in 7 papers; discussing KPIs (key performance indicators) in 6 papers; conducting systematic reviews related to process mining in healthcare in 6 papers; defining reference models for healthcare data in 4 papers; and finally, three papers were dedicated to process visualization.

Erdogan and Tarhan [28] divided the 172 analyzed papers into six types of objectives in their systematic review: method, process variant analysis, empirical results, process, tool, outlier detection, predictive monitoring, metrics, and model. In total, 68 studies proposed new methods for process mining in healthcare, including new data visualization techniques, combining mining with other techniques, and new approaches to data quality assessment. In 53 studies, process mining was applied to process variant analysis. In 47 studies, process mining techniques, tools, or methodologies were applied to obtain empirical results. In 44 papers, steps were defined for analysis using mining techniques. Additionally, 17 papers proposed new tools implemented in healthcare; of these, 11 papers demonstrated the developed solution and its applicability using hospital data process logs. Outlier detection and predictive monitoring were addressed in 10 papers each. Finally, eight papers described new metrics for the application of mining, while four papers proposed new models for representing healthcare data or processes.

According to Dallagassa et al. [31], among the 270 analyzed studies, the main observed objectives were pre-processing information in 17 papers, predictive analysis for case, disease, and care management in 14 papers, and algorithm development in eight papers. It is worth noting that 11 systematic, narrative reviews and books on process mining were found, showing an evolution compared to the number found by Batista and Solanas [12].

The systematic review by Kusuma et al. [8] focused on the application of process mining in cardiology. Of the 32 analyzed studies, 7 proposed new approaches for extracting process models from event logs, 4 focused on predicting and recommending clinical pathways, 3 analyzed deviation detection in clinical pathways, and 1 developed a new methodology called Interactive Pattern Recognition (IPR).

In the work by Kurniati et al. [9], the application of process mining in cancer treatments was analyzed. Among the analyzed works, seven papers addressed issues with data heterogeneity, seven papers used declarative process mining and its improvements, the verification of common pathways was discussed in four papers, issues related to process characteristics and the quality of records were discussed in three papers, and finally, process anomaly analysis and exception verification were addressed by one paper each.

The systematic review by Erdogan and Tarhan [10] aimed to apply process mining to verify process compliance, selecting 11 specific studies from an initial total of 50 studies. Of the selected studies, six addressed pathway analysis methods, two conducted mining tool analysis, and two combined process mining with other techniques, while one paper each covered process mining technique comparison, mining methods, and similarity metrics.

De Roock and Martin [32] indicated that of the 263 analyzed papers, 156 focused on applying process mining algorithms, while 25 were dedicated to developing these algorithms. Additionally, in 23 studies, both the development and immediate application of the algorithms were carried out to evaluate their effectiveness. The remaining papers addressed a variety of objectives, including conceptual works and literature reviews.

Another relevant aspect observed in the study by De Roock and Martin [32] was the analysis of KPIs applied to process mining. These KPIs were divided into four categories: time, clinical, financial, and resources. Time-related KPIs, observed in 80 papers, include metrics such as length of stay, waiting time, and time between tasks. Clinical KPIs, analyzed in 50 papers, cover monitoring patient vital signs, assessing health parameters, and the quality of healthcare services provided. Financial KPIs, investigated in 9 papers, are used for patient cost billing and comparison among patients with similar characteristics. Lastly, resource-related KPIs, addressed in 8 papers, analyze aspects such as the availability of professionals and the use of care resources.

Table 4 highlights the three main objectives cited by the secondary reviews, showcasing the critical process mining initiatives in healthcare and relating them to the specific focus of each systematic review. The studies by Yang and Su [25], Partington et al. [27], Rojas et al. [11], Williams et al. [29], Farid et al. [5], Helm et al. [30], Grüger et al. [7], Kusuma et al. [6], Gunatilleke et al. [33], and Chen et al. [34] were excluded from this analysis (RQ1) as they either did not group the primary studies analyzed or did not statistically present the main identified objectives.

Table 4.

Main objectives of the secondary reviews

Secondary Reviews Objectives
(SR3) Erdogan and Tarhan [10] Pathway analysis methods.
Mining tool analysis.
Combination of process mining with other techniques.
(SR4) Kurniati et al. [9] Addressing data heterogeneity issues.
Declarative process mining.
Verification of common pathways.
(SR6) Erdogan and Tarhan [28] Proposition of new methods.
Process variation analysis.
Obtaining empirical results.
(SR7) Kusuma et al. [8] Proposition of new approaches for process model extraction.
Prediction or recommendation of clinical pathways.
Deviation detection analysis in clinical pathways.
(SR8) Batista and Solanas [12] Comparison between different healthcare processes.
Exploring healthcare processes.
Validating the use of mining in healthcare.
(SR14) Dallagassa et al. [31] Pre-processing of information.
Predictive analysis for case management.
Algorithm and methodology development.
(SR15) De Roock and Martin [32] Key performance indicator (KPI) analysis.
Application of process mining algorithms.
Algorithm development.
(SR17) Guzzo et al. [26] Process analysis.
Pre-processing.
Simulation.

Source: The authors

RQ2: what are the types and perspectives of process mining?

In the context of process mining, Van Der Aalst [15] identifies three main types of activities: discovery, conformance checking, and enhancement. Furthermore, he distinguishes four types of perspectives in process analysis: control flow, organizational, case, and time.

Considering the systematic reviews by De Roock and Martin [32], Batista and Solanas [12], Erdogan and Tarhan [28], Partington et al. [27], Guzzo et al. [26], and Dallagassa et al. [31], it was observed that process discovery was the type of activity in which process mining was most applied. Only in Rojas et al. [11] was conformance checking identified as the most relevant activity. For example, in De Roock and Martin’s [32] review, it was found that of the 263 papers analyzed, the majority used process mining for process discovery, totaling 213 papers (81% of the total), while conformance checking and enhancement accounted for 82 papers (31.2%) and 39 papers (14.8%), respectively. It is worth noting that many studies showed an intersection between the different types of activities.

In the thematic reviews by Kurniati et al. [9], Yang and Su [25], Kusuma et al. [6], and Grüger et al. [7], it was also observed that discovery was the primary type of activity. Only in the study by Helm et al. [30] was conformance checking the most commonly used type. For example, in the review by Grüger et al. [7], the most frequently used process mining type was process discovery, with 48 papers. Conformance checking was used in 13 papers, and process enhancement was mentioned in six papers.

According to Grüger et al. [7], this preference can be justified by the fact that other types of mining (conformance checking and enhancement) depend on a pre-existing process model. Thus, the most intuitive approach is to obtain the paths through process mining algorithms and apply conformance checking and enhancement techniques. This facilitates continuous analysis and improvement of processes based on the mined data.

We observed that, regarding perspectives, control flow was the most used in all reviews addressing this research question, both in systematic and thematic reviews. This preference, especially in healthcare contexts, can be attributed to the control flow’s ability to characterize medical treatment paths, focusing on patient care. This contrasts with other perspectives that emphasize organizational, temporal aspects, or attributes of the elements involved in the processes.

Table 5 presents the distribution of types and perspectives addressed in the secondary reviews. The reviews by Williams et al. [29], Farid et al. [5], Gunatilleke et al. [33], and Chen et al. [34] were excluded from the table as they did not specify these aspects in the analyzed studies. In the review by Erdogan and Tarhan [10], types were not observed based only on the 11 focused studies (out of 50), as the primary focus was conformance checking. The table adheres to the categorization provided by each author and includes types and perspectives beyond those strictly proposed by Van Der Aalst [15].

Table 5.

Types and perspectives identified

Types and perspectives
(SR1) [25] Discovery (19) Conformance (13) Enhancement (5)
(SR2) [27] Control flow (25) Discovery (23) Case (7) Conformance (6) Organizational (3) Enhancement (1)
(SR4) [9] Control flow (36) Discovery (35) Conformance (27) Performance (27) Enhancement (8) Organizational (5)
(SR5) [11] Control flow (45) Conformance (16) Time (10) Organizational (9)
(SR6) [28] Discovery (156) Conformance (53) Performance (22) Enhancement (21)
(SR7) [8] Control flow (31) Performance (15)
(SR8) [12] Discovery (33) Control flow (27) Performance (18) Time (14) Conformance (12) Organizational (7) Enhancement (3)
(SR11) [30] Control flow (30) Conformance (5) Organizational (2) Performance (1)
(SR12) [1, 7] Discovery (48) Conformance (13) Enhancement (6) Control flow (48%) Time (23%) Case (17%) Organizational (11%)
(SR13) [6] Discovery (3) Conformance (2)
(SR14) [31] Discovery (96) Conformance (47) Time (38) Organizational (24)
(SR15) [32] Discovery (213) Control flow (179) Conformance (82) Time (71) Enhancement (39) Case (27) Organizational (24)
(SR17) [26] Discovery (72) Conformance (23)

Source: The authors

1 In the study (SR12) [7], the results related to control flow, time, case, and organizational aspects were presented as percentages

RQ3: what are the main algorithms, supporting techniques, tools, and methodologies used for process mining?

In process mining applied to healthcare, combining algorithms, support techniques, tools, and methodologies is essential for extracting and analyzing data, optimizing workflows, and improving clinical decisions. The main components used for these purposes are presented below.

Algorithms

In process mining, algorithms are essential, as they are responsible for processing event log files and producing graphs for discovery, conformance checking, and process improvement activities.

In systematic reviews (especially in systematic mapping reviews), the main algorithms identified were: Alpha Miner, Heuristics Miner, Fuzzy Miner, and Inductive Miner. According to Dallagassa et al. [31], the preference for Fuzzy Miner is due to its ease in discovering healthcare models, while the preference for Heuristics and Alpha is because both are supported by the traditional ProM tool, which will be discussed later in this section. It is worth noting that process mining algorithms have evolved, acquiring new features to handle noise and infrequent behavior and to improve the visualization of generated graphs, avoiding the “Spaghetti Effect”. This evolution is evident in systematic reviews, as it has been observed that Fuzzy Miner and Inductive Miner have gained increasing popularity. Fuzzy Miner was the most used algorithm according to the studies by Dallagassa et al. [31] and Helm et al. [30], and Inductive Miner and its variations were highly relevant in the review by Guzzo et al. [26], while Helm et al. [30] reported that Inductive Miner was gaining more traction, despite not directly citing how many papers used this algorithm.

Table 6 highlights the most frequently used algorithms in the analyzed systematic reviews, with a primary focus on process discovery activities.

Table 6.

Distribution of the main algorithms used

Paper Heuristics Miner Fuzzy Miner Alpha Miner Inductive Miner Trace Clustering
(SR5) [11] 19 (25,7%) 15 (20,3%) 3 (4,1%) 1 (1,4%) 8 (10,8%)
(SR6) [28] 39 (22,7%) 28 (16,3%) 12 (7%) 4 (2,3%) 7 (4,1%)
(SR8) [12] 14 (25,5%) 8 (14,5%) 8 (14,5%) 2 (3,6%) 1 (1,8%)
(SR11) [30] 2 (5,3%) 11 (28,9%) N/A N/A 5 (13,2%)
(SR13) [6] 1 (25%) 1 (25%) N/A N/A N/A
(SR14) [31] 39 (14,4%) 49 (18,1%) 12 (4,4%) 10 (3,7%) N/A
(SR15) [32] 6 (2,3%) 3 (1,1%) N/A 3 (1,1%) 5 (1,9%)
(SR17) [26] 36 (20,9%) 33 (19,2%) 8 (4,7%) 22 (12,8%) N/A
(SR18) [34] N/A N/A N/A 3 (27,3%) N/A

Source: The authors

In the analyzed studies, in addition to the previously mentioned algorithms (Alpha Miner, Heuristics Miner, Fuzzy Miner, and Inductive Miner), other algorithms were identified. For example, the systematic review by Kusuma et al. [6] also reported the use of software developed by private companies and proprietary meta-heuristic optimization algorithms. Chen et al. [34] found that, in the analysis of chronic diseases, process mining is often combined with clustering techniques, Markov models, and traditional statistical methods, such as Fisher’s exact test and Cox regression.

In the study by Guzzo et al. [26], 13 distinct algorithms were identified among the 172 studies analyzed. Among them, the PALIA algorithm [35] was used in 14 studies. PALIA is an activity-based mining algorithm where the graph is constructed considering the parallelism between processes, resulting in a Timed Parallel Automaton (TPA). This TPA undergoes combinations and elimination of duplicate and unused components, resulting in an adjusted TPA. PALIA is part of the PALIA suite of solutions, which will be detailed later in this section.

In the analyzed systematic reviews, mining algorithms are associated mainly with discovery activities, but the use of techniques for process conformance checking is also observed. In the study by Erdogan and Tarhan [28], the most relevant conformance techniques include: LTL Checker (7 papers), log replay with Petri nets (5 papers), and Conformance Checker (4 papers), with 16 papers proposing new techniques. In the review by Guzzo et al. [26], log replay with Petri nets was used in 13 papers, while LTL-based techniques (Linear Temporal Logic) and trace alignment were employed in seven papers each.

Regarding process improvement, only the systematic review by Erdogan and Tarhan [28] mentioned the use of specific techniques for this type of mining. It was identified that simulation is the most widely used technique, present in five papers, followed by Declare Repair in three papers and performance analysis with Petri nets in two papers. Additionally, three papers proposed new techniques for process improvement, highlighting the innovation and continuous improvement in this field.

Support techniques for process mining in healthcare

In the selected studies, the authors tried identifying techniques that could be integrated into process mining. The main tasks identified include preprocessing, standardization, clustering, filtering, and anonymization. These complementary tasks help prepare and refine the data, enabling more accurate and effective analysis of processes while addressing issues such as data quality and privacy concerns when using process mining in healthcare contexts.

Preprocessing and standardization

Data preprocessing is a fundamental step in process mining in healthcare due to the data’s diverse origins and frequent incompleteness. In the study by Batista and Solanas [12], it was observed that seven papers applied preprocessing, with some also focusing on the integration between management systems and the standardization of information. Partington et al. [27] found that, among 28 studies, 15 performed preprocessing before mining, highlighting the importance of this step for data quality.

According to Guzzo et al. [26], preprocessing was divided into two main purposes: the extraction and preparation of the event log, which involves selecting relevant attributes (seven papers) and improving the quality of the event log (eight papers). The latter includes techniques to remove inconsistent data, correct missing data, and handle outliers, ensuring that the logs are more accurate and reliable for process mining.

Clustering and filtering

Clustering can reduce the complexity of models and prevent the so-called “Spaghetti Processes”, where the resulting graph is correct but overly complex and challenging for stakeholders to interpret. It is important to note that some authors consider clustering an integral part of preprocessing. At the same time, other systematic reviews dedicate separate sections to this task, treating it as a process mining algorithm. This demonstrates different approaches to how clustering is applied in process mining.

Batista and Solanas [12] observed that five papers included clustering activities. In contrast, three others performed initial filtering for events with specific activities to reduce model complexity. Erdogan and Tarhan [28] identified 16 clustering algorithms in 21 papers, with Trace Clustering (seven papers), k-means (five papers), and Hierarchical Clustering (four papers) standing out. Rojas et al. [11] observed Trace Clustering in eight papers. Guzzo et al. [26] identified 11 papers that applied clustering with mining algorithms, including Topic Modeling [36] (five papers) and Latent Dirichlet Allocation (LDA) [37] in one paper, both unsupervised learning algorithms applied for clustering medical records.

A relevant aspect to consider is the critical nature of two potentially conflicting factors in data processing for process mining in healthcare. On the one hand, clustering and filtering aim to make the data more concise and comprehensible, simplifying the analysis and facilitating the identification of general patterns by mining algorithms. On the other hand, preserving infrequent paths can be essential to uncover valuable insights, such as identifying rare diseases, new diagnostic pathways, or therapeutic alternatives. Thus, developers are responsible for ensuring an appropriate balance when applying these techniques.

Anonymization

Responsibility and data protection have become increasingly relevant with the emergence of specific legislation to guide the accountability of individuals and organizations. Examples include the GDPR (General Data Protection Regulation) in the European Union, the LGPD (General Data Protection Law) in Brazil, and HIPAA (Health Insurance Portability and Accountability Act) in the United States, which specifically regulates the protection of health data. These laws establish clear guidelines for the handling and protection of personal data, reflecting the growing importance of privacy and information security.

Despite the relevance of this issue, especially in a sensitive area like healthcare, only two systematic reviews addressed data protection and anonymization. In the study by Batista and Solanas [12], health data anonymization (or pseudo-anonymization) was highlighted, with eight out of 55 analyzed papers applying process mining for this purpose. In the review by Guzzo et al. [26], only one out of 172 examined studies addressed the issue, highlighting a significant gap in the literature on data protection in healthcare.

Tools and methodologies

Tools

Several tools were identified in the analyzed systematic reviews, many of which are not exclusively focused on process mining, such as Gephi, MATLAB, and C. The diversity of tools is essential to cover all stages of process mining, from data collection and preparation to analysis and visualization of results. The most notable tools were ProM, Disco, PMApp, and the R language, emphasizing the pMineR and bupaR packages, which facilitate various activities in healthcare process analysis.

ProM is an open-source and extensible tool widely used in process mining activities due to its support for various plugins [38]. Batista and Solanas [12] observed that, out of the 55 analyzed papers, 20 used ProM. Rojas et al. [11] reported that 31 of the 74 papers used the tool. Kusuma et al. [8] observed its use in 7 of the 32 papers, being combined with WEKA or RapidMiner in some cases. Kurniati et al. [9] identified the use of ProM in 26 of the 37 papers. In the review by Helm et al. [30], 18 of the 38 studies used the tool. Finally, Guzzo et al. [26] reported that 75 of the 172 papers used ProM, highlighting its popularity and relevance.

Another relevant tool identified in the systematic reviews is Disco [18], developed by Fluxicon and known for its user-friendly interface designed for corporate use. Its use was highlighted in several studies: Grüger et al. [7] identified six papers, Rojas et al. [11] found eight, Helm et al. [30] reported 11, and Guzzo et al. [26] mentioned 35 papers that used Disco. The tool was also cited in studies by Batista and Solanas [12], Kusuma et al. [8], and Chen et al. [34].

Figure 2 shows the distribution of the use of ProM and Disco tools based on the number of primary studies mentioned in the analyzed secondary reviews. This illustration highlights the popularity of both tools in the process mining field, allowing visualization of how they are adopted in different studies and contexts addressed by the reviews.

Fig. 2.

Fig. 2

Tool usage distribution. Source: The authors

In addition to the ProM and Disco tools, programming languages such as R and Python stand out in the application of process mining. In the reviews by Chen et al. [34], Grüger et al. [7], and Guzzo et al. [26], 9, 7, and 7 papers, respectively, were identified that used these languages, with particular emphasis on the pMineR and bupaR packages, which facilitate the adoption of process mining in R. Additionally, Guzzo et al. [26] reported the use of Python in 3 studies, primarily with the pm4py library [39].

In the healthcare context, the tools from the PALIA suite, such as PALIA-ER and the PALIA ILS Web Tool, are noteworthy. PALIA-ER [40] is focused on process mining analysis based on emergency room scenarios and was cited in two studies by Chen et al. [34] and Helm et al. [30]. The PALIA ILS Web Tool applies process mining to data from indoor location systems via sensing [41]. In Guzzo et al. [26], the PALIA suite was mentioned in six papers without distinguishing between the tools.

Methodologies

Based on the reviews conducted in this study, some methodologies were mentioned, such as L*, PM2, and PDM. On the other hand, in the review by Guzzo et al. [26], it was observed that projects may follow only the basic activity flow, such as data collection, event log preparation, process discovery, and process analysis (CEDA), without necessarily relying on existing methodologies.

The L* Life Cycle Model, proposed by Van Der Aalst [15], divides the process mining project into five stages: (1) mining project planning, (2) information extraction from management systems to create log files, (3) establishing the control-flow model, linking it to log records, (4) creating the integrated process model, where different perspectives (organizational, case, and time) are combined to gain insights, and (5) operational support, where the outputs are linked to activities such as detection, prediction, and recommendation. For example, process improvement recommendations can be emailed to professionals directly involved in the activities.

Another relevant process mining methodology is PM2 [42], which aims to support projects using process mining by promoting iterative analysis and the adoption of evolving insights. The methodology is divided into six stages: planning, extraction, data processing, mining, analysis of the obtained data, and finally, evaluation. Van Der Aalst [15] considers this methodology an L* Life Cycle refinement.

The Process Diagnostic Method (PDM) [43] aims to provide a methodology for diagnosing processes based on process mining, quickly offering an overview of processes without needing prior and specific domain knowledge. The methodology is divided into six stages: log preparation, log inspection, control-flow analysis, performance analysis, role analysis, and result transfer. In the healthcare context, Rebuge and Ferreira [44] extended this methodology to provide healthcare organizations with tools to perform business process analyses using process mining, which they applied in a public hospital in Portugal.

In the systematic reviews by Rojas et al. [11] and Guzzo et al. [26], which comprehensively analyzed process mining, it was observed that the L* Life Cycle Model was used in three papers in the review by Guzzo et al. [26] and two papers in the review by Rojas et al. [11]. In the systematic reviews focused on Oncology, L* was used in 4 papers by Grüger et al. [7] and in one paper in the review by Kurniati et al. [9]. The PM2 methodology was identified in 9 papers in the review by Guzzo et al. [26]. On the other hand, the PDM methodology, specifically adapted for the healthcare context, was not identified in any of the analyzed studies. Thus, it can be deduced that many data mining projects followed only the basic stages: data collection, event log preparation, process discovery, and process analysis. Guzzo et al. [26] identified 99 studies using this basic approach, while Grüger et al. [7] cited 29 papers as not using any known methodology.

RQ4: in which medical specialties are process mining used?

The reviewed studies highlight the application of process mining across several medical specialties, with Oncology emerging as the most frequently addressed area, followed by Cardiology and emergency care. Reviews such as those by Batista and Solanas [12], Rojas et al. [11], and Erdogan and Tarhan [28] identified Oncology as the specialty with the highest number of publications. In contrast, other areas, such as Surgery and Cardiology, also received significant attention. Thematic studies, such as those by Kusuma et al. [8], Helm et al. [30], and Chen et al. [34], explored specific topics within these specialties, including cardiovascular diseases, term standardization, and chronic diseases. In reviews on Oncology, such as those by Kurniati et al. [9] and Grüger et al. [7], gynecological cancer emerged as the main topic of interest. Finally, Gunatilleke et al. [33] explored the application of process mining in the study of symptoms such as shortness of breath, highlighting the wide range of applications of this technique in healthcare.

According to Erdogan and Tarhan [28], a significant interest in applying process mining in Oncology is likely since this area has well-defined stages and requires adherence to strict medical protocols. Furthermore, according to Kurniati et al. [9], understanding the best care pathways can help identify opportunities for improving the quality of treatments for cancer patients. Cancer is a complex and multifaceted disease that presents substantial challenges in prevention, early detection, treatment, and management [34].

Table 7 presents the distribution of medical specialties addressed in secondary reviews. Only specialties and subspecialties were included, excluding diseases that can be treated by various specialties or aspects related to assistive use. The reviews by Yang and Su [25], Kusuma et al. [6], Partington et al. [27], Dallagassa et al. [31], and De Roock and Martin [32] were not included in Table 7, as they did not specify the specialties in which process mining was applied, focusing only on aspects related to diseases or assistive use.

Table 7.

Division of specialties among selected papers

Specialty (SR3) [10] (SR4) [9] (SR5) [11] (SR6) [28] (SR7) [8] (SR8) [12] (SR9) [29] (SR10) [5] (SR11) [30] (SR12) [7] (SR16) [33] (SR17) [26] (SR18) [34]
Oncology X X X X X X X X X X
Cardiology X X X X X X X X
Emergency X X X X X X
Dentistry X X X X X X
Urology X X X X X X
Surgery X X X X X
Radiology X X X X
Endocrinology X X X
Pediatrics X X X
Pulmonology X X X
Anesthesiology X X
Neurology X X
Ophthalmology X X
ICU (Intensive Care Unit) X X
General Practice X
Nursing X
Gastroenterology X
Gynecology X
Family Physician X

Source: The author

RQ5: what types of medical processes is process mining applied to?

As described by Lenz and Reichert [45], healthcare processes can be divided into medical treatment processes and organizational processes. Medical treatment processes consist of direct medical care and attention to the patient, such as elective consultations, test collection, and performing procedures. Meanwhile, organizational processes focus on administrative and operational activities that support the functioning of institutions.

This study found that medical treatment processes stood out significantly with a higher number of published papers, even considering the multidisciplinary nature of healthcare, where understanding organizational structures is also relevant to conducting activities. In the systematic mapping reviews, only in the review by Batista and Solanas [12] were organizational processes more prevalent (18 organizational, 15 medical treatment, and eight both). In all other reviews, medical treatment processes stood out. For example, in the review by De Roock and Martin [32], when considering the papers that use process mining exclusively for a specific purpose (medical or organizational), it was observed that 63.9% of the works were applied to medical treatment. In contrast, only 7.6% were applied exclusively to organizational activities. In thematic reviews, this difference is also significant; for example, in Kusuma et al. [8], 75% of the papers applied process mining to medical treatment, while in Grüger et al. [7], 89.1% of the papers were directed at medical treatment processes.

In Guzzo et al. [26], the concepts related to medical and organizational processes were expanded, resulting in new subdivisions. Medical processes were classified into clinical pathways, patient behavior, and medical procedures, while organizational processes were divided into staff interactions, human movements, and staff tasks. It is worth noting that patient pathways can be subdivided into both types (medical and organizational) simultaneously. Considering the papers analyzed by the study, it was observed that patient pathways were addressed by 70 papers, where this type of activity includes the sequence of stages a patient goes through within the healthcare facility, thus classifying it as both medical and organizational at the same time. A total of 61 papers related to healthcare processes were observed, divided into clinical pathways (49 papers), patient behavior (seven papers), and medical procedures (five papers). Finally, 21 papers referring to organizational processes were observed, divided into human movements (nine papers), staff tasks (seven papers), and staff interactions (five papers).

In the study on process mining applied to frail elderly patients conducted by Farid et al. [5], it was found that the type of process to be analyzed (medical treatment or organizational) depends on the data available to professionals. Among the eight studies analyzed, one used electronic health records to investigate clinical treatment, while another addressed the organizational process by considering collaboration between different types of professionals. The remaining six papers analyzed the daily behavior of elderly individuals using data collected by smart environments, scenario simulations, or requests for mobile services.

Table 8 presents the distribution of medical process types (clinical and administrative) identified in secondary reviews. The reviews by Yang and Su [25], Partington et al. [27], Rojas et al. [11], Williams et al. [29], Farid et al. [5], Helm et al. [30], Kusuma et al. [6], Gunatilleke et al. [33], and Chen et al. [34] were not included in Table 5 as they did not specify the types of medical processes based on the primary studies analyzed.

Table 8.

Distribution among the types of medical processes

Paper Clinical Administrative Both Undefined Total
(SR7) [8] 24 (75%) 7 (21,9%) 1 (3,1%) 0 (0%) 32
(SR8) [12] 15 (27,3%) 18 (32,7%) 8 (25,5%) 14 (14,5%) 55
(SR10) [5] 1 (12,5%) 1 (12,5%) 0 (0%) 6 (75%) 8
(SR12) [7] 49 (89,1%) 3 (5,5%) 3 (5,5%) 0 (0%) 55
(SR15) [32] 168 (63,9%) 20 (7,6%) 40 (15,2%) 35 (13,3%) 263
(SR17) [26] 61 (35,5%) 21 (12,2%) 70 (40,7%) 20 (11,6%) 172

Source: The authors

RQ6: what are the limitations, challenges, and future directions of process mining applied to healthcare?

Below, we present the main limitations, challenges, and future directions in applying process mining in healthcare. This analysis is organized into four subsections, addressing data limitations, technical limitations, team-related limitations, and difficulties in process visualization to provide a comprehensive overview of the obstacles faced and opportunities for future advancements.

Data limitations

Establishing log files for process mining in healthcare is challenging, as it involves capturing clinical and health information from multiple data sources, which may contain noise or need to be completed, compromising the quality of process mining [12]. Additionally, measures need to be adopted to avoid overloading storage and processing systems without compromising the extraction of valuable insights, given the large volume of data generated in healthcare, which represents a significant concern [31].

In the study conducted by Erdogan and Tarhan [28], data handling for process mining was highlighted as a critical point, as 47 out of the 172 studies analyzed focused on the different aspects of data handling, such as extraction, removal of noisy data, preprocessing, among others. The study also identified that dealing with large volumes of complex healthcare data can present performance efficiency challenges. To address these challenges, three authors suggested using plugins, while five authors proposed new techniques or process mining tools to tackle performance issues or summarize healthcare data.

In the study by Kusuma et al. [8], data limitations were identified in 13 papers, with the main problems being the lack of timestamp information and data quality issues. The review focused on process mining applied to cardiology and identified opportunities for data improvement in 12 different studies, which suggested enhancements related to patients’ health parameters, analysis of cardiovascular disease diagnoses, and the inclusion of data from other medical departments.

In the study by Kurniati et al. [9], data limitations were identified in 10 papers, with the main issues related to data access, quality, missing attributes, and inadequate levels of detail. Of the analyzed works, nine proposed improvements in the information’s quality, dimensionality, and complexity. In the systematic review by Grüger et al. [7], proposals for data quality indicators were also observed in 3 papers, highlighting the importance of this aspect for the application of supervised learning techniques.

In the study by Farid et al. [5], which focused on process mining applied to frail elderly patients, much of the data was obtained through sensor nodes in smart environments. A significant limitation identified was that these data could be incomplete or inconsistent. We suggested applying preprocessing techniques to the extracted data to address this issue.

In the systematic review by Helm et al. [30], 38 studies were analyzed to establish a standard approach to improve the quality of event logs. Based on the reviewed studies, clinical terms and aspects were identified and described in three categories: the location where healthcare professionals and patients meet to provide medical services (encounter environment), the clinical specialty, and the medical diagnosis performed. The information was then correlated with standard clinical descriptors and codes found in SNOMED CT and ICD-10, aiming to improve accuracy and comparability between future case studies in healthcare process mining.

Technical limitations

The application of process mining algorithms faces significant technical challenges, such as integration with legacy systems, the need for a robust IT infrastructure, ensuring scalability, and the constant optimization of solution performance in real-world environments. Additionally, the algorithms must handle the high variability of healthcare processes and dynamically recognize patterns [31].

In the review by Kusuma et al. [8], technical limitations were identified in 13 papers, including limitations in processing power, high memory consumption, and bias associated with mining algorithms. Nineteen studies were observed proposing improvements to mitigate these limitations, such as using different algorithms, improving patient classification, checking compliance with clinical guidelines, and developing prototypes for validation.

In the study by Kurniati et al. [9], technical limitations were identified in 13 papers, especially in studies that implemented functionalities available in other tools. On the other hand, 20 papers proposed technical improvements, such as implementing clustering to visualize process variations, optimizing the techniques’ performance, and developing new mining techniques to obtain high-level, comprehensible information.

In Farid et al. [5], technical limitations were related to obtaining information from sensor nodes, with data quality issues related to granularity identified in six of the eight papers analyzed.

Finally, an interesting technical aspect addressed by Erdogan and Tarhan [28] was the diversity of process mining techniques and software providers. The mapping identified 68 techniques and 17 newly proposed tools, specifically in healthcare. While this variety is positive, it may hinder implementation for potential users. The authors pointed out a gap that could be filled by studies that evaluate and compare the different available tools and solutions.

Team limitations

In healthcare, collaboration with specialist professionals is essential during process mining activities. These professionals bring expert knowledge of clinical workflows, medical terminology, and patient care specifics, which is crucial for defining the dataset and correctly interpreting the generated insights.

The study by De Roock and Martin [32] thoroughly analyzed the involvement of healthcare professionals in the different stages of process mining. The validation of the insights obtained by the mining algorithms was the activity with the highest participation from professionals, observed in 61 of the 263 papers analyzed. Next, interactive analysis, presented in 38 papers, involved iterative reviews during the result analysis process. Professionals also played a significant role in the data phase, with 35 papers reporting participation in data extraction and 30 in data preparation. Additionally, in 33 papers, professionals helped identify issues during mining.

In the study by Kusuma et al. [8], it was observed that in 14 studies, it was necessary to involve medical experts to assist in the analysis or verification of results, with a specific recommendation to include cardiology specialists, the focus of the review. Finally, the work by Farid et al. [5] identified one study in which the assistance of a medical specialist was necessary to address the challenges of process mining.

Difficulties in process visualization

The visualization of processes generated by process mining algorithms in healthcare faces significant challenges due to the complex and variable nature of medical treatments, where different therapies may be adopted to treat patients with the same disease [12]. Additionally, the visualizations must be accessible and useful for healthcare professionals with varying familiarity with data tools. Rojas et al. [11] highlight that one of the limitations of current process mining tools is the lack of adequate visualization of process models and the results obtained, especially in complex and unstructured processes, which are common in healthcare.

In the study by Erdogan and Tarhan [28], several works were identified that proposed techniques to improve data visualization, such as the establishment of performance summaries in 2 papers, filtering of patient groups, and comparison between different populations in 4 papers each. Additionally, three papers proposed new modeling notations.

The study by Dallagassa et al. [31] observed that approaches related to patient trajectories were relatively limited, lacking process models that represented the patient’s complete journey. This view is corroborated by Chen et al. [34], who observed that many studies prioritized the development of technical methods over final medical outcomes, resulting in limited clinical interpretation.

Limitations and threats to validity

The main identified threats that could compromise the validity of the results were classified as internal, construct, and external.

Internal threats

  • Methodological quality of the secondary reviews: In this study, we did not establish exclusion criteria based on methodological quality, such as the application of the DARE (Database of Abstracts of Reviews of Effects) criteria3, which are commonly used in tertiary reviews [46] and [47]. However, we thoroughly analyzed the papers in the final version, ensuring that the selected reviews could address the research questions.

  • Study selection: To ensure that we did not inadvertently exclude secondary studies during the search, we opted to use a broad search string that was not strictly limited to process mining terms, such as “process discovery”, “conformance checking”, or “process enhancement”. This approach resulted in more retrieved studies, increasing the analysis volume while ensuring the inclusion of relevant studies. The snowballing process was also adopted, allowing for identifying relevant studies during the reading that were not initially captured in the initial search.

Construct threats

  • Dependence on aggregated data: The tertiary review aggregates and synthesizes the results of secondary reviews, making the analysis susceptible to the interpretations and potential biases of the authors of those reviews. This approach can result in the loss of detailed information from the primary studies, making it difficult to identify important nuances. Various secondary reviews addressing different perspectives were included to mitigate this dependence, balancing biases and enriching the quality of the collected information.

  • Variability in definitions and terminology: The lack of standardization in the definitions and terminology used in the secondary reviews can introduce inconsistencies and challenges in interpreting the results. To mitigate this threat, the authors defined key terms and concepts at the beginning of the study and adopted standardized terminology, enabling the reproducibility of the studies.

External threats

  • Heterogeneity among studies: The studies included in this review address different purposes and aspects of process mining, being classified into broad and thematic reviews. This categorization was considered during the analysis, as each approach could directly influence the answers to the research questions. Broad reviews widely explore the technical and theoretical aspects of process mining, offering a comprehensive and detailed view. In contrast, thematic reviews focus on the main aspects of process mining applied to specific areas such as oncology, cardiology, and care for frail elderly patients.

  • Temporal synchrony: A variation in the data collection period was observed in this study, which may negatively impact the results. The 18 analyzed systematic reviews were published between 2014 and 2023, with research protocols covering 2002 to 2021. The average coverage period was 10 to 11 years, with a higher concentration of data between 2011 and 2015. Considering the temporal context of the included studies was essential to mitigate the impact of temporal synchrony when interpreting the results, highlighting the technological evolutions and methodological changes that occurred over time. This contributed to a more balanced and relevant analysis, considering the potential influences of different periods on the results obtained.

Conclusions

Business processes in healthcare are complex, dynamic, and multidisciplinary, involving a wide range of professionals, different healthcare structures, and various levels of care. Each medical specialty has its own complexities, and each treatment or patient may require unique clinical pathways. Process mining emerges as a valuable tool for understanding process models, assessing compliance with clinical guidelines and protocols, identifying opportunities for improvement, and supporting decision-making. This tertiary review was conducted with the aim of deepening the understanding of the application of process mining in healthcare.

The review mapped the leading practices, classifying the types and perspectives of process mining and addressing algorithms, support techniques, tools, and methodologies. It also investigated the medical specialties and the most common process types in these approaches. Limitations, challenges, and future directions were assessed, providing valuable insights for researchers and healthcare professionals. The 18 systematic reviews analyzed, divided between 10 thematic reviews (focusing on areas such as cardiology and oncology) and eight broad reviews (focusing on general project aspects), revealed that process mining is often applied to improve the quality and efficiency of clinical processes, as well as to promote the development of new techniques and continuous improvements.

Process discovery is the most common activity in the field and is essential for other activities, such as conformance checking and enhancement, which rely on an initial model obtained in this phase. Among the perspectives of process mining, control flow was the most used, aiming for a precise characterization of the possible paths in healthcare processes. Medical treatment processes were widely studied, surpassing organizational processes, which remain relevant despite being less frequent due to the multidisciplinary nature of healthcare.

Oncology stood out as the most studied medical specialty, possibly due to the complexity of cancer and the need for strict protocols. The most used algorithms were Alpha Miner, Heuristics Miner, Fuzzy Miner, and Inductive Miner, each offering specific features to address challenges in the field, such as noisy data and the need for better visualizations of the obtained models. Fuzzy Miner, for example, was valued for its ease in discovering healthcare models, while Heuristics Miner and Alpha Miner are well supported by the ProM tool, which is the most widely used in the area.

The capture of data from multiple sources, often noisy or incomplete, along with the occurrence of infrequent behaviors, presents significant challenges for process mining. These challenges drive the evolution of algorithms, among which the growing interest in Fuzzy Miner and Inductive Miner stands out, as observed in the systematic reviews analyzed. Despite advancements, adherence to established process mining methodologies remains low, with few studies utilizing well-known methodological approaches. In addition to the main algorithms, other complementary techniques are applied. However, specific standards for pre-processing and standardizing event log data are still lacking, requiring designers to develop their solutions.

These challenges not only highlight the importance of algorithmic advancements — such as the growing interest in Fuzzy Miner and Inductive Miner — but also reinforce the need for well-defined methodologies to ensure consistency and reproducibility in process mining applications. The studies analyzed indicate that, despite technological progress, adherence to standardized approaches remains low, which suggests an opportunity for future work to bridge this gap by developing more structured frameworks.

The participation of healthcare professionals is essential due to their deep knowledge of clinical workflows and medical terminology. However, differences in familiarity with data tools and the lack of clear visualizations of process models can make it difficult to interpret the results, reinforcing the importance of developing accessible and comprehensible data presentations.

Thus, new research directions in process mining could explore adopting complementary techniques, such as developing solutions that enable visual analysis of processes. One example would be the creation of decision-making dashboards specifically tailored for process mining in healthcare. Additionally, mechanisms that integrate the outputs of process mining algorithms with large language models (LLMs) could enhance the exploration and interpretation of healthcare information.

Furthermore, establishing a comprehensive methodology encompassing the definition of a conceptual framework for healthcare treatments and the collection, processing, and integration of generated data could facilitate natural language-based consultations, enabling healthcare professionals to access patient information more efficiently.

In future work, we suggest conducting a detailed analysis of the professional and academic profiles of the authors of the included reviews, aiming to understand better their areas of expertise and how these perspectives might influence the direction of the research. Specifically, we propose categorizing the involvement of healthcare professionals in the reviewed studies (e.g., as co-authors, validators, or data providers) and analyzing how their participation affects the focus of the research. For instance, healthcare professionals may direct investigations toward clinical applications, such as specific medical specialties or disease studies. In contrast, computer science professionals might prioritize technical aspects, such as algorithm performance and computational efficiency. By establishing these relationships, we aim to identify whether a balanced collaboration between different expertise areas leads to more integrative and broadly applicable findings. Additionally, we propose a more detailed investigation of the data sources used in the analyzed studies. We assess their quality, diversity, and representativeness in clinical and organizational contexts to identify opportunities to enhance process mining applications.

In summary, this tertiary review provides a structured and comprehensive overview of process mining applications in healthcare, identifying key trends, challenges, and gaps in the field. While acknowledging the inherent limitations of tertiary reviews, we emphasize the importance of integrating healthcare professionals, refining methodologies, and incorporating AI-driven techniques to enhance process mining applications. By addressing these aspects, future research can contribute to more effective, transparent, and data-driven decision-making in healthcare.

Paper contributions

This paper significantly contributes to process mining in healthcare by consolidating and analyzing the state of the art of systematic reviews in the area. It provides a comprehensive overview of practices, challenges, and advancements in process mining, highlighting the importance of specific techniques for clinical and organizational contexts. By exploring the temporal synchrony of studies, the heterogeneity among approaches, and the reliance on aggregated data, the paper identifies limitations and suggests strategies to mitigate them.

Moreover, the paper emphasizes the need for greater involvement of healthcare professionals in validating and interpreting process mining results, promoting more effective use of these technologies to improve care quality. The practical recommendations and mapping of future directions provide a solid foundation for researchers and healthcare professionals seeking to apply process mining more effectively and innovatively, contributing to the continuous evolution of healthcare practices.

Notably, as is inherent to tertiary reviews, this work is limited to synthesizing previously published information, which may restrict the analysis to already studied issues and leave gaps regarding emerging or insufficiently explored topics. Nevertheless, these limitations do not diminish its relevance but highlight the importance of primary and secondary investigations to complement and expand the presented findings.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (36.1KB, docx)

Acknowledgements

Not applicable.

Authors contributions

The authors of this article (AS, GCLL, and RB) contributed to conducting this tertiary review, considering aspects related to the conception, planning, acquisition, analysis, and interpretation of the results. Additionally, all authors participated in the drafting and critical review of the manuscript, approved the final submitted version, and agreed to take responsibility for all aspects of the work.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data availability

No datasets were generated or analysed during the current study.

Declarations

Ethics approval and consent to participate

Not applicable. This study did not involve human participants, animals, or sensitive data requiring ethical approval.

Consent for publication

Not applicable. No individual personal data is included in this manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Gislaine Camila Lapasini Leal and Renato Balancieri contributed equally to this work.

References

  • 1.Constitution of the World Health Organization. https://www.who.int/about/governance/constitution. 6 Jul 2024.
  • 2.OECD. Estudos da OCDE Sobre Os Sistemas de Saúde: brasil 2021. 2021. p. 221. 10.1787/f2b7ee85-pt.
  • 3.Van Der Aalst W. Process mining: overview and opportunities. ACM Trans Manage Inf Syst (TMIS). 2012;3(2):1–17. [Google Scholar]
  • 4.Santos Garcia C, Meincheim A, Junior ERF, Dallagassa MR, Sato DMV, Carvalho DR, Santos EAP, Scalabrin EE. Process mining techniques and applications–a systematic mapping study. Expert Syst Appl. 2019;133:260–95. [Google Scholar]
  • 5.Farid NF, De Kamps M, Johnson OA. Process mining in frail elderly care: a literature review. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies-Volume 5: HEALTHINF. SciTePress, Science and Technology Publications; 2019. vol. 5, pp. 332–39.
  • 6.Kusuma G, Kurniati A, McInerney CD, Hall M, Gale CP, Johnson O. Process mining of disease trajectories in mimic-iii: a case study. In: International Conference on Process Mining. Springer; 2020. pp. 305–16.
  • 7.Grüger J, Bergmann R, Kazik Y, Kuhn M. Process mining for case acquisition in oncology: a systematic literature review. LWDA. 2020;2738:162–73.
  • 8.Kusuma GP, Hall M, Gale CP, Johnson OA. Process mining in cardiology: a literature review. Int J Biosci Biochem Bioinf. 2018;8(4):226–36. [Google Scholar]
  • 9.Kurniati AP, Johnson O, Hogg D, Hall G. Process mining in oncology: a literature review. In: 2016 6th International Conference on Information Communication and Management (ICICM). IEEE; 2016. pp. 291–97.
  • 10.Erdoğan T, Tarhan A. Process mining for healthcare process analytics. In: 2016 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (IWSM-MENSURA). IEEE; 2016. p. 125–30.
  • 11.Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D. Process mining in healthcare: a literature review. J Biomed Inf. 2016;61:224–36. [DOI] [PubMed] [Google Scholar]
  • 12.Batista E, Solanas A. Process mining in healthcare: a systematic review. In: 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE; 2018. p. 1–6.
  • 13.Ghasemi M, Amyot D. Process mining in healthcare: a systematised literature review. Int J Electron Healthcare. 2016;9(1):60–88. [Google Scholar]
  • 14.Tozzi AE, Fabozzi F, Eckley M, Croci I, Dell’Anna VA, Colantonio E, Mastronuzzi A. Gaps and opportunities of artificial intelligence applications for pediatric oncology in european research: a systematic review of reviews and a bibliometric analysis. Front Oncol. 2022;12:905770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Van Der Aalst W, Aalst W. Data Science in Action. Berlin, Heidelberg: Springer; 2016. [Google Scholar]
  • 16.Mans RS, Aalst WM, Vanwersch RJ. Process Mining in Healthcare: evaluating and Exploiting Operational Healthcare Processes. Heidelberg: Springer; 2015. [Google Scholar]
  • 17.Günther CW, Van Der Aalst WM. Fuzzy mining–adaptive process simplification based on multi-perspective metrics. In: International Conference on Business Process Management. Springer; 2007. p. 328–43.
  • 18.Günther CW, Rozinat A. Disco: discover your processes. In: Demonstration Track of the 10th International Conference on Business Process Management, BPM Demos 2012. 2012. p. 40–44. CEUR-WS.org
  • 19.Leemans SJ. Robust Process Mining with Guarantees. In: BPM (Dissertation/Demos/Industry). Springer; 2018. p. 46–50.
  • 20.Homayounfar P. Process mining challenges in hospital information systems. In: 2012 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE; 2012. p. 1135–40.
  • 21.Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA, Group -P-P. Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015 statement. Syst Rev. 2015;4:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Iachecen F, Dallagassa MR, Portela Santos EA, Carvalho DR, Ioshii SO. Is it possible to automate the discovery of process maps for the time-driven activity-based costing method? A systematic review. BMC Health Serv Res. 2023;23(1):1408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Oliart E, Rojas E, Capurro D. Are we ready for conformance checking in healthcare? Measuring adherence to clinical guidelines: a scoping systematic literature review. J Biomed Inf. 2022;130:104076. [DOI] [PubMed] [Google Scholar]
  • 24.Rojas E, Arias M, Sepúlveda M. Clinical processes and its data, what can we do with them. In: Proceedings of the International Conference on Health Informatics (HEALTHINF 2015). Lisbon, Portugal; 2015, p. 12–15.
  • 25.Yang W, Su Q. Process mining for clinical pathway: literature review and future directions. In: 2014 11th International Conference on Service Systems and Service Management (ICSSSM). IEEE; 2014. p. 1–5.
  • 26.Guzzo A, Rullo A, Vocaturo E. Process mining applications in the healthcare domain: a comprehensive review. Wiley Interdiscip Rev Data Min Knowl Discovery. 2022;12(2):1442. [Google Scholar]
  • 27.Partington A, Wynn M, Suriadi S, Ouyang C, Karnon J. Process mining for clinical processes: a comparative analysis of four australian hospitals. ACM Trans Manage Inf Syst (TMIS). 2015;5(4):1–18. [Google Scholar]
  • 28.Erdogan TG, Tarhan A. Systematic mapping of process mining studies in healthcare. IEEE Access. 2018;6:24543–67. [Google Scholar]
  • 29.Williams R, Rojas E, Peek N, Johnson OA. Process mining in primary care: a literature review. Build Continents Knowl Oceans Data Future Co-Created eHealth. 2018;247:376–80. [PubMed]
  • 30.Helm E, Lin AM, Baumgartner D, Lin AC, Küng J. Towards the use of standardized terms in clinical case studies for process mining in healthcare. Int J Environ Res Public Health. 2020;17(4):1348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dallagassa MR, Santos Garcia C, Scalabrin EE, Ioshii SO, Carvalho DR. Opportunities and challenges for applying process mining in healthcare: a systematic mapping study. J Ambient Intell Hum Comput. 2022;13:1–18.
  • 32.De Roock E, Martin N. Process mining in healthcare–an updated perspective on the state of the art. J Biomed Inf. 2022;127:103995. [DOI] [PubMed] [Google Scholar]
  • 33.Gunatilleke NJ, Fleuriot J, Anand A. A literature review on the analysis of symptom-based clinical pathways: time for a different approach? PLOS Digital Health. 2022;1(5):0000042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chen K, Abtahi F, Carrero -J-J, Fernandez-Llatas C, Seoane F. Process mining and data mining applications in the domain of chronic diseases: a systematic review. Artif Intell Med. 2023;144:102645. [DOI] [PubMed]
  • 35.Fernández-Llatas C, Meneu T, Benedi JM, Traver V. Activity-based process mining for clinical pathways computer aided design. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology. IEEE; 2010. p. 6178–81. [DOI] [PubMed]
  • 36.Blei D, Carin L, Dunson D. Probabilistic topic models. IEEE Signal Process Mag. 2010;27(6):55–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3(Jan):993–1022. [Google Scholar]
  • 38.ProM Tools–ProM has moved to GitHub, ask Eric Verbeek for details. https://promtools.org/. 7 Jul 2024.
  • 39.Berti A, Van Zelst SJ, Aalst W. Process mining for python (pm4py): bridging the gap between process-and data science. 2019. arXiv preprint arXiv:1905.06169.
  • 40.Rojas E, Fernández-Llatas C, Traver V, Munoz-Gama J, Sepúlveda M, Herskovic V, Capurro D. Palia-er: bringing question-driven process mining closer to the emergency room. In: BPM (Demos). 2017.
  • 41.Fernandez-Llatas C, Lizondo A, Monton E, Benedi J-M, Traver V. Process mining methodology for health process tracking using real-time indoor location systems. Sensors. 2015;15(12):29821–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Van Eck ML, Lu X, Leemans SJ, Van Der Aalst WM. Pm: a process mining project methodology. In: International Conference on Advanced Information Systems Engineering. Springer; 2015. pp. 297–313.
  • 43.Bozkaya M, Gabriels J, Werf JM. Process diagnostics: a method based on process mining. In: 2009 International Conference on Information, Process, and Knowledge Management. IEEE; 2009. p. 22–27.
  • 44.Rebuge Á, Ferreira DR. Business process analysis in healthcare environments: a methodology based on process mining. Inform Syst. 2012;37(2):99–116. [Google Scholar]
  • 45.Lenz R, Reichert M. It support for healthcare processes–premises, challenges, perspectives. Data Knowl Eng. 2007;61(1):39–58. [Google Scholar]
  • 46.Kotti Z, Galanopoulou R, Spinellis D. Machine learning for software engineering: a tertiary study. ACM Comput Surv. 2023;55(12):1–39. [Google Scholar]
  • 47.Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S. Systematic literature reviews in software engineering–a systematic literature review. Inf Software Technol. 2009;51(1):7–15. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (36.1KB, docx)

Data Availability Statement

No datasets were generated or analysed during the current study.


Articles from BMC Medical Informatics and Decision Making are provided here courtesy of BMC

RESOURCES