Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 1.
Published in final edited form as: Int J Med Inform. 2024 Aug 5;191:105588. doi: 10.1016/j.ijmedinf.2024.105588

Development and validation of a Multi-Causal investigation and discovery framework for knowledge harmonization (MINDMerge): A case study with acute kidney injury risk factor discovery using electronic medical records

Mingyang Zhang a,b,1, Xiangzhou Zhang a,c,1, Mingyang Dai a,d, Lijuan Wu e,f, Kang Liu a,b, Hongnian Wang a,b, Weiqi Chen g,*, Mei Liu h,*, Yong Hu a,c,*
PMCID: PMC12512021  NIHMSID: NIHMS2115673  PMID: 39128399

Abstract

Objective:

Accurate diagnoses and personalized treatments in medicine rely on identifying causality. However, existing causal discovery algorithms often yield inconsistent results due to distinct learning mechanisms. To address this challenge, we introduce MINDMerge, a multi-causal investigation and discovery framework designed to synthesize causal graphs from various algorithms.

Methods:

MINDMerge integrates five causal models to reconcile inconsistencies arising from different algorithms. Employing credibility weighting and a novel cycle-breaking mechanism in causal networks, we initially developed and tested MINDMerge using three synthetic networks. Subsequently, we validated its effectiveness in discovering risk factors and predicting acute kidney injury (AKI) using two electronic medical records (EMR) datasets, eICU Collaborative Research Database and MIMIC-III Database. Causal reasoning was employed to analyze the relationships between risk factors and AKI. The identified causal risk factors of AKI were used in building a prediction model, and the prediction model was evaluated using the area under the receiver operating characteristics curve (AUC) and recall.

Results:

Synthetic data experiments demonstrated that our model outperformed significantly in capturing ground-truth network structure compared to other causal models. Application of MINDMerge on real-world data revealed direct connections of pulmonary disease, hypertension, diabetes, x-ray assessment, and BUN with AKI. With the identified variables, AKI risk can be inferred at the individual level based on established BNs and prior information. Compared against existing benchmark models, MINDMerge maintained a higher AUC for AKI prediction in both internal (AUC: 0.832) and external network validations (AUC: 0.861).

Conclusion:

MINDMerge can identify causal risk factors of AKI, serving as a valuable diagnostic tool for clinical decision-making and facilitating effective intervention.

Keywords: Causal discovery, Causal network synthesis, Network cycle-breaking, Acute kidney injury, Risk factor identification

1. Introduction

Epidemiologic studies aim to scrutinize disease patterns within specific groups to infer potential causes and consequences. However, establishing definitive causality from an association between exposure risk factors and a disease is challenging as correlations do not imply causation. Solely relying on correlations identified through statistical or machine learning (ML) approaches is inadequate for informed decision making in healthcare. Causal discovery holds promise in uncovering crucial insights for medical diagnosis [1], and its growing relevance within healthcare presents a transformative pathway for advancing precision medicine and personalized diagnosis. Bayesian networks (BNs), providing a formal probabilistic framework to model and infer causal relationships among variables [2], offer a robust approach for classification. BNs have widespread utility across scientific domains, including biomedicine [3,4], and their application in healthcare could revolutionize medical decision-making by fostering a deeper understanding of causal relationships.

Causal discovery using electronic medical records (EMRs) is challenging due to the inherent complexity and limitations of observational data. Over the years, various causal discovery methods have shown promise and can be classified into three types: constraint-based [5,6], score-based [7], and hybrid algorithms [8,9]. Constraint-based algorithms assess network structures through conditional independence tests (e.g., chi-squared tests or mutual information tests). However, their computational complexity exponentially increases as the number of nodes, leading to reduced efficiency and reliability. Score-based algorithms employ scoring functions (e.g., BIC, BDeu, K2) and search methods (e.g., hill-climbing (HC) or tabu search algorithms) to identify underlying network structures. Yet, obtaining an optimal network structure within a large structural space remains challenging. Hybrid approaches, such as H2PC [9] and MMHC [10], combine the strengths of constraint-based and score-based algorithms. However, the selection of an appropriate causal algorithm remains an ongoing challenge. Comparative assessments of different BNs’ accuracy and efficiency are presented by Scutari et al [11] and Hussung et al [12]. The general consensus of these studies is that no single causal method exhibits significantly superior performance in structure reconstruction. Moreover, individual causal algorithms often suffer from local optimality and limited generalization. Applying different causal algorithms to the same dataset frequently yields different results, contributing to inconsistent knowledge.

To improve the accuracy and efficiency of causal structure search, several scholars have adopted intelligent optimization and ensemble methods. Evolutionary algorithms (e.g., ant colony [13,14], particle swarm [1517], genetic algorithm [18,19]) and combinatorial optimization algorithms (e.g., Information Flow (IF) Theory [20,21], Breeding Swarm [16], Glowworm swarm optimization algorithm [22], Greedy Equivalence Search [23]) have been proposed for learning BNs. However, limitations are also evident. Firstly, most of these methods rely on a single causal discovery method, potentially compromising the accuracy of discovered knowledge due to algorithmic bias. Secondly, optimization-based methods often suffer from reduced applicability owing to algorithm complexity and extensive search time. Some ensemble learning approaches for causal discovery (e.g., MIC (maximum information coefficient) [24], perturbed features [25], causal IF theory [26], causal strength scoring matrix [27], and weighted adjacency matrix of local data slices [28]) may overlook algorithm bias and the impact of data perturbation on causal algorithms. Most importantly, the potential existence of cyclic structures is often disregarded when synthesizing network knowledge acquired from different sources. This oversight may result in erroneous inferences.

In summary, data-driven causal discovery faces two critical challenges: 1) the diversity of causal discovery algorithms, and 2) the presence of cyclic structures in synthesized causal networks. To address these challenges, we propose MINDMerge, a multi-causal investigation and discovery framework aimed at harmonizing networks derived from multiple causal discovery algorithms to enhance accuracy, robustness, and generalization. Our evaluation of this framework spans synthetic and real-world datasets, investigating causal relationships among clinical variables and acute kidney injury (AKI). AKI is a prevalent complication among critically ill patients [29,30], associated with significantly high mortality and morbidity rates [31]. Globally, an estimated 13.3 million individuals suffer from AKI annually, with 85 % in developing countries [32], and over 70 % of cases undiagnosed within 3 days [33]. Additionally, AKI is prevalent among patients hospitalized for COVID-19 [34,35]. However, etiology of AKI is complex, posing challenges in prediction and risk factor discovery. Some studies employ machine learning algorithms, such as [3638], to analyze electronic health record (EHR) data and uncover patterns related to AKI occurrence. While the methodologies employed in these studies provide valuable insights into AKI risk prediction and factor discovery, they may be limited by their reliance on correlation-based analyses and not establish causal relationships. A robust causal network model, such as MINDMerge, can elucidate potential AKI risk factors, supporting clinical decisions and enabling effective interventions.

2. Material and methods

The proposed MINDMerge framework was initially tuned and evaluated using synthetic data before its application to EMR data for causal relationship discovery and AKI risk prediction. In this study, BN learning from observational data operated on the following assumptions:

  • Strong causality between variables without symmetry.

  • Data completeness –If the dataset is complete and the observations correctly reflect the true state of the variables, the causal network learned from the data will accurately reflect the true causal relationship.

  • Causal sufficiency – It was assumed that all relevant variables were observed without any hidden variables.

These assumptions aimed to simplify the reasoning and interpretation of the BN model. Deviations from these assumptions could potentially complicate the establishment of causality.

2.1. Synthetic data

Synthetic data plays an important role in evaluating the efficacy of methods. We developed and evaluated the proposed MINDMerge framework using three widely used synthetic datasets – Cancer [39], Child [40], Alarm [41] – representing varying sizes and distinct structures for BN structure learning. Experiments were conducted on each dataset under different settings with sample sizes ranging from 500 to 5000. The Cancer dataset comprises 5 nodes, 4 arcs, and 10 parameters, representing a small network used in clinical diagnosis. The Child dataset, designed for referrals of newborn babies with congenital heart disease in London, comprises 20 nodes, 25 arcs, and 230 parameters, making it a medium-sized network. On the other hand, the Alarm dataset, based on an alarm message system for emergency patient monitoring, includes 37 nodes, 46 arcs, and 509 parameters, representing a large network. Synthetic data in the form of RDA files was downloaded from the Bayesian network Repository and the ‘rbn’ function was used to generate samples of desired sizes. Source data is available at https://www.bnlearn.com/bnrepository.

2.2. Real-world data

The proposed MINDMerge framework was also validated using two de-identified EMR datasets, both of which comply with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, a US federal law designed to protect patient health information. Access permissions were obtained for research purposes after the researchers completed the MIT CITI training modules, which are specifically designed for the secondary use of de-identified data.

  • The eICU Collaborative Research Database (eICU) is a large multi-center critical care database made available by Philips Healthcare in partnership with the MIT Laboratory for Computational Physiology [42]. It contains over 200,000 distinct ICU admissions between 2014 and 2015 across 208 US hospitals. We collected 92 variables from eICU dataset, encompassing demographic information, vital signs, laboratory tests, comorbidities and procedures, based on clinician recommendations.

  • The Medical Information Mart for Intensive Care III (MIMIC-III) comprises of deidentified health-related data associated with over 40,000 patients admitted to critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. For our analysis, we collected 66 variables from MIMIC-III dataset, guided by clinician recommendations. The dataset served as an independent external validation set.

2.2.1. Description of study cohort

The study included adult patients (age ≥ 18) admitted to the hospital for a minimum of 2 days. AKI was defined using the KDIGO (Kidney Disease Improving Global Outcomes) guidelines, which was determined by:

  • An increase in serum creatinine (SCr) levels by ≥ 0.3 mg/dL (26.5 umol/L) within 48 h OR

  • An increase of > 1.5 times the baseline SCr level within seven days OR

  • Urine output below 0.5 mL/kg/h for more than 6 h.

2.2.2. Data processing

For each patient, we collected information on clinical variables up to 48 h before the prediction point (Supplementary Fig. S1A). For patients with AKI, the prediction point was 48 h before onset. For non-AKI patients, we set the prediction point to 48 h before the last SCr measurement. For clinical variables with longitudinal measures, we only used their most recent value. SCr and eGFR were not included as variables because they were used to determine AKI occurrence. Data extraction flowchart can be found in Supplementary Fig. S1B.

We processed datasets from four hospitals in the eICU database, forming a final cohort of 9688 patients: eICU-1 (2,477 patients; AKI incidence of 27.6 %), eICU-2 (4,280 patients; AKI incidence of 29.06 %), eICU-3 (1,630 patients; AKI incidence of 33.37 %), and eICU-4 (1,301 patients; AKI incidence of 32.2 %). Additionally, a cohort from the MIMIC-III database was extracted and processed, comprising 15,298 patients with an AKI incidence rate of 30.76 %. For experiments, eICU-1 served as the derivation cohort, while eICU-2 served as the external validation cohort. The remaining cohorts – eICU-3, eICU-4, and MIMIC-III – were utilized for external geographical validation to assess generalizability.

We adopted Li et al.’s approach [43] for handling missing data. Among vital signs and laboratory tests, variables with missing rate exceeding 75 % were excluded, while the remaining variables were imputed using the predictive mean matching (pmm) method with multiple imputation by Chained Equations (MICE). Although other methods such as KNN and Random Forest were considered, the pmm method showed the least discrepancy between distributions of the original and the imputed data by the Kolmogorov-Smirnov test (Supplementary Table S1 and Fig. S2).

Continuous variables were discretized, and discretization thresholds were chosen based on existing knowledge (MSD MANUAL Professional Version). The continuous features were divided into different categories as shown in Supplementary Table S2. We took the interaction of three feature selection methods, including the Chi-square (Chi2) method with a significance level p<0.05, Boruta, and Random Forest, were applied to exclude irrelevant variables, resulting in 26 variables (Supplementary Fig. S3 and Table S3). And generate a causal network structure based on these features. Baseline descriptive characteristics of other cohorts are shown in Supplementary Table S4S7.

2.3. Mindmerge framework

2.3.1. Causal Bayesian network

The constraint-based methods have been discarded due to the presence of undirected edges within the structure. We employed five causal discovery algorithms: three score-based algorithms (BIC [44,45], BDeu [46,47] and BDla [48,49]) with tabu-search, and two hybrid algorithms (MMHC [8,10] and H2PC [9,50]). Detailed descriptions of each method can be found in Supplementary method 1, and their key properties are summarized in Table 1. These algorithms excel in identifying specific data pattern, ensuring efficiency and ease of implementation while achieving a balanced trade-off between model fitting and complexity. However, they ignore the diverse knowledge inherent in the data. By combining different causal algorithms, we can mitigate reliance on a single method and capture different information from data, resulting in more robust knowledge.

Table 1.

Description of the key properties of baseline model.

Structure Learning Description
Score-based BIC Overfitting can be avoided by selecting models based on their log-likelihood and a penalty term for model complexity
BDeu It is assumed that the prior of the parameters obeying Dirichlet distribution, and it is used as the basis to find the optimal structure, which does not need to obtain the node order in advance. The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence.
BDla BDla can avoid the need to set free parameters and produce better results when the parameter space of the model is complex (a mix of uniform and skewed parameter distributions).
Hybrid-based MMHC MMHC is capable of dealing with thousands of nodes in reasonable time. Firstly, the MMPC (max–min parents and children) algorithm was used to determine the parent and child node sets of each node, so as to construct the network structure framework. Then the frame of the obtained network structure is searched and scored according to the K2 search strategy to obtain the optimal network structure.
H2PC It first reconstructs the skeleton of a Bayesian network by integrating multiple PC algorithms to identify the parents and children set of each variable, and then performs a greedy hill-climbing search to filter and orient the edges.

Algorithm ensemble can be approached in two ways. Homogeneous ensembles utilize similar models, but may not capture the diverse and complex relationships in data, thus limiting generalizability. Conversely, heterogeneous ensembles leverage multiple algorithms to capture patterns from diverse perspectives to enhance robustness. However, interpreting results from such a diverse set of models pose challenges due to their complexity. In this study, we sought to leverage the strengths of both homogeneous and heterogeneous ensembles within a hybrid framework for harmonizing causal networks. This approach allows synthesis of the best aspects of each method to attain a more comprehensive understanding of causal relationships.

2.3.2. Network fusion strategy

We designed a credibility-weighted fusion strategy, CW, to evaluate the final credibility score of each edge. A higher the score denotes higher the credibility, indicating a greater likelihood of the edge’s existence. Subsequently, edges with weak credibility were filtered out based on the threshold θ to derive a transition matrix. The credibility weights are defined as follows:

CW=i=1nNnodeNiMii=1nMin

where n is number of causal algorithms involved in the ensemble, Nnode refers to the number of nodes in the dataset, Ni refers to the numbers of arcs of causal algorithm i, the larger the value of Nnode/Ni, the larger the weight, and the more significant the causality of arcs, Mi is an 0–1 adjacency matrix of BN structure, is Hadamard product.

When merging causal networks, a circular structure can emerge, causing causal conflicts. Since BNs are DAGs, it is crucial to derive an acyclic weighted adjacency matrix W that approximates the cyclic matrix W0. However, manual deletion in large-scale dataset is infeasible. To solve this problem, we referred to the NOTEARS algorithm introduced by Zheng et al [51]. NOTEARS works by converting the standard combinatorial optimization problem into a continuous regularization problem on matrices, and then solving it using a numerical optimization algorithm. Thus

FWRd×dmin(W)subjecttoG(W)DAGs.WRd×dminF(W)subjecttoh(W)=0 (1)

where F(W) is a score function on Rd×d,G(W) is a directed acyclic graph generated by the weighted adjacency matrix W. Zheng et al confirmed the existence of such a smooth function h: A matrix WRd×d is a DAG if and only if h(W)=treWW-d=0, where d is the dimension of matrix W,tr(.) is trace of matrix, is Hadamard product and eW is the matrix exponential of W.

Given that the loss function of NOTEARS is designed for linear structural equation models (SEM), it’s essential to redefine the loss function appropriately in light of the distinct context of our study. So, we proposed a novel cycle-breaking algorithm termed CBAMN (Cycle-Breaking Algorithm based on Modified NOTEARS). CBAMN aims to identify acyclic solutions that closely resemble the cyclic solutions obtained through an iterative optimization process. we redefined a loss function:

F(W)=λ12WO-WF2+λ2W1subjecttoh(W)=0, (2)

where F is Frobenius norm, W0 is matrix with a cycle, W is the final ideal acyclic matrix. The primary modification involves the incorporation of a cyclicity penalty term, which penalizes cyclic structures within the learned causal graph. This penalty incentivizes the algorithm to prioritize acyclic solutions while discouraging the formation of cyclic dependencies. Additionally, we introduce a regularization term to promote sparsity in the learned graph, facilitating simpler and more interpretable causal models. W0-WF2 corresponds to the least squares loss of matrix differences, adjusting the degree of approximation between two matrices via the parameter λ1>0. The second term is the penalty term. To obtain an ideal matrix, the sum of all weights must be minimized. The sparsity of matrix can be adjusted by penalty parameter λ2>0. Similar to NOTEARS, we utilized the augmented Lagrange algorithm [51,52] to solve equation (2) by converting the objective function into a dual problem, based on the given the objective function and constraint conditions:

Lρ(W,α)=argminW*Rd×dF(W)+ρ2|h(W)|2+αh(W) (3)

where α is Lagrange multiplier, ρ is penalty parameter. Therefore, we can perform dual gradient ascent to optimize α=α+ρhW*, where W* is the local minimizer matrix of this iteration. The optimization process of CBAMN is shown in Algorithm 1.

Algorithm 1:

CBAMN Algorithm

Input: degree of decay c(0,1), growth rate of penalty
r>1, constraint term coefficient αo, optimize error rate ε>0, dual
 function:Lρ(W,α)
 Output: A directed acyclic graph W*
for t=0,1,2,, do:
Wt+1=argminWLρ(W,αt) if h(Wt+1)c×h(Wt) then
ρ=rρ
continue
end if
αt+1=αt+ρh(Wt+1) if h(Wt+1)<ε then
break
end if
end for
 Output W*

2.3.3. Experimental framework

To evaluate the proposed MINDMerge framework, we employed eleven baseline algorithms. These included 5 single-cause discovery algorithms (BDeu, BDla, BIC, H2PC, MMHC), their corresponding homogeneous ensemble versions (BDeu-en, BDla-en, BIC-en, H2PC-en, MMHC-en), and a heterogeneous ensemble algorithm combing all five causal algorithms (Het-en). Additionally, we compared MINDMerge against a voting-based network fusion strategy (Voting-N), which picks edges identified by majority of the algorithms. Fig. 1 provides an overview of the MINDMerge framework. Training data was extracted via bootstrapping from the original dataset. Phase 1 involved causal discovery on these bootstrapped samples to generate Phase 1 ensemble results. Phase 2 ensemble processing consolidated the Phase 1 results into a final output. Finally, given that the final output may be a cyclic graph, it is converted into an acyclic graph using CBAMN (Algorithm 1). Supplementary Fig.S4 provides an example shows the algorithm.

Fig. 1.

Fig. 1.

Overview of the MINDMerge framework. MINDMerge (Multi-Causal Investigation and Discovery Framework for Knowledge Harmonization), CBAMN (novel cycle-breaking algorithm based on modify NOTEARS).

The overall dataset was split into training, internal validation (20 % patients), and external validation (include network validation and geographic validation). Bootstrapping (N=50) was performed on the training data. Following this, each causal discovery method was applied across all bootstrap samples. We investigated credibility-weighted ensemble methods, generating weighted matrices EBN1-EBN50 derived from five different causal discovery methods on each bootstrap sample.

Subsequently, multiple BN structures were integrated using credibility weighting, and edges with weak reliability was filtered based on a predetermined threshold, resulting in a transition matrix. Finally, a cycle-breaking algorithm was applied to the transition matrix to obtain the adjacency matrix of DAG.

For parameter optimization, we performed parameter tuning on synthetic data before deploying MINDMerge with the identified optimal parameters on real-world EMR data for AKI risk prediction. The final learned network structure was externally validated. Additionally, to assess the portability of MINDMerge, we performed external geographic validation.

2.3.4. Model evaluation

Using synthetic data, we directly compared the learned network with the ground-truth network using Recall and F1-score. TP refers to the number of edges that exist in both the ground-truth and the learned networks, FP refers to the number of edges that exist in the learned network but not in the ground-truth network, and FN refers to the number of edges that exist in the ground-truth network but not in the learned network. We repeated the experiment 10 times and reported the average performance. Achieving comparable prediction accuracy indicates the model’s ability to capture the causal relationships between variables.

Recall=TPTP+FN
Precision=TPTP+FP
F1-score=2*Recall*PrecisionRecall+Precision.

In the real-world data application, we compared the prediction performance of various classification models – CART, C5.0, Random Forest (RF), XGBoost, Logistic Regression (LR), Support Vector Machine (SVM), ANN, and Naïve Bayes (NB) – using the area under the receiver operating characteristic curve (AUC) and Recall metrics. The performance evaluation involved 5-fold cross-validation, and we determined the optimal threshold using the Youden index.

3. Results

3.1. Experiments on synthetic datasets

To strike a balance between achieving a network closer to the ground-truth (i.e., high Recall) and maintaining optimal sparsity, we set hyperparameters λ1=1,λ2=0.5, and the threshold θ=0.85. Supplementary Fig. S5 and Table S8 provide an in-depth sensitivity analysis of these parameters.

Across various datasets – Cancer, Child, and Alarm – we conducted comparisons among different causal algorithms using MINDMerge with varying sample sizes (500, 1 k, 2 k, 5 k). Table 2 presents the outcomes, highlighting that distinct causal methods yield varied learning accuracies for the same dataset. Ensemble algorithms exhibited superior performance compared to single causal algorithms, with MINDMerge consistently demonstrating better results than other ensemble approaches. Moreover, our observations indicate that learning efficiency is constrained with smaller sample sizes, and progressively improved as the sample size increases. The significance analysis is shown in Supplementary Fig. S6S9. Fig. 2 illustrates the cosine similarity and Recall metrics for networks at a sample size of 5000. Notably, algorithmic bias is evident among different causal methods (Fig. 2AC). MINDMerge exhibits the highest Recall (1, 0.96, 0.807 for Cancer, Child and Alarm datasets respectively) (Fig. 2D).

Table 2.

Structure learning results of MINDMerge ensemble algorithm and other algorithms under different dataset sizes (best performances are highlighted in bold).

Method Recall F1 Recall F1 Recall F1 Recall F1
Cancer 500 Cancer 1 K Cancer 2 K Cancer 5 K
BDeu 0.6(0.129) 0.732(0.09) 0.7(0.158) 0.804(0.112) 0.775(0.142) 0.814(0.115) 0.925(0.169) 0.925(0.169)
BDla 0.25(0.289) 0.276(0.311) 0.425(0.237) 0.485(0.285) 0.275(0.079) 0.297(0.099) 0.8(0.329) 0.797(0.334)
BIC 0.325(0.169) 0.508(0.131) 0.6(0.175) 0.729(0.168) 0.65(0.129) 0.744(0.136) 0.9(0.129) 0.932(0.093)
H2PC 0.35(0.129) 0.495(0.152) 0.625(0.177) 0.748(0.171) 0.65(0.129) 0.725(0.153) 0.925(0.121) 0.936(0.108)
MMHC 0.35(0.129) 0.5(0.145) 0.575(0.169) 0.729(0.168) 0.65(0.129) 0.735(0.145) 0.875(0.132) 0.896(0.116)
BDeu-en 0.75(0.204) 0.753(0.135) 0.8(0.197) 0.793(0.192) 0.8(0.197) 0.807(0.177) 0.95(0.158) 0.95(0.158)
BDla-en 0.425(0.265) 0.425(0.279) 0.6(0.269) 0.577(0.245) 0.4(0.175) 0.395(0.172) 0.75(0.264) 0.721(0.267)
BIC-en 0.6(0.175) 0.677(0.157) 0.75(0.167) 0.786(0.127) 0.825(0.121) 0.835(0.102) 0.975(0.079) 0.964(0.083)
H2PC-en 0.6(0.211) 0.656(0.183) 0.7(0.105) 0.758(0.101) 0.775(0.142) 0.814(0.115) 0.975(0.079) 0.975(0.079)
MMHC-en 0.6(0.175) 0.662(0.15) 0.675(0.121) 0.731(0.1) 0.8(0.158) 0.828(0.129) 0.975(0.079) 0.975(0.079)
Het-en 0.45(0.158) 0.596(0.152) 0.625(0.177) 0.748(0.171) 0.675(0.121) 0.721(0.13) 0.925(0.121) 0.946(0.091)
Voting-N 0.3(0.197) 0.533(0.143) 0.5(0.236) 0.701(0.146) 0.675(0.121) 0.78(0.108) 0.875(0.132) 0.929(0.075)
MINDMerge 0.825(0.169) 0.66(0.149) 0.825(0.121) 0.727(0.077) 0.85(0.129) 0.781(0.063) 1(0) 0.954(0.071)
Child 500 Child 1 K Child 2 K Child 5 K
BDeu 0.676(0.072) 0.746(0.079) 0.74(0.047) 0.782(0.044) 0.748(0.092) 0.765(0.092) 0.784(0.087) 0.793(0.096)
BDla 0.6(0.105) 0.653(0.119) 0.708(0.057) 0.733(0.059) 0.82(0.087) 0.824(0.091) 0.808(0.045) 0.790(0.056)
BIC 0.66(0.069) 0.738(0.072) 0.736(0.054) 0.784(0.051) 0.748(0.073) 0.757(0.077) 0.788(0.09) 0.793(0.096)
H2PC 0.584(0.057) 0.679(0.06) 0.7(0.043) 0.764(0.043) 0.736(0.054) 0.784(0.044) 0.86(0.028) 0.869(0.021)
MMHC 0.496(0.051) 0.608(0.059) 0.592(0.037) 0.7(0.041) 0.628(0.038) 0.728(0.034) 0.6(0.044) 0.695(0.051)
BDeu-en 0.74(0.043) 0.754(0.051) 0.764(0.058) 0.769(0.063) 0.82(0.078) 0.805(0.086) 0.776(0.063) 0.761(0.068)
BDla-en 0.724(0.051) 0.689(0.051) 0.788(0.078) 0.774(0.069) 0.84(0.027) 0.806(0.028) 0.848(0.032) 0.824(0.038)
BIC-en 0.692(0.063) 0.729(0.065) 0.788(0.05) 0.797(0.061) 0.824(0.078) 0.803(0.079) 0.788(0.065) 0.778(0.07)
H2PC-en 0.608(0.082) 0.682(0.082) 0.692(0.046) 0.76(0.047) 0.732(0.042) 0.758(0.041) 0.824(0.021) 0.841(0.02)
MMHC-en 0.528(0.059) 0.634(0.065) 0.616(0.043) 0.713(0.051) 0.66(0.043) 0.75(0.047) 0.628(0.05) 0.702(0.053)
Het-en 0.64(0.068) 0.709(0.073) 0.692(0.038) 0.769(0.038) 0.792(0.067) 0.822(0.069) 0.836(0.048) 0.853(0.049)
Voting-N 0.652(0.053) 0.759(0.05) 0.736(0.034) 0.825(0.033) 0.876(0.035) 0.908(0.026) 0.928(0.017) 0.943(0.009)
MINDMerge 0.824(0.034) 0.79(0.038) 0.86(0.051) 0.83(0.061) 0.948(0.019) 0.917(0.033) 0.96(0) 0.958(0.006)
Alarm 500 Alarm 1 K Alarm 2 K Alarm 5 K
BDeu 0.641(0.106) 0.602(0.111) 0.661(0.067) 0.639(0.065) 0.696(0.047) 0.659(0.047) 0.717(0.067) 0.686(0.077)
BDla 0.646(0.057) 0.587(0.063) 0.685(0.048) 0.65(0.057) 0.698(0.093) 0.656(0.093) 0.761(0.086) 0.728(0.088)
BIC 0.5(0.066) 0.543(0.072) 0.55(0.032) 0.574(0.038) 0.635(0.054) 0.643(0.064) 0.693(0.042) 0.698(0.046)
H2PC 0.393(0.046) 0.519(0.051) 0.459(0.075) 0.574(0.082) 0.554(0.043) 0.662(0.052) 0.663(0.06) 0.738(0.068)
MMHC 0.354(0.053) 0.484(0.066) 0.487(0.068) 0.606(0.077) 0.524(0.022) 0.644(0.029) 0.635(0.034) 0.726(0.04)
BDeu-en 0.65(0.11) 0.548(0.102) 0.722(0.061) 0.633(0.054) 0.713(0.06) 0.633(0.065) 0.752(0.059) 0.689(0.068)
BDla-en 0.62(0.076) 0.511(0.065) 0.652(0.072) 0.568(0.06) 0.661(0.045) 0.567(0.047) 0.693(0.076) 0.621(0.072)
BIC-en 0.502(0.045) 0.511(0.044) 0.602(0.047) 0.594(0.04) 0.643(0.031) 0.628(0.034) 0.665(0.034) 0.641(0.033)
H2PC-en 0.502(0.071) 0.615(0.076) 0.537(0.051) 0.631(0.056) 0.589(0.042) 0.68(0.044) 0.676(0.042) 0.737(0.05)
MMHC-en 0.459(0.068) 0.573(0.072) 0.515(0.045) 0.617(0.047) 0.574(0.055) 0.677(0.061) 0.641(0.029) 0.721(0.033)
Het-en 0.517(0.062) 0.619(0.066) 0.7(0.05) 0.698(0.052) 0.724(0.053) 0.712(0.06) 0.77(0.044) 0.807(0.048)
Voting-N 0.507(0.047) 0.625(0.051) 0.635(0.032) 0.712(0.031) 0.663(0.031) 0.731(0.04) 0.729(0.051) 0.799(0.053)
MINDMerge 0.691(0.059) 0.632(0.053) 0.735(0.025) 0.675(0.019) 0.746(0.041) 0.674(0.046) 0.811(0.051) 0.762(0.056)

Fig. 2. The similarity and Recall of between different causal network.

Fig. 2.

A, B, C shows the similarity of different causal discovery methods on the three datasets (i.e., Cancer, Child, and Alarm), D shows the Recall on the three datasets.

To evaluate the performance of the proposed cycle-breaking algorithm CBAMN in the MINDMerge framework, we compared it to other cycle removing algorithms like colored-DFS[15,53] and MIGGA [54] (Supplementary Table S9). As network size increases, CBAMN exhibited a more pronounced cycle-breaking effect, maintaining higher Recall and F1 compared to other methods.

3.2. Experiments on real-world datasets

Table 3 summarizes the characteristics of the real-world study cohort. In Fig. 3, the AKI incidence and crude odds ratio (cOR) for varied Age, Sex and BMI are illustrated in both the derivation and external network validation cohorts from the eICU database. When dividing age into five bins, the AKI incidence increased with age and plateauing thereafter. Similarly, the division of BMI into four bins revealed a positive correlation with AKI, demonstrating a significant increase from 19.5 % to 31.8 % and 20.3 % to 34.2 % in the derivation and validation cohorts, respectively. Male patients exhibited a relatively higher AKI risk (28.5 % vs. 26.6 % and 31.1 % vs 26.2 %). Further details regarding AKI incidence and cOR in the geographical validation cohort can be found in Supplementary Fig. S10.

Table 3.

Patient characteristics of the derivation cohort and external validation cohort.

Derivation cohort Externalnetwork validation External geographical validation
Characteristics
eICU-1 eICU-2 eICU-3 eICU-4 MIMIC-III
Number of samples(%AKI) 2477(27.6 %) 4280(29.06 %) 1630(33.37 %) 1301(32.2 %) 15298(30.76 %)
Age (AKI%)
18–35 45 (6.6 %) 47 (3.8 %) 19 (3.5 %) 14 (3.3 %) 145 (3.1 %)
36–50 88 (12.9 %) 125 (10.0 %) 43 (7.9 %) 38 (9.1 %) 490 (10.4 %)
51–65 231 (33.8 %) 407 (32.7 %) 142 (26.1 %) 124 (29.6 %) 1294 (27.5 %)
66–80 228 (33.3 %) 461 (37.1 %) 212 (39.0 %) 178 (42.5 %) 1836 (39.0 %)
>80 92 (13.5 %) 204 (16.4 %) 128 (23.5 %) 65 (15.5 %) 941 (20.0 %)
Male (%) 376 (55.0 %) 773 (62.1 %) 326 (59.9 %) 248 (59.2 %) 2769 (58.8 %)
Race (%)
Caucasian 453 (66.2 %) 1139 (91.6 %) 117 (21.5 %) 301 (71.8 %) 3839 (81.6 %)
African American 226 (33.0 %) 72 (5.8 %) 86 (15.8 %) 114 (27.2 %) 436 (9.3 %)
Hispanic NA 2 (0.2 %) 335 (61.6 %) 2 (0.5 %) 152 (3.2 %)
Asian 5 (0.7 %) 28 (2.3 %) 6 (1.1 %) 2 (0.5 %) 279 (5.9 %)
Native American NA 3 (0.2 %) NA NA NA
BMI (%)
<18.5 22 (3.2 %) 35 (2.8 %) 15 (2.8 %) 11 (2.6 %) 85 (1.8 %)
18.5–24.9 176 (25.7 %) 327 (26.3 %) 170 (31.3 %) 93 (22.2 %) 1318 (28.0 %)
25–29.9 187 (27.3 %) 348 (28.0 %) 169 (31.1 %) 119 (28.4 %) 1593 (33.9 %)
>29.9 299 (43.7 %) 534 (42.9 %) 190 (34.9 %) 196 (46.8 %) 1710 (36.3 %)

Fig. 3. AKI incidence and cOR with varied Age, Sex and BMI in derivation cohort and external network validation cohort.

Fig. 3.

age1:18–35, age2:36–50, age3:51–65, age4:66–80, age5:>80; BMI: body mass index, bmi1:<18.5, bmi2:18.5–24.9, bmi3:25–29.9, bmi4:>29.9; sex1: male, sex2: female. * with statistical significance p<0.05.

When applying MINDMerge to real-world EMR data, we imposed constraints based on domain knowledge and temporal considerations to ensure reliability of the derived causal network. For instance, no variables can be the cause of demographic variables like age, sex, and race. Consequently, all edges originating from any variable to “age” or “sex” or “race” were removed. Additionally, AKI as an outcome variable cannot be a cause of any variable. Following the experimental setup, causal network was learned using a training dataset (80 % of patients) and is presented in Fig. 4. Here, clinical variables serve as nodes, and directed edges represent causal relationships. The causal graph highlights pulmonary disease, hypertension, diabetes, and BUN as direct parent nodes contributing to AKI, representing direct relationships. Other variables such as age, sex, Paco2, Hco3, coronary artery disease, and stroke were identified as indirect risk factors associated with AKI, consistent with previous reports [29,33,55,56]. We have also explained the corresponding dependency (Supplementary Tables SA).

Fig. 4.

Fig. 4.

The causal structure of AKI using MINDMerge ensemble learning. Purple nodes refer to demographic information and vital signs, red node refer to AKI onset (No/Yes), cyan, yellow, blue nodes refer to laboratory tests, comorbidities and procedures, respectively. BMI: body mass index, CAD: coronary artery disease, BUN: Blood Urea Nitrogen, AF: Atrial Fibrillation, HD: Heart Disease, TC: Tissue culture, RDW: Red cell Distribution Width, T_Bilirubin: Total Bilirubin.

AKI risk can be inferred using the BNs (Fig. 4). The probabilistic dependencies between AKI and its risk factors are illustrated in Supplementary Fig. S11A. Leveraging Bayes’ theorem can allow the design of an intervention plan based on these inferred probabilities. Consider a hospitalized patient at age 68, without hypertension or diabetes, and has a normal BUN (blood urea nitrogen) level, the initial probability of this patient in developing AKI within 48 h stands at 33 %. However, upon monitoring, if the BUN level increases beyond the normal range, the patient’s probability of developing AKI significantly increases from 33 % to 51 %. Moreover, before encountering abnormal BUN levels, the detection of urinary tract obstruction and the identification and treatment of potential AKI-associated factors through X-ray and CT scanners can significantly reduce the AKI risk from 33 % to 10 % as demonstrated in Supplementary Fig. S11BD. This reasoning suggests that diagnosing AKI solely based on creatinine or urine volume might have a time lag.

BN not only enables causal inference but also facilitates accurate prediction of AKI risk. Leveraging MINDMerge, we constructed an AKI prediction model using Bayesian causal network classification (Supplementary method 2). Fig. 5 illustrates the average performance and confidence intervals from 5-fold cross-validation on the test dataset. Notably, MINDMerge achieved the highest AUC compared to other models. To assess statistical differences between performances, ScottKnottESD [57,58] statistical rank test (SK-test) was used. As shown in Supplementary Fig. S12, MINDMerge exhibited significant difference in AUC compared to single causal algorithms, but not to other ensemble algorithms. Moreover, no statistically significant difference was observed in Recall. In contrast to other machine learning models, MINDMerge showed statistically significant improvements in both AUC and Recall. MINDMerge and XGBoost achieved the highest AUC and Recall, performing equally well. Furthermore, we selected the top-5 models to conduct a comprehensive analysis using the test set. Supplementary Table S10 and Fig. S13 present additional performance indicators such as Precision, Recall and AUPRC, decision curves, and calibration curves.

Fig. 5.

Fig. 5.

Forest plot of the prediction performance of different models.

To assess the robustness of the causal network, eICU-2 was used as an external validation dataset. Supplementary Table S11 provides the AUCs and other performance indices for the top-5 models. In terms of AUC, the eICU-1 (AUC is 0.861) model is 0.007 lower than the optimal XGBoost model, with a Recall of 0.744. When applying MINDMerge to eICU-2, it achieved an AUC of 0.848 and a Recall of 0.748. These results demonstrate that MINDMerge has excellent predictive performance on the external validation dataset. To further verify the portability of MINDMerge, external geographical validation was performed on datasets of different sizes (eICU-3:1630, eICU-4:1301, MIMIC-III:15298). Supplementary Table S12, Table S13 and Fig. S14 showed that MINDMerge has superior AUC (0.842, 0.810, 0.730 respectively) compared to other causal and machine learning models. Supplementary Table S14, Fig. S15, Fig. S16 and Fig. S17 present additional performance indicators such as Precision, Recall and AUPRC as well as decision curves and calibration curves.

4. Discussion

Causal learning is crucial in healthcare for unveiling the underlying relationships and mechanisms that drive medical conditions and treatment outcomes. However, singular causal models may not fully capture the complexity of relevant factors. The integration of multiple models through BN fusion offers a more comprehensive approach to causal inference, which is essential for informed and effective decision-making. Despite the efficiency of existing BN fusion methods (e.g., Puerta’s [23] and others’[2428]), they face challenges such as algorithm diversity, local optima, and cyclic structures. To mitigate these issues, we developed MINDMerge, a novel causal network fusion framework that harmonizes knowledge across algorithms. MINDMerge integrates learning results from different causal discovery algorithms, facilitating the construction of more accurate network structures across varying sample sizes. Experimental results on both synthetic and real-world data demonstrate that MINDMerge outperforms other causal algorithms and effectively captures true causal relationships between variables. It also has significant advantages over conventional machine learning by revealing the causality underlying complex diseases and facilitating new risk factor discoveries.

Although knowledge mining methods are adept at revealing factors with high predictive power, their identification does not necessarily imply causal influence [36]. In our work, five critical risk factors have been identified that are directly related to AKI, namely pulmonary disease, hypertension, diabetes, X-ray, and BUN. Significant interactions between the kidneys and lungs exist in both physiological and pathological conditions. Clinical observations have revealed kidney involvement in chronic respiratory disease. Studies have reported that the incidence of renal failure is higher in patients with concomitant pulmonary disease, especially in chronic obstructive pulmonary disease (COPD) [59]. Additionally, pulmonary infections, inflammatory responses, and certain treatment modalities such as mechanical ventilation can also contribute to AKI [60]. Lun et al [61] and Cai et al [55] confirmed hypertension as an independent risk factor of AKI through meta-analysis. Persistent hypertension can increase glomerular capsule pressure, leading to glomerular fibrosis and renal arteriosclerosis, which can cause ischemia of the renal parenchyma and renal failure [62]. Similarly, patients with diabetes are more likely to develop AKI than those without diabetes, demonstrated in multiple studies that diabetes was an independent risk factor of AKI [55,63,64]. In the diabetic states, glucose metabolism mainly takes place in the kidneys, which increases the glycemic load on the kidneys that may cause kidney damage. Elevated BUN levels can signal early AKI, preceding increases in serum creatinine, offering an initial alert for renal impairment [65]. Therefore, monitoring BUN aids in timely AKI detection and intervention, potentially enhancing patient outcomes. However, BUN has limitations as a diagnostic tool due to influences from other factors, leading to potential false positives or negatives. Thus, integrating additional biomarkers can refine and expedite AKI diagnosis[66]. Moreover, the use of X-ray [67] and CT imaging offers unique insights into intrarenal hemodynamics and function [68]. These imaging modalities, often employed for assessing abdominal region issues and lung infections, provide crucial diagnostic information potentially associated with kidney damage.

While machine learning models like Extreme Gradient Boosting (XGB) excel in outcome prediction, they lack inherent insights into causal mechanisms. Our BN fusion approach, in contrast, uncovers causal relationships between variables, enabling the identification of modifiable risk factors and potential intervention targets. This provides invaluable clinical interpretability and enhances the efficacy of clinical interventions by tailoring them to specific patient populations. The risk factors identified in our study can provide actionable strategies for the prevention and management of AKI. Recognizing these risk factors allows clinicians to identify high-risk patients, monitor them more closely, and implement preventive strategies earlier. For instance, patients with pulmonary disease, hypertension, or diabetes could benefit from more rigorous control of their underlying conditions to potentially reduce the risk of AKI. Additionally, early detection and prompt intervention are critical for patients exposed to X-ray or with elevated BUN levels to prevent further kidney damage. Finally, educating patients about the signs and symptoms of AKI empowers them to seek medical attention promptly.

Fig. 4 illustrates BUN’s direct and indirect effect on AKI with the chain “Age → BUN→AKI”. AKI risk increases with age [37], Urea is the metabolic product of protein decomposition in human body, the increase of BUN levels can be caused by various factors, including age. Generally, BUN levels tend to increase with age, which is often attributed to a gradual decline in kidney function with age, leading to abnormal renal function [29], subsequently elevating the risk of AKI. The causal chain “PaCO2 → HCO3 → Pulmonary Disease → AKI” involves Paco2 and Hco3, which are mainly used to assess the acid-base balance in the body and respiratory function, and they are closely associated with pulmonary disease. Avoiding acid-base disturbances may be beneficial for the management of lung disease and kidney function [69], which may help reduce the risk of AKI. We also observed that pulmonary disease may cause multiple complications that may indirectly or directly contribute to AKI. Studies have indicated a higher prevalence of comorbidities such as diabetes (12.2 % vs.4.6 %), heart disease (15.0 % vs.7.7 %) and hypertension (38.8 % vs. 22.8 %) in individuals with abnormal lung conditions [70]. Participants with restrictive or obstructive lung function had 1.49 and 1.42-fold higher atrial fibrillation risks, respectively [71]. Conditions like asthma, characterized by impaired lung function [72], have emerged as potential biomarkers for lung cancer development [73]. Therefore, good cardiopulmonary function stands as a cornerstone of human health, supporting the normal functioning of all bodily organs and, in some instances, reducing the risk of AKI.

Limitations of the study exist. Firstly, to reduce data complexity, mitigate influence of outliers, and enhance model interpretability, we discretized the continuous variables. Employing different data imputation methods and discretization methods for continuous data might potentially impact the obtained results. Secondly, setting the threshold for the causal discovery phase is difficult; an inappropriate threshold can significantly impact the accuracy of structure learning. Achieving a balance between network complexity and accuracy required parameter selection based on experiential knowledge. Thirdly, our study was based on cross-sectional data, whereas in realistic healthcare setting, data collection is time-series in nature. Incorporating time-series data necessitates alternative causal analysis methods, such as Granger causality. Fourthly, our BN fusion approach, aimed at enhancing model accuracy and robustness, lacks a direct comparison with existing BN fusion methods due to the varied assumptions and applicability of different techniques. Lastly, the limited scope of clinical data variables, particularly the lack of medication factors, represents a significant limitation of this study. Medications, including nephrotoxic agents, play a crucial role in the development of AKI and their omission could influence the results of the model, potentially underestimating the true complexity of the causal pathways involved.

Nonetheless, this study holds significant implications. Firstly, the developed causal network harmonization framework presents a novel approach to structural learning, offering potential adaptation in knowledge fusion across diverse datasets like in a federated learning setting. Secondly, MINDMerge has stronger generalizability and can maintain higher accuracy for different datasets regardless of the sample size compared to individual or ensemble causal discovery algorithms. Thirdly, distinct from conventional machine learning algorithms, MINDMerge not only achieves high performance but also provides inference and interpretation.

5. Conclusion

This study introduces a novel causal network harmonization framework that effectively integrates causal graphs from different causal discovery algorithms. To address the cyclic structure problem in causal network fusion, we developed a novel cycle breaking algorithm. Experimental results showed that our model significantly outperformed baseline models on both synthetic data and real-world data. The model has the potential to guide the development of personalized treatment strategies.

6. Summary Table

Problem or Issue What is Already Known What this Paper Adds
The diversity of causal discovery algorithms and the presence of cyclic structures in synthesized causal networks. No single causal method exhibits significantly superior performance in structure reconstruction. The existence of cyclic structures is often disregarded when synthesizing network knowledge from different sources, which may result in erroneous inferences. We proposed a novel network harmonization framework that effectively integrates causal graphs from different causal algorithms. To address the cyclic structure in network fusion, we developed a novel cycle-breaking algorithm. Experimental results showed that our model significantly outperformed baseline models on both synthetic data and real-world data. The model has the potential to guide the development of personalized treatment strategies.

Supplementary Material

Supplementary

Appendix A. Supplementary material

Code for the main experiments is presented in https://github.com/zmy-web/CD-code.

Supplementary material is available at the end of this article.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ijmedinf.2024.105588.

Acknowledgements

This work was supported by the Major Research Plan of the National Natural Science Foundation of China (Key Program, Grant No. 91746204), the Science and Technology Development in Guangdong Province (Major Projects of Advanced and Key Techniques Innovation, Grant No. 2017B030308008), Guangdong Engineering Technology Research Center for Big Data Precision Healthcare (Grant No. 603141789047), the National Natural Science Foundation of China (Grant No. 72371116). WC is supported by the Guangzhou Science and Technology Plan Project (Grant No. 2023A04J0360). ML is supported by the NIH/NIDDK under award number R01DK116986 and NSF Smart and Connected Health award 2014554.

Footnotes

CRediT authorship contribution statement

Mingyang Zhang: Writing – original draft, Visualization, Methodology, Investigation, Formal analysis, Conceptualization. Xiangzhou Zhang: Writing – review & editing, Validation, Supervision, Resources. Mingyang Dai: Writing – original draft, Visualization, Methodology, Investigation, Formal analysis, Conceptualization. Lijuan Wu: Writing – review & editing. Kang Liu: Writing – review & editing. Hongnian Wang: Resources, Data curation. Weiqi Chen: Writing – review & editing, Validation, Supervision, Project administration, Funding acquisition. Mei Liu: Writing – review & editing, Validation, Project administration, Investigation, Funding acquisition. Yong Hu: Writing – review & editing, Validation, Project administration, Investigation, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  • [1].Richens JG, Lee CM, Johri S, Improving the accuracy of medical diagnosis with causal machine learning, Nature Communications. 11 (2020) 3923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Wang J, Zhou J, Chen X, Bayesian Causal Network for Discrete Variables. Data-Driven Fault Detection and Reasoning for Industrial Monitoring, Springer; Singapore, Singapore, 2022, pp. 233–249. [Google Scholar]
  • [3].Yu KX, Cui ZH, Sui X, Qiu X, Zhang JF, Biological Network Inference With GRASP: A Bayesian Network Structure Learning Method Using Adaptive Sequential Monte Carlo, Frontiers in Genetics. (2021) 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Liu EZ, Li J, Kinnebrew GH, Zhang PY, Zhang Y, Cheng LJ, et al. , A Fast and Furious Bayesian Network and Its Application of Identifying Colon Cancer to Liver Metastasis Gene Regulatory Networks, Ieee-Acm Transactions on Computational Biology and Bioinformatics. 18 (2021) 1325–1335. [DOI] [PubMed] [Google Scholar]
  • [5].Cheng J, Bell DA, Liu W. Learning belief networks from data: An information theory based approach. Proceedings of the sixth international conference on Information and knowledge management1997. p. 325–31. [Google Scholar]
  • [6].Tsagris M, Bayesian Network Learning with the PC Algorithm: An Improved and Correct Variation, Applied Artificial Intelligence. 33 (2019) 101–123. [Google Scholar]
  • [7].Behjati S, Beigy H, Improved K2 algorithm for Bayesian network structure learning, Engineering Applications of Artificial Intelligence. 91 (2020). [Google Scholar]
  • [8].Tsamardinos I, Brown LE, Aliferis CF, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning. 65 (2006) 31–78. [Google Scholar]
  • [9].Gasse M, Aussem A, Elghazel H, A hybrid algorithm for Bayesian network structure learning with application to multi-label learning, Expert Systems with Applications. 41 (2014) 6755–6772. [Google Scholar]
  • [10].Song W, Qiu L, Qing J, Zhi W, Zha Z, Hu X, et al. , Using Bayesian network model with MMHC algorithm to detect risk factors for stroke, Mathematical Biosciences and Engineering. 19 (2022) 13660–13674. [DOI] [PubMed] [Google Scholar]
  • [11].Scutari M, Graafland CE, Gutiérrez JM, Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms, International Journal of Approximate Reasoning. 115 (2019) 235–253. [Google Scholar]
  • [12].Hussung S, Mahmud S, Sampath A, Wu M, Guo P, Wang J, Evaluation of data-driven causality discovery approaches among dominant climate modes, UMBC Faculty Collection. (2019). [Google Scholar]
  • [13].Liu T, Duan G, Task allocation optimization model in mechanical product development based on Bayesian network and ant colony algorithm, Journal of Supercomputing. 77 (2021) 13963–13991. [Google Scholar]
  • [14].Asghari K, Masdari M, Gharehchopogh FS, Saneifard R, A fixed structure learning automata-based optimization algorithm for structure learning of Bayesian networks, Expert Systems. 38 (2021). [Google Scholar]
  • [15].Gheisari S, Meybodi MR, BNC-PSO: structure learning of Bayesian networks by Particle Swarm Optimization, Information Sciences. 348 (2016) 272–289. [Google Scholar]
  • [16].Khanteymoori AR, Olyaee MH, Abbaszadeh O, Valian M, A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm, Soft Computing. 22 (2018) 3049–3060. [Google Scholar]
  • [17].Wang JY, Liu SY, A novel discrete particle swarm optimization algorithm for solving bayesian network structures learning problem, International Journal of Computer Mathematics. 96 (2019) 2423–2440. [Google Scholar]
  • [18].Xiao Y, Wang DY, Gao Y, A Mobile Bayesian Network Structure Learning Method Using Genetic Incremental K2 Algorithm and Random Attribute Order Technology, Scientific Programming. 2021 (2021). [Google Scholar]
  • [19].Zhang LW, Predictive Analysis of Machine Learning Error Classification Based on Bayesian Network, Wireless Personal Communications. 127 (2022) 615–634. [Google Scholar]
  • [20].Li M, Liu K, Causality-based attribute weighting via information flow and genetic algorithm for naive Bayes classifier, IEEE Access. 7 (2019) 150630–150641. [Google Scholar]
  • [21].Li M, Zhang R, Hong M, Bai C, Improved structure learning algorithm of Bayesian network based on information flow, Systems Engineering & Electronics. 40 (2018) 1385–1390. [Google Scholar]
  • [22].Wang X, Ren H, Guo X, A novel discrete firefly algorithm for Bayesian network structure learning, Knowledge-Based Systems. 242 (2022). [Google Scholar]
  • [23].Puerta JM, Aledo JA, Gámez JA, Laborda JD, Efficient and accurate structural fusion of Bayesian networks, Information Fusion. 66 (2021) 155–169. [Google Scholar]
  • [24].Wang S, Qin B, Bayesian Network Structure Learning by Ensemble Learning and Feedback Strategy, Chinese Journal of Computers. 44 (2021) 1051–1063. [Google Scholar]
  • [25].Sinha M, Tadepalli P, Ramsey SA, Voting-based integration algorithm improves causal network learning from interventional and observational data: An application to cell signaling network inference, Plos One. 16 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Li M, Zhang R, Liu K, A new ensemble learning algorithm combined with causal analysis for bayesian network structural learning, Symmetry. 12 (2020) 2054. [Google Scholar]
  • [27].Cai Q, Chen X, Bayesian Network Structure Merging Algorithm Based on Scoring Function, Computer Engineering and Application. 55 (2019) 147–152. [Google Scholar]
  • [28].Tang Y, Wang J, Mai N, Altintas I, PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data, Sensors. (2019) 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Kellum JA, Lameire N, Aspelin P, Barsoum RS, Burdmann EA, Goldstein SL, et al. , Kidney Disease: Improving Global Outcomes (KDIGO) Acute Kidney Injury Work Group. KDIGO Clinical Practice Guideline for Acute Kidney, Injury. (2012). [Google Scholar]
  • [30].Song X, Yu ASL, Kellum JA, Waitman LR, Matheny ME, Simpson SQ, et al. , Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction, Nature Communications. 11 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Wilson FP, Martin M, Yamamoto Y, Partridge C, Moreira E, Arora T, et al. , Electronic health record alerts for acute kidney injury: multicenter, randomized clinical trial. Bmj-British Medical, Journal. (2021) 372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Raimann JG, Riella MC, Levin NW, International Society of Nephrology’s 0by25 initiative (zero preventable deaths from acute kidney injury by 2025): focus on diagnosis of acute kidney injury in low-income countries, Clin Kidney J. 11 (2018) 12–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Yang L, Xing G, Wang L, Wu Y, Li S, Xu G, et al. , Acute kidney injury in China: a cross-sectional survey, Lancet. 386 (2015) 1465–1471. [DOI] [PubMed] [Google Scholar]
  • [34].Cheng YC, Luo R, Wang K, Zhang M, Wang ZX, Dong L, et al. , Kidney disease is associated with in -hospital death of patients with COVID-19, Kidney International. 97 (2020) 829–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Adamczak M, Surma S, Wiecek A, Acute kidney injury in patients with COVID-19: Epidemiology, pathogenesis and treatment, Advances in Clinical and Experimental Medicine. 31 (2022) 317–326. [DOI] [PubMed] [Google Scholar]
  • [36].Wu L, Hu Y, Yuan B, Zhang X, Chen W, Liu K, et al. , Which risk predictors are more likely to indicate severe AKI in hospitalized patients? Int J Med Inform. 143 (2020) 104270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Wu L, Hu Y, Zhang X, Zhang J, Liu M, Development of a knowledge mining approach to uncover heterogeneous risk predictors of acute kidney injury across age groups, Int J Med Inform. 158 (2021) 104661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Liu K, Yuan B, Zhang X, Chen W, Patel LP, Hu Y, et al. , Characterizing the temporal changes in association between modifiable risk factors and acute kidney injury with multi-view analysis, Int J Med Inform. 163 (2022) 104785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Kevin B Korb AEN. Bayesian Artificial Intelligence. 2010: 2nd edition, Section … CRC Press,. [Google Scholar]
  • [40].Spiegelhalter DJ, Dawid AP, Lauritzen SL, Cowell RG, Bayesian Analysis in Expert Systems, Statistical Science. 8 (219–47) (1993) 29. [Google Scholar]
  • [41].Shortliffe EH, Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF, The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks, Springer, Berlin Heidelberg, 1989. [Google Scholar]
  • [42].Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O, The eICU Collaborative Research Database, a freely available multi-center database for critical care research, Sci Data. 5 (2018) 180178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Li J, Yan XS, Chaudhary D, Avula V, Mudiganti S, Husby H, et al. , Imputation of missing values for electronic health record laboratory data, Npj Digital Medicine. 4 (2021) 147 -. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Schwarz G, Estimating the Dimension of a Model, The Annals of Statistics. 6 (461–4) (1978) 4. [Google Scholar]
  • [45].Ajmal HB, Madden MG, Dynamic Bayesian Network Learning to Infer Sparse Models From Time Series Gene Expression Data, Ieee-Acm Transactions on Computational Biology and Bioinformatics. 19 (2022) 2794–2805. [DOI] [PubMed] [Google Scholar]
  • [46].Heckerman D, Geiger D, Chickering DM, Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning. 20 (1995) 197–243. [Google Scholar]
  • [47].Azzimonti L, Corani G, Scutari M, A Bayesian hierarchical score for structure learning from related data sets, International Journal of Approximate Reasoning. 142 (2022) 248–265. [Google Scholar]
  • [48].Cano A, Gómez-Olmedo M, Masegosa AR, Moral S, Locally averaged Bayesian Dirichlet metrics for learning the structure and the parameters of Bayesian networks, International Journal of Approximate Reasoning. 54 (2013) 526–540. [Google Scholar]
  • [49].Rusek J, Firek K, Slowik L, Extracting structure of Bayesian network from data in predicting the damage of prefabricated reinforced concrete buildings in mining areas, Eksploatacja I Niezawodnosc-Maintenance and Reliability. 22 (2020) 658–666. [Google Scholar]
  • [50].Gasse M, Aussem A, Elghazel H. An experimental comparison of hybrid algorithms for Bayesian network structure learning. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part I 23: Springer; 2012. p. 58–73. [Google Scholar]
  • [51].Zheng X, Aragam B, Ravikumar P, Xing EP, Continuous Optimization for Structure Learning. Adv Neur In, DAGs with NO TEARS, 2018, p. 31. [Google Scholar]
  • [52].Wang H, Li J, Zhu G, A Data Feature Extraction Method Based on the NOTEARS Causal Inference Algorithm, Applied Sciences. 13 (2023) 8438. [Google Scholar]
  • [53].Gheisari S, Meybodi MR, A new reasoning and learning model for Cognitive Wireless Sensor Networks based on Bayesian networks and learning automata cooperation, Computer Networks. 124 (2017) 11–26. [Google Scholar]
  • [54].Yan K, Fang W, Lu H, Zhang X, Sun J, Wu X, Mutual Information-Guided GA for Bayesian Network Structure Learning, IEEE Transactions on Knowledge and Data Engineering. 1–16 (2022). [Google Scholar]
  • [55].Cai XY, Wu GM, Zhang J, Yang LC, Risk Factors for Acute Kidney Injury in Adult Patients With COVID-19: A Systematic Review and Meta-Analysis, Frontiers in Medicine. 8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Liu YF, Zhang Z, Pan XL, Xing GL, Zhang Y, Liu ZS, et al. , The chronic kidney disease and acute kidney injury involvement in COVID-19 pandemic: A systematic review and meta-analysis, Plos One. 16 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K, The impact of automated parameter optimization on defect prediction models, IEEE Transactions on Software Engineering. 45 (2018) 683–711. [Google Scholar]
  • [58].Balogun AO, Basri S, Mahamad S, Abdulkadir SJ, Capretz LF, Imam AA, et al. , Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction, Electronics. 10 (2021) 179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Sorino C, Scichilone N, Pedone C, Negri S, Visca D, Spanevello A, When kidneys and lungs suffer together, J Nephrol. 32 (2019) 699–707. [DOI] [PubMed] [Google Scholar]
  • [60].Alge J, Dolan K, Angelo J, Thadani S, Virk M, Akcan AA, Two to Tango: Kidney-Lung Interaction in Acute Kidney Injury and Acute Respiratory Distress Syndrome, Front Pediatr. 9 (2021) 744110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Lun Z, Mai Z, Liu L, Chen G, Li H, Ying M, et al. , Hypertension as a Risk Factor for Contrast-Associated Acute Kidney Injury: A Meta-Analysis Including 2,830,338 Patients, Kidney Blood Press Res. 46 (2021) 670–692. [DOI] [PubMed] [Google Scholar]
  • [62].Nie S, Feng Z, Tang L, Wang X, He Y, Fang J, et al. , Risk Factor Analysis for AKI Including Laboratory Indicators: a Nationwide Multicenter Study of Hospitalized Patients, Kidney Blood Press Res. 42 (2017) 761–773. [DOI] [PubMed] [Google Scholar]
  • [63].Wang YJ, Li JH, Guan Y, Xie QH, Hao CM, Wang ZX, Diabetes mellitus is a risk factor of acute kidney injury in liver transplantation patients, Hepatobiliary & Pancreatic Diseases International. 20 (2021) 215–221. [DOI] [PubMed] [Google Scholar]
  • [64].Shen X, Lv KM, Hou BC, Ao QG, Zhao JH, Yang G, et al. , Impact of Diabetes on the Recurrence and Prognosis of Acute Kidney Injury in Older Male Patients: A 10-Year Retrospective Cohort Study, Diabetes Therapy. 13 (2022) 1907–1920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Zachariah S, Kumar K, Lee SWH, Choon WY, Naeem S, Leong C, Chapter 7 - Interpretation of Laboratory Data and General Physical Examination by Pharmacists, in: Thomas D (Ed.), Clinical Pharmacy Education, Elsevier, Practice and Research, 2019, pp. 91–108. [Google Scholar]
  • [66].Edelstein CL. Chapter Six - Biomarkers in Acute Kidney Injury. In: Edelstein CL, editor. Biomarkers of Kidney Disease (Second Edition): Academic Press; 2017. p. 241–315. [Google Scholar]
  • [67].Boyacioglu H, Yola BB, Karaman C, Karaman O, Atar N, Yola ML, A novel electrochemical kidney injury molecule-1 (KIM-1) immunosensor based covalent organic frameworks-gold nanoparticles composite and porous NiCo2S4@CeO2 microspheres: The monitoring of acute kidney injury, Applied Surface Science. 578 (2022). [Google Scholar]
  • [68].Lerman LO, Rodriguez-Porcel M, Romero JC, The development of x-ray imaging to study renal function, Kidney International. 55 (1999) 400–416. [DOI] [PubMed] [Google Scholar]
  • [69].Yagi K, Fujii T, Management of acute metabolic acidosis in the ICU: sodium bicarbonate and renal replacement therapy, Critical Care. 25 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70].Mannino DM, McBurnie MA, Tan W, Kocabas A, Anto J, Vollmer WM, et al. , Restricted spirometry in the Burden of Lung Disease Study, International Journal of Tuberculosis and Lung Disease. 16 (2012) 1405–1411. [DOI] [PubMed] [Google Scholar]
  • [71].Lee SN, Ko SH, Her SH, Han K, Moon D, Kim SK, et al. , Association between lung function and the risk of atrial fibrillation in a nationwide population cohort study, Scientific Reports. 12 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Kim MA, Shin SW, Park JS, Uh ST, Chang HS, Bae DJ, et al. , Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis, Allergy Asthma & Immunology Research. 9 (2017) 483–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [73].Lee HW, Lee HJ, Lee JK, Park TY, Heo EY, Kim DK, Rapid FEV1 Decline and Lung Cancer Incidence in South Korea, Chest. 162 (2022) 466–474. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES