AI-aided general clinical diagnoses verified by third-parties with dynamic uncertain causality graph extended to also include classification

Zhan Zhang; Yang Jiao; Mingxia Zhang; Bing Wei; Xiao Liu; Juan Zhao; Fengwei Tian; Jie Hu; Qin Zhang

doi:10.1007/s10462-021-10109-w

. 2022 Jan 29;55(6):4485–4521. doi: 10.1007/s10462-021-10109-w

AI-aided general clinical diagnoses verified by third-parties with dynamic uncertain causality graph extended to also include classification

Zhan Zhang ^1,^✉, Yang Jiao ^2,^#, Mingxia Zhang ^3,^#, Bing Wei ^4,^#, Xiao Liu ^5,^#, Juan Zhao ^6,^#, Fengwei Tian ^7,^#, Jie Hu ^8,^#, Qin Zhang ^1,^9,^✉

PMCID: PMC8800413 PMID: 35125607

Abstract

Artificial intelligence (AI)-aided general clinical diagnosis is helpful to primary clinicians. Machine learning approaches have problems of generalization, interpretability, etc. Dynamic Uncertain Causality Graph (DUCG) based on uncertain casual knowledge provided by clinical experts does not have these problems. This paper extends DUCG to include the representation and inference algorithm for non-causal classification relationships. As a part of general clinical diagnoses, six knowledge bases corresponding to six chief complaints (arthralgia, dyspnea, cough and expectoration, epistaxis, fever with rash and abdominal pain) were constructed through constructing subgraphs relevant to a chief complaint separately and synthesizing them together as the knowledge base of the chief complaint. A subgraph represents variables and causalities related to a single disease that may cause the chief complaint, regardless of which hospital department the disease belongs to. Verified by two groups of third-party hospitals independently, total diagnostic precisions of the six knowledge bases ranged in 96.5–100%, in which the precision for every disease was no less than 80%.

Keywords: Clinical diagnosis, Classification, Generalization, Causality, Uncertainty, Probabilistic reasoning

Introduction

AI-aided clinical diagnosis can help clinicians working at primary hospitals and clinics to avoid or reduce misdiagnoses and missing diagnoses. The ML models based on processed big data are well known, e.g., convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN) and Bayesian network (BN) (Fukushima and Neocognitron 1982; Lo et al. 1995; Russakovsky et al. 2015; Szegedy et al. 2015; Brosch et al. 2016; Shin et al. 2016; Duraisamy and Emperumal 2017; Bardou et al. 2018; Christodoulidis et al. 2017; Lin et al. 2018; Er et al. 2016; Ceccon et al. 2014), etc. However, most of them are applied to solve image and speech recognitions. AI-aided general clinical diagnosis is really needed in practice but is relatively rare. References (Wu et al. 2018) and (Liang et al. 2019) report two deep learning models that can perform general clinical diagnoses. However, it is not clear whether or not they have the same precisions when being applied in different application scenarios as being achieved in the testing dataset, which is called the generalization problem, although some comparisons between the models and clinicians have been made. The real world (primary level) applications are not qualified to judge the precisions, because of lacking the medical check measures, professional knowledge and experience. It is reasonable to doubt the generalization ability of the two models described in Wu et al. (2018) and (Liang et al. 2019), because the essence of deep learning is to establish a nonlinear mapping between the input (combinations of variable states including unknown states) and output (diseases) by adjusting the structure and parameters of the neural network. When the actual application scenario is different from the dataset in terms of sample space, which is common, the precision may drop, leading to the generalization problem.

In the general clinical diagnoses, there are at least 10,000 input variables. Each variable has at least 3 states: negative, positive and unknown. Thus, the number of state combinations of input variables are at least 3^10,000 = 1.6 × 10⁴⁷⁷¹, a huge number. The training and testing datasets cover only a small part of these state combinations, which is called the training and testing sample space (TTSS). The real application sample spaces (RASSs) are usually different from TTSS, while different application scenarios may have different RASSs. Thus, the mapping in TTSS may be different from that in RASSs. How the trained model based on TTSS can be applied in different RASSs needs to be verified. In fact, our experience is that the diagnostic precision drops significantly in real applications.

Moreover, how to ensure the model be able to diagnose the rare diseases is another problem, where the common diseases are relatively easy to be diagnosed even by primary clinicians and the rare diseases are really needed to be differentially diagnosed by the AI-aided models, which means that we need not only the high precision in total but also the high precision for each disease including rare diseases. Note that the common diseases are the majority in the training and testing datasets and the rare diseases may be marginalized in ML models, while the high precisions can still be achieved in the testing dataset due to the high proportion of common diseases. For the example of arthralgia shown in Table 4 in this paper, five common diseases (Gout, SLE, Osteoarthritis, RA and Trauma) have 95.8% case records in group 1, which implies that once the five diseases are correctly diagnosed, the total diagnostic precision will be 95.8%, even the diagnoses for the other 18 diseases are all incorrect. In practice, the correct diagnoses for the other 18 diseases are really needed.

Table 4.

The precisions of the third-party verifications for arthralgia, in which the diseases with “*” are not included in Group 2

Disease	Total number of cases in Group 1; Group 2	Randomly selected and tested cases in Group 1; Group 2	Correct diagnoses in Group 1; Group 2	Precision in Group 1; Group 2 (%)
Gout	1129; 1733	10; 10	10; 10	100; 100
SLE	808; 1861	10; 10	10; 10	100; 100
PsA	14; 488	10; 10	10; 10	100; 100
Polymyositis	5; 184	5; 10	5; 10	100; 100
Sjögren's syndrome	95; 452	10; 10	10; 10	100; 100
Osteoarthritis	1388; 2586	10; 10	10; 10	100; 100
RA	2282; 3999	10; 10	10; 10	100; 100
Reactive arthritis	30; 76	10; 10	10; 10	100; 100
TB	67; 2074	10; 10	9; 10	90; 100
AS	44; 339	10; 10	10; 10	100; 100
AOSD	4; 80	4; 10	4; 10	100; 100
Infectious arthritis	5; 54	5; 10	5; 10	100; 100
SSc	9; 161	9; 10	9; 10	100; 100
Pseudogout	0; 2	0; 2	; 2	; 100
Brucellosis	1; 0	1; 0	1;	100;
Lyme disease	0; 0	0; 0	;	;
Sub-total	5881; 14,089	114; 132	113; 132	99.12; 100
Trauma*	876;	10;	10;	100;
RPC*	0;	0;	;	;
PMR*	0;	0;	;	;
Vasculitis*	0;	0;	;	;
Sarcoidosis*	0;	0;	;	;
Sports injury*	5;	5;	5;	100;
Rheumatic fever*	4;	4;	4;	100;
Total	6766;14,089	133;132	132;132	99.25; 100

Open in a new tab

SLE: systemic lupus erythematosus; RA: Rheumatoid arthritis; TB: Tuberculosis; PsA: psoriatic arthritis; AS: ankylosing spondylitis; AOSD: Adult Still's disease; SSc: systemic sclerosis

Furthermore, because of the black box problem of deep learning models, the two models described in Wu et al. (2018) and (Liang et al. 2019) lack interpretability.

To solve these problems, the model based on the domain knowledge/causality is needed, because domain knowledge has invariance, which is essentially different from ML models basing on big data.

DUCG developed in recent years is such a model (Zhang 2012, 2015a, b; Zhang et al. 2014, 2018; Zhang and Geng 2015; Zhang and Zhang 2016; Zhang and Yao 2018) and has achieved promising application results for fault diagnoses of large, complex industrial systems (Zhang and Yao 2018; Zhang et al. 2018; Dong et al. 2014a, 2018; Qu et al. 2015; Zhao et al. 2014; Geng and Zhang 2014) and general clinical diagnoses (Dong et al. 2014b; Hao et al. 2017; Fan et al. 2018; Jiao et al. 2020; Ning et al. 2020; Zhang et al. 2021).

It is noted that the existing DUCG model is entirely based on causalities. However, in many practical cases, some non-causal knowledge representations and associated probabilistic reasoning are needed. For example, when representing an uncertain causal relationship between a disease and a blood routing test, it is desirable to use the blood routing test as an inspection type variable, and to use the results of the test as its consequential variables. However, there is no direct causal relationship between the disease and the blood routing test itself, because the blood routing test is not the consequence of the disease. What actually exists is the uncertain causal relationship between the disease and the blood routing test results, i.e. the indicators. On the other hand, such causalities cannot be represented intuitively without the blood routing test variable, where the test is an action to find the consequences/indicators of diseases. In the hierarchical domain knowledge representation, the action is actually a classifier between the disease and the indicators. To solve this problem, the classification type (C-type) variable along with its unit matrix I drawn as its input directed arc is introduced as illustrated in Figs. 1 and 2.

Fig. 1 — The case without C-type variable

It is proved in Sect. 3 that the DUCG without C-type variables is equivalent to that with them in the sense of inference. The former is resulted from the latter and is really used in the invisible DUCG inference, because the former is obviously easier to compute than the latter, while the latter remains as the visible knowledge base for better DUCG construction and interpretability.

Six DUCG knowledge bases including C-type variables for clinical diagnoses were constructed by clinical experts at Peking Union Medical College Hospital, Beijing Hospital, Xuanwu Hospital and Youan Hospital of Capital Medical University, Beijing, China. The diagnostic precisions were verified by two groups of third-party hospitals. Group 1 was Suining Central Hospital, Sichuan, China, which has a long history of more than 100 years. Group 2 was six hospitals officially organized as a whole by Chongqing Science and Technology Commission: West-South Hospital, Daping Hospital, The Second Affiliated Hospital of Chongqing Medical University, Chongqing Tumor Hospital, Chongqing Traditional Chinese Medicine Hospital (CTCMH) and Wanzhou Central Hospital, Chongqing, China. In which CTCMH was the leading unit. All hospitals are the Grade IIIA (the highest grade in China) hospitals and are located in southwest of China, far from Beijing where the knowledge bases were constructed. The verification results of the two groups are close to each other. Therefore, the generalization ability of DUCG were verified, which means that the DUCG-aided general clinical diagnoses can be applied in any application scenario without generalization problem that usually exists in ML models.

Section 2 introduces DUCG briefly. Section 3 presents the C-type variable methodology. Section 4 applies the C-type variable methodology to the diagnoses of six chief complaints. Two groups of third-party verifications were made. Section 5 summarizes this paper.

Brief Introduction to DUCG

DUCG is a newly developed model that can explicitly and graphically represent causalities with uncertainties and perform probabilistic reasoning. In clinical diagnoses, it can easily represent various complex and uncertain causalities between diseases (root causes) and risk factors, symptoms, signs, image findings and laboratory results, etc., namely the observations or evidences. Conditional on the evidences collected for each patient, DUCG calculates the conditional probabilities of the found possible diseases, and thus performs intelligent diagnoses with clear casual and mathematical meanings (Zhang et al. 2021). To have the primary clinicians take responsibilities instead of DUCG, DUCG’s strong interpretability in knowledge bases, diagnostic results and computation process are very important.

DUCG is composed of two sub-models: single-valued DUCG (S-DUCG) and multivalued DUCG (M-DUCG). The so called single-valued means that only the causes of the true state of a child variable can be specified, while the false state is the complement of the true state. The so-called multivalued means that the causes of every state of a variable can be specified separately (Zhang 2012). In this paper, only M-DUCG is addressed and therefore is abbreviated as DUCG. Figure 3 is an illustrative DUCG. The symbols are described in Table 1. The basic idea of the DUCG model is shown in Fig. 4.

Table 1.

Graphical Symbols Used in DUCG

Open in a new tab

Fig. 4 — The basic idea of M-DUCG model (abbreviated as DUCG in this paper), in which V ∈ {B, D, G, X, C, SX, BX, RG}

For simplicity, the subscript j_i in Fig. 3 is abbreviated as j. The rectangular node B_n is the basic or root cause event variable, without any input, and B_nj is state j of B_n. The circular node X_n is the result event variable, X_nj is state j of X_n, and X_n can be both the cause/input and the consequence/output of other nodes. The pentagonal node D_n is the default or unknown cause event of X_n or X_nj, without any input, and its occurrence probability is defined as 1. The hexagonal node SX_n is a special X-type event variable, and SX_nj is state j of SX_n. When SX_nj occurs, where j ≠ 0 and 0 indicates normal state, a particular disease or variable state must be true with a certain confidence θ, and therefore SX_nj is called gold-criterion in clinical diagnosis. The double-circle node BX_n is a B&X-type variable with both B and X properties. Its state division and definition are exactly the same as B_n, and only the state probability distribution of BX_n may be different from B_n (affected by the associated risk factors). The logic gate variable G_n represents the various state combinations of the input variable and its input is connected with a directed arc Inline graphic . The double line logic gate SG_n represents various state combinations of the associated risk factors (such as age, gender, etc., represented as X-type variables), changing the state probability distribution of B_n as that of BX_n. The output of SG_n is BX_n, through a double-line directed arc Inline graphic that zooms in or zooms out the state probabilities of B_n as that of BX_n according to the combinations of risk factors. The reversal logic gate RG_n drawn as represents that the input of RG_n may cause some combinations of output. The single-line directed arc represents the causality matrix F_n;i = (r_n;i/r_n)A_n;i, where A_nk;ij is the element in the matrix A_n;i, A_nk;ij is the virtual random event that the parent event V_ij (V ∈ {B, X, D, G, BX, SX}) causes the child event X_nk (including SX_nk) directly. r_n;i > 0 is the strength of the causal relationship between V_i and X_n, $r_{n} \equiv \sum_{i} r_{n ; i}$ . The dashed directed arcs Inline graphic or is conditional or respectively, conditional on condition event Z_n;i, where n indexes the child/output and i indexes the parent/input. When Z_n;i is true, or becomes or respectively; otherwise, or is eliminated.

In DUCG, the upper-case letter represents event or event variable and the corresponding lower-case letter represents the probability, i.e., b_nj = Pr{B_nj}, bx_nj = Pr{BX_nj}, x_nj = Pr{X_nj}, sx_nj = Pr{SX_nj}, g_nj = Pr{G_nj}, rg_nj = Pr{RG_nj}, d_n = Pr{D_n}≡1, z_n;i = Pr{Z_n;i}, f_nk;ij = Pr{F_nk;ij} = (r_n;i/r_n)a_nk;ij, a_nk;ij = Pr{A_nk;ij}, f_n;i = Pr{F_n;i}, a_n;i = Pr{A_n;i}, etc. The indices before “;” are for the child and the indices after “;” are for the parent. The {a-, b-, r-}-type parameters are usually given by domain experts based on statistics or their experience. Note that the main formulas of DUCG are in the form of numerator divided by denominator (see (Zhang et al. 2021) for details). Therefore, only the relative values of parameters are sensitive, not the absolute values, which means that the parameters are easy to be given by clinical experts.

The variable index is inside the symbol without the letter of the variable type. The symbol shape represents the variable type. State index 0 denotes the normal/negative state, while the other states indicate abnormal/positive states. Moreover, V_nj ∈ {X_nj, SX_nj, RG_nj}, j ≠ 0, is assigned with attention parameter ε_nj ≥ 1 that quantifies the attention of domain experts to explain the cause of V_nj. If no cause can be found, a virtual D_n drawn as dashed pentagon will be assigned as the default cause of V_nj according to the DUCG simplification rule 10 listed in the Appendix of Zhang et al. (2021), and a_nj;nD between V_nj and the virtual D_n is defined as a_nj;nD = 1/ε_nj, in which the index D indicates the invariable state of D_n. In this case, V_nj is called the isolated evidence. Also, 0 < θ_nj ≤ 1 is assigned to SX_nj to quantify the confidence that the specific disease does exist given SX_nj, where j ≠ 0. Ref. (Zhang et al. 2021) gives more details.

As shown in Fig. 4, the above events and probabilities satisfy Eqs. (1) and (2) respectively:

X_{nk} = \sum_{i} F_{n k ; i j} V_{ij} = \sum_{i} (r_{n ; i} / r_{n}) A_{n k ; i j} V_{ij}

x_{nk} = \sum_{i} f_{n k ; i j} v_{ij} = \sum_{i} (r_{n ; i} / r_{n}) a_{n k ; i j} v_{ij}

In which, F_n;i≡(r_n;i/r_n)A_n;i and f_n;i≡(r_n;i/r_n)a_n;i. F_nk;ij≡(r_n;i/r_n)A_nk;ij, f_nk;ij≡(r_n;i/r_n)a_nk;ij and a_nk;ij = Pr{A_nk;ij}, where F_nk;ij, f_nk;ij, A_nk;ij and a_nk;ij are members of F_n;i, f_n;i, A_n;i and a_n;i respectively. In the case of only one input to X_n, F_n;i = A_n;i and f_n;i = a_n;i.

Equation (1) can be repeatedly applied until the expression becomes the sum-of-products composed of {BX-, D-, A-, r-}-type events and parameters, which is the event expanding process, and then the probability of the expression can be calculated by replacing the upper-case letters with the corresponding lower-case letters as illustrated in Eqs. (1) and (2). The state probability distribution of BX_k can be calculated from $b x_{km} = s a_{k m ; k j} b_{km}$ , where sa_km;kj is the zoom factor transforming b_km to bx_km (see (7) in Zhang et al. (2021) for details). Then, BX-type variables can be treated as root causes/diseases.

The evidences can be written as $E = ⋂_{i} X_{ij}$ . The diagnostic inference is to calculate the conditional probability Pr{BX_kj|E} = Pr{BX_kjE}/Pr{E}, BX_kj ∈ S_H, S_H is the possible disease set conditional on E. We need to expand E as the sum-of-products composed of {BX-, D-, A-, r-}-type events and parameters. In which, logic computations such as absorption and exclusion and the r-type parameter calculation are applied.

In general, Eq. (3) is satisfied, in which “1” denotes complete set.

\sum_{k} A_{n k ; i j} = 1 ; \sum_{j} B_{ij} = 1

Based on Eq. (3), we have the following theorem expressed as Eq. (4).

Theorem 1

\sum_{k} X_{nk} = \sum_{k} \sum_{i, j} F_{n k ; i j} V_{ij} = \sum_{i, j} (r_{n ; i} / r_{n}) (\sum_{k} A_{n k ; i j}) V_{ij} = 1

Which means that the causality chains in DUCG are self-relied. Therefore, we do not need to specify all parameters in a_n;i. For example, we may have Eq. (5).

a_{5 ; 3} = Pr {A_{5 ; 3}} = (\begin{matrix} a_{5, 0 ; 3, 0} & a_{5, 0 ; 3, 1} & a_{5, 0 ; 3, 2} \\ a_{5, 1 ; 3, 0} & a_{5, 1 ; 3, 1} & a_{5, 1 ; 3, 2} \\ a_{5, 2 ; 3, 0} & a_{5, 2 ; 3, 1} & a_{5, 2 ; 3, 2} \end{matrix}) = (\begin{matrix} - & - & - \\ - & - & 0.9 \\ - & 0.2 & - \end{matrix})

Which means that we can specify only the parameters in concern. In other words, for a variable whose state is normal (indexed by 0), we do not care about the causality and probability related to this state. What we are interested in is the causality between abnormal states. For example, a certain disease B_ij (j ≠ 0) causes a certain abnormal state X_nk (k ≠ 0), where X_n may represent a medical check result. We also do not care about the unconditional probability b_i0 (i.e. without disease). That is to say, b_i0, a_n0;ij and a_nk;i0 in {a-, b-}-type matrices do not need to be given. Usually, we express b_i0, a_n0;ij and a_nk;i0 as “ − ” or blank, which is equivalent to null set in expanding E.

The DUCG diagnostic inference is to calculate the probability distribution of BX_i affected by risk factors observed for a patient and calculate Pr{BX_kj|E} = Pr{BX_kjE}/Pr{E}, j ≠ 0, in which BX_kj ∈ S_H is composed of the abnormal states of BX-type variables. S_H is the set of possible diseases conditional on E, and is found by the logical expanding and simplification of DUCG. The appendix in Zhang et al. (2021) lists the DUCG simplification rules. The detailed inference algorithm can be found in Zhang (2012)-(Zhang and Zhang 2016; Zhang et al. 2021).

Introducing C-type Variables to Extend DUCG to Include Classification Relationship

The basic idea

Consider Fig. 5, where B₁ represents pituitary prolactin adenoma, X₂ indicates whether thyroid function is normal, X₃ indicates whether TSH (Thyroid Stimulating Hormone) is low, and X₄ indicates whether FT3 (free triiodothyronine) is low.

In Fig. 5, the hierarchy and relationships are clearly represented. It also embodies the medical knowledge of the disease, that is, pituitary prolactinoma (B₁) may cause thyroid function abnormal (X₂), and these abnormalities are manifested as TSH (X₃) and FT3 (X₄). However, problems are exposed when assigning values to the a-type matrices for each directed arc. Since A_2;1 is a causal event matrix between pituitary prolactinoma B₁ and thyroid function X₂, a_2,1;1,1 should be the probability of thyroid dysfunction caused by pituitary prolactinoma. Since A_3;2 is a causal event matrix representing the causality from thyroid function X₂ to TSH (X₃), a_3,1;2,1 should be the probability that thyroid dysfunction (X_2,1) triggers low TSH (X_3,1). Similarly, a_4,1;2,1 should be the probability of thyroid dysfunction (X_2,1) triggering low FT3 (X_4,1). But this is obviously wrong, because the real causal relationship is: X_3,1 and X_4,1 are the causes of X_2,1, not the opposite. At the same time, there is no direct causal relationship between B₁ and X₂. It is an indirect causal relationship with X₂ through X₃ and X₄, and the direction is opposite. According to the expression in Fig. 5, the inference results of DUCG and the diagnosis results of clinical experts will be inconsistent, because the knowledge of the clinical experts is actually as shown in Fig. 6. In other words, Fig. 5 is incorrect. This example illustrates how easy the mistake may occur without classification variables.

Fig. 6 — The actual causal relationship about pituitary prolactinoma

To solve this problem, we introduce C-type variable along with I matrix as follows:

Definition 1

The state partition of the classification variable C_n drawn as Inline graphic is identical to its parent variable i, F_n;i is fixed as a unit matrix $I_{n ; i} = (\begin{matrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & ⋮ \\ ⋮ & ⋮ & ⋱ & 0 \\ 0 & \dots & 0 & 1 \end{matrix})$ , and F_m;n is actually the causality between cause variable i and consequence variable m.

Equivalently, f_n;i = I_n;i, because f_n;i = Pr{F_n;i} = Pr{I_n;i} = I_n;i. Note that “1” in DUCG stands for both numerical one and complete set. With this definition, Fig. 6 can be better represented as Fig. 7.

In Fig. 7, according to Definition 1, f_2;1 = I_2;1, and f_3;2 and f_4;2 equal to f_3;1 and f_4;1 in Fig. 6 respectively.

Theorem 2

In the sense of inference, the DUCG with C-type variable along with its corresponding I matrix is equivalent to the DUCG without C-type variables.

Theorem 2 constitutes the inference algorithm of the DUCG with C-type variables, i.e. we can use the C-type variables along with I matrices to construct the DUCG with C-type variables, while the corresponding DUCG without C-type variables is really used in the DUCG inference. The latter is resulted from the former by (1) the elimination of C-type variables along with I directed arcs and (2) the connections between the cause and consequences of the C-type variable in the former. i.e., simplify Fig. 7 as Fig. 6. The inference equivalence is proved in follows:

Proof

First, we prove a simple case, i.e. Figures 6 and 7 are equivalent in inference. For this, we only need to prove that Pr{B₁X₃X₄} in Fig. 6 and in Fig. 7 are equal. According to Fig. 6 and Eq. (1), we have.

\begin{matrix} Pr {B_{1} X_{3} X_{4}} = Pr {B_{1} (F_{3 ; 1} B_{1} \cdot F_{4 ; 1} B_{1})} \\ = Pr {(F_{3 ; 1} * F_{4 ; 1}) B_{1}} \\ = (f_{3 ; 1} * f_{4 ; 1}) b_{1} \end{matrix}

In which the operator “*” indicates to multiply the corresponding elements in the two matrices as defined in Corollary 151 in Zhang et al. (2014). According to Fig. 7 and Eq. (1), we have

\begin{matrix} Pr {B_{1} X_{3} X_{4}} = & Pr {B_{1} (F_{3 ; 2} C_{2} \cdot F_{4 ; 2} C_{2})} \\ = & Pr {B_{1} (F_{3 ; 2} * F_{4 ; 2}) C_{2}} \\ = & Pr {B_{1} (F_{3 ; 2} * F_{4 ; 2}) I_{2 ; 1} B_{1}} \\ = & Pr {(F_{3 ; 2} * F_{4 ; 2}) B_{1}} \\ = & (f_{3 ; 2} * f_{4 ; 2}) b_{1} \\ = & (f_{3 ; 1} * f_{4 ; 1}) b_{1} \end{matrix}

The last step in Eq. (7) is because f_3;2 in Fig. 7 equals to f_3;1 in Fig. 6, and f_4;2 in Fig. 7 equals to f_4;1 in Fig. 6. Thus, we have Eq. (7) equals to Eq. (6).

Obviously, the above proof can be applied in the case when the child variables of B₁ in Fig. 6 and C₂ in Fig. 7 are increased, which covers all cases of theorem 2. ■

According to Theorem 2, we can use Fig. 7 to express the medical hierarchical knowledge in the DUCG editor, automatically change Fig. 7 as Fig. 6 in the invisible inference, and perform the inference according to Fig. 6.

More details are addressed in follows.

Single parent

In Fig. 8, C₃ has more than one parent, where the real causalities that we want to represent are as shown in Fig. 9. However, Fig. 8 may cause some trouble.

Suppose evidence E = X_5,1X_6,2, and f_5;3 and f_6;3 are given as follows:

$f_{5 ; 3} = (\begin{matrix} - & - \\ - & f_{5, 1 ; 3, 1} \end{matrix})$ , $f_{6 ; 3} = (\begin{matrix} - & - \\ - & f_{6, 1 ; 3, 1} \end{matrix})$ .

Based on Fig. 8, we have f_3;1 = I_3;1 and f_3;2 = I_3;2 as defined. According to Eq. (1), we have

\begin{matrix} Pr {E} = & Pr {X_{5, 1} X_{6, 2}} \\ = & Pr {F_{5, 1 ; 3, 1} C_{3, 1} \cdot F_{6, 2 ; 3, 1} C_{3, 1}} \\ = & Pr \{(F_{5, 1 ; 3, 1} * F_{6, 2 ; 3, 1}) C_{3, 1}\} \\ = & Pr \{(F_{5, 1 ; 3, 1} * F_{6, 2 ; 3, 1}) (\frac{r_{3 ; 1}}{r_{3}} I_{3, 1 ; 1} B_{1} + \frac{r_{3 ; 2}}{r_{3}} I_{3, 1 ; 2} B_{2})\} \\ = & Pr \{(F_{5, 1 ; 3, 1} * F_{6, 2 ; 3, 1}) (\frac{r_{3 ; 1}}{r_{3}} B_{1, 1} + \frac{r_{3 ; 2}}{r_{3}} B_{2, 1})\} \\ = & (f_{5, 1 ; 3, 1} * f_{6, 2 ; 3, 1}) (\frac{r_{3 ; 1}}{r_{3}} b_{1, 1} + \frac{r_{3 ; 2}}{r_{3}} b_{2, 1}) \end{matrix}

However, based on Fig. 9, we have

\begin{matrix} Pr {E} = & Pr \{X_{5, 1} X_{6, 2}\} \\ = & Pr \{(F_{5, 1 ; 1} B_{1} + F_{5, 1 ; 2} B_{2}) (F_{6, 1 ; 1} B_{1} + F_{6, 1 ; 2} B_{2})\} \\ = & Pr \{\begin{matrix} F_{5, 1 ; 1} B_{1} F_{6, 1 ; 1} B_{1} + F_{5, 1 ; 1} B_{1} F_{6, 1 ; 2} B_{2} \\ + F_{5, 1 ; 2} B_{2} F_{6, 1 ; 1} B_{1} + F_{5, 1 ; 2} B_{2} F_{6, 1 ; 2} B_{2} \end{matrix}\} \\ = & Pr \{\begin{matrix} (F_{5, 1 ; 1} * F_{6, 1 ; 1}) B_{1} + F_{5, 1 ; 1} B_{1} F_{6, 1 ; 2} B_{2} \\ + F_{5, 1 ; 2} B_{2} F_{6, 1 ; 1} B_{1} + (F_{5, 1 ; 2} * F_{6, 1 ; 2}) B_{2} \end{matrix}\} \\ = & (f_{5, 1 ; 1} * f_{6, 1 ; 1}) b_{1} + f_{5, 1 ; 1} b_{1} f_{6, 1 ; 2} b_{2} \\ + f_{5, 1 ; 2} b_{2} f_{6, 1 ; 1} b_{1} + (f_{5, 1 ; 2} * f_{6, 1 ; 2}) b_{2} \end{matrix}

Equation (9) is not equal to Eq. (8). To solve this problem, we have the following definition:

Definition 2

Each C-type variable can have only one parent variable, while different C-type variables may be the same in content.

Thus, Fig. 8 is changed as Fig. 10, in which C₃ = C₄. As defined, f_5;3, f_5;4, f_6;3 and f_6;4 in Fig. 10 equal to f_5;1, f_5;2, f_6;1 and f_6;2 in Fig. 9 respectively.

Based on Fig. 10, we have Eq. (10).

\begin{matrix} Pr {E} = & Pr \{X_{5, 1} X_{6, 2}\} \\ = & Pr \{(F_{5, 1 ; 3} C_{3} + F_{5, 1 ; 4} C_{4}) (F_{6, 1 ; 3} C_{3} + F_{6, 1 ; 4} C_{4})\} \\ = & Pr \{\begin{matrix} F_{5, 1 ; 3} C_{3} F_{6, 1 ; 3} C_{3} + F_{5, 1 ; 3} C_{3} F_{6, 1 ; 4} C_{4} \\ + F_{5, 1 ; 4} C_{4} F_{6, 1 ; 3} C_{3} + F_{5, 1 ; 4} C_{4} F_{6, 1 ; 4} C_{4} \end{matrix}\} \\ = & Pr \{\begin{matrix} (F_{5, 1 ; 3} * F_{6, 1 ; 3}) C_{3} + F_{5, 1 ; 3} C_{3} F_{6, 1 ; 4} C_{4} \\ + F_{5, 1 ; 4} C_{4} F_{6, 1 ; 3} C_{3} + (F_{5, 1 ; 4} * F_{6, 1 ; 4}) C_{4} \end{matrix}\} \\ = & Pr \{\begin{matrix} (F_{5, 1 ; 3} * F_{6, 1 ; 3}) I_{3 ; 1} B_{1} + F_{5, 1 ; 3} I_{3 ; 1} B_{1} F_{6, 1 ; 4} I_{4 ; 2} B_{2} \\ + F_{5, 1 ; 4} I_{4 ; 2} B_{2} F_{6, 1 ; 3} I_{3 ; 1} B_{1} + (F_{5, 1 ; 4} * F_{6, 1 ; 4}) I_{4 ; 2} B_{2} \end{matrix}\} \\ = & Pr \{\begin{matrix} (F_{5, 1 ; 3} * F_{6, 1 ; 3}) B_{1} + F_{5, 1 ; 3} B_{1} F_{6, 1 ; 4} B_{2} \\ + F_{5, 1 ; 4} B_{2} F_{6, 1 ; 3} B_{1} + (F_{5, 1 ; 4} * F_{6, 1 ; 4}) B_{2} \end{matrix}\} \\ = & (f_{5, 1 ; 1} * f_{6, 1 ; 1}) b_{1} + f_{5, 1 ; 1} b_{1} f_{6, 1 ; 2} b_{2} \\ + f_{5, 1 ; 2} b_{2} f_{6, 1 ; 1} b_{1} + (f_{5, 1 ; 2} * f_{6, 1 ; 2}) b_{2} \\ = & (f_{5, 1 ; 1} * f_{6, 1 ; 1}) b_{1} + f_{5, 1 ; 1} b_{1} f_{6, 1 ; 2} b_{2} \\ + f_{5, 1 ; 2} b_{2} f_{6, 1 ; 1} b_{1} + (f_{5, 1 ; 2} * f_{6, 1 ; 2}) b_{2} \end{matrix}

It is seen that Eq. (10) equals to Eq. (9), which means that Fig. 10 is equivalent to Fig. 9 in the sense of inference. In conclusion, Fig. 8 is not allowed and Fig. 10 should be used.

Figure 11 shows another case that cannot be represented by one C-type variable. According to Definition 2, the corresponding DUCG with C-type variables should be as shown in Fig. 12. It is easy to prove that Figs. 11 and 12 are equivalent to each other in inference.

Fig. 11 — The causalities between causes and consequences/indicators

Fig. 12 — The corresponding DUCG with C-type variables but different indicators

Normalizing paths

In practice, the repeated paths shown in Fig. 13 are possible. These repeated paths can be merged, that is, Fig. 13 can be calculated according to Fig. 14.

Figure 14 merges C₈ and C₉ in Fig. 13 into C₁₀. The calculation of the merged parameters is as follows:

First, I_8;4 and I_9;4 in Fig. 13 are merged as I_10;4 in Fig. 14. Next, F_5;10 in Fig. 14 is equal to the sum of F_5;8 = (r_5;8/r₅)A_5;8 and F_5;9 = (r_5;9/r₅)A_5;9 in Fig. 13 as shown in Eq. (11).

\begin{matrix} F_{5 ; 10} = F_{5 ; 8} + F_{5 ; 9} \\ f_{5 ; 10} = f_{5 ; 8} + f_{5 ; 9} \end{matrix}

Theorem 3

Once a group of C-type variables share a same child variable and a same parent variable, this group of C-type variables can be merged as a single C-type variable along with its single I matrix. The merged F-type variable as the only output of the merged C-type variable is the sum of the group of F-type variables as the outputs of the group of C-type variables.

Proof

Suppose the group of C-type variables are C_i, i ∈ S_C. They share a child variable X_n and a parent variable V_m. Let C_j be the merged C-type variable, j ∉ S_C, and F_n;j be the merged F-type variable that is the single output directed arc of the merged C-type variable. According to Eq. (1) and based on the original group of C-type variables, we have.

X_{n} = \sum_{i \in S_{C}} F_{n ; i} C_{i} = \sum_{i \in S_{C}} F_{n ; i} I_{i ; m} V_{m} = (\sum_{i \in S_{C}} F_{n ; i}) V_{m}

Also, according to Eq. (1) but based on the merged C-type variable, we have

X_{n} = F_{n ; j} C_{j} = F_{n ; j} I_{j ; m} V_{m} = F_{n ; j} V_{m}

Let Eq. (12) equal to Eq. (13), we have

F_{n ; j} = \sum_{i \in S_{C}} F_{n ; i}

Of course, the merged DUCG with C-type variable can be replaced in inference by the one without C-type variable.

The Third-Party Verifications

To verify the diagnostic precisions and generalization ability of DUCG, we constructed six DUCG knowledge bases according to six chief complaints respectively, in which the C-type variables were used.

Construction of DUCG with C-type variables

The construction steps are as follows.

Step 1 Determine the diseases that may cause the chief complaints across hospital departments, which means that the diseases are not limited in a specific hospital department and the triage may not be necessary, although the DUCG triage methodology has been presented in Bu et al. (2020).

Step 2 Construct the subgraph for every disease determined in step 1 as illustrated in Figs. 15 and 16 in which the symbols are described in Table 2. In subgraphs, the interpretability of DUCG knowledge bases is well demonstrated.

Fig. 15 — The subgraph with C-type variables for lyme disease under chief complaint arthralgia

Fig. 16 — The subgraph with C-type variables for polymyositis under chief complaint arthralgia

Table 2.

Descriptions of the symbols in Figs. 15 and 16

Symbol	Variable	n	Description
and	B_n and BX_n	3	Lyme disease
		11	Polymyositis
	X_n	7	Erythema migrans
		12	ECG shows cardiac block
		16	Radiculopathy
		17	Experience of field travelling
		36	ESR
		37	CRP
		38	Sex
		40	Conjunctivitis ANA
		58	Conjunctivitis ANA
		60	RF
		62	WBC
		70	HGB
		85	Skin rash
		89	Splenomegaly
		91	Arthralgia (acute or chronic)
		92	Arthralgia (large or small joint)
		93	Arthralgia (axis or peripheral)
		94	Arthralgia (self-limited or aggravating)
		95	CSF-WBC
		96	CSF-P
		99	CSF-PRO
		100	Abnormal ultrasonocardiography
		101	Headache
		102	Nausea
		103	Vomit
		104	Mental disorders
		105	Facial palsy
		106	Meningeal irritation sign
		110	Lymphadenectasis
		111	Hepatomegaly
		140	Chest CT shows interstitial pneumonia
		144	Testis swelling
		145	Borrelia burgdorferi-IgG
		146	Fever
		147	Cerebellar ataxia
		149	AST or ALT
		150	TBIL
		151	DBIL
		158	Myalgia
		161	Dysphagia
		162	Myasthenia
		165	Facet joint of hand pathological change
		172	Arthralgia (quantity)
		175	Limbs proximal myasthenia
		176	Weight loss
		178	Electromyogram shows myogenic muscular atrophy
		179	CK
		180	Dyspnea
		209	Anorexia
	C_n	29	Symptom
		30	Sign
		31	Brucella culture
		32	Other imaging tests
		33	CT
		34	Blood biochemical test
		35	Anti-MCV antibody
		71	PLT
		72	CT shows sacroiliac joint injury
		73	ECG
		74	Ultrasonocardiography
		75	Rheumatic test
		76	Blood RT
		77	CSF RT
		78	CSF biochemical test
		79	Virus and infection related test
		80	Autoimmune antibody test
	SX_n	13	Muscle biopsy shows myositis

Open in a new tab

Step 3 Synthesize the subgraphs under a same chief complaint as a DUCG by fusing the same variables in different subgraphs. For example, the synthesized arthralgia DUCG is as shown in Fig. 17.

Fig. 17 — The DUCG including 23 diseases that may cause arthralgia

Verifications, precisions and comparisons

After the DUCG construction, we tested its correctness carefully by using the case records in the hospital information system (HIS) of the knowledge base constructor’s hospitals as illustrated in Ref. (Zhang et al. 2021). Then, two groups of third-party verifications for six DUCG knowledge bases were performed independently to verify the generalization ability and diagnostic precisions of DUCG. The verifications done by Group 1 contain more diseases than Group 2, because Group 2 did verifications earlier than Group 1 when less diseases were considered. However, the diseases in Group 2 are all included in Group 1, so that we can compare the results of them in a comparable scale. The verifications were performed as follows:

Under each chief complaint, search the cases recorded in the HISs of the third-party hospitals for each disease.
For the total cases searched for each disease, randomly select no more than 10 cases for test.
Check the selected case record to ensure that it is in high quality, otherwise give up the case and make a new selection.
Manually input the evidences found in the tested case record into the DUCG cloud platform developed to implement the DUCG methodology.
Click the DUCG diagnosis function on the platform to find the possible diseases and rank them according to their conditional probabilities.
Compare the diagnosed diseases with the tested case record. If the diagnosed diseases with significant conditional probabilities cover the diseases in the record, and the clinical experts confirm that the diseases not in the record (if any) are also reasonable, label this tested case as “correct,” otherwise label it as “incorrect.” In fact, because of the uncertain quality, norm and format in the records, it was not easy to judge the correctness. In the confusing cases, discussions with clinical experts were the final means to make judgements.
Calculate the precision for each disease by the correct case number divided by the total tested case number of the disease.
Calculate the total precision for the DUCG of the chief complaint by the total correct case number divided by the total tested case number under the chief complaint.

As an example, the arthralgia DUCG verified in Group 1 is as shown in Fig. 17. Total 23 diseases are listed in Table 3, in which the 16 diseases in Group 2 are included. The verification results are shown in Tables 4, 5. The results for the other five chief complaints are in Tables 6, 7, 8, 9, 10 respectively in the Appendix. The total precisions from the two groups are listed and compared in Table 5. Note that the precisions from Group 2 are all 100%.

Table 3.

The 23 diseases that may cause arthralgia, in which the diseases with “*” are not included in Group 2

Variable index	Disease	Abbreviate
1	Pseudogout
2	Reactive arthritis
3	Lyme disease
4	Rheumatoid arthritis	RA
5	gout
6	Adult still's disease	AOSD
7	Systemic lupus erythematosus	SLE
8	Sjögren's syndrome	SS
9	Osteoarthritis	OA
10	Ankylosing spondylitis	AS
11	Polymyositis
12	Infectious arthritis
13	Systemic sclerosis	SSc
14	Psoriatic arthritis	PsA
15	Brucellosis
16	Tuberculosis	TB
39	Trauma*
40	Relapsing polychondritis*	RPC
41	Polymyalgia arteritica*	PMR
42	Vasculitis*
43	Sarcoidosis*
44	Sports injury*
46	Rheumatic fever*

Open in a new tab

Table 5.

The precisions of the third-party verifications for the six chief complaints, wherein the diseases in Group 2 are covered in Group 1

Chief complaint	Number of diseases in Group 1; Group 2	Number of total cases recorded in Group 1; Group 2	Randomly tested cases in Group 1; Group 2	Total precision in Group 1; Group 2 (%)	The lowest precision of a disease in Group 1; Group 2 (%)
arthralgia	23; 16	6766; 14,089	133; 132	99.25; 100	90; 100
dyspynea	28; 28	25,959; 65,834	202; 216	96.53; 100	80; 100
cough and expectoration	32; 28	102,935; 62,250	220; 223	99.55; 100	90; 100
epistaxis	24; 19	2033; 5913	137; 131	97.81; 100	90; 100
fever with rash	59; 17	13,290; 7935	386; 94	99.48; 100	90; 100
abdominal pain	99; 44	29,085; 35,631	612; 383	98.37; 100	83; 100

Open in a new tab

Table 6.

Diagnostic precisions of DUCG for dyspnea, in which cases are randomly selected not more than 10 for each disease and “*” indicates only in Group 1

Disease	Total cases in Group 1; Group 2	Randomly selected and tested cases in Group 1; Group 2	Correct diagnoses in Group 1; Group 2	Precision in Group 1; Group 2 (%)
Carbon monoxide poisoning	58; 128	10; 10	10; 10	100; 100
Metabolic acidosis	9; 13	9; 10	9; 10	100; 100
HCM	10; 100	10; 10	9; 10	90; 100
Pulmonary infection	10; 18,421	10; 10	9; 10	90; 100
PAH	559; 70	10; 10	10; 10	100; 100
Interstitial lung disease	296; 0	10; 0	10;	100;
Pulmonary alveolar proteinosis	1; 2	1; 2	1; 2	100; 100
PE	101; 1080	10; 10	10; 10	100; 100
Heart failure	429; 2108	10; 10	9; 10	90; 100
HPS	0; 3	0; 3	; 3	; 100
DCM	151; 1330	10; 10	9; 10	90; 100
Anemia	3871; 3710	10; 10	10; 10	100; 100
Renal failure	1099; 2065	10; 10	10; 10	100; 100
Constrictive pericarditis	7; 67	7; 10	7; 10	100; 100
Pericardial effusion	300; 185	10; 10	10; 10	100; 100
Hemochromatosis	0; 1	0; 1	; 1	; 100
End-stage tumor	9; 220	9; 10	9; 10	100; 100
COPD	13,900; 12,872	10; 10	10; 10	100; 100
Laryngospasm	0; 0	0; 0	;	;
Foreign body in air passage	40; 146	10; 10	10; 10	100; 100
Obesity	5; 3	5; 3	5; 3	100; 100
Scoliosis	9; 306	9; 10	9; 10	100; 100
Pleural effusion	1469; 11,366	10; 10	10; 10	100; 100
Asthma	2294; 2409	10; 10	8; 10	80; 100
Bronchitis	1330; 9066	10; 10	9; 10	90; 100
Guillain–Barre syndrome	0; 7	0; 7	; 7	; 100
Myasthenia gravis	2; 156	2; 10	2; 10	100; 100
Psychology	0; 0	0; 0	;	;
Total	25,959; 65,834	202; 216	195; 216	96.53; 100

Open in a new tab

COPD: chronic obstructive pulmonary disease; HCM: hypertrophic cardiomyopathy; PAH: pulmonary artery hypertension; PE: pulmonary embolism; DCM: dilated cardiomyopathy; HPS: hepatopulmonary syndrome

Table 7.

Diagnostic precisions of DUCG for cough and expectoration, in which cases are randomly selected not more than 10 for each disease and “*” indicates only in Group 1

Disease	Total cases in Group 1; Group 2	Randomly selected and tested cases in Group 1; Group 2	Correct diagnoses in Group 1; Group 2	Precision in Group 1; Group 2 (%)
Subacute thyroiditis	3; 33	3; 10	3; 10	100; 100
Pulmonary tuberculosis	1342; 1501	10; 10	10; 10	100; 100
Pneumothorax	951; 673	10; 10	10; 10	100; 100
Pulmonary abscess	453; 65	10; 10	10; 10	100; 100
Acute bronchitis	123; 1426	10; 10	10; 10	100; 100
Chronic bronchitis	2656; 5024	10; 10	10; 10	100; 100
Primary bronchogenic carcinoma	1023; 7430	10; 10	10; 10	100; 100
Bronchiectasis	6361; 2806	10; 10	10; 10	100; 100
COPD	129; 3245	10; 10	10; 10	100; 100
Pulmonary thromboembolism	18; 11	10; 10	10; 10	100; 100
Chronic pulmonary heart disease	6053; 2442	10; 10	10; 10	100; 100
Sarcoidosis	5; 675	5; 5	5; 5	100; 100
CVA	1; 276	1; 10	1; 10	100; 100
Upper respiratory tract infection	577; 4697	10; 10	10; 10	100; 100
Pneumonia	78,352; 16,721	10; 10	10; 10	100; 100
Heart failure	350; 6427	10; 10	10; 10	100; 100
Pleural effusion	37; 4428	10; 10	10; 10	100; 100
Bronchial asthma	4235; 2831	10; 10	9; 10	90; 100
Pericardial diseases	39; 50	10; 10	10; 10	100; 100
Vocal cord dysfunction syndrome	0; 286	0; 10	0; 10	0; 100
Nasal polyp	3; 1190	3; 10	3; 10	100; 100
IPF	49; 8	10; 8	10; 8	100; 100
Upper airway cough syndrome	0; 5	0; 5	0; 5	0; 100
Psychogenic cough	0; 0	0; 0	0; 0	0; 0
Tracheal collapse syndrome	0; 0	0; 0	0; 0	0; 0
EB	3; 0	3; 0	3; 0	100; 0
Reflux esophagitis	41; 0	10; 0	10; 0	100; 0
Diaphragmatic abnormalities	0; 0	0; 0	0; 0	0; 0
Sub-Total	102,804; 62,250	195; 223	194; 223	99.49; 100
Tracheobronchial foreign body*	109;	10;	10;	100;
Mediastinal lesions*	17;	10;	10;	100;
Diffuse interstitial lung disease*	2;	2;	2;	100;
Coronavirus disease 2019*	3;	3;	3;	100;
Total	102,935; 62,250	220; 223	219; 223	99.55; 100

Open in a new tab

COPD: chronic obstructive pulmonary disease; CVA: cough variant asthma; IPF: idiopathic pulmonary fibrosis; EB: eosinophilic bronchitis

Table 8.

Diagnostic precisions of DUCG for epistaxis, in which “*” indicates only in Group 1

Disease	Total cases in Group 1; Group 2	Randomly selected and tested cases in Group 1; Group 2	Correct diagnoses in Group 1; Group 2	Precision in Group 1; Group 2 (%)
Malignant tumor of nasal cavity and paranasal sinus	7; 109	7; 10	7; 10	100; 100
Hemorrhagic nasal polyps	3; 94	3; 10	3; 10	100; 100
Nasal bone fracture	136; 26	10; 10	10; 10	100; 100
Fungal maxillary sinusitis	10; 26	10; 10	10; 10	100; 100
Acute leukemia	34; 631	10; 10	10; 10	100; 100
Inverting papilloma	15; 24	10; 10	10; 10	100; 100
Epistaxis	1089; 436	10; 10	10; 10	100; 100
Deviation of nasal septum	572; 870	10; 10	10; 10	100; 100
Nasal angioma	14; 80	10; 10	10; 10	100; 100
ITP	31; 562	10; 10	9; 10	90; 100
Maxillary sinus carcinoma	4; 138	4; 10	4; 10	100; 100
Nasopharyngeal carcinoma	85; 2906	10; 10	10; 10	100; 100
Ethmoid sinus fracture	0; 4	0; 4	; 4	; 100
Ethmoid sinus carcinoma	0; 3	0; 3	; 3	; 100
Atrophic rhinitis	8; 2	8; 2	8; 2	100; 100
HT	0; 2	0; 2	; 2	; 100
Foreign body in nasal cavity	3; 0	3; 0	3;	100;
Nasopharyngeal angiofibroma	1; 0	1; 0	1;	100;
Frontal sinus fracture	0; 0	0; 0	;	;
Sub-Total	2012; 5913	116; 131	115; 131	99.14; 100
AA*	8;	8;	8;	100;
MDS*	2;	2;	2;	100;
Hemophilia*	1;	1;	1;	100;
Hepatopathy*	10;	10;	8;	100;
Leptospirosis*	0;	0;	;	;
Total	2033; 5913	137; 131	134; 131	97.81; 100

Open in a new tab

ITP: Idiopathic thrombocytopenic purpura; HT: Hemorrhagic telangiectasia; AA: aplastic anemia; MDS: myelodysplastic syndrome

Table 9.

Diagnostic precisions of DUCG for fever with rash, in which “*” indicates only in Group 1

Disease	Total cases in Group 1; Group 2	Randomly selected and tested cases in Group 1; Group 2	Correct diagnoses in Group 1; Group 2	Precision in Group 1; Group 2 (%)
Exanthema subitum	102; 44	10; 10	10; 10	100; 100
Hand foot mouth disease	200; 24	10; 10	10; 10	100; 100
Varicella	642; 93	10; 10	10; 10	100; 100
Infectious mononucleosis	8; 1019	8; 8	8; 8	100; 100
Herpes zoster	2348; 6652	10; 10	10; 10	100; 100
Measles	309; 69	10; 10	10; 10	100; 100
Dengue fever	2; 8	2; 8	2; 8	100; 100
Rubella	7; 7	7; 7	7; 7	100; 100
Herpetic angina	23; 6	10; 6	10; 6	100; 100
Scarlet fever	16; 6	10; 6	10; 6	100; 100
Typhoid fever	0; 6	0; 6	; 6	; 100
Tsutsugamushi disease	0; 1	0; 1	; 1	; 100
Hemorrhagic fever with renal syndrome	1; 0	1; 0	1;	100;
Epidemic cerebrospinal meningitis	0; 0	0; 0	;	;
Epidemic typhus	0; 0	0; 0	;	;
Endemic typhus	0; 0	0; 0	;	;
Paratyphoid fever	0; 0	0; 0	;	;
Sub-Total	3658; 7935	88; 94	88; 94	100; 100
HIV*	0;	0;	;	;
Systemic lupus erythematosus*	122;	10;	10;	100;
AOSD*	1;	1;	1;	100;
PM*	1;	1;	1;	100;
SS*	13;	10;	10;	100;
Erysipelas*	192;	10;	10;	100;
Rheumatic fever*	5;	5;	5;	100;
ANCA associated vasculitis*	0;	0;	;	;
Polyarteritis nodosa*	0;	0;	;	;
Nodular panniculitis*	8;	8;	8;	100;
APS*	3;	3;	3;	100;
Infective endocarditis*	1;	1;	1;	100;
Anthrax*	0;	0;	;	;
Contact dermatitis*	93;	10;	10;	100;
Melanoma*	91;	10;	10;	100;
Urticaria*	257;	10;	10;	100;
Drug induced rash*	142;	10;	10;	100;
Stevens Johnson syndrome*	0;	0;	;	;
Anaphylactoid purpura*	508;	10;	10;	100;
Vitiligo*	27;	10;	10;	100;
Scleroderma*	8;	8;	8;	100;
Furuncle*	380;	10;	10;	100;
Condyloma acuminatum*	13;	10;	10;	100;
Dermatophytosis*	124;	10;	10;	100;
Prurigo*	59;	10;	10;	100;
Syphilis*	9;	9;	9;	100;
Erythroderma*	36;	10;	10;	100;
Molluscum contagiosum*	17;	10;	10;	100;
Seborrheic dermatitis*	5919;	10;	10;	100;
Folliculitis*	35;	10;	9;	90;
Genital herpes*	6;	6;	6;	100;
Impetigo*	51;	10;	10;	100;
Eczema*	794;	10;	10;	100;
Cutaneous squamous cell carcinoma*	7;	7;	7;	100;
Psoriasis*	238;	10;	10;	100;
Wart*	288;	10;	9;	90;
Behcet syndrome*	7;	7;	7;	100;
Acne*	67;	10;	10;	100;
Amyloidosis cutis*	46;	10;	10;	100;
Typhus fever*	0;	0;	;	;
Carbuncle*	62;	10;	10;	100;
Cirrhosis*	2;	2;	2;	100;
Total	13,290; 7935	386; 94	384; 94	99.48; 100

Open in a new tab

AOSD: Adult still's disease; PM: Polymyositis; SS: Sjögren's syndrome; APS: antiphospholipid syndrome

Table 10.

Diagnostic precisions of DUCG for abdominal pain, in which “*” indicates only in Group 1

Disease	Total cases in Group 1; Group 2	Randomly selected and tested cases in Group 1; Group 2	Correct diagnoses in Group 1; Group 2	Precision in Group 1; Group 2 (%)
Crohn's disease	8; 97	8; 10	7; 10	87.5; 100
Ulcerative colitis	125; 84	10; 10	9; 10	90; 100
Tuberculous peritonitis	10; 12	10; 10	9; 10	90; 100
Chronic pancreatitis	174; 12	10; 10	9; 10	90; 100
Acute pancreatitis	2124; 70 0	10; 10	10; 10	100; 100
Chronic cholecystitis	357; 615	10; 10	10; 10	100; 100
Acute cholecystitis	784; 342	10; 10	10; 10	100; 100
Anaphylactoid purpura	322; 461	10; 10	10; 10	100; 100
Ectopic pregnancy	461; 371	10; 10	10; 10	100; 100
Urinary calculi	2888; 503	10; 10	10; 10	100; 100
Angina pectoris	3; 522	3; 10	3; 10	100; 100
Renal failure	210; 897	10; 10	10; 10	100; 100
Chronic gastritis	3020; 1197	10; 10	10; 10	100; 100
Acute gastritis	457; 162	10; 10	10; 10	100; 100
Hepatitis	249; 410	10; 10	10; 10	100; 100
Saturnism	0; 10	0; 10	; 10	; 100
Gastric volvulus	2; 15	2; 10	2; 10	100; 100
Liver abscess	302; 74	10; 10	10; 10	100; 100
Gastrointestinal neurosis	137; 11	10; 10	9; 10	90; 100
Reflux esophagitis	933; 36	10; 10	10; 10	100; 100
Intestines and stomach cramps	2; 231	2; 10	2; 10	100; 100
Gastric or duodenal ulcer	255; 4893	10; 10	10; 10	100; 100
Pancreatic carcinoma	359; 85	10; 10	10; 10	100; 100
Herpes zoster	19; 232	10; 10	10; 10	100; 100
Liver carcinoma	443; 5900	10; 10	10; 10	100; 100
Hepatic rupture	49; 24	10; 10	10; 10	100; 100
Splenic rupture	311; 46	10; 10	10; 10	100; 100
Abdominal wall abscess	7; 21	7; 10	7; 10	100; 100
Hepatolithiasis	1242; 111	10; 10	10; 10	100; 100
Gastric carcinoma	297; 85	10; 10	10; 10	100; 100
Intraperitoneal tumor rupture	11; 12	10; 10	9; 10	90; 100
Miocardial infarction	65; 2185	10; 10	10; 10	100; 100
Gastrointestinal perforation	585; 121	10; 10	10; 10	100; 100
Appendicitis	1545; 10,368	10; 10	10; 10	100; 100
Enteritis	400; 4738	10; 10	10; 10	100; 100
Ischemicboweldisease	47; 25	4; 10	4; 10	100; 100
Diabetic ketoacidosis	102; 9	10; 9	10; 9	100; 100
Autoimmune pancreatitis	1; 6	1; 6	1; 6	100; 100
Eosinophilic gastroenteritis	5; 4	5; 4	5; 4	100; 100
Acute hemorrhagic necrotizing enteritis	0; 2	0; 2	; 2	; 100
Intestinal obstruction	2038; 1	10; 1	10; 1	100; 100
Porphyria	0; 1	0; 1	; 1	; 100
Pulmonary infection	1; 0	1; 0	1;	100;
Bile duct infection	3; 0	3; 0	3;	100;
Sub-Total	20,353; 35,631	346; 383	340; 383	98.27; 100
Fatty liver*	5107;	10;	10;	100;
Intussusception*	1091;	10;	10;	100;
Pelvic inflammatory disease*	818;	10;	9;	90;
Colorectal cancer*	503;	10;	10;	100;
Endometriosis*	234;	10;	9;	90;
Cirrhosis*	176;	10;	10;	100;
Carcinoma of gallbladder*	160;	10;	10;	100;
Cholangiocarcinoma*	126;	10;	10;	100;
Torsion of ovary and oviduct*	125;	10;	10;	100;
Oviduct ovarian abscess*	79;	10;	10;	100;
Acute cholecystitis with gangrene and perforation*	45;	10;	9;	100;
Dissection of aorta*	43;	10;	10;	100;
Acute pyelonephritis*	27;	10;	10;	100;
Systemic lupus erythematosus*	26;	10;	10;	100;
Renal infarction*	21;	10;	10;	100;
Small intestine tumor*	18;	10;	10;	100;
Portal vein thrombosis*	17;	10;	10;	100;
Splenic infarction*	17;	10;	10;	100;
Placental abruption*	17;	10;	10;	100;
Acute cystitis*	16;	10;	10;	100;
Acute urinary retention*	9;	9;	9;	100;
Intestinal tuberculosis*	6;	6;	5;	83;
Diverticular disease of colon*	6;	6;	6;	100;
Rupture of abdominal aortic aneurysm*	6;	6;	6;	100;
Spermatic cord torsion*	6;	6;	6;	100;
Dialysis related peritonitis*	4;	4;	4;	100;
Gallstone*	3;	3;	3;	100;
Pulmonary embolism*	3;	3;	3;	100;
Ovarian hyperstimulation syndrome*	3;	3;	3;	100;
Rupture of ovarian cyst*	3;	3;	3;	100;
Gastrointestinal neuroendocrineneoplasm*	3;	3;	3;	100;
Behcet's disease*	2;	2;	2;	100;
Splenic abscess*	2;	2;	2;	100;
Hepatic hydatid disease*	2;	2;	2;	100;
Chronic adrenocortical hypofunction*	2;	2;	2;	100;
Small intestinal diverticulum*	1;	1;	1;	100;
Acute adrenocortical hypofunction*	1;	1;	1;	100;
Antiphospholipid syndrome*	1;	1;	1;	100;
Costochondritis*	1;	1;	1;	100;
Gastrointestinal stromal tumor*	1;	1;	1;	100;
Spontaneous bacterial peritonitis*	1;	1;	1;	100;
Other abdominal tuberculosis*	0;	0;	;	;
Gastrointestinal diverticulum*	0;	0;	;	;
Esophageal diverticulum*	0;	0;	;	;
Takayasu arteritis*	0;	0;	;	;
Celiac artery compression syndrome*	0;	0;	;	;
Wilson's disease*	0;	0;	;	;
Gangrenous pyoderma*	0;	0;	;	;
Acute bacillary dysentery*	0;	0;	;	;
Polyarteritis nodosa*	0;	0;	;	;
Thallium poisoning*	0;	0;	;	;
Eosinophilic granulomatous polyangitis*	0;	0;	;	;
Rheumatoid vasculitis *	0;	0;	;	;
Lactose intolerance*	0;	0;	;	;
Pseudomembranous enteritis*	0;	0;	;	;
Total	29,085; 35,631	612; 383	602; 383	98.37; 100

Open in a new tab

It is seen that the total precisions of the six DUCGs from the two groups respectively are very close to each other and no less than 96.5%, in which the lowest precision for all diseases was no less than 80%. The precision difference of the two groups is no more than |96.53 − 100|% = 3.47%. The mean precision difference of the six chief complaints is:

\frac{|99.12 - 100| + |96.53 - 100| + |99.49 - 100| + |99.14 - 100| + |100 - 100| + | 98.27 - 100 |}{6} = 1.24 %

Verification discussions

For some relatively rare diseases, the case records were less than 10. In such cases, all the qualified records were selected. If there was no case found, the precision of this disease could not be calculated and was not considered in the precision calculations.

We believe that it is enough to test no more than 10 randomly selected cases for a disease in verifications, because 10 cases can cover most knowledge points related to the disease. If the knowledge base is correct, the test results will be correct, regardless of how many cases are tested. Given the total number of cases, if we increase the tested cases for every disease, only the tested cases of common diseases will be increased and the results will likely be correct, while the tested cases of rare diseases will not be increased due to the lack of cases, leading to an improper higher precision in total. The scientific way to perform the verification is to have the numbers of tested cases as equal as possible for all diseases. As a balance, we chosen to have no more than 10 tested cases.

The so-called “rare” disease means that it is rare under the chief complaint. A disease is rare under a chief complaint does not mean that it is also rare under other chief complaints.

It is easy to understand that only the discharged patient case records meet the high-quality requirement (the recorded information was sufficient and diagnosis was correct) for the third-party verifications. We did not use the outpatient case records for verifications, because it was hard to judge whether the outpatient diagnoses were correct or not. In general, the case record for a discharged patient contains more medical information than the case record of an outpatient. How to verify the diagnostic precision of DUCG conditional on less information for an outpatient is another issue and will be addressed elsewhere.

Summery and discussions

The C-type variables are used only in the DUCG construction. Without C-type variables, the DUCG knowledge base is hard to be well organized and interpreted, and mistakes occur easily. The inference is based on the DUCG without C-type variables, which is automatically generated from the DUCG with C-type variables and is invisible.

Two groups of independent verifications for the six DUCG knowledge bases corresponding to six chief complaints verify that DUCG has strong generalization ability, which means that DUCG can be applied in any real application scenarios with almost the same precisions. This is because of the knowledge invariance.

The diagnostic interpretability of DUCG is provided by the generated sub-DUCG for each possible disease. A sub-DUCG is for a possible disease, in which all the evidences and causalities including the connected state-known variables and the isolated state-abnormal variables to this possible disease are displayed to the users in a graphical manner with text. More details can be found in Zhang et al. (2021).

DUCG does not deal with AI-aided medical image examination and medical sound recognition. They could be done by ML models. Hence, the relationship between DUCG and ML is cooperation.

In real applications, the AI-aided system should be able to recommend next medical checks based on the known information to collect further information for more accurate diagnosis. This will be discussed in another paper.

Acknowledgements

This research was supported by Institute for Guo Qiang, Tsinghua University (project number: 2020QG0001), Chongqing Science and Technology Commission (project number: cstc2018jscx-mszdx0106), and The Rockefeller-Endowed China Medical Board (Open Competition Program, grant number: 20-384).

Appendix

The diagnostic results of the five chief complaints (dyspnea, cough and expectoration, epistaxis, fever with rash, abdominal pain) are shown in following.

Footnotes

Corollary 15: $A_{n k_{n} ; i} V_{i} A_{m k_{m} ; i} V_{i} = (A_{n k_{n} ; i} * A_{m k_{m} ; i}) V_{i}$ , in which $(A_{n k_{n} ; i} * A_{m k_{m} ; i}) \equiv (\begin{matrix} A_{n k_{n} ; i 1} A_{m k_{m} ; i 1} & A_{n k_{n} ; i 2} A_{m k_{m} ; i 2} & \dots & A_{n k_{n} ; i j} A_{m k_{m} ; i j} & \dots & A_{n k_{n} ; i J} A_{m k_{m} ; i J} \end{matrix})$

Correspondingly,

$a_{n k_{n} ; i} * a_{m k_{m} ; i} \equiv (\begin{matrix} a_{n k_{n} ; i 1} a_{m k_{m} ; i 1} & a_{n k_{n} ; i 2} a_{m k_{m} ; i 2} & \dots & a_{n k_{n} ; i j} a_{m k_{m} ; i j} & \dots & a_{n k_{n} ; i J} a_{m k_{m} ; i J} \end{matrix})$

where, “*” is an AND/multiplication matrix operator specially defined in DUCG.

In format, the * operator is similar to Hadamard product.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yang Jiao, Mingxia Zhang, Bing Wei, Xiao Liu, Juan Zhao, Fengwei Tian, Jie Hu have contributed equally to this work.

Contributor Information

Zhan Zhang, Email: zhangzhan19@mails.tsinghua.edu.cn.

Yang Jiao, Email: peterpumch@163.com.

Mingxia Zhang, Email: xwyyzmx@sina.com.

Bing Wei, Email: yinan0721@sina.com.

Xiao Liu, Email: 392528423@qq.com.

Juan Zhao, Email: zhaojuanof241@163.com.

Fengwei Tian, Email: gcp666@126.com.

Jie Hu, Email: futhrew@qq.com.

Qin Zhang, Email: qinzhang@tsinghua.edu.cn.

References

Danal Bardou, Kun Zhang, Sayed Mohammad Ahmad. Classification of breast cancer based on histology images using convolutional neural networks. IEEE Access, vol. 6, pp. 24680–24693. 2018.
Brosch T, et al. Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE Trans Medical Imaging. 2016;35(5):1229–1239. doi: 10.1109/TMI.2016.2528821. [DOI] [PubMed] [Google Scholar]
Bu X, Lu L, Zhang Z, Zhang Q, Yan Z. A general outpatient triage system based on dynamic uncertain causality graph. IEEE Access. 2020 doi: 10.1109/ACCESS.2020.2995087. [DOI] [Google Scholar]
Ceccon S, Garwayheath DF, Crabb DP, et al. Exploring early Glaucoma and the visual field test: classification and clustering using bayesian networks. IEEE J Biomed Health Infom. 2014;18(3):1008–1014. doi: 10.1109/JBHI.2013.2289367. [DOI] [PubMed] [Google Scholar]
Christodoulidis S, Anthimopoulos M, Ebner L, Chresti A, Mougiakakou S. Multisource transfer learning with convolutional neural networks for lung pattern analysis. IEEE J Biomed Health Inform. 2017;21(1):76–84. doi: 10.1109/JBHI.2016.2636929. [DOI] [PubMed] [Google Scholar]
Dong C, Zhang Q, Geng S. A modeling and probabilistic reasoning method of dynamic uncertain causality graph for industrial fault diagnosis. Int J Autom Comput. 2014;11(3):288–298. doi: 10.1007/s11633-014-0791-8. [DOI] [Google Scholar]
Dong C, Wang Y, Zhang Q, Wang N. The methodology of dynamic uncertain causality graph for intelligent diagnosis of vertigo. Comput Methods Programs Biomed. 2014;113:62–174. doi: 10.1016/j.cmpb.2013.10.002. [DOI] [PubMed] [Google Scholar]
Dong C, Zhao Y, Zhang Q. Cubic causality modeling and uncertain inference method for dynamic fault diagnosis. J Tsinghua Univ (Sci Technol) 2018;58(7):614–622. [Google Scholar]
Duraisamy Sawaswathi, Emperumal Srinivasan. Computer-aided mammogram diagnosis system using deep learing convolutional fully complex-valued relaxation neural network classifier. IET Computer Vision. 2017;11(8):656–662. doi: 10.1049/iet-cvi.2016.0425. [DOI] [Google Scholar]
Er O, Cetin O, Bascil MS, Temurtas F. A comparitive study on Parkinson’s disease diagnosis using neural networks and artifial immune system. J Med Imaging Health Inf. 2016;1:264–268. doi: 10.1166/jmihi.2016.1606. [DOI] [Google Scholar]
Fan Y, Zhang Z, Jing Z, Wang Y, Liu Z, Guo M, Wang R, Feng M. Diagnostic value of dynamic uncertain causality graph DUCG in sellar region disease. Chinese J Minimal Invasive Neurosurg. 2018;06:249–253. [Google Scholar]
Fukushima K, Miyake S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. Berlin Heidelberg: Competition and Cooperation in Neural Nets. Springer; 1982. [Google Scholar]
Geng S and Zhang Q (2014) Calculation method to diagnose intigrated causes of faults in process systems by means of dynamic uncertain causality graph. In: proceeding of 2014 Aisa-Pasific computer science and application confreence (CSAC 2014), Shanghai, China, pp 306–311
Hao S, Geng S, Fan L, Chen J, Zhang Q, Li L. Intelligent diagnosis of jaundice with dynamic uncertain causality graph model. J Zhejiang Univ-Sci B (Biomed Biotechnol) 2017;18(5):393–401. doi: 10.1631/jzus.B1600273. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiao Y, Zhang Z, Zhang T, Shi W, Zhu Y, Hu J, Zhang Q. Development of an artificial intelligence diagnostic model based on dynamic uncertain causality graph for the differential diagnosis of dyspnea. Front Med. 2020;14:488–497. doi: 10.1007/s11684-020-0762-0. [DOI] [PubMed] [Google Scholar]
Liang H, Tsui BY, Ni H, Calentim CCS, Baxter SL, Liu G, et al. Evaluation and accurate diagnoses of pdiatric diseases using artificial intelligence. Nat Med. 2019 doi: 10.1038/s41591-018-0335-9. [DOI] [PubMed] [Google Scholar]
Lin Z, Huang Y, Wang J. RNN-SM fast steganalysis of VoIP streams using recurrent neural network. IEEE Trans Inf Forensics Secur. 2018;13(7):1854–1868. doi: 10.1109/TIFS.2018.2806741. [DOI] [Google Scholar]
Lo SB, Lou SA, Lin JS, et al. Artificial convolution neural network techniques and applications for lung nodule detection. IEEE Trans Med Imaging. 1995;14(4):711. doi: 10.1109/42.476112. [DOI] [PubMed] [Google Scholar]
Ning D, Zhang Z, Qiu K, Lu L, Zhang Q, Zhu Y, Wang R. Efficacy of intelligent diagnosis with a dynamic uncertain causality graph model for rare disorders of sex development. Frontiers of Medicine. 2020;14:498–505. doi: 10.1007/s11684-020-0791-8. [DOI] [PubMed] [Google Scholar]
Qu Y, Zhang Q, Zhu X. Application of dynamic uncertain causality graph to dynamic fault diagnosis in chemical processes. CAAI Trans Intell Syst. 2015;10(3):354–361. [Google Scholar]
Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision. 2015;115(3):211–252. doi: 10.1007/s11263-015-0816-y. [DOI] [Google Scholar]
Shin H-C, et al. Deep comvolutional neural networks for computer-aided detection: CNN architectures, dataset charicteristics and transfer learning. IEEE Trans Med Imaging. 2016;35(5):1285–1294. doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]
Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception Architecture for Computer Vision. Computer Sci, 2015: 2818–2826
Wu J, Liu X, Zhang X, He Z, Lv P. Mastrer clinical medical knowledge at certified-doctor-level with deep learning model. Nacture Commun. 2018 doi: 10.1038/s41467-018-06799-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yao Q, Zhang Q, Liu P, Yang P. Application of dynamic uncertain causality graph in spacecraft fault diagnosis: prediction. Int Core J Eng. 2017;3(1):113–119. [Google Scholar]
Zhang Q. Dynamic uncertain causality graph for knowledge representation and reasoning: discrete DAG cases J. Comput Sci Technol. 2012;27(1):1–23. doi: 10.1007/s11390-012-1202-7. [DOI] [Google Scholar]
Zhang Q. Dynamic uncertain causality graph for knowledge representation and probabilistic reasoning: directed cyclic graph and joint probability distribution. IEEE Trans Neural Netw Learn Syste. 2015;26(7):1503–1517. doi: 10.1109/TNNLS.2015.2402162. [DOI] [PubMed] [Google Scholar]
Zhang Q. Increase safety and availability of nuclear power plants by means of DUCG. China Nuclear Power. 2018;11(1):59–68. [Google Scholar]
Zhang Q, Geng S. Dynamic uncertain causality graph applied to dynamic fault diagnosis of large and complex systems. IEEE Trans Rel. 2015;64(3):910–927. doi: 10.1109/TR.2015.2416332. [DOI] [Google Scholar]
Zhang Q, Yao Q. Dynamic uncertain causality graph for knowledge representation and reasoning: utilization of statistical data and domain knowledge in complex cases. IEEE Trans Neural Netw Learn Syst. 2018;29(5):1637–1651. doi: 10.1109/TNNLS.2017.2673243. [DOI] [PubMed] [Google Scholar]
Zhang Q, Zhang Z. Dynamic uncertain causality graph applied to dynamic fault diagnoses and predictions with negative feedbacks. IEEE Trans Rel. 2016;65(2):1030–1044. doi: 10.1109/TR.2015.2503759. [DOI] [Google Scholar]
Zhang Q, Dong C, Cui Y, Yang Z. Dynamic uncertain causality graph for knowledge representation and probabilistic reasoning: statistics base, matrix and fault diagnosis. IEEE Trans Neural Netw Learn Syst. 2014;25(4):645–663. doi: 10.1109/TNNLS.2013.2279320. [DOI] [PubMed] [Google Scholar]
Zhang Q, Qiu K, Zhang Z. Calculate joint probability distribution of steady directed cyclic graph with local data and domain casual knowledge. China Comun. 2018;15(7):146–155. doi: 10.1109/CC.2018.8424610. [DOI] [Google Scholar]
Zhang Q, Bu X, Zhang Z, Zhang M, Hu J. Dynamic uncertain causality graph for computer-aided general clinical diagnoses with nasal obstruction as illustration. Artif Intell Rev. 2021;54:27–61. doi: 10.1007/s10462-020-09871-0. [DOI] [Google Scholar]
Zhang Q (2015) Dynamic uncertain causality graph for knowledge representation and probabilistic reasoning: continuous variable, uncertain evidence and failure forecast. IEEE Trans Syst, Man Cybern,. 45, 7, pp 990–1003
Zhao Y, Zhang Q, Dong C. Application of DUCG in fault diagnosis of nuclear power plant secondary loop. Autom Sci Technol. 2014;48:496–501. [Google Scholar]

[CR1] Danal Bardou, Kun Zhang, Sayed Mohammad Ahmad. Classification of breast cancer based on histology images using convolutional neural networks. IEEE Access, vol. 6, pp. 24680–24693. 2018.

[CR2] Brosch T, et al. Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE Trans Medical Imaging. 2016;35(5):1229–1239. doi: 10.1109/TMI.2016.2528821. [DOI] [PubMed] [Google Scholar]

[CR3] Bu X, Lu L, Zhang Z, Zhang Q, Yan Z. A general outpatient triage system based on dynamic uncertain causality graph. IEEE Access. 2020 doi: 10.1109/ACCESS.2020.2995087. [DOI] [Google Scholar]

[CR4] Ceccon S, Garwayheath DF, Crabb DP, et al. Exploring early Glaucoma and the visual field test: classification and clustering using bayesian networks. IEEE J Biomed Health Infom. 2014;18(3):1008–1014. doi: 10.1109/JBHI.2013.2289367. [DOI] [PubMed] [Google Scholar]

[CR5] Christodoulidis S, Anthimopoulos M, Ebner L, Chresti A, Mougiakakou S. Multisource transfer learning with convolutional neural networks for lung pattern analysis. IEEE J Biomed Health Inform. 2017;21(1):76–84. doi: 10.1109/JBHI.2016.2636929. [DOI] [PubMed] [Google Scholar]

[CR6] Dong C, Zhang Q, Geng S. A modeling and probabilistic reasoning method of dynamic uncertain causality graph for industrial fault diagnosis. Int J Autom Comput. 2014;11(3):288–298. doi: 10.1007/s11633-014-0791-8. [DOI] [Google Scholar]

[CR7] Dong C, Wang Y, Zhang Q, Wang N. The methodology of dynamic uncertain causality graph for intelligent diagnosis of vertigo. Comput Methods Programs Biomed. 2014;113:62–174. doi: 10.1016/j.cmpb.2013.10.002. [DOI] [PubMed] [Google Scholar]

[CR8] Dong C, Zhao Y, Zhang Q. Cubic causality modeling and uncertain inference method for dynamic fault diagnosis. J Tsinghua Univ (Sci Technol) 2018;58(7):614–622. [Google Scholar]

[CR9] Duraisamy Sawaswathi, Emperumal Srinivasan. Computer-aided mammogram diagnosis system using deep learing convolutional fully complex-valued relaxation neural network classifier. IET Computer Vision. 2017;11(8):656–662. doi: 10.1049/iet-cvi.2016.0425. [DOI] [Google Scholar]

[CR10] Er O, Cetin O, Bascil MS, Temurtas F. A comparitive study on Parkinson’s disease diagnosis using neural networks and artifial immune system. J Med Imaging Health Inf. 2016;1:264–268. doi: 10.1166/jmihi.2016.1606. [DOI] [Google Scholar]

[CR11] Fan Y, Zhang Z, Jing Z, Wang Y, Liu Z, Guo M, Wang R, Feng M. Diagnostic value of dynamic uncertain causality graph DUCG in sellar region disease. Chinese J Minimal Invasive Neurosurg. 2018;06:249–253. [Google Scholar]

[CR12] Fukushima K, Miyake S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. Berlin Heidelberg: Competition and Cooperation in Neural Nets. Springer; 1982. [Google Scholar]

[CR13] Geng S and Zhang Q (2014) Calculation method to diagnose intigrated causes of faults in process systems by means of dynamic uncertain causality graph. In: proceeding of 2014 Aisa-Pasific computer science and application confreence (CSAC 2014), Shanghai, China, pp 306–311

[CR14] Hao S, Geng S, Fan L, Chen J, Zhang Q, Li L. Intelligent diagnosis of jaundice with dynamic uncertain causality graph model. J Zhejiang Univ-Sci B (Biomed Biotechnol) 2017;18(5):393–401. doi: 10.1631/jzus.B1600273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] Jiao Y, Zhang Z, Zhang T, Shi W, Zhu Y, Hu J, Zhang Q. Development of an artificial intelligence diagnostic model based on dynamic uncertain causality graph for the differential diagnosis of dyspnea. Front Med. 2020;14:488–497. doi: 10.1007/s11684-020-0762-0. [DOI] [PubMed] [Google Scholar]

[CR16] Liang H, Tsui BY, Ni H, Calentim CCS, Baxter SL, Liu G, et al. Evaluation and accurate diagnoses of pdiatric diseases using artificial intelligence. Nat Med. 2019 doi: 10.1038/s41591-018-0335-9. [DOI] [PubMed] [Google Scholar]

[CR17] Lin Z, Huang Y, Wang J. RNN-SM fast steganalysis of VoIP streams using recurrent neural network. IEEE Trans Inf Forensics Secur. 2018;13(7):1854–1868. doi: 10.1109/TIFS.2018.2806741. [DOI] [Google Scholar]

[CR18] Lo SB, Lou SA, Lin JS, et al. Artificial convolution neural network techniques and applications for lung nodule detection. IEEE Trans Med Imaging. 1995;14(4):711. doi: 10.1109/42.476112. [DOI] [PubMed] [Google Scholar]

[CR19] Ning D, Zhang Z, Qiu K, Lu L, Zhang Q, Zhu Y, Wang R. Efficacy of intelligent diagnosis with a dynamic uncertain causality graph model for rare disorders of sex development. Frontiers of Medicine. 2020;14:498–505. doi: 10.1007/s11684-020-0791-8. [DOI] [PubMed] [Google Scholar]

[CR20] Qu Y, Zhang Q, Zhu X. Application of dynamic uncertain causality graph to dynamic fault diagnosis in chemical processes. CAAI Trans Intell Syst. 2015;10(3):354–361. [Google Scholar]

[CR21] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision. 2015;115(3):211–252. doi: 10.1007/s11263-015-0816-y. [DOI] [Google Scholar]

[CR22] Shin H-C, et al. Deep comvolutional neural networks for computer-aided detection: CNN architectures, dataset charicteristics and transfer learning. IEEE Trans Med Imaging. 2016;35(5):1285–1294. doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception Architecture for Computer Vision. Computer Sci, 2015: 2818–2826

[CR24] Wu J, Liu X, Zhang X, He Z, Lv P. Mastrer clinical medical knowledge at certified-doctor-level with deep learning model. Nacture Commun. 2018 doi: 10.1038/s41467-018-06799-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] Yao Q, Zhang Q, Liu P, Yang P. Application of dynamic uncertain causality graph in spacecraft fault diagnosis: prediction. Int Core J Eng. 2017;3(1):113–119. [Google Scholar]

[CR26] Zhang Q. Dynamic uncertain causality graph for knowledge representation and reasoning: discrete DAG cases J. Comput Sci Technol. 2012;27(1):1–23. doi: 10.1007/s11390-012-1202-7. [DOI] [Google Scholar]

[CR27] Zhang Q. Dynamic uncertain causality graph for knowledge representation and probabilistic reasoning: directed cyclic graph and joint probability distribution. IEEE Trans Neural Netw Learn Syste. 2015;26(7):1503–1517. doi: 10.1109/TNNLS.2015.2402162. [DOI] [PubMed] [Google Scholar]

[CR28] Zhang Q. Increase safety and availability of nuclear power plants by means of DUCG. China Nuclear Power. 2018;11(1):59–68. [Google Scholar]

[CR29] Zhang Q, Geng S. Dynamic uncertain causality graph applied to dynamic fault diagnosis of large and complex systems. IEEE Trans Rel. 2015;64(3):910–927. doi: 10.1109/TR.2015.2416332. [DOI] [Google Scholar]

[CR30] Zhang Q, Yao Q. Dynamic uncertain causality graph for knowledge representation and reasoning: utilization of statistical data and domain knowledge in complex cases. IEEE Trans Neural Netw Learn Syst. 2018;29(5):1637–1651. doi: 10.1109/TNNLS.2017.2673243. [DOI] [PubMed] [Google Scholar]

[CR31] Zhang Q, Zhang Z. Dynamic uncertain causality graph applied to dynamic fault diagnoses and predictions with negative feedbacks. IEEE Trans Rel. 2016;65(2):1030–1044. doi: 10.1109/TR.2015.2503759. [DOI] [Google Scholar]

[CR32] Zhang Q, Dong C, Cui Y, Yang Z. Dynamic uncertain causality graph for knowledge representation and probabilistic reasoning: statistics base, matrix and fault diagnosis. IEEE Trans Neural Netw Learn Syst. 2014;25(4):645–663. doi: 10.1109/TNNLS.2013.2279320. [DOI] [PubMed] [Google Scholar]

[CR33] Zhang Q, Qiu K, Zhang Z. Calculate joint probability distribution of steady directed cyclic graph with local data and domain casual knowledge. China Comun. 2018;15(7):146–155. doi: 10.1109/CC.2018.8424610. [DOI] [Google Scholar]

[CR34] Zhang Q, Bu X, Zhang Z, Zhang M, Hu J. Dynamic uncertain causality graph for computer-aided general clinical diagnoses with nasal obstruction as illustration. Artif Intell Rev. 2021;54:27–61. doi: 10.1007/s10462-020-09871-0. [DOI] [Google Scholar]

[CR35] Zhang Q (2015) Dynamic uncertain causality graph for knowledge representation and probabilistic reasoning: continuous variable, uncertain evidence and failure forecast. IEEE Trans Syst, Man Cybern,. 45, 7, pp 990–1003

[CR36] Zhao Y, Zhang Q, Dong C. Application of DUCG in fault diagnosis of nuclear power plant secondary loop. Autom Sci Technol. 2014;48:496–501. [Google Scholar]

PERMALINK

AI-aided general clinical diagnoses verified by third-parties with dynamic uncertain causality graph extended to also include classification

Zhan Zhang

Yang Jiao

Mingxia Zhang

Bing Wei

Xiao Liu

Juan Zhao

Fengwei Tian

Jie Hu

Qin Zhang

Abstract

Introduction

Table 4.

Fig. 1.

Fig. 2.

Brief Introduction to DUCG

Fig. 3.

Table 1.

Fig. 4.

Theorem 1

Introducing C-type Variables to Extend DUCG to Include Classification Relationship

The basic idea

Fig. 5.

Fig. 6.

Definition 1

Fig. 7.

Theorem 2

Proof

Single parent

Fig. 8.

Fig. 9.

Definition 2

Fig. 10.

Fig. 11.

Fig. 12.

Normalizing paths

Fig. 13.

Fig. 14.

Theorem 3

Proof

The Third-Party Verifications

Construction of DUCG with C-type variables

Fig. 15.

Fig. 16.

Table 2.

Fig. 17.

Verifications, precisions and comparisons

Table 3.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Verification discussions

Summery and discussions

Acknowledgements

Appendix

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases