Skip to main content
Health Information Science and Systems logoLink to Health Information Science and Systems
. 2023 Jan 18;11(1):5. doi: 10.1007/s13755-022-00207-6

Meta-path guided graph attention network for explainable herb recommendation

Yuanyuan Jin 1,2, Wendi Ji 2, Yao Shi 1, Xiaoling Wang 2,3,, Xiaochun Yang 4
PMCID: PMC9847457  PMID: 36660407

Abstract

Traditional Chinese Medicine (TCM) has been widely adopted in clinical practice by Eastern Asia people for thousands of years. Nowadays, TCM still plays a critical role in Chinese society and receives increasing attention worldwide. The existing herb recommenders learn the complex relations between symptoms and herbs by mining the TCM prescriptions. Given a set of symptoms, they will provide a set of herbs and explanations from the TCM theory. However, the foundation of TCM is Yinyangism (i.e. the combination of Five Phases theory with Yin-yang theory), which is very different from modern medicine philosophy. Only recommending herbs from the TCM theory aspect largely prevents TCM from modern medical treatment. As TCM and modern medicine share a common view at the molecular level, it is necessary to integrate the ancient practice of TCM and standards of modern medicine. In this paper, we explore the underlying action mechanisms of herbs from both TCM and modern medicine, and propose a Meta-path guided Graph Attention Network (MGAT) to provide the explainable herb recommendations. Technically, to translate TCM from an experience-based medicine to an evidence-based medicine system, we incorporate the pharmacology knowledge of modern Chinese medicine with the TCM knowledge. We design a meta-path guided information propagation scheme based on the extended knowledge graph, which combines information propagation and decision process. This scheme adopts meta-paths (predefined relation sequences) to guide neighbor selection in the propagation process. Furthermore, the attention mechanism is utilized in aggregation to help distinguish the salience of different paths connecting a symptom with a herb. In this way, our model can distill the long-range semantics along meta-paths and generate fine-grained explanations. We conduct extensive experiments on a public TCM dataset, demonstrating comparable performance to the state-of-the-art herb recommendation models and the strong explainability.

Keywords: Graph neural network, Herb recommendation, Meta-path, Explainable recommendation

Introduction

Traditional Chinese Medicine (TCM) has maintained the health of the Chinese people for at least 3000 years [1]. Through the long development and innovation, the TCM theory forms a unique academic system [2]. Currently, as an alternative to modern medicine, traditional herbal medicine still plays a critical role in treating the COVID-19 [3]. The existing herb recommendation systems [4, 5] focus on mining the clinical experiences from prescriptions and provide herbs given the symptoms. Wang et al. [4] propose a topic model to depict the TCM therapeutic process connecting symptoms and herbs, which can generate explanations based on the latent TCM concepts. Jin et al. [5] design a syndrome-aware GCN method to mimic the clinical practice of TCM doctors. However, TCM and modern medicine are based on different academic systems. Recommending herbs only based on the TCM Yinyangism gives little clue of the underlying action mechanisms of herbs at the molecular level.

As TCM and modern medicine have a common view at the molecular level, great efforts have been paid to TCM’s modernization, which tries to determine the relationship between herb ingredients and their target proteins. Xue et al. [6] build TCMID comprising TCM formulae, herbs, ingredients, and the targets and diseases. Ru et al. [7] construct TCMSP providing quantitative TCM ingredients, ADME-related properties, targets, and diseases, to accelerate the drug discovery from herbal medicines. Zhang et al. [1] construct TCM-Mesh, which records TCM-related information, like compounds and genes, to serve for network pharmacology analysis. Wu et al. [8] propose SymMap, which integrates TCM with modern medicine through both internal molecular mechanism and external symptom mapping. Based on the above integrative databases and TCM prescriptions, given a set of symptoms, explainable recommenders can provide a herb set and generate corresponding reasons from both TCM and modern medicine views.

Obeying the taxonomy in [9], meta-path based recommendation methods [10, 11] and GCN based recommenders [1215] can both provide explainable recommendation at some extent. As for meta-path based approaches, they define a meta-path as an ordered sequence of relations between two types of nodes on knowledge graphs, such as symptoms and herbs. Wang et al. [10], Sun et al. [11] and Zhang et al. [16] adopt path instances connecting user-item pairs to characterize user preferences towards items. Specifically, LSTM is utilized to generate each path’s representations, and then a pooling operator discriminates the saliency of different paths. In this way, important paths can be regarded as recommendation reasons. However, when multiple nodes in the same type link to a node with another type, a.k.a. tree structure [17], path-based methods split them into multiple paths, and the tree structure will be omitted. More recently, knowledge-based GNN models utilize the attention-based information aggregation scheme to integrate high-order neighbors into representation from KG [18]. However, all nodes in the sub-graph connecting the target user and the recommended item will be utilized to generate reasons. Thus the explanations rely on the quality of knowledge and may suffer from the noisy connections. Considering the strong expressiveness of GNN models and the high interpretability of path-based methods, some researches [1923] focus on the integration of information propagation and decision process. [19] adopts meta-path to aggregate multi-hop neighbors by transferring the information only between the two end nodes of meta paths, while erases the middle nodes. [20] and [22] only provide path-level explanations without the node-level reasons. Although [21] learn both the path-level and node-level attentions, the explicit paths connecting the target user to the recommended item cannot be directly generated.

From the above literature survey, we can find that meta-path-based GNN models’ explainability is not fully exploited. In this paper, we integrate the TCM KG in [4], and SymMap [8] to build a new knowledge graph TCM-MM including TCM theory and modern medicine standards. As shown in Fig. 1, our KG contains the TCM diagnosis and treatment process: determining the cause (syndromes) according to symptoms, then deciding the treatment methods based on the cause, and finally selecting proper herbs as a prescription. As Fig. 1 demonstrates, in the entity sequence (呕吐, Yang deficiency of spleen and kidney, warming middle-jiao to dispel cold, dried ginger), 呕吐 is caused by syndrome “Yang deficiency of spleen and kidney”, the treatment method “warming middle-jiao to dispel cold” can relieve “Yang deficiency of spleen and kidney” , and herb “dried ginger” has the effect “warming middle-jiao to dispel cold”. Besides, (dried ginger, SMIT00507, emesis, 呕吐) shows the underlying action mechanism why herb “dried ginger” can cure the symptom 呕吐, which is because dried ginger contains the chemical compound SMIT00507. Specifically, we align the symptoms S and herbs H in prescriptions with the TCM-MM entities by name.

Fig. 1.

Fig. 1

The schematic diagram of the integrated knowledge graph TCM-MM

In addition, the mapping between TCM symptoms and modern medicine symptoms, and the herb-ingredient relations are also included in TCM-MM. Based on the TCM-MM, we propose a new model Meta-path guided Graph Attention Network (MGAT) to address the explainable herb recommendation task. First, we design several meta-paths according to the TCM diagnose process and pharmacology analysis of herbs. Second, the recursive information propagation is operated along the predefined meta-paths. Further, attention-based aggregation learns attention weights among multiple neighbors to discriminate their importance during training. In this way, the long-range path saliency can be inferred through the accumulation of multi-hop attention weights, and the key paths with high weights are selected to generate recommendation explanations. Compared with the existing meta-path based GNN models, our MGAT can generate fine-grained recommendation reasons, which demonstrates the critical path instances for each meta-path to directly explain the correlations between symptoms and herbs.

The contributions of this work are summarized as follows:

  • We highlight the importance of interpreting the underlying action mechanisms of herbs and build an integrated KG to align the TCM theory and pharmacology analysis of modern medicine to some extent.

  • We develop a new method MGAT, which guides the information propagation by predefined meta-paths, and further selects the long-range path instances with high attention weights to generate fine-grained explanations.

  • We conduct extensive experiments on a public TCM dataset, demonstrating the effectiveness of MGAT and its interpretability in understanding the underlying action mechanism of herbs.

Related work

Herb recommendation

The development of TCM prescription mining contains three stages [5]: (1) traditional frequency statistic and data mining methods, such as association analysis and classification methods; (2) topic models; (3) graph model-based methods. Here we mainly introduce the recently commonly used second and third stages.

Topic models treat TCM prescriptions as documents and the included herbs and symptoms as words. The learned intermediate topics linking symptoms and herbs can be regarded as recommendation reasons. [2426] regard the TCM concepts, including “pathogenesis”, “herb roles”, “diagnosis”, and “treatment” et al., as latent topics to link symptoms and herbs. Chen et al. [27] and Wang et al. [4] integrate TCM domain knowledge into topic models to depict the herb compatibility patterns. In particular, [4] et al. extract the entity embeddings from the TCM-Mesh [1] containing herb-related genes, chemicals, and diseases. By regarding the entity representations as the initial embeddings of the topic model, the proposed model takes modern medical knowledge into account. However, initial KG embeddings cannot support the explicit interpretability, and it can only provide explanations based on the latent TCM topics.

The complex relations among TCM entities can be described in the graph form. Recently, some studies aim to leverage deep learning techniques for graph-based prescription mining. Li et al. [28] devise a attentional Seq2Seq [29] model to mimic the prescriptions generation process. Li et al. [30] utilize the BRNN [31] framework to learn the text representations of herb words for treatment complement task. [32, 33] design a meta-path based autoencoder to represent the TCM heterogeneous information network. More recently, Jin et al. [5] propose a syndrome-aware herb recommendation method, which employs GNNs on multiple graphs to capture the complex relations among symptoms and herbs in TCM prescriptions. [34] and [35] further incorporate the TCM knowledge in the implicit way, such as TransE or multi-hot vectors, to enrich the semantics of entity representations for improving the recommendation accuracy. Although the above researches treat the TCM therapeutic process carefully, they lack the explicit interpretability on the TCM domain knowledge and the related modern medical knowledge.

KG-based recommender systems

The knowledge graph is the common resource to support the explainable recommendation. Thus, to improve the interpretability of the herb recommendation approaches, we will survey the KG-based recommendation methods in this section.

Embedding-based methods

[3639] employ the KG embedding techniques, such as TransE [40], TransR [41] and TransH [42], to regularize the representative learning of the recommendation task. Specifically, CKE [36] utilizes TransR to extract KG embeddings and then integrates them into the matrix factorization (MF) framework to address the item recommendation task. KTUP [37] devises a multi-task framework, which jointly trains a TransH-based recommendation task and a KG completion task to improve the performances of both tasks. DKN [38] designs multiple channels and alignment of words and entities to learn both semantic-aspect and knowledge-aspect representations of news. MKR [39] proposes a cross and compress module to link the recommendation task and KG embedding task, which shares latent features and captures high-order interactions between item and entity features. Although the above methods benefit from the regularization effect of knowledge-aware embeddings, they fail to capture the high-order connectivity between users and items explicitly. Thus they cannot demonstrate the underlying reasons why a certain item is appropriate for the target user.

Path-based methods

As the above embedding-based approaches integrate the knowledge graph information in a rather implicit way, some studies [10, 11, 22, 4345] leverage meta-paths, where each meta-path defines a sequence of entities and relations between users and items on KG, to explore KG in a more natural and intuitive way. For instance, PER [43] first generates meta-path-based latent features and then employs matrix factorization techniques on the features to learn the latent representations of users and items. FMG [44] devises a “matrix factorization (MF) + factorization machine (FM)” framework, which first applies MF on the meta-graph based similarities and then adopts FM with Group lasso (FMG) to automatically select significant meta-graph based features. HERec [45] first proposes a random walk strategy based on meta-paths to distill the node embeddings containing rich semantics. Next, the learned embeddings are incorporated into a matrix factorization model for the rating prediction task. However, the above methods only utilize meta-paths to update the user-item similarity, thus lack the ability to reason on paths.

To directly utilize meta-paths for explanation, recent researches focus on learning the representation for each path. RKGE [11] and KPRN [10] leverage the LSTM modules to compute each path instance’s embedding and merge them into the final prediction score. MCRec [22] employs the CNN technique to obtain meta-path based context and learns an effective interaction model among user, items, and meta-based context. TEM [46] combines tree-based models and embedding-based models, which first adopts a GBDT tree to select cross features as decision paths and then incorporates the cross features into an embedding model for collaborative filtering. Although these studies can provide the meta-path level or instance-level explanation, they divide the many-to-one structure, which is that multiple entities connect to another entity under a certain relation, into separate paths. In this way, the tree structure cannot be fully preserved [17].

Recently, RippleNet [47] benefits from both the embedding-based and path-based methods, which automatically discovers possible paths between items by iteratively propagating users’ preferences in the KG. However, without the regularization of predefined rules, the explanation quality may be degraded by the noisy information.

GNN-based methods

Recent researches have paid much attention to the graph neural networks (GNNs) [1214, 48], which employ the information aggregation process to introduce high-order neighbors into representations for recommendation. KGCN [14] aggregates neighborhood information selectively and biasedly, where neighbors are weighted by scores given the relation and specific user. KGAT [12] applies the attentive embedding propagation layer to discriminate the importance of the neighbors on the collaborative knowledge graph. CKAN [13] adopts two different propagation strategies to encode both collaborative signals and knowledge associations. MVIN [48] proposes a multi-view network from user and entity angles, which adopts a mixing layer to capture the mixed GCN information from the various layer-wise neighborhood features. Although the above methods can provide a sub-graph with attention weights as the recommendation reason, they may suffer from noisy information by considering all nodes in the sub-graph for explanation [20].

To address the aforementioned weak point, [1921, 23, 49] combine both GNN models and path-based methods to introduce more regularization into the GNN-based recommendation approaches. In particular, HAN [19] defines the head and tail nodes connected by a meta-path instance as neighbors and then aggregates the meta-path based neighbors with the hierarchical node-level and semantic-level attentions. Similar to [19, 50] and [51] construct neighbor set according to the pre-defined meta-paths. [50] samples neighbors based on meta-paths for each node. [51] decomposes the heterogeneous graph into multiple meta-path based sub-graphs, where each sub-graph contains all neighbor pairs connected by a certain meta-path. The above three studies only consider the two-end nodes in meta-paths, and ignore the middle nodes. MEIRec [23] proposes a meta-path guided information aggregation strategy, which simply selects different-step neighbors of an object following meta-paths, and employs LSTM and CNN as aggregation function for users’ neighbors and query’s neighbors, respectively. RGRec [20] and GEMS [49] construct the adjacency matrix following meta-paths and meta-structures, respectively. As for model interpretability, they both adopt the sementic-level attention to distinguish the saliency of multiple meta-graphs (meta-structures). NIRec [21] captures the interactive patterns between each pair of nodes through their metapath-guided neighborhoods with the node-level and semantic-level attention mechanism. However, to the best of our knowledge, they fail to preserve the exact relational dependency in paths with both node-level and semantic-level attention, which affects the granularity of interpretability to some extent. In this paper, we propose a meta-path guided graph attention network, towards comparable performance and finer-grained interpretability.

Task formulation

In this section, we first define the symptom-herb interaction data, the integrated knowledge graph TCM-MM, and introduce how to merge them as the inputs of our model. Then the meta-path definition and task formulation are given based on the input data.

Symptom-herb bipartite graph

We construct the symptom-herb interaction graph following [5]. Taking a prescription p=sc={s1,s2,...,sk},hc={h1,h2,...,hn} as an instance, we link each symptom s in the symptom set sc to every herb h in the herb set hc. It is defined as {(s,h,ysh)sS,hH)} where S and H separately denote the symptom and herb collections, and a link ysh=1 indicates that symptom s and herb h occurred in the same prescription; otherwise ysh=0.

Knowledge graph

The knowledge graph is a commonly adopted information storage and retrieval tool, which organizes the real-world facts, such as entity attributes and commonsense knowledge, into triples [45]. Each triple is defined as (qrt), which denotes that there exists a relation r from the head entity q to the tail entity t. In the TCM domain, for instance, (dried ginger, effect, warming middle-jiao to dispel cold) describes that herb dried ginger has the effect of warming middle-jiao to dispel cold; (dried ginger, ingredient, SMIT00507) means that herb dried ginger contains the ingredient SMIT00507. Although there are some integrated databases containing both TCM and the related modern medicine knowledge [7, 8], they omit the TCM diagnose process. Thus, we build a integrated KG TCM-MM in this paper, which is represented as {(q,r,t)q,tV,rR}.

Following [12], we combine the symptom-herb bipartite graph and TCM-MM together. In particular, we transform the link ysh=1 into the triple (sinteracth). The extended KG G is represented by {(q,r,t)q,tV,rR}, where V={VSH} and R={R{interact}}. Notably, for each relation r, there is an inverse relation r-1 in R. For example, for triple (dried ginger, ingredient, SMIT00507), there is a symmetrical triple (SMIT00507, ingredient-1, dried ginger). Through this manner, the side information can be considered to assist the prescription mining.

Meta-path

Based on the extended KG G, we explore the high-order connections between symptoms and herbs to endow our model with the reasoning ability. Technically, we adopt the meta-path to formulate the multi-hop relations in G. The entity type set of V is represented by A. A meta-path M is a sequence of entity types and relations, defined as A1R1A2R2RL-1AL, which indicates that there is a composite relation (R1,R2,,RL-1) between entity type A1 and AL. In this paper, to depict the link connecting TCM symptoms and herbs, we define several meta-paths from symptoms to herbs as follows:

  • M1=TCMsymptomsyndromesyndrometreatmentfunctioneffect-1herb. A meta-path instance m1 of M1 is an entity sequence following the schema M1. For example, m1 = 呕吐 syndrome Yang deficiency of spleen and kidney treatment warming middle-jiao to dispel cold effect-1 dried ginger.

  • M2 = herb ingredient ingredient chemical-MMsymptom MM symptom MMsymptom-TCMsymptom TCM symptom A corresponding meta-path instance is m2 = dried ginger ingredient SMIT00507 chemical-MMsymptom emesis MMsymptom-TCMsymptom 呕吐

  • M3=TCMsymptomremedyherb. A related meta-path instance of M3 is m3 = 呕吐 remedy dried ginger.

  • M4=TCMsymptominteractherb. A meta-path instance of M4 is m4 = 呕吐interact dried ginger.

The above meta-path instances from the same TCM symptom 呕吐 to the same herb “dried ginger” demonstrate the possibility of employing the multiple composite semantics to do explanations. In particular, M3 is extracted from the TCM-MM KG, which indicates that a certain herb can relieve a certain symptom. M4 is extracted from the prescriptions, which indicates that in a prescription, each symptom interacts with each herb. M3 can be adopted to explain the formulation pattern among the interaction pairs from M4.

Task description

Similar to [5], the explainable herb recommendation task aims to generate a herb set for the given symptom set with explanations. Each prescription is represented by p=sc={s1,s2,...,sk},hc={h1,h2,...,hn}, where sc is the symptom set and hc is the herb set. Given a symptom set sc, we need to compute a probability vector y^sc for all candidate herbs, where every dimension i in y^sc indicates the probability that herb hi is appropriate to cure sc. The input and output are described as,

  • Input: Herbs H, Symptoms S, Prescriptions P, extended KG G, meta-path set M.

  • Output: A learned function y^sc=f(sc,Hθ,P,G,M), which computes the probability vector y^sc for all herbs H for the symptom set sc.

Methodology

In this section, we describe the proposed MGAT model in details, which consists of two parts: (1) attentive embedding layer and (2) prediction layer. The attentive embedding layer shown in Fig. 2 employs the GAT technique. Specifically, for the information propagation, it transfers the neighbor information along the meta-paths predefined in Sect. “Task formulation”. As for information aggregation, we utilize the attention mechanism to discriminate the different saliency of multiple paths. After obtaining the representations of KG entities, the design of the prediction layer is consistent with [5], where multiple symptom embeddings are merged into the syndrome representation, and the syndrome embedding interacts with herb embeddings to output the prediction scores. In this way, our model is endowed with the reasoning ability based on paths.

Fig. 2.

Fig. 2

Illustration of attentive embedding layer in the proposed MGAT model

Attentive embedding layer

The embedding layer focuses on obtaining comprehensive representations to encode rich semantics of meta-paths for KG entities. As Fig. 2 demonstrates, there are multiple instances connecting the symptom si and herb h2 under meta-path M1. By borrowing the idea of graph attention network [52], we utilize the attention mechanism to distinguish the different degrees of contribution for multiple paths. In particular, for each meta-path Mi, we construct a sub-graph GMi, where each path connecting a symptom and a herb follows the definition of Mi. Taking the sub-graph GM1 in Fig. 2 as an example, path <si,syn1,fun1,h1> is one instance of meta-path TCMsymptomsyndromesyndrometreatmentfunctioneffect-1herb, which is that each hop in the sub-graph is under the control of the corresponding meta-path. To learn the entity embeddings of multiple sub-graphs, first we apply a GAT (Graph Attention Network) to encode each sub-graph, and then the multiple representations are merged into the fused embeddings.

Entity representation guided by single meta-path

First, we introduce the GAT design on the single meta-path. As Fig. 2 shows, there are multiple neighbors linking to the same node, which is the aforementioned tree structure. For instance, syn1 and syn2 are both syndromes causing symptom si. To capture the tree structure and discriminate the different importance of multiple paths linking each symptom-herb pair, we adopt the attention mechanism in the information propagation process.

  • Attentive Information Propagation

    In this section, we first illustrate how we determine neighbors along meta-path Mi for each node. According to the network structure of GM1 in Fig. 2, the multi-hop neighbor sets of the target node si are NM11(si)={syn1,syn2}, NM12(si)={fun1,fun2,fun3}, and NM13(si)={h1,h2,h3}. To obtain the representation for si, first the embeddings of h1 and h2 constitute the representation of fun1, h2 contributes to fun2, and h3 contributes to fun3. Second, fun1 and fun2 are merged into syn1’s embeddings, and fun3’s embedding propagates to syn2. Finally, the embeddings of syn1 and syn2 contribute to si’s representation. To quantify the importance distribution for each hop, we leverage the attention mechanism similar to [12]. For the target node q, we set the one-hop neighbor triples of q under meta-path M as NM(q)={t(q,r,t)GM}. Given a triple (qrt), the weight of the information t passes to q is defined as follows,
    αq,tM=(WMet0)Ttanh(WMeq0+eM) 1
    where et0 and eq0 are initial embeddings for t and q for all meta-paths in M, respectively. eM is the relation embedding for meta-path M, and WM is the weight matrix. Hereafter, we normalize the weight by softmax function:
    αq,tM=exp(αq,tM)tNM(q)exp(αq,tM) 2
    Up to now, the merged one-hop neighbor message for q can be represented by,
    eNM(q)0=tNM(q)αq,tMet0 3
    Through this manner, the learned attention weights can help to regularize the propagation process at a more fine-grained level, and further endow our model with higher interpretability.
  • Information Aggregation

    After constructing the first-order neighbor information eNM(q)0, this part aims to update the embedding for the target node. Here we utilize the Bi-Interaction Aggregator in [12], which takes two kinds of feature interactions into consideration between eq0 and eNM(q)0. Specifically, the first layer representation for q is formulated as,
    eqM,1=LeakyReLU(WaddM,1(eq0+eNM(q)0))+LeakyReLU(WdotM,1(eq0eNM(q)0)) 4
    wherein WaddM,1 and WdotM,1 are the first-layer weight matrices, and indicates the element-wise product. Further, we can extend the one-layer propagation to multiple hops. In particular, in the l-th layer, the representation for q is recursively defined as follows,
    eqM,l=LeakyReLU(WaddM,l(eqM,l-1+eNM(q)l-1))+LeakyReLU(WdotM,l(eqM,l-1eNM(q)l-1)) 5
    eNM(q)l-1=tNM(q)αq,tMetM,l-1 6
    wherein etM,l-1 is the representation of entity t encoding its neighbor information form the previous (l-1) hops. Further, it contributes to construct the (l-1)-hop neighbor message for the target entity q. Finally, the neighbor message eNM(q)l-1 and the (l-1)-hop representation eqM,l-1 interact to update entity q’s embedding. As a result, the high-order semantic in the meta-path can be captured through the recursive propagation and aggregation process.

To notice that, when selecting paths to explain why recommending a herb for a certain symptom from M1, M2 and M3, there might be multiple paths for M1 and M2 linking the symptom-herb pair, while there is only one path for M3, if existed. Thus, we only apply the attention mechanism on GM1 and GM2. For the information propagation formulations of M3 and M4, we just remove the attention weight α from Eq.6. Through the above propagation, for each entity q in GM, there are multiple representations, {eqM,1,eqM,2,...,eqM,L}, where L is the length of meta-path M. Referencing [53], we concatenate the multiple embeddings to construct the final representation eqM,, which is formulated as,

eqM,=eq0eqM,1...eqM,L 7

where denotes the concatenation operation. In this way, compared with only adopting the last layer’s embedding, the semantics hided in all layers is captured comprehensively.

Multi-view representation combination

Up to now, we have extracted multiple representations for each entity q from various meta-paths. Here we combine the multiple embeddings by concatenation. We aggregate the various embeddings for each entity q as follows,

eq=eqM1,eqM2,eqM3,eqM4, 8

Note that besides concatenation, other aggregators can also be applied, such as average, attention-weighted sum, etc., and we compare different merge operations in Sect. “Influence of hyperparameters (RQ4)”.

To this step, we have learned the multi-view representations for all entity nodes. The attentive embedding layer benefits from both the recursive information propagation and decision process. By guiding the neighbor propagation via predefined meta-paths, the graph neural network can accurately capture the rich semantics with preserving the tree-structure in KG. Furthermore, with the help of attention mechanism, every hop in the path instances is assigned with attention weights. In this way, the weight for each path can be induced for recommendation explanation.

Model prediction

After obtaining the multi-view representations for all nodes in G, next we aim to build the syndrome representation for each symptom set sc according to [5]. First, the embeddings of all symptoms in sc are stacked into a matrix escRsc×d, where d is the embedding dimension. Next we aggregate esc into the syndrome representation by average pooling followed by a single layer MLP transformation, which is formulated as,

escsyndrome=ReLU(Wmlp·avg(esc)+bmlp) 9

where avg is the mean operation, Wmlp and bmlp are the weight matrices for the MLP network, and escsyndrome is the induced syndrome embedding for sc. As a result, benefiting from the nonlinearity in MLP, our model can capture the complex correlation among symptoms and learn a comprehensive syndrome representation.

Training and inference

For the herb recommendation task, given a symptom set, our task is to suggest a set of corresponding herbs. To evaluate the quality of the recommended herbs, we devise the multi-label loss according to [5], which also takes the label imbalance problem in the TCM prescription data [5] into account. Technically, the recommended herb set and the ground truth herb set are denoted by the multi-hot vectors, where the dimension of existing herb equals 1, otherwise 0. In order to measure the distance of the above two vectors, the loss function is formulated as,

L=argminΘ(sc,hc)PWMSE(hc,f(sc,H))+λΘΘ22f(sc,H)=sigmoid(escsyndrome·(eH)T) 10

where the operator “·” indicates the inner dot operation. eHRN×d denotes the learned embeddings for all herbs, wherein N is the number of all herbs and d is the embedding dimension. hc is the multi-hot vector for the ground truth herb set hc, f(scH) is the output probability vector for all herbs H. λθ controls the L2 regularization strength. WMSE [54] (weighted mean square error) is defined by,

WMSE(hc,f(sc,H))=i=1Nwi(hci-f(sc,H)i)2 11

wherein hc and f(scH) are in the same dimension N. wi aims to relieve the label imbalance problem by assigning different weights to herbs according to their occurrence frequencies, which is as follows,

wi=maxkfreq(k)freq(i) 12

where freq(i) denotes herb i’s frequency in prescriptions. The numerator part represents the largest frequency among all herbs appearing in prescriptions. We employ Adam [55] as the optimizer and update the model parameters via a mini-batch fashion.

Inference

To be consistent with the setting in [5], given the symptom set sc, we regard the top k herbs with the highest probabilities in f(scH) as the recommended herb set.

Experiments

In this section, we evaluate our proposed MGAT based on the publicly available TCM prescriptions in [26] and the integrated dataset TCM-MM. There are several questions to answer:

RQ1: Can our proposed model outperform the state-of-art herb recommendation approaches?

RQ2: Can our proposed model achieve comparable performance with the state-of-the-art graph neural network-based recommendation approaches?

RQ3: How does each meta-path perform?

RQ4: How does our model performance react to hyper-parameter settings (such as regularization strength)?

RQ5: Can our proposed MGAT provide explainable herb recommendation?

Dataset

The experimental data sets include the TCM prescription data and the KG data. To be consistent with work [5], we still adopt the same TCM prescription data set in [4]. As shown in Table 1, the prescription dataset contains 26,360 prescriptions, further divided into 22,917 for training and 3443 for testing. The statistics of the TCM-MM is demonstrated in the TCM-MM part of Table 2. After combining the train prescriptions and the TCM-MM KG into G, we selected the KG triples according to our pre-defined meta-paths as Gmeta-paths. Notably, the triples for GM4 is actually the train dataset in Table 1. The relation number denotes the number of one-way direction relations, and we set a reverse relation for each when operating GAT on the graphs.

Table 1.

Statistics of the prescription data sets

Dataset #prescriptions #symptoms #herbs
All 26,360 360 753
Train 22,917 360 753
Test 3443 254 558

Table 2.

Statistics of KG

Dataset #triples #entities #relations #included symptoms #included herbs
TCM-MM All 123358 37114 16 360 753
G All 146275 37114 17 360 753
Gmeta-paths GM1 2941 1063 3 293 377
GM2 18301 3938 3 114 342
GM3 1303 672 1 166 506
GM4 22917 1113 1 360 753
All 45462 4986 8 360 753

Evaluation

Given a symptom set, our proposed model suggests a herb set to cure the symptoms. To evaluate the performance of our approach, we adopt the following three measures [4, 53] commonly used in recommender systems, Precision at rank K (Precision@K), Recall at rank K (Recall@K), and Normalized Discounted Cumulative Gain at rank K (NDCG@K). Specifically, we adopt Precision@5 to decide the optimal parameters. We truncate the ranked list at 20 for all three measures and report the average metrics for all prescriptions in the test set.

Baselines

To demonstrate the effectiveness, we compare our proposed MGAT with topic-model (HC-KGETM), embedding-based (CKE), path-based (RKGE), and graph neural network-based (SMGCN, RGCN, and KGAT) methods. In particular, SMGCN is conducted only on the prescription dataset P. HC-KGETM employs pre-trained KG embeddings for initialization and trains the model on the prescriptions P. CKE, RGCN, and KGAT are based on dataset G. Besides, RKGE and our MGAT model are operated on Gmeta-paths.

  • HC-KGETM [4] integrates the pre-trained TransE embeddings from a TCM knowledge graph into the topic model to consider both the co-occurrence information in TCM prescriptions and comprehensive semantic relatedness of symptoms and herbs from the knowledge graph. Compared with the KG in HC-KGETM, our TCM-MM contains a more complete mapping between TCM and modern medicine, supporting better model interpretability.

  • CKE [36] is a representative embedding-based method, which introduces the TransR [41] loss as regularization to enhance matrix factorization.

  • RKGE [11] leverages the LSTM modules to compute each path instance’s embedding and merge them into the final prediction score.

  • SMGCN [5] is the state-of-the-art herb recommendation approach. It constructs multiple graphs and applies GCNs on them to learn the TCM entity embedding. Besides, it aggregates the symptom set to make the syndrome-aware herb recommendation.

  • RGCN [56] devises a relational graph convolutional network for knowledge base embedding. In our implementation, we replace our attentive embedding layer with its R-GCN layer design. In each R-GCN layer, the neighbor information is propagated according to different relation type channels.

  • KGAT [12] is a state-of-the-art GNN-based recommender. It combines KG with the user-item graph and applies the attentive neighborhood aggregation mechanism on the integrated graph to learn user and item representations.

For GNN-based methods, SMGCN, RGCN, KGAT, and our proposed MGAT, they utilize the MLP layer to generate syndrome representation as Sect. “Model prediction” does, and employ the multi-label loss. Other baselines do not adopt the MLP layer, where HC-KGETM employs log-loss, CKE uses bpr-loss, and RKGE utilizes BCE loss.

Parameter settings

The HC-KGETM method is implemented in Java, and RKGE approach is based on Pytorch. The other comparative methods and our model adopt Tensorflow. For the topic model HC-KGETM, we follow the parameter settings in [4]. We use grid search strategy to determine the optimal learning rate lr, the regularization coefficient λ and the dropout ratio. In particular, lr is varied in {10-5,10-4,10-3}, λ is tuned in {0,10-6,10-5,10-4,10-3,10-2}, and the dropout rate is searched in {0,0.1,...,0.8}. We use Xavier initializer [57] and Adam optimizer [55] to train models with the batch size of 1024.

For embedding-based and GNN-based methods, the embedding size equals 64. The path-based baseline RKGE employs the entity embeddings generated by HC-KGETM as initial embeddings, where the input HC-KGETM embeddings are in dimension 40 and the hidden dimension is tuned in {8,16,32,64,128}. The layer depth for SMGCN, RGCN and KGAT are both fixed as 3. For our proposed MGAT model, there are four sub-graphs, {GM1, GM2, GM3, GM4}. For each sub-graph, the layer depth equals to the length of its corresponding meta-path, to guarantee that the information propagation covers each meta-path. The optimal parameter settings are recorded in Table 3.

Table 3.

Optimal parameters of comparative models

Approaches Best parameter settings
HC-KGETM α = 0.05 βs = βh = 0.01 γ = 1
CKE emb_dim=64 lr=1e-4 λ=1e-5
RKGE input_dim=40 hidden_dim=16 lr=1e-4 dropout=0.25
SMGCN emb_dim=64 lr = 2e-4 dropout = 0.0 λ = 7e-3 xs =5 xh=40
RGCN emb_dim=64 lr=4e-3 dropout=0.1 λ=0.0
KGAT emb_dim=64 lr=6e-4 droput=0.2 λ=1e-5
MGAT emb_dim=64 lr=7e-4 layer=[[64,32,16], [64,32,16], [64], [64]] dropout=0.2 λ=3e-2

Performance comparison

In this section, we first report the performance of all the methods. Next, we conduct the ablation analysis to explore the effectiveness of each meta-path. Then we explore the influence of hyperparameters. Last, we introduce our explanation function for the explainable recommendation.

Overall comparison (RQ1 & RQ2)

The overall performance is presented in Table 4. We have the following observations:

  • MGAT consistently outperforms all comparative methods in terms of all metrics. In particular, it outperforms the topic-model HC-KGETM w.r.t. p@10 by 6.06% and r@10 by 7.16%. Besides, as for the state-of-the-art KG-based GNN recommender KGAT, MGAT outperforms it in terms of p@10 by 1.37% and r@10 by 2.08%. We conduct the paired t-tests between KGAT and MGAT, which shows that the improvements of MGAT over KGAT are statistically significant. By adopting the pre-defined meta-paths to guide the GAT propagation, MGAT can preserve the tree structure in KG and accurately capture the rich semantics in meta-paths.

  • The GNN-based methods achieve better performance than HC-KGETM, CKE and RKGE, verifying that graph neural networks are superior in modeling complex relations and rich semantics in TCM prescriptions and the related KG. Further, our proposed MGAT performs slightly better than KGAT. We can observe that guiding the GAT propagation by meta-paths can degrade noise and improve the recommendation accuracy.

Table 4.

The overall performance comparison

Methods p@5 p@10 p@20 r@5 r@10 r@20 ndcg@5 ndcg@10 ndcg@20
HC-KGETM 0.2783 0.2197 0.1626 0.1959 0.3072 0.4523 0.3717 0.4491 0.5501
CKE 0.2692 0.2161 0.1609 0.1901 0.3063 0.4509 0.3636 0.4443 0.5490
RKGE 0.2719 0.2159 0.1584 0.1928 0.3028 0.4401 0.3681 0.4462 0.5469
RGCN 0.2910 0.2302 0.1682 0.2079 0.3250 0.4667 0.3901 0.4684 0.5691
SMGCN 0.2928 0.2295 0.1683 0.2076 0.3245 0.4689 0.3923 0.4687 0.5716
KGAT 0.2926 0.2299 0.1683 0.2083 0.3225 0.4693 0.3927 0.4691 0.5721
MGAT 0.2941* 0.2330* 0.1699* 0.2105* 0.3292* 0.4743* 0.3935* 0.4717* 0.5733*
%Improv. by HC-KGETM 5.66% 6.06% 4.48% 7.46% 7.16% 4.87% 5.86% 5.03% 4.22%
%Improv. by KGAT 0.50% 1.37% 0.97% 1.05% 2.08% 1.08% 0.19% 0.55% 0.21%

The bold font indicates the best results

The second best results are underlined. p@k and r@k are short for precision@k and recall@k

*Means the significant paired t-test result

Ablation analysis (RQ3)

To verify the impact of each meta-path and attention mechanism, we first remove the attention mechanism from MGAT to obtain the sub-model MGCN. Further, as shown in Table 1, the meta-path network GM1, GM2 and GM3 cannot cover all symptoms and herbs in TCM prescriptions P. To guarantee all symptom and herb embeddings to be updated, we add GM4 to each of them, as GM1+M4, GM2+M4 and GM3+M4. We summarize the experimental results in Table 5 and have the following findings:

  • For MGCN, three variants of GM1+M4, GM2+M4 and GM3+M4 slightly outperform the baseline KGAT in terms of almost all top-10 metrics. It demonstrates the rationality and effectiveness of integrating information propagation with decision process.

  • Among the three variants, based on the prescriptions M4, adding each meta-path M1, M2, or M3 achieves comparable performances, which validates that the predefined M1, M2, or M3 contribute almost the same to MGCN. MGCN employing all four meta-paths is further superior to the above three variants, which denotes that simultaneously considering these meta-paths helps improve the performance to some extent.

  • MGAT performs slightly better than MGCN. It illustrates the effectiveness of the attention mechanism.

Table 5.

Effect of each meta-path and attention mechanism

Approaches p@10 r@10 ndcg@10
KGAT 0.2299 0.3225 0.4691
MGCN (GM1+M4) 0.2303 0.3252 0.4704
MGCN (GM2+M4) 0.2305 0.3265 0.4672
MGCN (GM3+M4) 0.2306 0.3250 0.4682
MGCN 0.2313 0.3271 0.4715
MGAT 0.2330 0.3292 0.4717

The bold font indicates the best results

Influence of hyperparameters (RQ4)

In this section, we will discuss the critical factors in detail.

  • Effect of Regularization.

It is easy for neural networks to overfit the train data. The classic solution is to set the regularization term and the dropout of neurons. In this paper, the dropout ratio adjusts the ratio of removed neurons in the training process. Specifically, we adopt the message dropout as in [5]. Besides, λ controls the regularization strength on parameters. Fig. 3 reports the influence of λ and Fig. 4 shows the influence of the dropout ratio. We can find that MGAT performs slightly better when λ equals 3e-2. Lower λ might not sufficient to prevent overfitting, while higher λ may cause under-fitting and hurt the performance. As for the dropout ratio, we can observe a similar trend where the dropout ratio 0.2 results in better performance.

  • Effect of different attention network implementations.

To explore the impact of attention networks, we reference the node-level attention implementation in HAN [19] to be compared with our TransR attention defined in Eq.1. The HAN attention is formulated by Eq.13, where WHAN,M is the attention weight vector for meta-path M, and indicates the concatenation operation. Table 6 demonstrates the experimental results. We have the following observations:

Fig. 3.

Fig. 3

Performance w.r.t. different λ on MGAT

Fig. 4.

Fig. 4

Performance w.r.t. different dropout ratios on MGAT

Table 6.

Comparison of different attention network implementations

Approaches p@10 r@10 ndcg@10
MGCN 0.2313 0.3271 0.4715
MGAT (HAN attention) 0.2319 0.3282 0.4683
MGAT (TransR attention) 0.2330 0.3292 0.4717

The bold font indicates the best results

(1) MGAT (HAN attention) and MGAT (TransR attention) is slightly superior to MGCN in terms of p@10 and r@10. It illustrates the effectiveness of attention mechanism at some extent.

(2) Furthermore, MGAT (TransR attention) slightly outperforms MGAT (HAN attention), which verifies that the TransR attention is more suitable to model the TCM KG for herb recommendation.

αq,tM=softmax(tanh(WHAN,MT·[eq0et0])) 13
  • Effect of different combination implementations for multiple meta-paths.

Except for the simple concatenation shown in 4.1.2, we also explore other merge methods like average sum (avg_sum), attention-weighted sum (att_sum) and attention-weighted concatenation (att_concat), to combine the multi-view representations. Table 7 shows the experimental results. Specifically, concat indicates the operation defined in Eq.8. In att_concat and att_sum, the attention weights are calculated following the semantic-level attention in HAN [19], which is defined as,

wMi=1|V|qVzT·tanh(Watt·eqMi,+batt) 14

wherein V={VSH}, WattRd×d, battRd and zRd. After getting the importance of each meta-path, we normalize them by softmax function,

βMi=exp(wMi)i=1|M|exp(wMi) 15

For att_concat, the merge operation is defined as follows, where βMiRd is the repeated βMi and indicates the element-wise multiplication,

eq=[βM1eqM1,][βM2eqM2,][βM3eqM3,][βM4eqM4,] 16

For att_sum, eq=i=1||M||βMieqMi,. For avg_sum, we just remove the coefficient βMi and average the sum.

Table 7.

Comparison of different merge implementations

Approaches p@10 r@10 ndcg@10
concat 0.233 0.3292 0.4717
att_concat 0.2322 0.3292 0.4703
avg_sum 0.2295 0.3229 0.4688
att_sum 0.229 0.3223 0.4689

The bold font indicates the best results

As Table 7 demonstrates, for concat and avg_sum operations, adding the attention weights does not bring apparent performance improvement. Further, the concat and att_concat show the slight improvement compared with avg_sum and att_sum. Thus, the attention mechanism may help little in merging multiple representations from various meta-paths, and we choose the simple concatenation operation due to not involving additional parameters to learn.

Explainability of MGAT (RQ5)

In this section, we conduct case studies via some recommendation cases to analyze the effectiveness and explainability of our MGAT model. As Fig. 5 presents, there is one prescription instance from the test dataset, wherein the Herb Set column, the bold red font marks the hit herbs. Based on the TCM knowledge, the missing herbs can also be the substitutes of the ground truth herbs. In this way, it can be verified that MGAT can generate accurate herb recommendations. Besides, with the help of the attention mechanism, we can reason on paths to infer the connection between TCM symptom-herb pairs for explanations. For each path, we first define the explanation function to compute its recommendation score. Taking the path in Fig. 6b dried ginger 0.056SMIT005070.14emesis0.027 呕吐 as an instance, the explanation score is formulated as follows, where the attention weights of all hops are multiplied together:

pathscore=0.0560.140.027=0.02% 17

Up to now, for each meta-path, we can rank all path instances connecting a TCM symptom and the recommended herb according to the path scores and select several top paths for explanations. According to the above process, we select several paths for the <呕吐-dried ginger> and <呕吐-pinellia ternata> pairs, presented in Figs. 6 and 7 respectively. Specifically, as for the <呕吐-dried ginger> pair, we can generate the explanation texts as follows,

  • Herb dried ginger is recommended for the TCM symptom < 呕吐> because the syndrome causing <呕吐> can be Yang deficiency of spleen & stomach, one treatment for Yang deficiency of spleen & stomach is warming middle-jiao to dispel cold, and dried ginger has the above function. (0.029*0.144*0.054=0.02%)

  • Herb dried ginger is recommended for the TCM symptom < 呕吐 > because it contains a kind of ingredient SMIT03300. This ingredient SMIT03300 can cure gagging. The corresponding TCM symptom for gagging is <呕吐>.(0.058*0.079*0.045=0.02%)

In addition, for the pair <呕吐-pinellia ternata>, the explanations are,

  • Herb pinellia ternata is recommended for the TCM symptom < 呕吐> because the syndrome causing <呕吐> can be deficiency of stomach qi, one treatment for deficiency of stomach qi is downbear counterflow and check vomiting, and pinellia ternata has the above function. (0.027*0.236*0.108=0.07%)

  • Herb pinellia ternata is recommended for the TCM symptom <呕吐> because it contains a kind of ingredient SMIT03300. This ingredient SMIT03300 can cure gagging. The corresponding TCM symptom for gagging is <呕吐>.(0.108*0.079*0.045=0.04%)

Besides, under meta-path M3, there also exist (呕吐 remedy dried ginger) and (呕吐 remedy pinellia ternata) paths. From the above explanations, we can observe that based on the modern medicine aspect, herb dried ginger and pinellia ternata both contain the chemical ingredient SMIT03300 to relieve the TCM symptom 呕吐. Besides, from the TCM view, dried ginger and pinellia ternata can both treat the deficiency of stomach related syndromes.

Fig. 5.

Fig. 5

The herb recommendation case

Fig. 6.

Fig. 6

Explainable path examples for <呕吐-dried ginger> pair

Fig. 7.

Fig. 7

Explainable path examples for <呕吐-pinellia ternata> pair

The above analysis shows that the explainability study can uncover the underlying activation mechanism of herbs from both TCM and modern medicine aspects and shed light on the herb compatibility study from a modern medicine view. To this step, we can conclude that our proposed MGAT can achieve effective and explainable performance.

Conclusion and future work

In this paper, we aim to offer explainable herb recommendation based on the integrated KG containing both TCM and modern medicine information. Specifically, we devise a Meta-path guided Graph Attention Network by combining information propagation and decision process together. For the information propagation, the predefined meta-paths determine neighbors for each node. As for the information aggregation, the attention mechanism is employed to distinguish the saliency of different paths linking each symptom-herb pair. In this way, our proposed model can capture the long-range semantics along meta-paths and offer fine-grained explanations with the help of learned attention weights. Through this manner, the underlying activation mechanism of herbs can be discovered from multiple aspects. In future applications, through the TCM theory meta-path, our model can help TCM doctors discover the prescribing experience from the prescriptions of famous TCM experts. By the modern medicine meta-path, our approach can show the essential compounds for TCM researchers and accelerate the drug discovery from herbal medicines to some extent.

Funding

This work was supported by National Natural Science Foundation of China (Grant No. 61972155), Science and Technology Commission of Shanghai Municipality (Grant No. 20DZ1100300), Special assistance for Chinese postdoctoral staff (Grant No. 2022TQ0297).

Declarations

Conflict of interest

The authors report no conflict of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Zhang Rz Yu, Sj Bai H, Ning K. TCM-Mesh: the database and analytical system for network pharmacology analysis for TCM preparations. Sci Rep. 2017;7(1):1–14. doi: 10.1038/s41598-017-03039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cheung F. TCM: made in China. Nature. 2011;480(7378):S82–S83. doi: 10.1038/480S82a. [DOI] [PubMed] [Google Scholar]
  • 3.Jl Ren, Zhang AH, Wang XJ. Traditional Chinese medicine for COVID-19 treatment. Pharmacol Res. 2020;155:104743. doi: 10.1016/j.phrs.2020.104743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wang X, Zhang Y, Wang X, Chen JA. Knowledge graph enhanced topic modeling approach for herb recommendation. In: DASFAA; 2019. p. 709–24.
  • 5.Jin Y, Zhang W, He X, Wang X, Wang X. Syndrome-aware herb recommendation with multi-graph convolution network. In: ICDE. IEEE; 2020. p. 145–56.
  • 6.Xue R, Fang Z, Zhang M, Yi Z, Wen C, Shi T. TCMID: traditional Chinese medicine integrative database for herb molecular mechanism analysis. Nucleic Acids Res. 2012;41(D1):D1089–D1095. doi: 10.1093/nar/gks1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ru J, Li P, Wang J, Zhou W, Li B, Huang C, et al. TCMSP: a database of systems pharmacology for drug discovery from herbal medicines. J Cheminform. 2014;6(1):1–6. doi: 10.1186/1758-2946-6-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wu Y, Zhang F, Yang K, Fang S, Bu D, Li H, et al. SymMap: an integrative database of traditional Chinese medicine enhanced by symptom mapping. Nucleic Acids Res. 2019;47(D1):D1110–D1117. doi: 10.1093/nar/gky1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhang Y, Tiňo P, Leonardis A, Tang K. A survey on neural network interpretability; 2020. arXiv preprint arXiv:2012.14261.
  • 10.Wang X, Wang D, Xu C, He X, Cao Y, Chua TS. Explainable reasoning over knowledge graphs for recommendation. In: AAAI; 2019. p. 5329–36.
  • 11.Sun Z, Yang J, Zhang J, Bozzon A, Huang LK, Xu C. Recurrent knowledge graph embedding for effective recommendation. In: RecSys; 2018. p. 297–305.
  • 12.Wang X, He X, Cao Y, Liu M, Chua TS. KGAT: knowledge graph attention network for recommendation. In: SIGKDD; 2019. p. 950–8.
  • 13.Wang Z, Lin G, Tan H, Chen Q, Liu X. CKAN: collaborative knowledge-aware attentive network for recommender systems. In: SIGIR; 2020. p. 219–28.
  • 14.Wang H, Zhao M, Xie X, Li W, Guo M. Knowledge graph convolutional networks for recommender systems. In: WWW; 2019. p. 3307–13.
  • 15.Wang H, Leskovec J, Zhang F, Zhao M, Li W, Zhang M, et al. Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In: SIGKDD; 2019. p. 968–77.
  • 16.Zhang J, Gao M, Yu J, Yang L, Wang Z, Xiong Q. Path-based reasoning over heterogeneous networks for recommendation via bidirectional modeling; 2020. arXiv preprint arXiv:2008.04185.
  • 17.Qiao Z, Wang P, Fu Y, Du Y, Wang P, Zhou Y. Tree structure-aware graph representation learning via integrated hierarchical aggregation and relational metric learning; 2020. arXiv preprint arXiv:2008.10003.
  • 18.Wang X, Huang T, Wang D, Yuan Y, Liu Z, He X, et al. Learning intents behind interactions with knowledge graph for recommendation; 2021. arXiv preprint arXiv:2102.07057.
  • 19.Wang X, Ji H, Cui P, Yu P, Shi C, Wang B, et al. Heterogeneous graph attention network. In: WWW; 2019. p. 2022–32.
  • 20.Lyu X, Li G, Huang J, Hu W. Rule-guided graph neural networks for recommender systems. In: The semantic web - ISWC 2020 - 19th international semantic web conference, Athens, Greece, November 2–6, 2020, Proceedings, Part I. vol. 12506 of Lecture Notes in Computer Science. Springer; 2020. p. 384–401.
  • 21.Jin J, Qin J, Fang Y, Du K, Zhang W, Yu Y, et al. An efficient neighborhood-based interaction model for recommendation on heterogeneous graph. In: Gupta R, Liu Y, Tang J, Prakash BA, editors., et al., KDD. ACM; 2020. p. 75–84.
  • 22.Hu B, Shi C, Zhao WX, Yu PS. Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In: SIGKDD; 2018;1:1531–40.
  • 23.Fan S, Shi C, Hu L, Zhu J, Ma B, Han X, et al. Metapath-guided heterogeneous graph neural network for intent recommendation. In: SIGKDD; 2019. p. 2478–86.
  • 24.Lin F, Xiahou J, Xu Z. TCM clinic records data mining approaches based on weighted-LDA and multi-relationship LDA model. Multimed Tools Appl. 2016;75(22):14203–14232. doi: 10.1007/s11042-016-3363-9. [DOI] [Google Scholar]
  • 25.Ji W, Zhang Y, Wang X, Zhou Y. Latent semantic diagnosis in traditional Chinese medicine. WWW. 2017;20(5):1071–87. [Google Scholar]
  • 26.Yao L, Zhang Y, Wei B, Zhang W, Jin Z. A topic modeling approach for traditional Chinese medicine prescriptions. TKDE. 2018;30(6):1007–1021. [Google Scholar]
  • 27.Chen X, Ruan C, Zhang Y, Chen H. Heterogeneous information network based clustering for categorizations of traditional Chinese medicine formula. In: BIBM. Cham: Springer; 2018. p. 839–46.
  • 28.Li W, Yang Z, Sun X. Exploration on generating traditional chinese medicine prescription from symptoms with an end-to-end method; 2018. arXiv preprint arXiv:1801.09030.
  • 29.Zhang Y, Yu M, Li N, Yu C, Cui J, Yu D. Seq2Seq attentional Siamese neural networks for text-dependent speaker verification. In: ICASSP; 2019. p. 6131–5.
  • 30.Li W, Yang Z. Distributed representation for traditional Chinese medicine herb via deep learning models; 2017. arXiv preprint arXiv:1711.01701.
  • 31.Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–2681. doi: 10.1109/78.650093. [DOI] [Google Scholar]
  • 32.Ruan C, Ma J, Wang Y, Zhang Y, Yang Y. Discovering regularities from traditional Chinese medicine prescriptions via bipartite embedding model. In: IJCAI; 2019. p. 3346–52.
  • 33.Ruan C, Wang Y, Zhang Y, Yang Y. Exploring regularity in traditional Chinese medicine clinical data using heterogeneous weighted networks embedding. In: DASFAA; 2019. p. 310–3.
  • 34.Yang Y, Rao Y, Yu M, Kang Y. Multi-layer information fusion based on graph convolutional network for knowledge-driven herb recommendation. Neural Netw. 2022;146:1–10. doi: 10.1016/j.neunet.2021.11.010. [DOI] [PubMed] [Google Scholar]
  • 35.Jin Y, Ji W, Zhang W, He X, Wang X, Wang X. A KG-enhanced multi-graph neural network for attentive herb recommendation. IEEE/ACM Trans Comput Biol Bioinform. 2021 doi: 10.1109/TCBB.2021.3115489. [DOI] [PubMed] [Google Scholar]
  • 36.Zhang F, Yuan NJ, Lian D, Xie X, Ma WY. Collaborative knowledge base embedding for recommender systems. In: SIGKDD; 2016. p. 353–62.
  • 37.Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences. In: WWW. 2019; p. 151–161.
  • 38.Wang H, Zhang F, Xie X, Guo M. DKN: deep knowledge-aware network for news recommendation. In: Champin P, Gandon FL, Lalmas M, Ipeirotis PG, editors. WWW. ACM; 2018. p. 1835–44.
  • 39.Wang H, Zhang F, Zhao M, Li W, Xie X, Guo M, et al. Multi-task feature learning for knowledge graph enhanced recommendation. In: Liu L, White RW, Mantrach A, Silvestri F, McAuley JJ, Baeza-Yates R, et al., editors. WWW. ACM; 2019. p. 2000–10.
  • 40.Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States; 2013. p. 2787–95.
  • 41.Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In: Bonet B, Koenig S, editors. AAAI. AAAI Press; 2015. pp. 2181–2187. [Google Scholar]
  • 42.Wang Z, Li J. Text-enhanced representation learning for knowledge graph. In: Kambhampati S, editor. IJCAI. New York: IJCAI/AAAI Press; 2016. pp. 1293–1299. [Google Scholar]
  • 43.Yu X, Ren X, Sun Y, Gu Q, Sturt B, Khandelwal U, et al. Personalized entity recommendation: a heterogeneous information network approach. In: WSDM; 2014. p. 283–92.
  • 44.Zhao H, Yao Q, Li J, Song Y, Lee DL. Meta-graph based recommendation fusion over heterogeneous information networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining; 2017. p. 635–44.
  • 45.Shi C, Hu B, Zhao WX, Philip SY. Heterogeneous information network embedding for recommendation. TKDE. 2018;31(2):357–370. [Google Scholar]
  • 46.Wang X, He X, Feng F, Nie L, Chua TS. TEM: tree-enhanced embedding model for explainable recommendation. In: WWW; 2018. p. 1543–52.
  • 47.Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, et al. RippleNet: Propagating user preferences on the knowledge graph for recommender systems. In: CIKM; 2018. p. 417–26.
  • 48.Tai CY, Wu MR, Chu YW, Chu SY. Ku LW. MVIN: learning multiview items for recommendation. In: SIGIR; 2020. p. 99–108.
  • 49.Han Z, Xu F, Shi J, Shang Y, Ma H, Hui P et al. Genetic meta-structure search for recommendation on heterogeneous information network. In: CIKM; 2020. p. 455–64.
  • 50.Liang X, Ma Y, Cheng G, Fan C, Yang Y, Liu Z. Meta-path-based heterogeneous graph neural networks in academic network. Int J Mach Learn Cybern. 2022;13(6):1553–1569. doi: 10.1007/s13042-021-01465-8. [DOI] [Google Scholar]
  • 51.Zhao X, Zhao X, Yin M. Heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction. Brief Bioinform. 2022;23(1):bbab407. doi: 10.1093/bib/bbab407. [DOI] [PubMed] [Google Scholar]
  • 52.Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. 2017. arXiv preprint arXiv:1710.10903.
  • 53.Wang X, He X, Wang M, Feng F, Chua T. Neural Graph Collaborative Filtering. In: SIGIR; 2019. p. 165–74.
  • 54.Hu H, He X. Sets2sets: Learning from sequential sets with neural networks. In: SIGKDD; 2019. p. 1491–9.
  • 55.Kingma DP, Ba J. Adam: a method for stochastic optimization. In: ICLR; 2015.
  • 56.Schlichtkrull M, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. Cham: Springer; 2018. [Google Scholar]
  • 57.Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: AISTATS; 2010. p. 249–256.

Articles from Health Information Science and Systems are provided here courtesy of Springer

RESOURCES