Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease

Go Eun Heo; Qing Xie; Min Song; Jeong-Hoon Lee

doi:10.1186/s12911-019-0934-5

. 2019 Dec 5;19(Suppl 5):240. doi: 10.1186/s12911-019-0934-5

Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease

Go Eun Heo ¹, Qing Xie ¹, Min Song ^1,^✉, Jeong-Hoon Lee ²

PMCID: PMC6894106 PMID: 31801521

Abstract

Background

Extracting useful information from biomedical literature plays an important role in the development of modern medicine. In natural language processing, there have been rigorous attempts to find meaningful relationships between entities automatically by co-occurrence-based methods. It has been increasingly important to understand whether relationships exist, and if so how strong, between any two entities extracted from a large number of texts. One of the defining methods is to measure semantic similarity and relatedness between two entities.

Methods

We propose a hybrid ranking method that combines a co-occurrence approach considering both direct and indirect entity pair relationship with specialized word embeddings for measuring the relatedness of two entities.

Results

We evaluate the proposed ranking method comparatively with other well-known methods such as co-occurrence, Word2Vec, COALS (Correlated Occurrence Analog to Lexical Semantics), and random indexing by calculating top-ranked entities related to Alzheimer’s disease. In addition, we analyze gene, pathway, and gene–phenotype relationships. Overall, the proposed method tends to find more hidden relationships than the other methods.

Conclusion

Our proposed method is able to select more useful related entities that not only highly co-occur but also have more indirect relations for the target entity. In pathway analysis, our proposed method shows superior performance at identifying (functional) cross clustering and higher-level pathways. Our proposed method, resulting from phenotype analysis, has an advantage in identifying the common genotype relating to phenotypes from biological literature.

Keywords: Information extraction, Semantic relatedness, Ranking algorithm, Knowledge discovery, Alzheimer’s disease

Background

With the recent exponential growth of biomedical literatures, extracting useful information from these literatures has come to play an important role in the development of modern medicine. In the biomedical domain, information extraction (IE) is focused mainly on automatically identifying entities and their relationships from biomedical literatures as an aspect of natural language processing (NLP). Traditionally, detecting biomedical relationships between entities commonly involves adopting co-occurrence methods, which are based on the assumption that if two entities appear in the same sentence, paragraph, or abstract, these entities would be relevant to each other and helpful for biomedical knowledge discovery such as gene–gene interaction and gene–drug association. However, co-occurrence methods have posed the problem of generating many false positive relations, since they do not consider contextual information in a specific text [1].

In addition to simple co-occurrence-based approaches to measuring the relationship between entities, rule-based methods using syntactic patterns [2–5] and machine learning methods [6, 7] have been proposed in order to tackle this false positive issue. Measures of semantic similarity and relatedness have been developed to identify ontological relationships between two entities, such as WordNet [8] and UMLS (Unified Medical Language System) [9]. Recently, models of semantic word representations, or word embeddings, have been developed constructing semantic spaces based on large-scale corpora. This line of research adopts deep learning approaches [10–16] such as Word2Vec [17] for automatically learning optimal feature representation. However, these studies focus only on learning word embeddings by maximizing raw-text probability, which does not perfectly capture both similarity and relatedness [18].

As indicated by previous studies [18–21], incorporating two or more knowledge sources (e.g. thesaurus, ontology, and corpus) into word embedding approaches can produce better results for ranking the results for relationships between two entities. The present paper was motivated by the concept of utilizing knowledge sources for enriching word embeddings. To our best knowledge, no attempt has previously been made to combine word embedding based on multiple knowledge resources with co-occurrence of entity pairs, while classifying the type of relation by reflecting contextual information in biomedical literature. Moreover, there is no previous study that considers both direct and indirect relationships of entity pairs when calculating co-occurrence of entity pairs.

Therefore, in this study, we propose a hybrid semantic relatedness algorithm for biological knowledge discovery. Our proposed method combines co-occurrence between entities with specialized word embeddings [18] to calculate the semantic similarity of two entities by capturing both similarity and relatedness for semantic words, learning from both a corpus and a thesaurus. In the proposed method, we also consider both direct and indirect scores for each entity pair so as to find a more complex relationship considering not only explicit but also hidden relationships. We select Alzheimer’s disease (AD) as a case study for analysis and evaluation. Alzheimer’s disease is a degenerative brain disorder, whose cause is hard to diagnose accurately. As the number of AD patients has increased, researchers have striven by means of medical experiments and literature analysis to understand the disease’s pathophysiology so as to improve its diagnosis and treatment. For entity extraction, we used two approaches, PKDE4J [22] and SemRep [23]. PKDE4J is an integrated system designed to extract entity and relation from unstructured biomedical text corpora, whereas SemRep, a UMLS-based entity and relation extraction application, can identify semantic relationships in biomedical literatures. To evaluate the performance of the proposed method, we compared it with several well-accepted techniques, namely co-occurrence, Word2Vec [17], COALS (Correlated Occurrence Analog to Lexical Semantics) [24], and random indexing (RI) [25]. In addition, to evaluate the usefulness of the proposed method for other types of knowledge discovery, we conducted the following analyses 1) pathways analysis on the Reactome Pathway database [26] and 2) gene–phenotype relationships analysis on OMIM (Online Mendelian Inheritance in Man) [27]. Overall, the proposed method is able to identify more related genes for pathways than the other methods by differentiating rankings for each gene. The proposed method also finds genes like APOE, which is strongly associated with familial early-onset AD and coronary heart disease [28], through analyses of AD-related genes and the gene–phenotype relationship.

Methods

The present study comprises four steps: data collection, entity relation extraction, semantic relatedness scoring calculation, and evaluation. For semantic relatedness scoring, we consider both direct and indirect connection; in terms of evaluation, we employ four kinds of analyses, namely algorithm comparison, AD related–gene analysis, pathway analysis, and gene–phenotype relation analysis. Figure 1 illustrates the overall design of this study. A detailed description of the proposed approach is provided in subsequent sections.

Fig. 1 — Overview of the proposed approach

Data collection

Using ‘Alzheimer disease’ or ‘Alzheimer’s disease’ as search terms, we retrieved 118,167 abstracts from PubMed, a search engine indexing more than 29 million citations for biomedical literature from MEDLINE. The exact query formulation is “Alzheimer disease [Title/Abstract] OR Alzheimer’s disease [Title/Abstract]”.

We did not limit publication by year, so as to get as much data as possible for our analysis. Figure 2 shows the distribution of the number of papers by publication year from 1990 to January 2019.

Fig. 2 — Number of papers by publication year

Entity relation extraction

For PKDE4J [22], the algorithm used for entity relation extraction can identify the verb located between the two entities in a sentence and capture relational characteristics. In order to decrease unnecessary indirect connections, we selected entity by type. Since we focus on Alzheimer’s disease, we limited the entity type to gene, drug, and disease. Thus, for entity extraction, we used the following dictionaries: drug dictionaries, the gene dictionary collects from UniProt [29], MeSH (Medical Subject Headings) for disease [30], KEGG (Kyoto Encyclopedia of Genes and Genomes) for genetics [31], and DrugBank for medications [32]. We used the same data collection as the input for SemRep. As output, we extracted 969,341 entity relations using PKDE4J and 630,054 entity relations using SemRep [23].

Semantic relatedness scoring calculation

We considered both direct and indirect scoring for each entity pair. For the direct score, after we extracted the relations of an entity pair, we looked at the same entity pairs with different relation types appearing in one abstract. An example is shown below: the first column is the PMID (PubMed unique identifier), the second column is sentence location in that abstract, and the last column is entity relations:

19,395,124 | 8 | MCI | DISEASE | depression | DISEASE | CO-OCCUR |.

19,395,124 | 17 | MCI | DISEASE | depression | DISEASE | RESULT_OF |.

Next, we considered only the co-occurrence frequency of entity pairs. There are two different kinds of direct relations: 1) co-occurrence of an entity pair in one abstract with frequency greater than one as noted as ‘sum_same’ in Tables 1 and 2) one-time co-occurrence of an entity pair in one abstract as noted as ‘sum_different’ in Table 1. If an entity pair only co-occurs once in an abstract, the co-occurrence number is the same as the number of abstracts. Biomedical literatures, like any other literatures, have skewed distribution. In other words, much research tends to follow popular diseases, drugs, and genes. Due to this tendency, it is hard to identify a new relation by the co-occurrence method. Thus, we aim to find less visible information from biological texts. If two-entity pairs co-occur in several abstracts, it indicates these relations are more popular and we can infer they are well-known entity pairs. We give them a low weight, while assigning entity pairs found in the same abstract a higher weight. Table 1 represents pseudocode for our algorithm.

Table 1.

Pseudocode for our algorithm.

Entity A	Co-occurrence (A, B)	relatedness (A, B)	Middle word B	Co-occurrence (B, C)	Relatedness (B, C)	Entity C	Score X
Alzheimer’s disease	1750	0.434575	PSEN1	1562	0.712967	APP	1874.1596
Alzheimer’s disease	692	0.398862	BACE1	1294	0.774334	APP	1278.0003
Alzheimer’s disease	3470	0.546675	amyloid beta	652	0.706621	APP	2357.6781
Alzheimer’s disease	471	0.449012	PSEN2	648	0.703564	APP	667.3944
Alzheimer’s disease	5107	0.464037	tau	526	0.628522	APP	2700.4406

No	Entity A	Entity C	Proposed	Co-occurrence	Word2Vec	COALS	Random indexing
1	Alzheimer’s disease	TAU	1	0.6181	0.607	0.6424	0.6188
2	Alzheimer’s disease	MCI	0.99	1	0.7571	0.1408	0.084
3	Alzheimer’s disease	Memory	0.9873	0.5843	0.6139	0.6618	0.6395
4	Alzheimer’s disease	Parkinson’s disease	0.935	0.5004	0.8704	1	1
5	Alzheimer’s disease	CSF	0.9072	0.4738	0.5717	0.1133	0.0547
6	Alzheimer’s disease	APP	0.9062	0.6204	0.5685	0.3317	0.2876
7	Alzheimer’s disease	APOE	0.8879	0.4328	0.606	0.1214	0.0633
8	Alzheimer’s disease	Neurodegenerative diseases	0.8689	0.4348	0.7678	0.11	0.0512
9	Alzheimer’s disease	Impairment	0.8035	0.1258	0.7224	0.9951	0.9948
10	Alzheimer’s disease	Amyloid beta	0.8024	0.4199	0.615	0.0777	0.0661
11	Alzheimer’s disease	Cognitive impairment	0.8002	0.1237	0.7617	0.1019	0.0426
12	Alzheimer’s disease	Neurodegeneration	0.7984	0.1464	0.7375	0.233	0.1823
13	Alzheimer’s disease	Neurodegenerative disorders	0.7863	0.2935	0.767	0.1521	0.0961
14	Alzheimer’s disease	Depression	0.7827	0.241	0.6844	0.4013	0.3617
15	Alzheimer’s disease	Oxidative stress	0.782	0.2512	0.6038	0.1084	0.0495
16	Alzheimer’s disease	Hippocampus	0.7794	0.0856	0.6091	0.1553	0.0995
17	Alzheimer’s disease	Vascular dementia	0.7683	0.3273	0.7796	0.6845	0.6636
18	Alzheimer’s disease	Patients	0.7589	0.016	0.9726	0.1235	0.0661
19	Alzheimer’s disease	Neurofibrillary tangles	0.7448	0.3975	0.6191	0.1553	0.0995
20	Alzheimer’s disease	MRI	0.7405	0.1315	0.5843	0.1472	0.0909

Entity A	Proposed
Alzheimer’s disease	[1] TAU [2] MCI [3] Memory [4] Parkinson’s disease [5] CSF [6] APP [7] APOE [8] Neurodegenerative diseases [9] Impairment [10] Amyloid beta [11] Cognitive impairment [12] Neurodegeneration [13] Neurodegenerative disorders [14] Depression [15] Oxidative stress [16] Hippocampus [17] Vascular dementia [18] Patients [19] Neurofibrillary tangles [20] MRI
Entity A	Co-occurrence
Alzheimer’s disease	[1] MCI [2] APP [3] TAU [4] Memory [5] Parkinson’s disease [6] CSF [7] Neurodegenerative diseases [8] APOE [9] Amyloid beta [10] Neurofibrillary tangles [11] Vascular dementia [12] Neurodegenerative disorders [13] Senile plaques [14] Oxidative stress [15] Neurodegenerative disorder [16] Depression [17] PD [18] PSEN1 [19] Amyloid plaques [20] Neurodegenerative disease
Entity A	Word2Vec
Alzheimer’s disease	[1] Asymptomatic Alzheimer’s disease [2] Alzheimer’s disease pathophysiology [3] Alzheimer’s disease neuropathology [4] Sporadic Alzheimer’s disease [5] Alzheimer’s disease patients [6] Early Alzheimer’s disease [7] Depression in Alzheimer’s disease [8] Late-onset Alzheimer’s disease [9] Incipient Alzheimer’s disease [10] Sporadic Alzheimer’s disease patients [11] Asymptomatic Alzheimer’s disease [12] Preclinical Alzheimer’s disease [13] Alzheimer’s disease dementia [14] Prodromal Alzheimer’s disease [15] Severe Alzheimer’s disease [16] Typical Alzheimer’s disease [17] Mild Alzheimer’s disease [18] Alzheimer’s disease with diabetes [19] Presenile Alzheimer’s disease [20] Familial Alzheimer’s disease
Entity A	COALS
Alzheimer’s disease	[1] Parkinson’s disease [2] Impairment [3] Vascular dementia [4] Memory [5]TAU [6] Neuronal [7] Increased [8] Dementias [9] Mild cognitive impairment [10] Huntington’s disease [11] Depression [12] Diabetes [13] Schizophrenia [14] Stroke [15] APP [16] Accumulation [17] Cancer [18] Down syndrome [19] Neurodegeneration [20] Caregivers
Entity A	Random indexing
Alzheimer’s disease	[1] Parkinson’s disease [2] Memory [3] TAU [4] Neuronal [5] Increased [6] Dementias [7] Mild cognitive impairment [8] Impairment [9] Vascular dementia [10] Biomarkers [11] Huntington’s disease [12] Depression [13] Diabetes [14] Schizophrenia [15] Stroke [16] APP [17] Accumulation [18] Cancer [19] Downs syndrome [20] Neurodegeneration

No	Entity A	Entity C	Ranking score	Co-occurrence	Word2Vec	COALS	Random indexing
1	Alzheimer’s disease	Patients	1	1	0.5444	0.0413	0.0206
2	Alzheimer’s disease	Disease	0.9065	0.1167	0.98	0.8615	0.8585
3	Alzheimer’s disease	Brain	0.5989	0.023	0.6085	0.1052	0.0859
4	Alzheimer’s disease	Dementia	0.5902	0.1187	0.733	0.6591	0.6518
5	Alzheimer’s disease	Impaired cognition	0.5013	0.0637	0.6737	0.0413	0.0206
6	Alzheimer’s disease	Therapeutic procedure	0.4828	0.0512	0.6606	0.0413	0.0206
7	Alzheimer’s disease	Neurodegenerative disorders	0.4371	0.0444	0.7707	0.0413	0.0206
8	Alzheimer’s disease	Persons	0.408	0.0879	0.6314	0.0413	0.0206
9	Alzheimer’s disease	Amyloid	0.4018	0.0357	0.6118	0.1159	0.0968
10	Alzheimer’s disease	Pharmaceutical preparations	0.394	0.03	0.7011	0.0413	0.0206
11	Alzheimer’s disease	APP gene	0.3845	0.0098	0.5346	0.0413	0.0206
12	Alzheimer’s disease	Amyloid beta-protein precursor	0.3751	0.0211	0.3492	0.0413	0.0206
13	Alzheimer’s disease	Functional disorder	0.3746	0.0354	0.7702	0.0413	0.0206
14	Alzheimer’s disease	Apolipoprotein E	0.3707	0.0292	0.5656	0.052	0.0315
15	Alzheimer’s disease	Parkinson’s disease	0.3702	0.0052	0.9802	0.0413	0.0206
16	Alzheimer’s disease	Population group	0.3692	0.0464	0.629	0.0413	0.0206
17	Alzheimer’s disease	Pathogenesis	0.3685	0.0405	0.7519	0.0413	0.0206
18	Alzheimer’s disease	Dementia, vascular	0.3552	0.0095	0.7878	0.6494	0.5067
19	Alzheimer’s disease	Nerve Degeneration	0.3546	0.0282	0.712	0.0413	0.0206
20	Alzheimer’s disease	Entire hippocampus	0.3541	0.0038	0.5782	0.0413	0.0206

Entity A	Proposed
Alzheimer’s disease	[1] Patients [2] Disease [3] Brain [4] Dementia [5] Impaired cognition [6] Therapeutic procedure [7] Neurodegenerative disorders [8] Persons [9] Amyloid [10] Pharmaceutical preparations [11] APP gene [12] Amyloid beta-protein precursor [13] Functional disorder [14] Apolipoprotein E [15] Parkinson’s disease [16] Population group [17] Pathogenesis [18] Dementia, vascular [19] Nerve degeneration [20] Entire hippocampus
Entity A	Co-occurrence
Alzheimer’s disease	[1] Patients [2] Dementia [3] Disease [4] Persons [5] Individual [6] Impaired cognition [7] Therapeutic procedure [8] Population group [9] Neurodegenerative disorders [10] Elderly [11] Pathogenesis [12] Amyloid [13] Senile plaques [14] Functional disorder [15] Pharmaceutical preparations [16] Apolipoprotein E [17] Nerve degeneration [18] Brain [19] Participant [20] Woman
Entity A	Word2Vec
Alzheimer’s disease	[1] Tangier disease [2] Lyme disease [3] Binswanger disease [4] Parkinson’s disease [5] Disease [6] Huntington’s disease [7] Autosomal recessive juvenile Parkinson’s disease [8] Disease progression [9] Alzheimer’s disease, late onset [10] Alzheimer’s disease, early onset [11] Familial Alzheimer’s disease [12] Progressive disease [13] Alzheimer’s disease assessment scale [14] Genetic predisposition to disease [15] Chronic disease [16] Pick disease of the brain [17] Disease model [18] Chronic obstructive airway disease [19] Psychic disease [20] Motor neurone disease
Entity A	COALS
Alzheimer’s disease	[1] Response [2] Patients [3] Impaired cognition [4] Therapeutic procedure [5] Neurodegenerative disorders [6] Persons [7] Pharmaceutical preparations [8] APP gene [9] Amyloid beta-protein precursor [10] Functional disorder [11] Parkinson’s disease [12] Population group [13] Pathogenesis [14] Nerve degeneration [15] Entire hippocampus [16] Memory impairment [17] Senile plaques [18] Proteins [19] Nervous system disorder [20] Genes
Entity A	Random indexing
Alzheimer’s disease	[1] Response [2] Inhibitors [3] Cohort [4] Tomography [5] Disease [6] Donepezil [7] Neuroimaging [8] Receptor [9] Dementia [10] Dementia, vascular [11] Presenilin [12] Network [13] Presenilin-1 [14] Follow-up [15] Sex [16] DNA [17] Rivastigmine [18] Investigation [19] Density [20] Hyperphosphorylation

Pair rank	Co-occurrence	COALS	Random indexing	Word2Vec	Proposed
1–10	1	0	0	0	1
11–100	2	2	2	0	1
101–500	4	1	1	0	5
501–2000	6	6	6	1	5
2000–3999	2	5	5	6	4
4000–5999	3	2	2	7	2
6000–9000	1	1	1	5	1

Method name	Precision*10²	Recall	F-Measure
Proposed	68,97%	63.16%	65.94%
Word2Vec	5.74%	5.26%	5.30%
Co-occurrence	68,97%	63.16%	65.94%
COALS	51.72%	47.37%	49.45%
RI	45.98%	42.10%	43.95%

Proposed
Pathway name	Entities				Reactions
Pathway name	found	ratio	p-value	FDR*	found	ratio
Nuclear signaling by ERBB4	3 / 35	0.002	5.37e-05	0.012	3 / 22	0.002
Signaling by interleukins	7 / 640	0.046	2.60e-04	0.019	6 / 491	0.041
MECP2 regulates transcription of neuronal ligands	3 / 61	0.004	2.74e-04	0.019	3 / 37	0.003
Signaling by receptor tyrosine kinases	2 / 13	9.25e-04	3.41e-04	0.019	2 / 8	6.68e-04
NRIF signals cell death from the nucleus	6 / 521	0.037	5.89e-04	0.023	71 / 633	0.053
Co-occurrence
MECP2 regulates transcription of neuronal ligands	2 / 13	9.25e-04	3.18e-04	0.036	2 / 8	6.68e-04
RUNX1 and FOXP3 control the development of regulatory T lymphocytes (Tregs)	2 / 17	0.001	5.41e-04	0.036	2 / 20	0.002
NRIF signals cell death from the nucleus	2 / 18	0.001	6.06e-04	0.036	4 / 7	5.84e-04
Amyloid fiber formation	3 / 88	0.006	7.15e-04	0.036	16 / 33	0.003
Neurodegenerative diseases	2 / 30	0.002	0.002	0.056	2 / 22	0.002
COALS
Plasma lipoprotein assembly	3 / 30	0.002	4.18e-05	0.01	8 / 19	0.002
MECP2 regulates transcription of neuronal ligands	2 / 13	9.25e-04	3.91e-04	0.042	2 / 8	6.68e-04
HDL assembly	2 / 18	0.001	7.44e-04	0.042	7 / 9	7.51e-04
NRIF signals cell death from the nucleus	2 / 18	0.001	7.44e-04	0.042	4 / 7	5.84e-04
Amyloid fiber formation	3 / 88	0.006	9.67e-04	0.044	16 / 33	0.003
Word2Vec
Transfer of LPS from LBP carrier to CD14	1 / 3	2.13e-04	0.005	0.075	2 / 2	1.67e-04
NTF3 activates NTRK2 (TRKB) signaling	1 / 4	2.85e-04	0.006	0.075	3 / 3	2.50e-04
NTF4 activates NTRK2 (TRKB) signaling	1 / 4	2.85e-04	0.006	0.075	3 / 3	2.50e-04
BDNF activates NTRK2 (TRKB) signaling	1 / 4	2.85e-04	0.006	0.075	3 / 3	2.50e-04
Defective GSS causes glutathione synthetase deficiency (GSS deficiency)	1 / 4	2.85e-04	0.006	0.075	1 / 1	8.34e-05

Extraction		Ranking Numbers
	Gene	Proposed	Co-occurrence	Word2Vec	COALS	Random Indexing	Pathway
PKDE4J	*TNF	361	742	4876	548	851	57
	*EGFR	1468	2063	5550	2391	3381	43
	*GSK3B	990	1374	2341	2119	2349	39
	*CREB1	6298	8240	6792	6468	7460	39
	IL1B	2401	1292	3451	3043	3733	37
	*SRC	3854	8230	3984	4257	5257	36
	*ATF4	2890	6965	4204	3430	4538	32
	MYC	6531	4909	4957	6688	6578	32
	*IGF1	446	463	3410	1940	2040	30
	IGF1R	1868	1397	3697	519	1119	29
SemRep	MAP 2 K1	3466	3462	3433	1.5	1.2	80
	PRKACB	3952	2131	3444	1.5	1.2	67
	*MAPK8	253	1004	3410	1.5	1.2	59
	*TNF	731	3564	2460	1.5	1.2	57
	JUN	2428	1804	2799	1.5	1.2	49
	*TP53	399	724	2492	1.5	1.2	48
	*IL6	925	1012	2816	1.5	1.2	43
	EGFR	3201	3032	2497	1.5	1.2	43
	*BAX	2099	3643	2780	1.5	1.2	41
	IL1B	2264	1910	2246	1.5	1.2	37

Entity A	Co-occurrences of (A, B)	Relatedness	Intermediate entity B	Co-occurrences of (B, C)	Relatedness	Entity C
Alzheimers disease	1128	0.65167	Diabetes	14	0.75549	Coronary artery disease
Alzheimers disease	409	0.61262	Hypertension	13	0.78521	Coronary artery disease
Alzheimers disease	687	0.46660	Cholesterol	10	0.54389	Coronary artery disease
Alzheimers disease	1789	0.49688	APOE	9	0.55493	Coronary artery disease
Alzheimers disease	452	0.68793	Atherosclerosis	7	0.81259	Coronary artery disease
Alzheimers disease	437	0.65669	Type 2 diabetes	7	0.74681	Coronary artery disease
Alzheimers disease	1166	0.73817	Schizophrenia	6	0.66643	Coronary artery disease
Alzheimers disease	408	0.66002	Diabetes mellitus	6	0.75417	Coronary artery disease
Alzheimers disease	28	0.64372	Atrial fibrillation	6	0.70951	Coronary artery disease
Alzheimers disease	147	0.69426	Bipolar disorder	4	0.66286	Coronary artery disease
Alzheimers disease	49	0.66373	Heart failure	4	0.78867	Coronary artery disease
Alzheimers disease	27	0.52386	PON1	4	0.59528	Coronary artery disease
Alzheimers disease	4135	0.97112	Parkinson’s disease	3	0.84996	Coronary artery disease
Alzheimers disease	1992	0.58868	Depression	3	0.55049	Coronary artery disease
Alzheimers disease	357	0.63473	Obesity	3	0.72262	Coronary artery disease
Alzheimers disease	221	0.50534	APOE4	3	0.54360	Coronary artery disease
Alzheimers disease	165	0.73497	Osteoporosis	3	0.75669	Coronary artery disease
Alzheimers disease	58	0.64033	Genome-wide association study	3	0.65748	Coronary artery disease
Alzheimers disease	4414	0.66884	Mild cognitive impairment	2	0.61373	Coronary artery disease

Entity A	Entity C (Gene Only)	Phenotype	Phenotype-Related Gene	Common Phenotype Genes
Alzheimers disease	CD36	Platelet glycoprotein IV deficiency	CD36	17
		Macrothrombocytopenia	–
		Coronary heart disease	CD36
		Malaria, cerebral	ACKR1, FCGR2A, FCGR2B, FCGR2A, FCGR2B, CR1, GYPC, CISH, GYPB, GYPA, TNF, HBB, TIRAP, NOS2A, SLC4A1, ICAM1,G6PD, CD36
Alzheimers disease	IL10	Graft-versus-host disease	IL10	13
		HIV-1	CXCR1, CX3CR1, TLR3, HLAC, CXCL12, IFNG, IL4R, CCL3L1, CCL2, CCL11, CCL3, CD209, KIR3DL1, IL10
		Rheumatoid arthritis	–
Alzheimers disease	ABCA1	HDL deficiency	APOA1, ABCA1	2
		Tangier disease	ABCA1
		Coronary artery disease, familial	LDLR, ABCA1

PERMALINK

Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease

Go Eun Heo

Qing Xie

Min Song

Jeong-Hoon Lee

Conference

Abstract

Background

Methods

Results

Conclusion

Background

Methods

Fig. 1.

Data collection

Fig. 2.

Entity relation extraction

Semantic relatedness scoring calculation

Table 1.

Table 2.

Table 3.

Results and discussion

Top 20 entity pairs analysis

Table 4.

Table 5.

Table 6.

Table 7.

Alzheimer’s disease-related gene analysis

Fig. 3.

Table 8.

Table 9.

Pathway analysis

Fig. 4.

Table 10.

Table 11.

Table 12.

Gene–phenotype relationship analysis

Table 13.

Table 14.

Table 15.

Conclusion

Acknowledgements

About this supplement

Abbreviations

Authors’ contributions

Funding

Availability of data and materials

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases