Skip to main content
Health Information Science and Systems logoLink to Health Information Science and Systems
. 2018 Nov 28;7(1):1. doi: 10.1007/s13755-018-0062-0

Neural networks for mining the associations between diseases and symptoms in clinical notes

Setu Shah 1, Xiao Luo 1,, Saravanan Kanakasabai 2, Ricardo Tuason 2, Gregory Klopper 2
PMCID: PMC6261925  PMID: 30588291

Abstract

There are challenges for analyzing the narrative clinical notes in Electronic Health Records (EHRs) because of their unstructured nature. Mining the associations between the clinical concepts within the clinical notes can support physicians in making decisions, and provide researchers evidence about disease development and treatment. In this paper, in order to model and analyze disease and symptom relationships in the clinical notes, we present a concept association mining framework that is based on word embedding learned through neural networks. The approach is tested using 154,738 clinical notes from 500 patients, which are extracted from the Indiana University Health’s Electronic Health Records system. All patients are diagnosed with more than one type of disease. The results show that this concept association mining framework can identify related diseases and symptoms. We also propose a method to visualize a patients’ diseases and related symptoms in chronological order. This visualization can provide physicians an overview of the medical history of a patient and support decision making. The presented approach can also be expanded to analyze the associations of other clinical concepts, such as social history, family history, medications, etc.

Keywords: Neural networks, Natural language processing, Concept association mining, Clinical notes, Electronic health records

Introduction

Electronic Health Record systems have recently been adopted by many countries [1]. In United States, there are three stages of meaningful use of the EHRs. Stage III is to improve population health outcomes, improve clinical outcomes, and gain more robust research data on health systems. Figure 1 shows the patient-centralized functional modules in a typical EHR system. Many of these modules, such as medication and diagnosis, contain structured data which comprises of defined data types and are often ready to use for data mining applications. The encounter notes (also called clinical notes) is a major component of the EHR which includes large amounts of unstructured data. These unstructured data are mostly text written or dictated by physicians or nurses. Arguably, it is an important part of the patient’s medical records. Sondhi et al. [2] demonstrated the importance of mining the clinical notes by detecting the symptoms of Congestive Heart failure (CHF). Previous studies [3, 4] also demonstrated the need for extracting clinical signs and symptoms from patient medical records and further analyzing their associations with specific diseases.

Fig. 1.

Fig. 1

Major functional modules in a typical EHR system

In this paper, we study the following questions: How to extract the relationships between symptoms and diseases from real-world clinical notes? How to track the diseases and symptoms of a patient over time? To the best of our knowledge, no previous work has systematically studied these questions.

To this end, we present a concept association mining model that is based on neural networks to reveal the disease and symptom relationships buried in the clinical notes. The neural network implemented in this research is based on the Skip-Gram Model of Word2Vec [5]. The input to the neural network are the clinical notes containing the extracted disease and symptom concepts. After the training process, it generated distributed vector representations of the disease and symptom concepts. The similarities between the concepts can be calculated through distance measures. This concept association mining model can support many different applications. An example of the application enabled by the concept association model is: tracking the development of diseases along with the related symptoms over time.

We evaluate the proposed concept association mining model on a data cohort extracted from Indiana University Health’s Electronic Health Record system. This data cohort contains 500 patients’ records, and 154738 clinical notes in total. An example of the clinical notes is shown in Fig. 2. It consists of sentences and unstructured text which describe details of a patient’s clinical status or improvements in clinical conditions during the course of a hospitalization or over the course of outpatient care.

Fig. 2.

Fig. 2

Example of clinical notes in the EHR

The following sections are organized as following: the related work in the area is presented in Sect. 2; the methodology of the study is presented in Sect. 3; Sect. 4 contains the information of the data and study cohort; the results of the experiment are presented in Sect. 5; Sect. 6 presents the conclusion and potential future work.

Related work

Natural Language Processing (NLP) and text mining techniques have been used to understand the associations between medical terms. Most of the previous research relied on existing ontology such as MeSH or WordNet to identify the associations between different medical terms, such as different diseases [68]. However, the ontologies that present the entity and attribute relationships in a hierarchy without emphasizing the co-occurrences of the terms based on the content of the text. On the other hand, the ontologies often do not include the different representations of the same medical term. For example, ‘Type II Diabetes’ and ‘DM2’ both represent ‘Type 2 Diabetes Mellitus’. Different representations for same terminology happen quite often in the clinical notes within the Electronic Health Records (EHR), because physicians have their own preferences of recording notes.

In recent years, the distributed representation of word also called word embedding has gained a lot of interest in the research areas of text mining, natural language processing and health informatics [5, 9, 10]. Word embedding emphasizes the co-occurrences of the words based on the content of a given text document collection. There are different ways to generate the word embedding, which include probabilistic models [11], dimensionality reduction on the word co-occurrence matrix [12] and so on. Neural network is a new technique to generate the word embedding. It has been recently studied for biomedical text classification and clustering [10, 13], where word is the basic unit for the text documents and word embedding is learned through neural networks. However, in the biomedical domain, clinical or medical concepts often contain more than one word. Especially, it is very hard to describe the symptoms using one word. Hence, it is necessary to analyze the associations between disease and symptom concepts based on their representations as more than one word. To the best of the authors’ knowledge, there is no existing research on generating clinical concept embedding through the use of neural networks in the area of Natural Language Processing (NLP).

Methods

In this study, we first extract the terms of diseases and symptoms from the clinical notes, then concept embedding is generated through training the neural networks. Finally, concept association mining is based on calculating cosine distances between the diseases and the symptoms using concept embedding.

Disease and symptom concepts extraction

To identify the disease and symptom terms from a given clinical note, we use UMLS MetaMap [14] which is a natural language processing tool that uses various sources such as UMLS Metathesaurus [15] to categorize the phrases or terms in the text to different semantic types. In this research, the terms or phrases that are mapped to ‘Disease or Syndrome’ or ‘Neoplastic Process’ are extracted as concepts of diseases. Whereas, those categorized as ‘Sign or Symptom’ are extracted as concepts of symptoms. Figure 3 provides an example of categorizing the terms into semantic types using UMLS MetaMap. In this case, ‘fever’ and ‘cough’ are categorized and extracted as symptoms, and ‘Community-Acquired Pneumonia’ is categorized and extracted as disease. It is worth mentioning that UMLS MetaMap might not be able to recognize all of the diseases, especially some that are written as abbreviations.

Fig. 3.

Fig. 3

Semantic mapping using MetaMap

Generating disease and symptom concept embedding

The word embedding based on neural networks is firstly proposed by Mikolov et al. [5] in 2013. It has drawn attention in different research areas, such as natural language processing, text mining and health informatics [10, 16]. Two models - Continuous Bag of words (CBOW) and Skip-Gram—have been introduced. Based on our previous research [17], Skip-Gram model works better than CBOW model to generate concept embedding. In this study, we employ the Skip-Gram model. The neural network architecture of the Skip-Gram is provided as Fig. 4, which is a standard three-layer neural network. The input to the neural network are words that are represented as vectors. The length of the vector (N) is the number of words in a text document collection. In this case, N is the total number of words and concepts in all clinical notes. The element in the vector corresponding to the input word is set to 1, the remaining elements are set to 0. The output vectors are those to be evaluated against the defined context words (target outputs) of an input word. The context words are the words that occur within a specific window of the input word in a sentence.

Fig. 4.

Fig. 4

Skip-Gram model

The steps to train the neural network after weight initialization include: calculating the hidden activation, generating outputs, calculating the softmax of the outputs, calculating the sum errors and updating neuron weights. These steps repeat until changes of the weights of the neurons are minimum or it reaches the specified maximum iterations.

  • Multiply an input (x) by the input-hidden weights (A) as hidden activation: h=xTA

  • Multiply the hidden input in the hidden activation by hidden-output weights (B) as outputs: y=hTB

  • Calculate the softmax of the outputs:

    zj=eyj1Nei,j=1,,N

  • Calculate the element-wise sum errors by using the differences between the softmax of the outputs and the target outputs (o):

    E=1C(z-o)

    C is the number of the target outputs.

  • Update neuron weights using E.

After the training process, the weights between the input and the hidden layer are taken as the word embedding. The word embedding preserve the distances between words so that the words that have semantic and syntactic associations in the raw clinical notes are close to one another.

In this study, we generate word and concept embedding by training the neural network using all clinical notes in the study cohort. The words and concepts of diseases and symptoms are input into the neural network to generate the concept embedding for diseases and symptoms. Concepts are treated as individual units as words. For example, the concept ‘lung cancer’ is treated as an input unit in the Skip-Gram model. The vectors for concepts are calculated as a part of the training process, by considering each concept as an individual unit. To generate the word and concept embedding, we specify the number of context words and concepts as 15 and the number of hidden neurons (dimensions) as 300.

Concept association mining

Since each concept is represented by a vector ConceptV, the association scores between the concepts can be calculated through a distance measure. In this study, the cosine distance (Eq. 2) is used to calculate the association scores. Given m concepts (where m is the sum of symptom and disease concepts), the association scores can be stored in an association matrix S where each entry si,j represents the association score between concept ConceptVi and ConceptVj.

S=(si,j)Rm×m 1
si,j=ConceptVi·ConceptVj||ConceptVi||2||ConceptVj||2 2

Since diseases and symptoms concepts are all extracted from the clinical notes, three concept association matrices can be generated to capture the disease associations, symptom associations and disease and symptom associations. Table 1 demonstrates the 3 most relevant diseases of given diseases based on their association scores, whereas Table 2 demonstrates the 3 most relevant symptoms of given symptoms based on their association scores.

Table 1.

Examples of diseases and associated diseases based on the association scores

Disease Associated diseases Association score
Chronic obstructive pulmonary disease (COPD) Severe chronic obstructive pulmonary disease 0.518
Pulmonary disease obstructive 0.496
Chronic obstructive lung disease 0.471
Diabetes mellitus type 2 Diabetes type 2 0.629
Diabetes mellitus type ii 0.582
Diabetes type 2 on insulin 0.570
Breast cancer Breast ductal carcinoma 0.642
Breast cancer female 0.626
Invasive ductal carcinoma breast 0.622
Coronary artery disease (CAD) Coronary disease 0.662
Peripheral arterial disease 0.538
Coronary artery disease with myocardial infarction 0.534

Table 2.

Examples of symptoms and associated symptoms based on the association scores

Symptom Associated symptoms Association score
Vertigo Dizziness 0.461
Lightheadedness 0.415
Headaches 0.370
Chronic pain Chronic back pain 0.627
Back abdominal pain 0.517
Intractable pain 0.487
Swollen legs Cramps in legs 0.312
Swelling of legs 0.209
Swollen feet 0.302
Breast pain Groin pain 0.635
Rib pain 0.627
Flank pain 0.604

k-means clustering

To group disease and symptom concepts together, we use the k-means clustering algorithm. The k-means clustering algorithm [18] is straight forward to implement and can be applied to large and high dimensional data sets. It has been successfully used in various application domains, such as text mining, computer vision and so on [6, 8]. k-means clustering algorithms try to assign the data in the data set to one of the predefined number of clusters. The aim is to minimize the sum of distances of each point within the cluster to the cluster center. Given x=x1,x2,,xi is a set of d-dimensional sample data (input vector), and C=C1,C2,,Ck is a set of initialized k centers with d-dimensions, the algorithm is summarized as follows:

  • Assignment of cluster centers: Assign each data point xi to the cluster Cj whose Euclidean distance from the cluster center is minimum of all the cluster centers.
    Cj={xi:||xi-Cj||||xi-Ci||,i,1ik} 3
  • Update cluster centers: Set the new center of each cluster to the mean of all data points belongs to that cluster.
    μi=1|C|jCixj,i 4
  • Repeat the previous two steps until convergence.

k-means is applied to the matrix S from Eq. 1. We use k = 50 based on previous experiments performed as a part of [17], and by looking at the internal clustering evaluation metrics. Different k values usually produce various results with larger clusters breaking out into smaller clusters with increase in the value of k.

Study cohort

For this study, Institutional Review Board (IRB) approval has been gained to extract Electronic Health Records from the Indiana University Health’s (IU Health) Electronic Health Record (EHR) system. 500 adult patients’ medical records for the years of 2003–2017 are included in this study. Some patients have over 10 years of medical history. After analysis, we find that all 500 patients had more than one diagnosis and more than one clinical note. Table 3 lists the most frequent diagnoses as extracted from the diagnosis module of the EHR system. For this cohort, some patients have hypertension and/or hyperlipidemia. It is also noticed that ‘cough’ and ‘shortness of breath’ are both found in the diagnosis module with associated ICD codes. However, UMLS MetaMap [14] maps them to ‘Sign or Symptom’. So, those terms in the clinical notes are treated as symptoms in this study.

Table 3.

Top 10 frequent diagnosis of the study cohort

Diagnosis in EHR chart Patient count
Essential (primary) hypertension 298
Hyperlipidemia unspecified 239
Unspecified essential hypertension 209
Atherosclerotic heart disease of native coronary artery without angina pectoris 201
Other and unspecified hyperlipidemia 186
Heart failure unspecified 172
Type 2 diabetes mellitus without complications 154
Cough 153
Shortness of breath 153
Congestive heart failure unspecified 137

There are 154,738 clinical notes in total, which include both inpatient and outpatient clinic notes. Figure 5 shows the distribution of the clinical notes for each patient over the period of 15 years of medical history. Majority of the patients have less than 305 clinical notes, while some patients have more than 805 clinical notes.

Fig. 5.

Fig. 5

Number of clinical notes per patient

Experimental results and discussion

Disease and symptom concept embedding evaluation

To evaluate the generated concept embedding for diseases and symptoms that were extracted from the clinical notes, instead of employing annotators to evaluate each pair of diseases and symptoms, we use k-means clustering algorithm to group the diseases and symptoms, and evaluate the groups by investigating the closest concepts within the groups. The k-means clustering algorithm [18] has been successfully used in various application domains, such as text mining, computer vision and so on [6, 8]. The aim of k-means algorithm here is to minimize the sum of distances of each concept within a cluster to the cluster center. We chose k = 50 and generated 50 clusters of diseases and symptoms respectively. We manually investigate the diseases concepts within those clusters, and find that some of the clusters contained diseases that are highly related or the same disease at different stages, such as different stages of chronic kidney disease, or different representations of the same disease. Table 4 shows some of the diseases in the corresponding clusters among the 50 disease clusters. Cluster 5 contains diseases relate to type 2 diabetes, although hypertension is also included. We find that it is because some patients in the study cohort have both type 2 diabetes and hypertension. Cluster 6 contains diseases related to heart diseases. It covers various heart diseases that are related. The literature shows that Ischemic cardiomyopathy, Atrial fibrillation and Aortic stenosis are causes or conditions associated with Dilated cardiomyopathy [19]. Cluster 14 is about different heart failures. However, hypoxemic respiratory failure is also included in this cluster. After investigating the clinical notes of the patients, it is found that the hypoxemic respiratory failure co-occurred with heart failure in some patients’ clinical notes. Cluster 15 contains only one disease, erythema, which means this disease did not co-occur with other diseases in this study cohort.

Table 4.

Clusters of diseases

Cluster 5 Cluster 6 Cluster 4 Cluster 15
Diabetes type 2 Cardiomyopathy Congestive heart failure Erythema
Diabetes type II Stroke Failure heart
Diabetes mellitus type II Ischemic cardiomyopathy Chronic systolic heart failure
Type II diabetes Nonischemic dilated cardiomyopathy Diastolic heart failure
Hyperglycemia Chronic atrial fibrillation Acute heart failure
Diabetes type 2 on insulin Aortic stenosis Biventricular failure
Hypertension Sinus tachycardia Left ventricular failure
ESRD Atrial fibrillation Chronic heart failure
Nonischemic dilated cardiomyopathy Hypoxemic respiratory failure
Rapid atrial fibrillation Chronic diastolic heart failure

Table 5 shows some of the symptoms in the corresponding clusters among the 50 symptom clusters. Each cluster demonstrated here is associated with one category of symptom. For example, cluster 0 is about different types of edema, cluster 1 is about headache and dizziness, cluster 2 is for symptoms of the chest, and cluster 3 is about joints related symptoms. The cluster 4 contains a lot of single word symptoms. These single word symptoms were mostly related to movement disorders in one or more parts of the body.

Table 5.

Clusters of symptoms

Cluster 0 Cluster 1 Cluster 2 Cluster 3 Cluster 4
Pitting edema Headache Chest tightness Joint stiffness Seizure
Massive edema Dizziness Chest pain Joint swelling Spasm
Pedal edema Headaches Chest discomfort Knees stiffness Tremor
Hand edema Vertigo Chest pressure Costovertebral angle tenderness Tremors
Pitting edema Generalized headache Pain in chest Joint crepitus Dystonia
Postpartum hemorrhage Global headache Chronic chest pain Decreased grip strength Cramp
Extremity edema Chronic vertigo Acute chest pain Stiffness of wrist Ataxia
Bilateral pedal edema Headache throbbing Chest wall pain Painful joints Clonus
Penile edema Morning headache Chest pain angina Stiffness fingers Asterixis
Edema knees Intermittent dizziness Chest burn Facet arthropathy Shakes

Since there is no other NLP steps, such as stemming or stop words removal, involved in this study, the results demonstrate that neural network based word embedding can actually present a word and the plural of the word in the same cluster. For example, ‘headache’ and ‘headaches’ are in cluster 1, and ‘tremor’ and ‘tremors’ are in cluster 4.

Disease and symptom association evaluation

Section 3.3 describes how concept associations can be measured based on calculating the Cosine distance. Hence, in this research, we select five different diseases extracted from the clinical notes and use the proposed concept association mining to find the most related symptoms identified from the study cohort. Table 6 shows the most related symptoms by using a distance threshold of 0.30.

Table 6.

Most associated symptoms to the diseases in the study cohort

Diseases Symptoms Association score
Chronic obstructive pulmonary disease (COPD) Peptic ulcer symptoms 0.442
Chronic pain 0.392
Chronic cough 0.359
Chronic back pain 0.339
Chronic abdominal pain 0.338
Chronic chest pain 0.306
Gastroesophageal reflux disease symptoms 0.302
Alzheimer disease Sleep disorders 0.467
Groin tenderness 0.327
Breast cancer Breast pain 0.458
Breast discomfort 0.455
Breast tenderness 0.424
Coronary artery disease (CAD) Coronary chest pain 0.471
Peptic ulcer symptoms 0.426
Coronary symptoms 0.334
Diabetes mellitus type 2 Symptom nausea 0.344
Weakness of lower limb 0.310

In order to validate the associations between the diseases and symptoms, we have investigated the actual clinical notes to validate whether the diseases occur in the same clinical notes with the associated symptoms. We find that indeed the diseases co-occur with the symptoms in the clinical notes. However, often a few diseases as existing problems are mentioned together in the clinical notes with the symptoms. Without other interpretation, it is hard for the algorithm to determine which symptoms correspond to what diseases exactly. For example, if a clinical note describes that the patient has peptic ulcer symptoms and a history of chronic obstructive pulmonary disease (COPD) and coronary artery disease (CAD). It is hard to determine whether the peptic ulcer symptoms are more associated with COPD or CAD without being interpreted by a physician in the presence of other information. If patients in the study cohort have COPD and CAD with symptoms of peptic ulcer, the peptic ulcer symptoms are found to be associated with both diseases, as shown in Table 6. We also use literature to validate the diseases and symptoms associations identified through our proposed methods. For example, we have searched literature about type 2 diabetes and weakness of lower limb, and found that previous research in diabetes has shown a decrease in lower-limb muscle strength in diabetic patients [20, 21].

Visualizing the diseases and associated symptoms over time

Through the concept extraction and concept association mining, we have generated clusters of diseases and symptoms, and also identified the diseases and associated symptoms from the clinical notes. To support clinical decisions by efficiently making use of the clinical notes, we propose a two-dimensional visualization tool to visualize the development of diseases and associated symptoms over time. The x-axis of the visualization represents the time of the encounters, while the y-axis represents the cluster index of the disease(s). For example, if the clinical note mentions ‘ischemic cardiomyopathy’, it belongs to cluster 6 according to Table 4. So, 6 is the value for y axis. We select two patients to explore the visualization of diseases and symptoms extracted from the clinical notes over time as demonstrated in Figs. 6 and 7. These two patients have diagnoses of ‘ischemic cardiomyopathy’ and ‘breast cancer’ respectively in the diagnosis module of the EHR.

Fig. 6.

Fig. 6

Patient with diagnosed ischemic cardiomyopathy

Fig. 7.

Fig. 7

Patient with diagnosed breast cancer

Based on Fig. 6, it is visualized that the patient had ‘Melanoma’ mentioned in the clinical note only around 2009. However, there are no associated symptoms mentioned during that period. ‘Melanoma’ is not mentioned in the clinical notes after 2009. Starting from 2015, this patient has CAD and related symptoms, such as ‘chest pain’ and ‘cardiovascular symptoms’ mentioned in notes. Around 2016, the patient has been diagnosed with ‘ischemic cardiomyopathy’. Following that, this disease and ‘Chronic Systolic Heart Failure’ are both mentioned in the clinical notes along with related symptoms. From this visualization, the development of the diseases along with the symptoms can be clearly demonstrated.

Figure 7 shows the diseases and most related symptoms extracted from a patient who has been diagnosed with ‘breast cancer’ in 2005. It shows that ‘breast cancer’ and the related symptoms have been mentioned in the clinical notes periodically from 2005 till late 2013. Other than ‘breast cancer’, gastric diseases and the most related symptoms, such as ‘nausea’ are mentioned in the clinical notes periodically over the years from 2006 to 2013.

This disease and symptom visualization can help physicians’ review the medical problems of a patient. We envision that related medications and lab test results can also be added to the visualization, it can serve as a good decision support tool for physicians.

Conclusions and future work

In this study, we have explored and evaluated the concept association mining model to reveal the relationships between diseases and symptoms extracted from the clinical notes in the EHR. This concept association mining model is based on word and concept embedding learned through the neural networks. The word and concept embedding successfully captures the associations between words and concepts based on the co-occurrences of the words and concepts within the clinical notes. Our results show that the presented concept association mining model can identify the associations between diseases and symptoms. We have also proposed a temporal visualization tool to visualize the history of diseases along with the symptoms that are recorded in the narrative clinical notes. This visualization tool can provide physicians an overview of the medical history of a patient and support decision making.

There are several limitations with the proposed concept association mining model. First, the concept vector is based on the words and concepts co-occur in the study cohort, so if two concepts co-occur often, these two concepts might be close to each other, although they might represent two different concepts, such as ‘breast pain’ and ‘groin pain’ in Table 2. Second, the model can not accurately capture the associations between diseases and their related symptoms when the number of instances is very small, because the training samples are not enough to train the neural networks to identify patterns.

Future work includes working with physicians or clinical annotators to evaluate the effectiveness of the concept association mining model and visualization tool for decision support by including a large number of clinical notes from the EHR system. We would also like to expand this model to analyze the associations of other clinical concepts, such as social history, family history, medications and so on.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Setu Shah, Email: setshah@iupui.edu.

Xiao Luo, Email: luo25@iupui.edu.

Saravanan Kanakasabai, Email: skanakas@iuhealth.org.

Ricardo Tuason, Email: rtuanson@iuhealth.org.

Gregory Klopper, Email: gklopper@iuhealth.org.

References

  • 1.Meigs SL, Solomon M. Electronic health record use a bitter pill for many physicians. Perspect Health Inf Manag. 2016;13:1–17. [PMC free article] [PubMed] [Google Scholar]
  • 2.Sondhi P, Sun J, Tong H, Zhai C. Sympgraph: a framework formining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, p. 1167–1175. ACM; 2012.
  • 3.McKee PA, Castelli WP, McNamara PM, Kannel WB. The natural history of congestive heart failure: the Framingham study. N Engl J Med. 1971;285(26):1441–1446. doi: 10.1056/NEJM197112232852601. [DOI] [PubMed] [Google Scholar]
  • 4.Zhou X, Menche J, Barabási AL, Sharma A. Human symptoms-disease network. Nat Commun. 2014;5:4212. doi: 10.1038/ncomms5212. [DOI] [PubMed] [Google Scholar]
  • 5.Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the international conference on neural information processing systems, 2013; p. 3111–3119.
  • 6.Logeswari S, Premalatha K. Biomedical document clustering using ontology based concept weight. In: Proceedings of the International Conference on Computer Communication and Informatics; 2013. p. 1–4 10.1109/ICCCI.2013.6466273
  • 7.Yoo I, Hu X, Song IY. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. In: Proceedings of the first international workshop on text mining in bioinformatics, 2006. p. 84–89 [DOI] [PMC free article] [PubMed]
  • 8.Zhang X, Jing L, Hu X, Ng M, Zhou X. A comparative study of ontology based term similarity measure on pubmed document clustering. In: Proceedings of the international conference on database systems for advanced applications, 2007. p. 115–126
  • 9.Moen S, Ananiadou TSS. 2013. Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th international symposium on languages in biology and medicine, Tokyo, Japan, p. 39–43
  • 10.Tulkens S, Suster S, DaelemansW. Using distributed representations to disambiguate biomedical and clinical concepts. In: Proceedings of the 15th workshop on biomedical natural language processing, 2016.
  • 11.Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. J Mach Learn Res. 2007;8(Oct):2265–2295. [Google Scholar]
  • 12.Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst. 2014;27:2177–2185. [Google Scholar]
  • 13.Zhu Y, Yan E, Wang F. Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med Inform Decis Making. 2017;17:95–103. doi: 10.1186/s12911-017-0498-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.MetaMap—A Tool For Recognizing UMLS Concepts in Text. https://metamap.nlm.nih.gov/
  • 15.Fact Sheet—UMLS Metathesaurus. https://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html
  • 16.Kim HK, Kim H, Cho S. Bag-of-concepts Comprehending document representation through clustering words in distributed representation. Neurocomputing. 2017;266:336–352. doi: 10.1016/j.neucom.2017.05.046. [DOI] [Google Scholar]
  • 17.Shah S, Luo X. Comparison of deep learning based concept representations for biomedical document clustering. In: 2018 IEEE EMBS international conference on biomedical & health informatics (BHI), p. 349–352. IEEE; 2018
  • 18.Hartigan JA, Wong MA. Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28(1):100–108. [Google Scholar]
  • 19.Nallamothu BK, Baman TS. Dilated and restrictive cardiomyopathy. Inpatient Cardiovasc Med 2014, 178–186
  • 20.Cavanagh P, Derr J, Ulbrecht J, Maser R, Orchard T. Problems with gait and posture in neuropathic patients with insulin-dependent diabetes mellitus. Diabetic Med. 1992;9(5):469–474. doi: 10.1111/j.1464-5491.1992.tb01819.x. [DOI] [PubMed] [Google Scholar]
  • 21.Macgilchrist C, Paul L, Ellis B, Howe T, Kennon B, Godwin J. Lower-limb risk factors for falls in people with diabetes mellitus. Diabetic Med. 2010;27(2):162–168. doi: 10.1111/j.1464-5491.2009.02914.x. [DOI] [PubMed] [Google Scholar]

Articles from Health Information Science and Systems are provided here courtesy of Springer

RESOURCES