Predicting future falls in older people using natural language processing of general practitioners’ clinical notes

Noman Dormosh; Martijn C Schut; Martijn W Heymans; Otto Maarsingh; Jonathan Bouman; Nathalie van der Velde; Ameen Abu-Hanna

doi:10.1093/ageing/afad046

. 2023 Apr 1;52(4):afad046. doi: 10.1093/ageing/afad046

Predicting future falls in older people using natural language processing of general practitioners’ clinical notes

Noman Dormosh ^1,^2,^✉, Martijn C Schut ^3,^4,⁵, Martijn W Heymans ^6,⁷, Otto Maarsingh ^8,⁹, Jonathan Bouman ¹⁰, Nathalie van der Velde ^11,¹², Ameen Abu-Hanna ^13,¹⁴

PMCID: PMC10071555 PMID: 37014000

Abstract

Background

Falls in older people are common and morbid. Prediction models can help identifying individuals at higher fall risk. Electronic health records (EHR) offer an opportunity to develop automated prediction tools that may help to identify fall-prone individuals and lower clinical workload. However, existing models primarily utilise structured EHR data and neglect information in unstructured data. Using machine learning and natural language processing (NLP), we aimed to examine the predictive performance provided by unstructured clinical notes, and their incremental performance over structured data to predict falls.

Methods

We used primary care EHR data of people aged 65 or over. We developed three logistic regression models using the least absolute shrinkage and selection operator: one using structured clinical variables (Baseline), one with topics extracted from unstructured clinical notes (Topic-based) and one by adding clinical variables to the extracted topics (Combi). Model performance was assessed in terms of discrimination using the area under the receiver operating characteristic curve (AUC), and calibration by calibration plots. We used 10-fold cross-validation to validate the approach.

Results

Data of 35,357 individuals were analysed, of which 4,734 experienced falls. Our NLP topic modelling technique discovered 151 topics from the unstructured clinical notes. AUCs and 95% confidence intervals of the Baseline, Topic-based and Combi models were 0.709 (0.700–0.719), 0.685 (0.676–0.694) and 0.718 (0.708–0.727), respectively. All the models showed good calibration.

Conclusions

Unstructured clinical notes are an additional viable data source to develop and improve prediction models for falls compared to traditional prediction models, but the clinical relevance remains limited.

Keywords: accidental falls, fall prediction, natural language processing, electronic health records, free text, topic modelling, older people

Key points

Existing prediction models for falls mainly utilise clinical variables and neglect information in unstructured clinical notes.
We explored the predictive performance of the clinical notes and their incremental performance over the clinical variables.
The clinical notes provided an additional viable way for fall risk estimation compared to existing traditional prediction models.
The predictive performance can be improved when adding information extracted from the clinical notes to the clinical variables.
The models are based on routinely collected data, which can be implemented in EHR systems to decrease clinical workload.

Introduction

The vast majority of injuries in community-dwelling older people are caused by falls [1]. About one out of three older people falls each year, making falls a public health concern [2]. Older people who experienced falls often suffer from a decline in functional status, social and physical activities, and experience long-term reduced quality of life [3, 4]. Timely identifying older people at higher fall risk can prevent adverse outcomes and thus serves as a ground for successful falls prevention and management strategy [5].

Many prediction tools exist to estimate fall risk for community-dwelling older people, such as the algorithm provided by the guideline of the American & British Geriatrics Societies (AGS/BGS update 2011) [6], the fall-risk assessment tool (FRAT-UP) based on a meta-analysis of fall risk factors [7] and tools developed using research cohorts [8–10]. However, most of these tools entail time-consuming procedures that impose additional burden on patients and clinicians. For example, many tools rely on mobility and physical performance tests, which have to be carried out under supervision of clinicians, and manual collection of other clinical data to estimate fall risk. Therefore, they are difficult to implement and less feasible in daily general practice.

Researchers have recognised the importance of utilising electronic health records (EHR) data to develop prediction models for falls in older people [11–14]. An advantage of prediction models based on the EHR is that they can be developed using routinely collected data such as demographics, diagnostic results and medication use, which can be retrieved and readily implemented in clinical practice [15].

Two types of EHR data can be used for predictive modelling: structured data with predefined and consistent formats (e.g. demographics, diagnoses and medications) and unstructured data (e.g. clinical notes). Most of the existing prediction models for falls derived from EHR data have utilised only structured data. While this type of data is relatively easy to analyse and may capture many risk factors for falls, it does not necessarily capture implicit fall risk factors such as environmental hazards (inadequate lighting, insecure toilet) [16]. Furthermore, some important geriatric syndromes recognised to be associated with falls, for example, lack of social support, vision impairment, urinary incontinence and walking difficulty, were better identified in unstructured data than structured data [17]. Clinicians hence often document additional information they consider important and may not be captured in structured data fields, which can be leveraged to enhance predictive models of falls.

While clinical notes may contain potential fall risk factors, manual extraction of keywords or phrases to represent these risk factors is challenging and impractical. Natural language processing (NLP) can be utilised to automatically derive value from the clinical notes. NLP is a field that combines linguistics and artificial intelligence to allow machines to comprehend and interpret text, and it is being widely applied to extract and get insight into hidden information in clinical notes [18]. Related specifically to falls, some research has been carried out applying machine learning and NLP on clinical notes to identify fall events [19–22] and to explore predictors of falls [16]. However, to our knowledge, no study exists that used machine learning to predict future falls in the general older population using information extracted from clinical notes.

In this study, we aimed to explore the predictive performance of the clinical notes written by general practitioners (GPs) in predicting future falls in community-dwelling older people. In addition, we sought to investigate the incremental predictive value of the clinical notes over the structured clinical variables. Our approach relies on modern NLP techniques that translate words and documents into a vector representation, in order to capture their meaning.

Methods

Data source, study design and participants

We analysed anonymised primary care EHR data extracted from the pseudonymised database of the Academic General Practitioner’s Network at the Academic Medical Center (AHA AMC). It contains data collected from 50 general practices in the province of North Holland in the Netherlands, pertaining to patients registered by any of the respective GPs between 2012 and 2019. The data include demographics, diagnoses, medication prescriptions and de-identified clinical notes in Dutch.

Our study sample included patients enlisted with GPs between 2018 and 2019. Baseline data (structured clinical variable and unstructured clinical notes) were obtained from the observation period in the year 2018. Data in the follow-up period in the year 2019 were used to determine the outcome (1-year fall). Individuals aged 65 or older at the beginning of 2018 and who had more than one word in the clinical notes written in 2018 were included. The restriction of the minimum number of words was a requirement of our NLP modelling technique, which is described subsequently.

Observational studies based on pseudonymised data from the AHA AMC database are exempted from informed consent of patients. The AHA AMC database is run according to Dutch privacy legislation and contains pseudonymised data from all patients of the participating general practices, except for those patients who object to this.

Outcome and the structured clinical variables

The outcome was defined as any fall event that occurred in the 1-year follow-up period and was ascertained by a manual chart review of the clinical notes associated with patients in the follow-up period. The structured clinical variables included age, sex, 33 medication groups and 43 chronic medical conditions associated with patients in the observation period. The outcome and the clinical variables were defined and obtained the same way as in [12]. The list of the clinical variables is given in Supplementary Appendix 1.

Unstructured clinical notes: data extraction and pre-processing

The clinical notes were mainly written by GPs and to a much lesser extent by other healthcare professionals in the general practice during the observation period and follow the SOAP structure (S-subjective, O-objective, A-assessment, and P-plan). We extracted all text associated with each patient and combined them into one text record, further referred to as document. Pre-processing of each document involved lowercasing and removal of non-letter characters.

Patient text representation and topic inference

Unstructured texts should be mapped into numerical representations that summarise the documents before using them as input to develop prediction models. Each document can be seen as a collection of topics varying in prominence and these topics can be extracted from the text. We deployed the recently introduced top2vec algorithm to discover common topics in the documents [23]. A description of our application of top2vec for patient text representation is given in Supplementary Appendix 2. Briefly, top2vec is an NLP topic modelling approach that maps words and documents to n-dimensional vector space, based on the semantic relatedness between the documents. Topic vectors are obtained after clustering the document vectors and calculating the arithmetic mean of each cluster. Figure 1 depicts a two-dimensional illustration of words, documents and topics when top2vec is applied. Words nearest to each topic vector are used to describe the topic. From Figure 1, the cluster of the words ‘carbidopa’, ‘levopdopa’, ‘parkinson’ and ‘hypokinetic’ may be considered as a topic on Parkinson’s disease. The resulting topics can then be used as input variables to a machine learning algorithm. Each document is represented as a vector of distances to all topics.

An illustration of embeddings of words, documents and topics generated by top2vec, adapted from [23]. The large dark ellipses reflect the clusters of the documents. Words, documents and topics are represented in a two-dimensional space for illustration purposes, but they usually have between 50 and 300 dimensions.

Model development and validation

We developed three prediction models to predict 1-year fall risk in older people: one contained only the structured clinical variables (Baseline model), one used the topics extracted by top2vec as input (Topic-based model) and one combined the structured clinical variables with the extracted topics (Combi model) by simply concatenating them.

Each model was constructed by applying penalised logistic regression using the least absolute shrinkage and selection operator (Lasso) [24], which allows for variable selection as part of fitting the model. We internally validated the models, including lambda tuning, using 10-fold cross-validation.

Model performance

We assessed model performance in terms of discrimination measured by the area under the receiver operating characteristic curve (AUC), and calibration using calibration plots [25]. We reported the mean and 95% confidence interval (CI) of the AUCs over 10-fold. We used DeLong’s method [26] to compare the AUCs between each model and the Baseline model and we considered P-values <0.05 to be statistically significant.

All the analyses were done using R statistical software environment version 4.0 (R Foundation for Statistical Computing, Vienna, Austria). Lasso was performed using the package glmnet and the python implementation of top2vec was interfaced to R using the package reticulate [27]. A checklist of the Minimum Information About Clinical Artificial Intelligence (MI-CLAIM) [28] is included in Supplementary Appendix 3.

Results

Study population

Data of 36,470 older people were extracted from the database. Table 1 presents a summary of the baseline characteristics. The number of individuals who fell was 4,778 (13.1%). Compared to non-fallers, fallers were older (median age 76.6 interquartile range (IQR) [70.7–83.3] versus 71.4 IQR [68.0–77.1]), had a higher proportion of female sex (63.3% versus 51.65%) and had longer clinical notes (median number of words 740.0 IQR [362.5–1376.5] versus 398.0 IQR [177.0–775.0]). According to our eligibility criteria, 1,113 individuals were excluded from the analysis because of a missing clinical note (n = 1,033; 2.8%) or just one clinical note with less than two words (n = 80; 0.2%). The excluded individuals were generally younger (median age 68.74 IQR [65.66–72.90] versus 71.82 IQR [68.41–78.06]) and of male sex (53.7% versus 46.6%). The number of fallers among the excluded individuals was 44 (4.0%). The remaining 35,357 individuals were included in the analysis.

Table 1.

Summary of the baseline characteristics of the study sample population

Population characteristic	Non-fallers (n = 31,692)	Fallers (n = 4,778)
Age	71.4 [68.0, 77.1]	76.6 [70.7, 83.3]
Female sex	16,372 (51.7)	3,026 (63.3)
Circulatory hypertension	16,061 (50.7)	2,713 (56.8)
Cardiac arrhythmia	5,556 (17.5)	1,194 (25.0)
Coronary heart disease	4,559 (14.4)	913 (19.1)
Heart failure	1,413 (4.5)	449 (9.4)
Orthostatic hypotension	164 (0.5)	62 (1.3)
Stroke including transient ischemic attack	1,803 (5.7)	484 (10.1)
Diabetes	6,869 (21.7)	1,314 (27.5)
Kidney disease	1,072 (3.4)	198 (4.1)
Anxiety disorder	899 (2.8)	205 (4.3)
Dementia	785 (2.5)	282 (5.9)
Depression	867 (2.7)	277 (5.8)
Epilepsy	287 (0.9)	79 (1.7)
Parkinson disease	298 (0.9)	103 (2.2)
Memory and concentration problem	1,959 (6.2)	667 (14.0)
Vertigo or dizziness	1,101 (3.5)	345 (7.2)
Hearing disorder	4,132 (13.0)	925 (19.4)
Visual disorder	8,975 (28.3)	1,839 (38.5)
Previous injury	2,416 (7.6)	853 (17.9)
Back or neck disorder	2,872 (9.1)	638 (13.4)
Osteoarthritis	10,092 (31.8)	2,031 (42.5)
Osteoporosis	1,391 (4.4)	385 (8.1)
Rheumatoid arthritis	666 (2.1)	155 (3.2)
Vitamin deficiency	936 (3.0)	243 (5.1)
Fatigue or weakness	1,520 (4.8)	463 (9.7)
Urinary incontinence	1,553 (4.9)	537 (11.2)
Multimorbidity^a	15,013 (47.4)	3,371 (71.0)
Antihyperglycemic drugs	5,404 (17.1)	1,045 (21.9)
Antidepressant drugs	2,201 (6.9)	577 (12.1)
Antiepileptic drugs	1,099 (3.5)	298 (6.2)
Antiparkinson drugs	387 (1.2)	128 (2.7)
Proton pump inhibitors	12,045 (38.0)	2,533 (53.0)
Urinary incontinence drugs	785 (2.5)	236 (4.9)
Non-steroidal anti-inflammatory drugs	4,320 (13.6)	748 (15.7)
Opioids	3,883 (12.3)	1,035 (21.7)
Number of cardiovascular drugs
None	11,509 (36.3)	1,270 (26.6)
One	5,563 (17.6)	816 (17.1)
Two	5,315 (16.8)	852 (17.8)
Three	4,316 (13.6)	764 (16.0)
Four	2,619 (8.3)	540 (11.3)
Five or more	2,370 (7.5)	536 (11.2)
Number of words in the clinical notes	398.0 [177.0, 775.0]	740.0 [362.5, 1376.5]
Missing clinical notes	992 (3.1)	41 (0.86)
Clinical notes with one word	77 (0.2)	3 (0.1)

Open in a new tab

Data are presented as n (%) or median [IQR].

^aPresence of ≥4 long-term health conditions.

Inferred topics

Our approach to extract topics using top2vec produced 151 topics. Supplementary Appendix 4 lists 20 selected topics, for illustrative purposes, together with the top 10 words that describe each topic. The full list of topics is reported in Supplementary Appendix 5.

Predictive performance

Table 2 shows the discriminative ability of the models. The AUCs of the Baseline and Topic-based models were 0.709 (CI 95% 0.700–0.719) and 0.685 (CI 95% 0.676–0.694), respectively. The AUC of the Combi model was 0.718 (CI 95% 0.708–0.727) and this value was significantly higher than the Baseline model. With respect to calibration, all the models showed good calibration as illustrated in Figure 2.

Table 2.

The predictive performance of the models to predict falls based on 10-fold cross-validation

Model	Mean AUC (95% CI)
Baseline model (only clinical variables)	0.709 (0.700–0.719)
Topic-based model (topics extracted from clinical notes)	0.685 (0.676–0.694)^*
Combi model (clinical variables + topics)	0.718 (0.708–0.727)^*

Open in a new tab

The numbers are rounded to three decimal places.

^* P < 0.0001 of difference in AUC in comparison with the Baseline model.

The calibration plots of three prediction models: Baseline, Topic-based and Combi models. The diagonal line reflects the calibration of an ideal model. The dashed line indicates the actual model calibration. Points below the diagonal line reflect overestimation, whereas points above the diagonal line reflect underestimation. The graph in the lower compartment of each plot represents a histogram of the distribution of the predicted probabilities.

Topics and clinical variables associated with falls

The total number of topics obtained after applying Lasso on the Topic-based model was 36 (see Table 3). Seven of these topics were positively associated with falls (odds ratios ranged from 1.08 to 55.69) and 28 were negatively associated with falls (odds ratios ranged from 0.23 to 0.98). For example, topics related to residential care facility, cognitive impairment, fractures and injuries were positively associated with falls, whereas topics related to hypertension, cardiovascular risk management (CVRM), diabetes prevention and diagnostic care and pre-travel health advice and vaccination were negatively associated with falls. In the Combi model, 11 topics and 37 clinical variables were retained (see Supplementary Appendix 6). The variables retained in the Baseline model are provided in Supplementary Appendix 7.

Table 3.

Topics retained from the topic-based model to predict falls after applying lasso regression. Topic name is based on our clinical knowledge

Topic id	Possible topic	Top 10 words (Dutch ‘English translation’)	Odds ratio
23	Residential care	toonladder ‘residential home’, verzorgende ‘caring’, wvp^a ‘district nurse’, verzorging ‘care’, koert ‘person name’, verpleging ‘nursing’, wijkverpleging ‘district nursing’, buitenhaeghe ‘residential home’, wijkvpl^a ‘district nursing’, overloop ‘overflow’	55.69
33	Cognitive impairment	mmse^a ‘mini-mental state examination’, gds^a ‘geriatric depression scale’, mental ‘mental’, state ‘state’, examination ‘examination’, wijzers ‘pointers’, rinella ‘person name’, vergeet ‘forget’, klokkentest ‘clock test’, klok ‘clock’	3.38
111	Fractures	sling ‘sling’, subcapitale ‘subcapital’, humerusfractuur ‘humeral fracture’, humerus ‘humerus’, fractuur ‘fracture’, tuberculum ‘tubercle’, majus ‘majus’, dislocatie ‘dislocation’, mitella ‘sling’, schouderluxatie ‘shoulder dislocation’	1.77
69	Head trauma	capitis ‘capitis’, hersenen ‘brain’, subduraal ‘subdural’, intracraniele ‘intracranial’, hoofdwond ‘head wound’, amnesie ‘amnesia’, neurologisch ‘neurological’, cerebrum ‘cerebrum’, hersenschudding ‘concussion’, blankevoort ‘person name’	1.72
133	Information exchange between providers & CVRM laboratory measurements	onbekende ‘unknown’, leeg ‘empty’, code ‘code’, huisartsgeneeskunde ‘family medicine’, medovd^a ‘medical transfer message’, rq^a ‘part of a clinical COPD questionnaire code’, ond^a ‘examination’, triglyceri^b ‘triglyceride’, kreatmdrd^a ‘creatinine modification of diet in renal diseases’, kre^a ‘creatinine’	1.48
29	Wounds	wondcontrole ‘wound control’, wond ‘wound’, verbinden ‘bandage’, alginaat ‘alginate’, verbonden ‘bandaged’, gaas ‘gauze’, honey ‘honey’, wondco^a ‘wound control’, honinggaas ‘honey gauze’, wondranden ‘wound edges’	1.25
56	Frailty assessment	Isar^a ‘identification of seniors at risk’, isarbrief ‘isar letter’, kw^a ‘frailty’, isarscore ‘isar score’, ouderenzorg ‘elderly care’, vragenlijst ‘questionnaire’, teruggestuurd ‘returned’, kwetsbaar ‘vulnerable’, hhhulp^a ‘domestic help’, kenmerk ‘characteristic’	1.08
68	Prostate cancer	prostaatcarcinoom ‘prostate carcinoma’, psma^a ‘psma’, gleason ‘gleason’, prostaatca^a ‘prostate carcinoma’, ipsa ‘initial prostate specific antigen’, hormonale ‘hormonal’, curatieve ‘curative’, radiotherapie ‘radiotherapy’, opzet ‘intent’, ebrt^a ‘external beam radiotherapy’	1
36	Communication (letters) medical specialists	vriendelijke ‘friendly’, groet ‘greeting’, groeten ‘regards’, beste ‘dear’, kunt ‘can’, hartelijke ‘sincerely’, dank ‘thank’, uw ‘your’, mijn ‘my’, geachte ‘dear’	0.98
89	Glaucoma/eye disorder	glaucoom ‘glaucoma’, oogdruk ‘eye pressure’, oct^a ‘optical coherence tomography’, ods^a ‘both eyes’, oogdrukken ‘eye pressures’, papillen ‘papillae’, od^a ‘right eye’, gezichtsveldonderzoek ‘visual field test’, cosopt ‘timolol/dorzolamide eye drop’, oogheelkunde ‘ophthalmology’	0.97
125	Thyroid disorder	struma ‘goiter’, multinodulair ‘multinodular’, nodus ‘nodule’, schildklier ‘thyroid’, dominante ‘dominant’, schildkliercarcinoom ‘thyroid carcinoma’, punctie ‘puncture’, nodi ‘nodules’, trachea ‘trachea’, hals ‘neck’	0.95
95	Skin infections	erysipelas ‘erysipelas’, cellulitis ‘cellulitis’, flucloxacilline ‘flucloxacillin’, fluclox ‘flucloxacillin’, floxapen ‘flucloxacillin’, wondroos ‘erysipelas’, vurig ‘fiery’, afgetekend ‘marked’, onderbeen ‘lower leg’, erytheem ‘erythema’	0.95
66	Use of catheter	catheter ‘catheter’, cad^a ‘catheter’, katheter ‘catheter’, cath^a ‘catheter’, urineretentie ‘urinary retention’, cathset^a ‘catheter set’, retentie ‘retention’, suprapubische ‘suprapubic’, slang ‘tubing’, curikit ‘catheter’	0.95
100	Diabetes and other controls	po^a ‘oral’, mnden^a ‘months’, contr^a ‘controle’, wken^a ‘weeks’, detty ‘name of a nurse’, curven ‘curves’, gbeld^b ‘called’, bezoekdatum ‘visit date’, glu^a ‘glucose’, insulineafhankelijk ‘insulin dependent’	0.94
65	Hernia	liesbreuk ‘inguinal hernia’, inguinalis ‘inguinal’, lieskanaal ‘inguinal canal’, reponibel ‘responsive’, reponibele ‘responsive’, hernia ‘hernia’, scrotum ‘scrotum’, breukpoort ‘port-site hernia’, testis ‘testes’, buikwand ‘abdominal wall’	0.94
146	Hospitalisation/institutionalisation	hos^a ‘hospital’, hospital ‘hospital’, mutatiereden ‘reason for mutation’, bestemming ‘destination’, tekst ‘text’, afdeling ‘department’, actueel ‘current’, woonomg^a ‘living environment’, woonom^a ‘living environment’, verpl^a ‘nursing’	0.93
121	Urogenital	penis ‘penis’, voorhuid ‘foreskin’, glans ‘glans’, eikel ‘glans’, balanitis ‘balanitis’, phimosis ‘phimosis’, balzak ‘scrotum’, plasbuis ‘urethra’, seks ‘sex’, urethra ‘urethra’	0.92
114	Sleep apnea	osas^a ‘obstructive sleep apnea syndrome’, cpap^a ‘continuous positive airway pressure’, slaapapneu ‘sleep apnea’, slaapapnoe ‘sleep apnea’, snurken ‘snoring’, apneu ‘apnea’, osa^a ‘obstructive sleep apnea’, ademstops ‘apnea’, snurkt ‘snores’, apnoe ‘apnea’	0.91
104	Angioplasty/coronary heart disease	pci^a ‘percutaneous coronary intervention’, rca^a ‘right coronary artery’, eluting ‘eluting’, lad^a ‘left anterior descending artery’, cag^a ‘coronary angiogram’, rcx^a ‘ramus circumflex artery’, drug ‘drug’, dapt^a ‘dual antiplatelet therapy’, coronairlijden ‘coronary artery disease’, stent ‘stent’	0.85
102	Cardiovascular risk management AO	lsp^a ‘the national exchange point’, xxx ‘xxx’, hag^a ‘family medicine’, hydrochloorthiazid^b ‘hydrochlorothiazide’, diast^a ‘diastolic’, inschrijf ‘subscribe’, rokennhg^a ‘smoking status’, syst^a ‘systolic’, metingen ‘measurements’, mgn^a ‘my health web portal’	0.84
3	Cardiovascular monitoring and prescribing	derde ‘third’, nierlab ‘renal lab’, controlebeleid ‘monitoring policy’, polsfrequentie ‘pulse rate’, cholestrol ‘cholestrol’, herhalingsrecept ‘repeat prescription’, receptregels ‘prescription lines’, ongeacht ‘regardless’, diastolische ‘diastolic’, vermenigvuldigd ‘multiplied’	0.82
149	Urine test	kwalitatief ‘qualitative’, glucu^b ‘glucose’, aant^a ‘number’, leuku^a ‘leukocytes’, ketonen ‘ketones’, normaalwaarde ‘normal value’, eiw^a ‘protein’, eiwit ‘protien’, phadiatop ‘phadiatop’, leucocyten^b ‘leukocyten’	0.79

Open in a new tab

(Continue)

Discussion

In this study, we analysed the predictive performance of using the unstructured clinical notes in order to predict future falls in older people. We found that prediction models for falls based on clinical notes provide an additional viable way of fall risk estimation compared to existing traditional prediction models based on only clinical variables. Furthermore, the predictive performance can be slightly increased when adding information extracted from the unstructured clinical notes to the structured clinical variables.

The present study pioneers in applying modern NLP techniques and machine learning on clinical notes to predict future falls in older people. Outside of falls but in the area of predicting clinical outcomes, topic modelling has been demonstrated to improve the prediction of mortality in the intensive care [29], psychiatric readmission [30] and early sepsis [31]. Our results concord with the abovementioned studies, which found that the predictive performance of prediction models can be improved when adding unstructured data to clinical variables. These results are likely to be related to the presence of certain predictive information that was inadequately captured in structured data. However, contrary to the findings of [29], the performance of our prediction model based only on the unstructured clinical notes did not exceed that of the model with just the clinical variables. A possible explanation is the differences in clinical documentation of both the structured and unstructured data across clinical settings [32]. The unstructured clinical notes in a particular intensive care setting usually contain different types of notes (e.g. physician, nursing and radiology reports) that may convey more information, while in primary care they are primarily written by GPs. Moreover, it is important to highlight the challenging multifactorial nature of falls that are also complex to predict.

The discrimination ability of our NLP models compares favourably and often surpasses models based on prospective research cohorts summarised in a recent review [33]. Several national guidelines have adopted the risk stratification algorithm provided by the AGS/BGS guideline for falls prevention [34]. The algorithm is based on screening questions on prior falls and problems with balance followed by in-depth multifactorial risk assessment and targeted interventions. Although based on expert opinion, the screening questions were found to have limited predictive ability [35, 36]. Moreover, the multifactorial risk assessment may demand time-consuming functional assessments for gait and balance which may take up to 20 minutes to complete [35]. In contrast to these tools, our models are based on data that are routinely recorded in the process of clinical care and readily available. The model that combined both the unstructured clinical notes and structured clinical variables could be integrated with a decision support system in the EHR to enhance the accuracy of identifying individuals at higher fall risk. Although the AUC gain of the model does not necessarily reflect an improved performance with respect to clinical relevance, this add-on gain can be obtained without imposing an additional burden on clinicians or patients being assessed.

Most of the topics identified in the current study were coherent and capture to a large extent comorbidities (e.g. dementia) or medical conditions (e.g. fractures). This lends some support to the validity of top2vec to generate meaningful topics. We note that the interpretation of a particular topic cannot be done by considering separate words in isolation from the other words, and that the order of the words that form the topic reflects the relative importance of each word to the topic. For instance, for the topic cognitive impairment, the top five words listed in descending order according to their importance are ‘MMSE’, ‘GDS’, ‘mental’, ‘state’ and ‘examination’. The word ‘examination’ is preceded by ‘MMSE’, ‘GDS’, ‘mental’ and ‘state’ and therefore is likely to be in the context of mental state examination, and the topic can be related to cognitive impairment. However, we observed that some topics were difficult to interpret. For example, the cluster of words ‘third’, ‘renal lab’, ‘monitoring policy’, ‘pulse rate’, ‘cholesterol’, ‘repeat prescription’, ‘prescription lines’, ‘regardless’, ‘diastolic’, may refer to multiple topics as CVRM, medication prescription or blood pressure monitoring. This could be partially explained by the unique nature of the clinical notes where clinicians document data in wrong places, the use of specific unique sentence fragments [37] and variations in documentation [38].

Table 3.

Continued

Topic id	Possible topic	Top 10 words (Dutch ‘English translation’)	Odds ratio
109	Diabetes eye & foot checks AO	fundusfoto ‘fundus photography’, ingestelde ‘set’, simms^a ‘simm’s classification for diabetic foot’, type ‘type’, glun ‘fasting glucose’, mol^a ‘mole’, mydriasis ‘mydriasis’, kwartaal ‘quarter’, marokko ‘morocco’, filo^a ‘monofilament test’	0.77
98	Bladder cancer	cystoscopie ‘cystoscopy’, ptag^a ‘pituitary tumor derived apoptosis gene’, urotheelcelcarcinoom ‘urothelial carcinoma’, turt^a ‘trans urethral resection of the tumor’, spoelingen ‘washes’, mitomycine ‘mitomycin’, blaastumor ‘bladder tumor’, bcg^a ‘bacillus calmette-guerin’, blaas ‘bladder’, cis^a ‘carcinoma in situ’	0.76
72	Sore throat	keel ‘throat’, keelklachten ‘sore throat’, hees ‘hoarse’, stem ‘voice’, heesheid ‘hoarseness’, globus ‘globus’, globusgevoel ‘globus feeling’, schrapen ‘scraping’, slikklachten ‘swallowing complaints’, keelpijn ‘sore throat’	0.74
151	Preventive and multidisciplinary care	oproep ‘invitation’, gv^a ‘influenza vaccination’, kzorg^a ‘integrated/chain care’, opgeroepen ‘invited’, cvr^a ‘cardiovascular risk’, oproepen ‘invite’, hfdbeh^a ‘main practitioner’, keten ‘chain’, rggzcvrm^a ‘reason for none regular care cvrm’, dominante ‘dominant’	0.72
90	Cardiovascular risk management and control	eiwitlekkage ‘protein leakage’, systolisch ‘systolic’, mgn^a ‘my health web portal’, cvrp^a ‘cardiovascular risk profile’, vvr^a ‘vascular increased risk’, oproeplijst ‘call list’, jaarcontrole ‘annual check’, praktijkondersteuner ‘practice nurse’, protocollair ‘protocol’, labwaarden ‘lab values’	0.62
1	Flu vaccination	griepvaccin ‘flu vaccine’, mercatorapotheek ‘name of pharmacy’, campagne ‘campaign’, inentings ‘vaccinations’, pd^a ‘per day’, bijgeruizen ‘recurrences’, retourbericht ‘return message’, goedhuisartsen ‘good general practitioners’, kis ‘chain information system’, schellingwoude ‘name of village’	0.61
7	Pulmonary disease	exacerbatie ‘exacerbation’, piepen ‘wheezing’, bronchitis ‘bronchitis’, rhonchi ‘rhonchi’, longen ‘lungs’, longontsteking ‘pneumonia’, ronchi ‘wheezing’, piepend ‘wheezing’, verlengd ‘prolonged’, pneumonie ‘pneumonia’	0.53
4	Preventive and diagnostic care mainly diabetes	mvg^a ‘motivational interviewing’, venapunctie ‘venipuncture’, glycaemische^b ‘glycemic’, ddb^a ‘dimethyl dimethoxy biphenyl dicarboxylate’, nuchterh^b ‘fasting’, medial ‘medial’, diagnostische ‘diagnostic’, regulatie ‘regulation’, hbacb^a ‘hemoglobine A1c’, dmjc^a ‘diabetes mellitus control’	0.5
79	Blood pressure measurement	metingen ‘measurements’, gemiddeld ‘average’, meting ‘measurement’, systolisch ‘systolic’, thuismetingen ‘home measurements’, gemiddelde ‘average’, bloeddrukmeter ‘blood pressure monitor’, meet ‘measure’, halfuursmeting ‘half hour measurement’, bloeddrukmeting ‘blood pressure measurement’	0.47
144	Hypertension	essentiële ‘essential’, orgaanbeschadiging ‘organ damage’, hypertensie ‘hypertension’, bls^a ‘blood sugar’, essentiele ‘essential’, herinnering ‘reminder’, oproepen ‘recall’, lijst ‘list’, tensies ‘tensions’, hype^a ‘hypertension’	0.46
96	Kidney stones	nierstenen ‘kidney stones’, niersteen ‘kidney stone’, urolithiasis ‘urolithiasis’, bewegingsdrang ‘urge to move’, steen ‘stone’, uretersteen ‘ureteral stone’, uitgeplast ‘urinated/peed out’, koliekpijn ‘colic pain’, stenen ‘stones’, steentje ‘stone’	0.44
113	Erectile dysfunction	erectie ‘erection’, viagra ‘viagra’, cialis ‘cialis’, sildenafil ‘sildenafil’, erectiele ‘erectile’, erectieproblemen ‘erectile problem’, erectiestoornis ‘erectile dysfunction’, erecties ‘erections’, potentie ‘potency’, seks ‘sex’	0.36
130	Pre-travel health advice & vaccination	dtp^a ‘dtp vaccine’, vaccinaties ‘vaccinations’, reisadvies ‘travel advice’, hep^a ‘hepatitis’, malarone ‘malarone’, gevaccineerd ‘vaccinated’, malaria ‘malaria’, vacc^a ‘vaccination’, vaccin ‘vaccine’, reis ‘travel’	0.26
10	New patient registration or unregistering	verhuisd ‘moved’, inschrijfformulier ‘registration form’, uitschrijving ‘unregister’, ion^a ‘registered by name’, inschrijving ‘registration’, ingeschreven ‘registered’, dossier ‘file’, kennismaking ‘acquaintance’, inschrijf ‘register’, geimporteerd ‘imported’	0.23

Open in a new tab

Numbers are rounded to two decimal places. AO = among others. Intercept = −1.39.

^aAbbreviation or acronym

^bMisspelling

With respect to the retained topics in the prediction model that used only topics inferred from the clinical notes, our results indicate that institutionalisation, cognitive impairment, previous fractures or injuries and frailty were associated with higher fall risk. The relationship between these topics and falls is widely recognised [39–41]. On the other hand, many topics were found to be negatively associated with falls such as the vaccination against influenza and pre-travel health advice and vaccination. The former highlights the role that influenza vaccination may play in preventing influenza symptoms, including fatigue, dizziness and abnormal gait, which are known predisposing factors for falls [41]. The latter is indicative of the positive overall physical health and mobility status of older people who travel, making them less vulnerable to falls. While the aforementioned topics appear intuitive, we found some other counter-intuitive topics to what is known in the literature. For example, topics revolving around CVRM and diabetes care were related to reduced fall risk, suggesting that the presence of a cardiovascular disease or diabetes is protective against falls. However, individuals with chronic diseases are prone to falls due to different reasons, including among others, the use of certain medications. This discrepancy could be explained in part by the fact that routine preventive CVRM and diabetes check-ups are predominantly performed in relatively fit older people, while those with the highest fall risk, e.g. frail older persons, receive more personalised care that may not include standard routine lab controls and the like. Alternatively, it can be attributed to the attentive medical care given to individuals with chronic diseases (most of the individuals with chronic diseases in the Netherlands receive a special chronic care program) who receive more medical attention and accompanied better management of their comorbidities. This finding corroborates the ideas of [42] who suggested that diabetic older people are at lower risk to develop dizziness when they receive regular medical attention. A note of caution is due here since our study aimed to develop prediction models for falls, and not to establish a causal relationship between predictors and falls. These results therefore need to be interpreted with caution.

This study has several strengths. To our knowledge, this is the first study that utilised clinical notes to predict future falls in the general older population. Furthermore, we analysed a large sample size of older people in the primary care setting. Additionally, we represented the clinical notes as topics which can be directly interpreted by clinicians to understand specific reasons for fall. We also evaluated models’ performance on both discrimination and calibration using cross-validation.

We acknowledge several limitations. First, our results reflect a primary care setting and one healthcare system. The extent to which results generalise to other clinical settings and healthcare systems remains to be established. Clinicians from different settings may have different patterns of clinical documentation, which may impact the generalisability of the prediction models. Second, the fall prevalence in this study is lower than that observed in prospective cohort studies. As we used EHR data, falls were likely those requiring medical attention and those that resulted in concerns about falling, since in current practice in general not all falls are reported to or documented by the GPs. Quite often, seniors forget to mention their falls or may not report them because no major injury was sustained. Similarly, letters sent by other care providers (e.g. hospitals) to the GPs, which could contain fall incidents, might be inadequately documented by the GPs. Nevertheless, falls in our study may better mimic real practice where majority of non-injurious falls go unreported. Third, our models are limited to the Dutch language but can be trained using our development strategy to make them applicable when another language is required. Finally, prediction models based on text are unable to perform when textual data are unavailable or inaccessible. In such cases, one should rely on clinical variables to predict falls.

This study has implications for GPs and researchers. Our prediction model harnessed routinely collected unstructured clinical notes and structured clinical variables. As such, it can be implemented as an electronic decision support tool to help GPs identifying individuals at higher fall risk, although the performance gains might be small. For researchers, the unstructured clinical notes appear to be a viable source to develop prediction models. Our study corroborates the hypothesis that integrating unstructured data with structured data could lead to improved predictive performance. However, as the improvement in our model was small, this improvement may not be clinically relevant.

Further research may investigate the impact of incorporating temporal information extracted from clinical notes on the predictive performance. Moreover, external validation is needed to establish the generalisability of our results in other clinical settings. Additionally, although we used topics to represent the text to maintain interpretability with minimal impact on performance, future studies may consider different text representation approaches and explore their predictive performance.

Supplementary Material

aa-22-1541-File002_afad046

Click here for additional data file.^{(84.7KB, docx)}

Acknowledgements

The authors are grateful to all participating GPs and the data managers of the Academic General Practitioner’s Network at Academic Medical Center (AHA AMC) for their time and effort in contributing routine care data for this study.

Contributor Information

Noman Dormosh, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Amsterdam Public Health, Aging and Later Life & Methodology Amsterdam, Amsterdam, The Netherlands.

Martijn C Schut, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Department of Clinical Chemistry, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Amsterdam Public Health, Methodology & Quality of Care, Amsterdam, The Netherlands.

Martijn W Heymans, Department of Epidemiology and Data Science, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Amsterdam Public Health, Methodology & Personalized Medicine, Amsterdam, The Netherlands.

Otto Maarsingh, Department of General practice, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Amsterdam Public Health, Aging and Later Life & Mental Health, Amsterdam, The Netherlands.

Jonathan Bouman, Department of General Practice, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands.

Nathalie van der Velde, Department of Internal Medicine, Section of Geriatric Medicine, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Amsterdam Public Health, Aging and Later Life, Amsterdam, The Netherlands.

Ameen Abu-Hanna, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Amsterdam Public Health, Aging and Later Life & Methodology Amsterdam, Amsterdam, The Netherlands.

Data Availability Statement

The data underlying this article were provided by the Academic General Practitioner’s Network at the Academic Medical Center (AHA AMC). For privacy reasons, the data cannot be made publicly available. Reasonable requests for conditional reuse of the data can be submitted to the corresponding author.

Declaration of Conflicts of Interest

None.

Declaration of Sources of Funding

This work was supported by the Dutch Research Council (NWO) (grant number 628011026), the Hague, the Netherlands. The funder did not have any role or influence in study design analysis or reporting.

References

1. EuroSafe . EuroSafe: injuries in the European Union, summary on injury statistics 2012–2014. 6th ed, EuroSafe, Amsterdam 2016; 505–18. [Google Scholar]
2. Moreland B, Kakara R, Henry A. Trends in nonfatal falls and fall-related injuries among adults aged ≥65 years—United States, 2012–2018. MMWR Morb Mortal Wkly Rep 2020; 69: 875–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Stel VS, Smit JH, Pluijm SMFet al. Consequences of falling in older men and women and risk factors for health service use and functional decline. Age Ageing 2004; 33: 58–65. [DOI] [PubMed] [Google Scholar]
4. Hartholt KA, Van Beeck EF, Polinder Set al. Societal consequences of falls in the older population: injuries, healthcare costs, and long-term reduced quality of life. J Trauma Inj Infect Crit Care 2011; 71: 748–53. [DOI] [PubMed] [Google Scholar]
5. Montero-Odasso M, Van Der Velde N, Alexander NBet al. New horizons in falls prevention and management for older adults: a global initiative. Age Ageing 2021; 50: 1499–507. [DOI] [PubMed] [Google Scholar]
6. Drootin M. Summary of the updated American Geriatrics Society/British Geriatrics Society clinical practice guideline for prevention of falls in older persons. J Am Geriatr Soc 2011; 59: 148–57. [DOI] [PubMed] [Google Scholar]
7. Cattelani L, Palumbo P, Palmerini Let al. FRAT-up, a web-based fall-risk assessment tool for elderly people living in the community. J Med Internet Res 2015; 17: e41. 10.2196/jmir.4064. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Bongue B, Dupré C, Beauchet O, Rossat A, Fantino B, Colvez A. A screening tool with five risk factors was developed for fall-risk prediction in community-dwelling elderly. J Clin Epidemiol 2011; 64: 1152–60. [DOI] [PubMed] [Google Scholar]
9. Tromp AM, Pluijm SMF, Smit JH, Deeg DJH, Bouter LM, Lips P. Fall-risk screening test: a prospective study on predictors for falls in community-dwelling elderly. J Clin Epidemiol 2001; 54: 837–44. [DOI] [PubMed] [Google Scholar]
10. Loo B, Seppala LJ, Velde Net al. Development of the AD F ICE_IT models for predicting falls and recurrent falls in community-dwelling older adults: pooled analyses of European cohorts with special attention to medication. J Gerontol A Biol Sci Med Sci 2022; 77: 1446–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Oshiro CES, Frankland TB, Rosales AGet al. Fall ascertainment and development of a risk prediction model using electronic medical records. J Am Geriatr Soc 2019; 67: 1417–22. [DOI] [PubMed] [Google Scholar]
12. Dormosh N, Schut MC, Heymans MW, van der Velde N, Abu-Hanna A. Development and internal validation of a risk prediction model for falls among older people using primary care electronic health records. J Gerontol Ser A 2021; 77: 1438–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Ye C, Li J, Hao Set al. Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm. Int J Med Inform 2020; 137: 104105. 10.1016/j.ijmedinf.2020.104105. [DOI] [PubMed] [Google Scholar]
14. Rafiq M, McGovern A, Jones Set al. Falls in the elderly were predicted opportunistically using a decision tree and systematically using a database-driven screening tool. J Clin Epidemiol 2014; 67: 877–86. [DOI] [PubMed] [Google Scholar]
15. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Informatics Assoc 2017; 24: 198–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Bjarnadottir RI, Lucero RJ. What can we learn about fall risk Factors from EHR nursing notes? A text mining study. EGEMS (Wash. DC) 2018; 6: 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Kharrazi H, Anzaldi LJ, Hernandez Let al. The value of unstructured electronic health record data in geriatric syndrome case identification. J Am Geriatr Soc 2018; 66: 1499–507. [DOI] [PubMed] [Google Scholar]
18. Kreimeyer K, Foster M, Pandey Aet al. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform 2017; 73: 14–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. McCart JA, Berndt DJ, Jarman J, Finch DK, Luther SL. Finding falls in ambulatory care clinical documents using statistical text mining. J Am Med Informatics Assoc 2013; 20: 906–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Bates J, Fodeh SJ, Brandt CA, Womack JA. Classification of radiology reports for falls in an hiv study cohort. J Am Med Informatics Assoc 2016; 23: e113–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Tohira H, Finn J, Ball Set al. Machine learning and natural language processing to identify falls in electronic patient care records from ambulance attendances. Informatics Heal Soc Care 2021; 47: 403–413. [DOI] [PubMed] [Google Scholar]
22. Fu S, Thorsteinsdottir B, Zhang Xet al. A hybrid model to identify fall occurrence from electronic health records. Int J Med Inform 2022; 162: 104736. 10.1016/j.ijmedinf.2022.104736. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Angelov D. Top2vec: Distributed representations of topics, 2020, arXiv preprint arXiv:2008.09470.
24. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 1996; 58: 267–88. [Google Scholar]
25. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med 2014; 33: 517–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44: 837. 10.2307/2531595. [DOI] [PubMed] [Google Scholar]
27. Ushey K, Allaire J, Tang Y. reticulate: Interface to ‘Python’. 2023. https://rstudio.github.io/reticulate/, https://github.com/rstudio/reticulate.
28. Norgeot B, Quer G, Beaulieu-Jones BKet al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med 2020; 26: 1320–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Ghassemi M, Naumann T, Doshi-Velez Fet al. Unfolding physiological state: Mortality modelling in intensive care units. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2014, 75–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Rumshisky A, Ghassemi M, Naumann Tet al. Predicting early psychiatric readmission with natural language processing of narrative discharge summaries. Transl Psychiatry 2016; 6: e921. 10.1038/tp.2015.182. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Goh KH, Wang L, Yeow AYKet al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun 2021; 12: 711. 10.1038/s41467-021-20910-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Seinen TM, Fridgeirsson EA, Ioannou Set al. Use of unstructured text in prognostic clinical prediction models: a systematic review. J Am Med Inform Assoc 2022; 29: 1292–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Gade GV, Jørgensen MG, Ryg Jet al. Predicting falls in community-dwelling older adults: a systematic review of prognostic models. BMJ Open 2021; 11: e044170. 10.1136/bmjopen-2020-044170. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Montero-Odasso MM, Kamkar N, Pieruccini-Faria Fet al. Evaluation of clinical practice guidelines on fall prevention and Management for Older Adults: a systematic review. JAMA Netw Open 2021; 4: e2138911. 10.1001/jamanetworkopen.2021.38911. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Palumbo P, Becker C, Bandinelli S, Chiari L. Simulating the effects of a clinical guidelines screening algorithm for fall risk in community dwelling older adults. Aging Clin Exp Res 2019; 31: 1069–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Burns ER, Lee R, Hodge SE, Pineau VJ, Welch B, Zhu M. Validation and comparison of fall screening tools for predicting future falls among older adults. Arch Gerontol Geriatr 2022; 101: 104713. 10.1016/j.archger.2022.104713. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Moon S, McInnes B, Melton GB. Challenges and practical approaches with word sense disambiguation of acronyms and abbreviations in the clinical domain. Healthc Inform Res 2015; 21: 35–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Cohen GR, Friedman CP, Ryan AM, Richardson CR, Adler-Milstein J. Variation in physicians’ electronic health record documentation and potential patient harm from that variation. J Gen Intern Med 2019; 34: 2355–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Aranda-Gallardo M, Morales-Asencio JM, Enriquez De Luna-Rodriguez Met al. Characteristics, consequences and prevention of falls in institutionalised older adults in the province of Malaga (Spain): a prospective, cohort, multicentre study. BMJ Open 2018; 8: e020039. 10.1136/bmjopen-2017-020039. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Ge ML, Simonsick EM, Dong BRet al., eds. Frailty, with or without cognitive impairment, is a strong predictor of recurrent falls in a US population-representative sample of older adults. Newman AB (ed.). J Gerontol Ser A Biol Sci Med Sci 2021; 76: E354–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Ambrose AF, Paul G, Hausdorff JM. Risk factors for falls among older adults: a review of the literature. Maturitas 2013; 75: 51–61. [DOI] [PubMed] [Google Scholar]
42. Dros J, Maarsingh OR, Beem Let al. Functional prognosis of dizziness in older adults in primary care: a prospective cohort study. J Am Geriatr Soc 2012; 60: 2263–9. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

aa-22-1541-File002_afad046

Click here for additional data file.^{(84.7KB, docx)}

Data Availability Statement

[ref1] 1. EuroSafe . EuroSafe: injuries in the European Union, summary on injury statistics 2012–2014. 6th ed, EuroSafe, Amsterdam 2016; 505–18. [Google Scholar]

[ref2] 2. Moreland B, Kakara R, Henry A. Trends in nonfatal falls and fall-related injuries among adults aged ≥65 years—United States, 2012–2018. MMWR Morb Mortal Wkly Rep 2020; 69: 875–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3. Stel VS, Smit JH, Pluijm SMFet al. Consequences of falling in older men and women and risk factors for health service use and functional decline. Age Ageing 2004; 33: 58–65. [DOI] [PubMed] [Google Scholar]

[ref4] 4. Hartholt KA, Van Beeck EF, Polinder Set al. Societal consequences of falls in the older population: injuries, healthcare costs, and long-term reduced quality of life. J Trauma Inj Infect Crit Care 2011; 71: 748–53. [DOI] [PubMed] [Google Scholar]

[ref5] 5. Montero-Odasso M, Van Der Velde N, Alexander NBet al. New horizons in falls prevention and management for older adults: a global initiative. Age Ageing 2021; 50: 1499–507. [DOI] [PubMed] [Google Scholar]

[ref6] 6. Drootin M. Summary of the updated American Geriatrics Society/British Geriatrics Society clinical practice guideline for prevention of falls in older persons. J Am Geriatr Soc 2011; 59: 148–57. [DOI] [PubMed] [Google Scholar]

[ref7] 7. Cattelani L, Palumbo P, Palmerini Let al. FRAT-up, a web-based fall-risk assessment tool for elderly people living in the community. J Med Internet Res 2015; 17: e41. 10.2196/jmir.4064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Bongue B, Dupré C, Beauchet O, Rossat A, Fantino B, Colvez A. A screening tool with five risk factors was developed for fall-risk prediction in community-dwelling elderly. J Clin Epidemiol 2011; 64: 1152–60. [DOI] [PubMed] [Google Scholar]

[ref9] 9. Tromp AM, Pluijm SMF, Smit JH, Deeg DJH, Bouter LM, Lips P. Fall-risk screening test: a prospective study on predictors for falls in community-dwelling elderly. J Clin Epidemiol 2001; 54: 837–44. [DOI] [PubMed] [Google Scholar]

[ref10] 10. Loo B, Seppala LJ, Velde Net al. Development of the AD F ICE_IT models for predicting falls and recurrent falls in community-dwelling older adults: pooled analyses of European cohorts with special attention to medication. J Gerontol A Biol Sci Med Sci 2022; 77: 1446–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Oshiro CES, Frankland TB, Rosales AGet al. Fall ascertainment and development of a risk prediction model using electronic medical records. J Am Geriatr Soc 2019; 67: 1417–22. [DOI] [PubMed] [Google Scholar]

[ref12] 12. Dormosh N, Schut MC, Heymans MW, van der Velde N, Abu-Hanna A. Development and internal validation of a risk prediction model for falls among older people using primary care electronic health records. J Gerontol Ser A 2021; 77: 1438–45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13. Ye C, Li J, Hao Set al. Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm. Int J Med Inform 2020; 137: 104105. 10.1016/j.ijmedinf.2020.104105. [DOI] [PubMed] [Google Scholar]

[ref14] 14. Rafiq M, McGovern A, Jones Set al. Falls in the elderly were predicted opportunistically using a decision tree and systematically using a database-driven screening tool. J Clin Epidemiol 2014; 67: 877–86. [DOI] [PubMed] [Google Scholar]

[ref15] 15. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Informatics Assoc 2017; 24: 198–208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Bjarnadottir RI, Lucero RJ. What can we learn about fall risk Factors from EHR nursing notes? A text mining study. EGEMS (Wash. DC) 2018; 6: 21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] 17. Kharrazi H, Anzaldi LJ, Hernandez Let al. The value of unstructured electronic health record data in geriatric syndrome case identification. J Am Geriatr Soc 2018; 66: 1499–507. [DOI] [PubMed] [Google Scholar]

[ref18] 18. Kreimeyer K, Foster M, Pandey Aet al. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform 2017; 73: 14–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] 19. McCart JA, Berndt DJ, Jarman J, Finch DK, Luther SL. Finding falls in ambulatory care clinical documents using statistical text mining. J Am Med Informatics Assoc 2013; 20: 906–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 20. Bates J, Fodeh SJ, Brandt CA, Womack JA. Classification of radiology reports for falls in an hiv study cohort. J Am Med Informatics Assoc 2016; 23: e113–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 21. Tohira H, Finn J, Ball Set al. Machine learning and natural language processing to identify falls in electronic patient care records from ambulance attendances. Informatics Heal Soc Care 2021; 47: 403–413. [DOI] [PubMed] [Google Scholar]

[ref22] 22. Fu S, Thorsteinsdottir B, Zhang Xet al. A hybrid model to identify fall occurrence from electronic health records. Int J Med Inform 2022; 162: 104736. 10.1016/j.ijmedinf.2022.104736. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] 23. Angelov D. Top2vec: Distributed representations of topics, 2020, arXiv preprint arXiv:2008.09470.

[ref24] 24. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 1996; 58: 267–88. [Google Scholar]

[ref25] 25. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med 2014; 33: 517–35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] 26. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44: 837. 10.2307/2531595. [DOI] [PubMed] [Google Scholar]

[ref27] 27. Ushey K, Allaire J, Tang Y. reticulate: Interface to ‘Python’. 2023. https://rstudio.github.io/reticulate/, https://github.com/rstudio/reticulate.

[ref28] 28. Norgeot B, Quer G, Beaulieu-Jones BKet al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med 2020; 26: 1320–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29. Ghassemi M, Naumann T, Doshi-Velez Fet al. Unfolding physiological state: Mortality modelling in intensive care units. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2014, 75–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30. Rumshisky A, Ghassemi M, Naumann Tet al. Predicting early psychiatric readmission with natural language processing of narrative discharge summaries. Transl Psychiatry 2016; 6: e921. 10.1038/tp.2015.182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31. Goh KH, Wang L, Yeow AYKet al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun 2021; 12: 711. 10.1038/s41467-021-20910-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] 32. Seinen TM, Fridgeirsson EA, Ioannou Set al. Use of unstructured text in prognostic clinical prediction models: a systematic review. J Am Med Inform Assoc 2022; 29: 1292–302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] 33. Gade GV, Jørgensen MG, Ryg Jet al. Predicting falls in community-dwelling older adults: a systematic review of prognostic models. BMJ Open 2021; 11: e044170. 10.1136/bmjopen-2020-044170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34. Montero-Odasso MM, Kamkar N, Pieruccini-Faria Fet al. Evaluation of clinical practice guidelines on fall prevention and Management for Older Adults: a systematic review. JAMA Netw Open 2021; 4: e2138911. 10.1001/jamanetworkopen.2021.38911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] 35. Palumbo P, Becker C, Bandinelli S, Chiari L. Simulating the effects of a clinical guidelines screening algorithm for fall risk in community dwelling older adults. Aging Clin Exp Res 2019; 31: 1069–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] 36. Burns ER, Lee R, Hodge SE, Pineau VJ, Welch B, Zhu M. Validation and comparison of fall screening tools for predicting future falls among older adults. Arch Gerontol Geriatr 2022; 101: 104713. 10.1016/j.archger.2022.104713. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] 37. Moon S, McInnes B, Melton GB. Challenges and practical approaches with word sense disambiguation of acronyms and abbreviations in the clinical domain. Healthc Inform Res 2015; 21: 35–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] 38. Cohen GR, Friedman CP, Ryan AM, Richardson CR, Adler-Milstein J. Variation in physicians’ electronic health record documentation and potential patient harm from that variation. J Gen Intern Med 2019; 34: 2355–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] 39. Aranda-Gallardo M, Morales-Asencio JM, Enriquez De Luna-Rodriguez Met al. Characteristics, consequences and prevention of falls in institutionalised older adults in the province of Malaga (Spain): a prospective, cohort, multicentre study. BMJ Open 2018; 8: e020039. 10.1136/bmjopen-2017-020039. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 40. Ge ML, Simonsick EM, Dong BRet al., eds. Frailty, with or without cognitive impairment, is a strong predictor of recurrent falls in a US population-representative sample of older adults. Newman AB (ed.). J Gerontol Ser A Biol Sci Med Sci 2021; 76: E354–60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41. Ambrose AF, Paul G, Hausdorff JM. Risk factors for falls among older adults: a review of the literature. Maturitas 2013; 75: 51–61. [DOI] [PubMed] [Google Scholar]

[ref42] 42. Dros J, Maarsingh OR, Beem Let al. Functional prognosis of dizziness in older adults in primary care: a prospective cohort study. J Am Geriatr Soc 2012; 60: 2263–9. [DOI] [PubMed] [Google Scholar]

PERMALINK

Predicting future falls in older people using natural language processing of general practitioners’ clinical notes

Noman Dormosh

Martijn C Schut

Martijn W Heymans

Otto Maarsingh

Jonathan Bouman

Nathalie van der Velde

Ameen Abu-Hanna

Abstract

Background

Methods

Results

Conclusions

Key points

Introduction

Methods

Data source, study design and participants

Outcome and the structured clinical variables

Unstructured clinical notes: data extraction and pre-processing

Patient text representation and topic inference

Figure 1.

Model development and validation

Model performance

Results

Study population

Table 1.

Inferred topics

Predictive performance

Table 2.

Figure 2.

Topics and clinical variables associated with falls

Table 3.

Discussion

Table 3.

Supplementary Material

Acknowledgements

Contributor Information

Data Availability Statement

Declaration of Conflicts of Interest

Declaration of Sources of Funding

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases