Abstract
Background
Falls involve dynamic risk factors that change over time, but most studies on fall-risk factors are cross-sectional and do not capture this temporal aspect. The longitudinal clinical notes within electronic health records (EHR) provide an opportunity to analyse fall risk factor trajectories through Natural Language Processing techniques, specifically dynamic topic modelling (DTM). This study aims to uncover fall-related topics for new fallers and track their evolving trends leading up to falls.
Methods
This case–cohort study utilised primary care EHR data covering information on older adults between 2016 and 2019. Cases were individuals who fell in 2019 but had no falls in the preceding three years (2016–18). The control group was randomly sampled individuals, with similar size to the cases group, who did not endure falls during the whole study follow-up period. We applied DTM on the clinical notes collected between 2016 and 2018. We compared the trend lines of the case and control groups using the slopes, which indicate direction and steepness of the change over time.
Results
A total of 2,384 fallers (cases) and an equal number of controls were included. We identified 25 topics that showed significant differences in trends between the case and control groups. Topics such as medications, renal care, family caregivers, hospital admission/discharge and referral/streamlining diagnostic pathways exhibited a consistent increase in steepness over time within the cases group before the occurrence of falls.
Conclusions
Early recognition of health conditions demanding care is crucial for applying proactive and comprehensive multifactorial assessments that address underlying causes, ultimately reducing falls and fall-related injuries.
Keywords: accidental falls, fall risk factors, natural language processing, electronic health records, free text, dynamic topic modelling, older people
Key Points
Falls involve dynamic risk factors that change over time, but most studies on risk factors do not capture this temporal aspect.
We applied dynamic topic modelling on longitudinal clinical notes within electronic health records data to analyse trajectories of fall-risk factors.
Trend lines of 25 topics differed significantly between fallers and non-fallers, suggesting warning signs for imminent fall risk.
Among the topics, medication use, referrals and hospital admissions showed progressive increase over time before fall occurrence.
Early recognition of health conditions demanding care is important to proactively tackle underlying causes and reduce fall risk.
Introduction
Fall prevention has become a crucial area of concern, attracting the attention of clinicians, academics and the wider community [1]. Falls are widely recognised as a geriatric giant that impacts nearly one-third of older adults annually, causing injuries, hospitalisations and fatalities [2–5]. Addressing this pervasive issue is challenging, as it involves navigating the complexity of individual risk factors. Consequently, developing effective fall prevention intervention strategies requires a deeper understanding of the fall-risk factors.
The literature on falls for older adults presents an overwhelming number of risk factors involving sociodemographic (e.g. age and sex), biological (e.g. gait disturbances, underlying diseases), behavioural (e.g. lack of exercise) and environmental (e.g. slippery surfaces) factors [3, 6–9]. Most of the existing studies describing risk factors for falls are cross-sectional, where data on risk factors are collected at a single point in time, disregarding the influence of risk over time. However, falls are complex and involve multiple risk factors that interact and change over time. For example, the mobility function may be improving or deteriorating over time, depending on several factors such as physical activity, environmental factors, medication use and comorbidities [10–15]. Understanding these trends allows clinicians to foresee signs of increased fall risk, enabling early intervention strategies to be implemented before falls occur.
Trend analysis of fall-risk factors over time can be performed in prospective longitudinal studies. However, conducting such studies is challenging and resource intensive because data need to be collected repeatedly over a long period. Clinical notes in routinely collected electronic health records (EHR) data can serve as an alternative source to examine the trajectory of fall-risk factors. These clinical notes provide a longitudinal view of the patient’s care trajectory, including medication use and comorbidities. In addition, they may offer a detailed narrative perspective on the patient’s health status, compared to the standard medical coding found in the structured data of the EHR or the data collected in longitudinal research cohorts. Consider for example the narrative descriptions of the patient’s symptoms (e.g. quality, severity and pain duration) and information about patient’s social, emotional or psychological state. This level of detail could be important for detecting changes related to an elevated fall risk, as this can be influenced by subtle changes in the patient’s physical, psychological and/or social status [16–18]. These data are typically unstructured, captured as free text by the physician in the record of the patient.
While clinical notes contain valuable information on fall-risk factors [19], extracting and utilising this information can be a challenging task due to the unstructured and complex nature of clinical notes data. Natural Language Processing (NLP) techniques have been increasingly used in the healthcare domain to enable automated extraction and analysis of relevant information from clinical notes [20]. One such popular NLP technique is topic modelling, which seeks to identify latent topics in clinical notes [21, 22]. Topic modelling assumes that each clinical note is formed by a mixture of topics where each topic is represented by a mixture of related words that may form a topic that can be used for further analyses. For example, coexistence of the words ‘gait’, ‘walking aid’, ‘balance’ and ‘limitation’ may suggest a topic one could label as ‘impaired mobility’. Dynamic topic modelling is an extension of topic modelling that captures changes in topics over time [23, 24], enabling the tracking of topics related to fall risk as they emerge and diminish over time.
The aim of this study was to discover important topics relating to the onset of falls in new fallers and to identify trends of these topics as they evolve over time prior to the occurrence of falls. To accomplish this, we utilised a large primary care EHR data pertaining to older adults, and modern NLP techniques that account for the context and the meanings of words.
Methods
Study design, data source and setting
This is a retrospective case–cohort study using EHR data from the pseudonymised database of the Academic General Practitioner’s Network at the Academic Medical Center (AHA AMC). This database comprises data from 50 general practices in the North Holland province of the Netherlands and contains de-identified longitudinal clinical notes (in Dutch) for patients registered with the participating general practitioners (GPs) between 2016 and 2019. The cohort from which we drew the sample covered 43,630 patients aged 65 or over as of January 2019 and 23,802,634 clinical notes. Over the course of 4 years, the patients were followed and the clinical notes from 2016 to 2018 were analysed to uncover the trend of the fall-risk factors. The study’s outcome, fall incidence, was determined in 2019.
Research studies that use pseudonymised data from the AHA AMC database do not require patient consent. This database adheres to Dutch privacy laws and includes data from patients registered with participating general practices, excluding opt-outs.
Case identification and control group
The case group consisted of patients who reported falls in 2019 but had not experienced falls in the preceding 3 years (2016–18). Falls were ascertained from the clinical notes using the validated search strategy outlined in [25], which relies on specific trigger words (e.g. fell, slipped) to capture falls. Falls in 2019 were identified using a manual chart review of the triggered clinical notes, as the meaning of the trigger words depends on the context in the sentences. We assumed that falls involved the individual falling down to the floor unless otherwise indicated in the clinical notes. Falls that occurred in the preceding 3 years were determined using the automated search strategy [25]. The control group consisted of randomly sampled patients, with similar size to the case group, who did not endure falls during the whole study period.
Textual data extraction and pre-processing
The clinical notes, further referred to as documents, were timestamped and each patient may have multiple documents. No text pre-processing was conducted to maintain consistency with the language model used in this study (described below).
Dynamic topic modelling
Recall that a document can be viewed as belonging to a set of topics that vary in their prominence, and that a topic is a set of words that co-occur frequently within documents. We applied the BERTopic algorithm to discover common topics and to track their trends over time [24]. We provide a simplified description of (dynamic) topic modelling. For more details, we refer the reader to the original paper describing this technique [24].
The procedure consists of two stages: first, identifying topics in all documents, and second, tracking topic changes over time (see Figure 1). To discover the topics, a language model is used to map the documents into numerical vectors based on their semantic similarity, positioning related documents close together in the vector space. For example, the documents ‘the patient has difficulty walking due to joint pain’ and ‘patient x experiences limited mobility because of arthritis’ convey similar information about mobility issues and are likely to be positioned close. Subsequently, these documents are clustered, with each cluster representing a topic (Figure 1a). Topic words are obtained by weighing the importance of the words in the documents within the cluster taking into account their frequency and rarity across documents in other clusters. In the example, the word ‘patient’ is likely to have a low score, indicating lower importance due to its potential presence in many clusters. Conversely, the word ‘mobility’ may receive a higher score due to its expected occurrence in fewer clusters, emphasising its relative significance. Assuming that the highest scored words in the abovementioned example are ‘pain’, ‘walking’, ‘knee’, ‘gait’ and ‘limitation’, a human can label this cluster (topic) as ‘impaired mobility’. This entails that the documents in this cluster mainly revolve around this topic.
Figure 1.

Conceptual visualisation of the (dynamic) topic modelling illustrated through a single fictional example: (a) documents are represented in a two-dimensional space after mapping them into numerical vectors (document embeddings). Note that the two-dimensional representation is merely for illustrative purposes, as the actual dimensions may extend to 768; (b) illustration of the frequency of documents pertaining to the topic ‘Impaired mobility’ between 2018 and 2020, and the evolution of the words associated with this topic.
To track the topics over time, all documents are partitioned into subsets based on specific time intervals. Because each document was initially mapped to a single topic, the frequency of the documents, belonging to a particular topic within a specific time interval, could be seen as an indication of the prominence of this topic across different time periods. In the fictional example (Figure 1b), the total number of documents for the topic ‘impaired mobility’ was 430 between 2018 and 2020, with document frequencies of 100, 130 and 200 for the years 2018, 2019 and 2020, respectively, suggesting an increase in the prominence of this topic over time. It is important to note that topic words can change over time because the scoring method depends on the subset of documents within specific time periods. Therefore, as the document subsets evolve, the importance and relevance of certain words within a topic may change.
Statistical analysis
We examined the topic patterns separately for the case and control groups to compare temporal changes. As such, a time-varying variable representing the frequency of a specific topic for each group was obtained, which can be visualised as a linear trend. The slope of the line indicates the direction and steepness of the change, and can be compared between the case and control groups. We created a linear regression model to fit a line to each set of data points, including an interaction term:
, where
is the topic frequency,
is the time interval and
is the group (case/control). The sign and the magnitude of the coefficient for the interaction term
) indicates whether the effect of the time interval on topic frequency depends on the group:
) indicates no differential effect,
indicates an amplified effect and
indicates a dampened effect. A P-value <0.05 for the interaction effect was considered significant.
Cases and controls were extracted using R software [26]. Dynamic topic modelling was performed using the Python implementation of BERTopic [24]. The language model employed in BERTopic was MedRoBERTa.nl, a transformer-based language model trained on Dutch hospital clinical notes [27]. The code to perform DTM and discover the trends of the extracted topics is given in Supplementary Appendix 1, available in Age and Ageing online.
Sensitivity analysis
In order to discover important topics and to assess the robustness of the method, we repeated our analysis using six additional control samples of sizes similar to the cases group: five using an independent random control samples, and one by matching cases with controls on age, sex, number of medications and number of clinical conditions using the Mahalanobis distance as implemented in the MatchIt R package [28].
Results
Baseline characteristics
The patient characteristics of the study sample are summarised in Table 1. A total of 2,384 fallers (cases) were included, along with an equal number of randomly sampled controls. Compared with the control group, fallers were older (median age 75.23 [IQR: 70.24–81.97] vs. 71.41 [IQR: 68.07–76.67]), had a higher proportion of female individuals (59.3% vs. 48.7%), were more likely to have morbidities and were prescribed more medications. Between 2016 and 2018, fallers had a higher number of documents compared to the control group, with 274,231 documents (59.8%) for fallers and 184,494 documents (40.2%) for the control group. There were no differences in the median number of words in the documents between the two groups, with 10 words (IQR: 4–24) for fallers and 10 words (IQR: 4–23) for the control group. No significant differences between the characteristics of the randomly sampled control group and the entire controls (see Supplementary Appendix 2, available in Age and Ageing online).
Table 1.
An overview of the clinical characteristics of the included patients
| Characteristic |
Control group
(n = 2,384) |
Case group
(n = 2,384) |
|---|---|---|
| Age at index date, years | 71.41 [68.07, 76.67] | 75.23 [70.24, 81.97] |
| Female sex | 1,161 (48.7) | 1,413 (59.3) |
| Circulatory hypertension | 891 (37.4) | 1,120 (47.0) |
| Heart failure | 60 (2.5) | 98 (4.1) |
| Cardiac arrhythmia | 224 (9.4) | 356 (14.9) |
| Coronary heart disease | 265 (11.1) | 335 (14.1) |
| Diabetes | 430 (18.0) | 550 (23.1) |
| Chronic kidney disease | 57 (2.4) | 75 (3.1) |
| Stroke | 139 (5.8) | 240 (10.1) |
| Osteoarthritis | 494 (20.7) | 649 (27.2) |
| Osteoporosis | 79 (3.3) | 160 (6.7) |
| Rheumatoid arthritis | 42 (1.8) | 64 (2.7) |
| Anxiety disorders | 110 (4.6) | 144 (6.0) |
| Dementia | 38 (1.6) | 68 (2.9) |
| Depression | 111 (4.7) | 176 (7.4) |
| Epilepsy | 15 (0.6) | 21 (0.9) |
| Parkinson’s disease | 11 (0.5) | 22 (0.9) |
| Memory and concentration problems | 52 (2.2) | 80 (3.4) |
| Chronic back or neck disorder | 375 (15.7) | 532 (22.3) |
| Fatigue weakness | 119 (5.0) | 207 (8.7) |
| Hearing disorder | 158 (6.6) | 259 (10.9) |
| Visual disorder | 436 (18.3) | 670 (28.1) |
| Vertigo dizziness | 65 (2.7) | 145 (6.1) |
| Vitamin deficiency | 133 (5.6) | 199 (8.3) |
| Urinary incontinence | 45 (1.9) | 107 (4.5) |
| Antiepileptic drugs | 59 (2.5) | 92 (3.9) |
| Antiparkinson drugs | 18 (0.8) | 32 (1.3) |
| Proton pump inhibitors | 732 (30.7) | 985 (41.3) |
| Urinary incontinence drugs | 32 (1.3) | 70 (2.9) |
| Opioids | 255 (10.7) | 331 (13.9) |
| Non-steroidal anti-inflammatory drugs | 348 (14.6) | 396 (16.6) |
| Antihypertensive drugs | 332 (13.9) | 427 (17.9) |
| Antidepressant drugs | 139 (5.8) | 205 (8.6) |
| Number of cardiovascular drugs | ||
| 0 | 1,083 (45.4) | 828 (34.7) |
| 1 | 401 (16.8) | 421 (17.7) |
| 2 | 342 (14.3) | 384 (16.1) |
| 3 | 275 (11.5) | 333 (14.0) |
| 4 | 159 (6.7) | 214 (9.0) |
| ≥5 | 124 (5.2) | 204 (8.6) |
| Total clinical notes | 184,494 | 274,231 |
| Clinical notes per patient | 64 [30, 108] | 99 [61, 150] |
Data are presented as n (%) or median [IQR].
Table 2.
Trending topics possibly suggesting imminent fall risk. The topic name is based on our clinical knowledge
| Possible topic | Cases slope (95% CI) | Control slope (95% CI) | Top 10 words (Dutch ‘English translation’) |
|---|---|---|---|
| Medications | +8.6 (4.3–12.9) | −5.3 (−9 to −1.7) | tablet ‘tablet’, stuk ‘piece’, meda ‘medication’, auta ‘automatically dispensed’, inva ‘administration prescription’, mgaa ‘controlled release’, capsule ‘capsule’, foa ‘filmed coated’, msra ‘gastro-resistant’, dag ‘day’ |
| Decision-making/treatment plan | +16.3 (8.6–23.9) | +6.6 (3.1–10.1) | conclusie ‘conclusion’, beleid ‘policy’, waarvoor ‘for which’, patient ‘patient’, status ‘status’, aanwijzingen ‘clues’, medspea ‘medical specialist’, patiente ‘female patient’, ingekomen ‘received’, rechts ‘right’ |
| Renal care and kidney function test | +7.6 (3.2–12) | +1.5 (−1.6 to 4.6) | laboratoriumuitslag ‘laboratory result’, afname ‘withdrawal’, mmoll ‘mmol/l’, kreatinine ‘creatinine’, gfra ‘glomerular filtration rate’, ckdepia ‘Chronic Kidney Disease Epidemiology Collaboration’, nuchter ‘fasting’, umolla ‘micromole/l’, mdrd ‘Modification of Diet in Renal Disease’, mlmina ‘millilitres per minute’ |
| Family caregivers | +15.1 (12.3–17.9) | +2.2 (0–4.4) | dochter ‘daughter’, gesprek ‘conversation’, zoon ‘son’, zorgen ‘concerns’, leven ‘life’, hulp ‘help’, man ‘man’, kinderen ‘children’, dingen ‘things’, gaat ‘is going’ |
| Referral/streamlining diagnostic pathways | +11.8 (9.1–14.6) | +5.9 (4.4–7.5) | zorgdomein ‘digital platform for referrals’, verwezen ‘referred’, diagnostisch ‘diagnostic’, hospitaalweg ‘a street name’, laboratorium ‘laboratory’, dermatologie ‘dermatology’, orthopedie ‘orthopaedics’, heelkunde ‘surgery’, neurologie ‘neurology’, knoheelkunde ‘ear; nose; and throat (ENT)’ |
| Urinalysis mainly for urinary tract infections | +3.8 (1–6.6) | +0.2 (−1.7 to 2) | stick ‘test strip’, leucob ‘leukocytes’, erya ‘erythrocytes’, nitrstickneg ‘negative for nitrite’, erysa ‘leukocytes’, nitr ‘nitrate’, sed ‘sediment’, leukoa ‘leukocytes’, nitriet ‘nitrate’, urine ‘urine’ |
| Blood pressure and heart rate measurements | +2.9 (1.4–4.4) | −0.4 (−1.9 to 1) | diastrra ‘diastolic’, pols ‘pulse’, meting ‘measurement’, systrra ‘systolic’, gemdiastrra ‘average diastolic’, gema ‘average’, hartfreqa ‘heart rate’, systa ‘systolic’, rega ‘regular’, gemiddeld ‘average’ |
| Communication attempts (with patient) | +2.3 (1.1–3.6) | +0.1 (−1 to 1.1) | ingesproken ‘recorded or left a message’, voicemale ‘voicemail’, gehoor ‘hearing’, geprobeerd ‘tried’, bellen ‘calling’, vma ‘voicemail’, gebeld ‘called’, bereikbaar ‘reachable/available’, bereiken ‘to reach/to contact’, tel ‘phone’ |
| Communication between healthcare institutions | +3.0 (2.2–3.7) | +1.9 (1.3–2.5) | zda ‘healthcare facility’, zhinstellinga ‘hospital’, uitgaand ‘outgoing’, redenoverige ‘reason:other’, medisch ‘medical’, medspea ‘medical specialist’, huisartsen ‘general practitioners’, ingekomen ‘incoming’, redenlaboratorium ‘reason:laboratory’, zorgvragen ‘care questions’ |
| Physiotherapy services | +1.7 (1–2.5) | +0.6 (−0.2 to 1.4) | zorggroep ‘care group’, fysiotherapie ‘physiotherapy’, verwijsbrief ‘referral letter’, locatiefysa ‘location: physiotherapy’, fysioa ‘physiotherapist’, fta ‘physiotherapist’, buurmeisje ‘neighbour’s girl’, knieklachten ‘knee complaints’, mcna ‘physiotherapy center’, herhaalverwijskaart ‘repeat referral card’ |
| GP consultation and prescription | +0.6 (−0.3 to 1.6) | −0.9 (−1.6 to −0.2) | consult ‘consultation’, huisarts ‘general practitioner’, receptregels ‘prescription rules’, ongeacht ‘regardless’, herhalingsrecept ‘repeated prescription’, aantal ‘number’, poha ‘general practitioner nurse/assistant’ |
| Hospital admission/discharge | +0.8 (−0.2 to 1.8) | −1.6 (−2.5 to −0.7) | ontslag ‘discharge’, opnamegegevens ‘admission data’, opname ‘admission’, ziekenhuis ‘hospital’, ij ‘a hospital name’, geneeskunde ‘medicine’, inwendige ‘internal medicine’, jm ‘name of doctor’, specialisten ‘specialists’, aaj ‘name of doctor’ |
| Urological disorders/complications | +1.9 (1–2.7) | +0.2 (−0.7 to 1.1) | journaal ‘medical record’, algemeen ‘general’, prostaatklachten ‘prostate complaints’, enterovisicale ‘enterovesical’, gestoord ‘disturbed’, zorgprobleem ‘care problem’, mellitus ‘diabetes mellitus’, diabetes ‘diabetes mellitus’, uwia ‘urinary tract infection’, fistel ‘fistula’ |
| Cardiovascular and geriatric care | +2.1 (1.2–3.1) | +0.9 (0.3–1.6) | cvrma ‘cardiovascular risk management’, cgaa ‘comprehensive geriatric assessment’, noteren ‘to note’, aangevraagd ‘requested’, cvia ‘cerebrovascular accident’, gba ‘no remarkable findings’, lab ‘lab’, mza ‘medication monitoring’, prachtig ‘wonderful’, screening ‘screening’ |
| Pharmacy service 1 | +3.0 (1.8–4.3) | +0.6 (−0.2 to 1.4) | benu ‘name of a pharmacy chain’, kraaiennest ‘name of a shopping center’, samenvatting ‘summary’, apotheek ‘pharmacy’ |
| Pharmacy service 2 | −1.8 (−2.5 to −1.1) | +0.9 (0.4–1.4) | waterland ‘city name’, samenvatting ‘summary’, apotheek ‘pharmacy’, ondertekend ‘signed’, bakje ‘container’, oost ‘east’, patient ‘patient’ |
| Pharmacy service 3 | +1.2 (0.5–2) | −0.1 (−0.8 to 0.7) | blwia ‘upper respiratory tract infection’, samenvatting ‘summary’, nieuwendammer ‘a place’, minuten ‘minutes’, langer ‘longer’, consult ‘consultation’, huisarts ‘general practitioner’, apotheek ‘pharmacy’, vblwia ‘upper respiratory tract infection’, rra ‘blood pressure’ |
| Intermittent claudication and ocular issues | +0.1 (−0.4 to 0.5) | −0.7 (−1.2 to −0.2) | intermittens ‘intermittent’, claudicatio ‘claudication’, gordelroos ‘herpes zoster’, schouderklachten ‘shoulder complaints’, voorwandinfarct ‘anterior wall infarction’, erosie ‘erosion’, cornea ‘cornea’, schouder ‘shoulder’, corneaerosie ‘corneal abrasion’, wondroos ‘cellulitis’ |
| Respiratory tract infection and cardiovascular drugs | +1.1 (0.4–1.8) | 0 (−0.7 to 0.6) | luchtwegen ‘airways’, bovenste ‘uppermost’, acute ‘acute’, infectie ‘infection’, celiprolol ‘celiprolol’, diltiazem ‘diltiazem’, behandelaar ‘treatment provider’, candesartan ‘candesartan’, isosorbidemononitraat ‘isosorbide mononitrate’, temazepam ‘temazepam’ |
| Orthopaedics | +0.7 (0–1.4) | −0.2 (−0.6 to 0.3) | orthopaedie ‘orthopaedics’ |
| Assessing peripheral arterial disease | +0.7 (0.2–1.2) | −0.1 (−0.4 to 0.3) | eaia ‘ankle-brachial index’, enkelarmindex ‘ankle-brachial index’, abia ‘ankle-brachial index’, pava ‘peripheral arterial disease’, index ‘index’, meting ‘measurement’, enkelarm ‘ankle-arm’, sprake ‘presence’, doppler ‘doppler’, arm ‘arm’ |
| Communications in healthcare | +0.6 (0.2–1) | −0.1 (−0.3 to 0.2) | patintbrief ‘patient letter’, gewezen ‘pointed out’, gemaild ‘emailed’, meegegeven ‘handed over’, mailen ‘to email’, verwijsbrief ‘referral letter’, info ‘information’, uitleg ‘explanation’, suava ‘emergency department admission request’, scoort ‘scores’ |
| Peripheral oedema | +0.5 (−0.1 to 1.1) | −0.6 (−1.3 to 0.1) | oedeem ‘edema’, oedemen ‘edemas’, enkels ‘ankels’, onderbenen ‘lower leges’, benen ‘legs’, voeten ‘feet’, lipoedeem ‘lipedema’, enkelsbenen ‘ankles and legs’, sigmoidcarcinoom ‘sigmoid carcinoma’, armen ‘arms’ |
| Urolithiasis-related symptoms | −0.2 (−0.7 to 0.2) | +0.2 (0–0.5) | mictie ‘voiding’, klachten ‘complaints’, urolithiasis ‘urolithiasis’, braken ‘nausea’, anemie ‘anaemia’, infectie ‘infection’, hoofdpijn ‘headache’, buikpijn ‘abdominal pain’ |
| Cardiovascular risk management | +0.3 (0.1–0.6) | −0.1 (−0.4 to 0.1) | hypercholesterolemie ‘hypercholesterolemia’, administratie ‘administration’, achteruitgang ‘deterioration’, reca ‘recurrent’, decompensatio ‘decompensation’, apa ‘angina pectoris’, cholesterol ‘cholesterol’, algemeen ‘general’, pijnklachten ‘pain complaints’, tintelingen ‘tingling sensations’ |
Note. The sign of the slope represents the trend direction (+ for increase, − for decrease), while the magnitude indicates the steepness of the trend
aAbbreviation or acronym
bMisspelling
For the case and control groups, the linear trends of the topics were in direct contrast for 14 topics (see Figure 2). Specifically, for 12 topics there was a positive trend for the cases and a negative one for the controls (e.g. medication, blood pressure and heart rate measurements, GP consultation for prescription and hospital admissions/discharge). By the same token, the trend lines for two topics exhibited the opposite pattern (e.g. urolithiasis-related symptoms) where there was a negative trend for the cases and a positive one for the controls. The remaining 11 topics (see Figure 3) showed an upward trajectory trend for both groups, but the case group had a steeper slope (e.g. decision-making/treatment plan, renal care, family caregivers, referral/streamlining diagnostic pathways).
Figure 2.

The trend lines of the topics that showed inverse correlation trend in case and control groups.
Figure 3.

The trend lines of the topics that showed a steady increase in the steepness of the cases group.
Sensitivity analysis
The baseline characteristics of the cases and the six control samples are detailed in Supplementary Appendix 3, available in Age and Ageing online. The five independent random control samples mirrored the characteristics identified in the study’s main control sample (Table 1). After matching, the characteristics of the cases and controls group were similar. An overview of the clinical notes and the extract topics across the six control samples in the sensitivity analysis is given in Supplementary Appendixes 4 and 5, available in Age and Ageing online. After repeating our analysis on the six control samples, a consistent trend was observed for four topics in five samples, six topics in four samples and four topics in three samples, respectively (see Supplementary Appendix 6, available in Age and Ageing online).
Discussion
In this study, we employed NLP to analyse and examine longitudinal clinical notes of GPs to discover and track topics relevant to future fall risk among older adults and observe their trend over time. We discovered 25 topics serving as indicators of an imminent fall risk, exhibiting a clear trend of increasing difference in topic frequency between cases and controls over time.
Our study breaks new ground, being the first to examine the temporal change in trends of information extracted from clinical notes using NLP techniques, specifically in relation to falls in community-dwelling older adults. A strength of this study lies in its longitudinal design, distinguishing it from previous studies summarised in systematic reviews and meta-analyses, which typically measured fall-risk factors only once at baseline [3, 6–9]. Our design enabled us to track changes in potential risk factors over an extended period, aligning with the dynamic nature of falls as many risk factors may change over time. The utility of NLP has been previously explored in few studies to identify fall-risk factors [19] or to predict falls [29, 30]. Unlike the studies by [19, 30] who examined inpatient falls, our study predominantly focused on community falls and included a representative sample of community-dwelling older individuals. In addition, in contrast to these studies, we incorporated a temporal analysis to examine the time-related dimensions of information extracted from clinical notes.
Among the prevalent topics discovered in this study, some topics like ‘medications’, ‘renal care’ and ‘urologic conditions’ were found to pertain specifically to a particular medical domain and may imply identifiable risk factors for falls (e.g. fall-risk-increasing drugs, kidney disease and urinary incontinence) that are well established in the literature [1, 31–36]. The rising importance of these topics over time, as reflected by the number of clinical notes, may signify the emergence or worsening of medical conditions that foreshadow a fall risk. On the other hand, other topics were not domain specific and do not explicitly indicate risk factors, but suggest an increased care need. For example, the topics ‘referral/streamlining diagnostic pathways’, ‘communication between healthcare providers’ and ‘hospital admission/discharge’ may suggest a broad spectrum of underlying health conditions or complications involving specialised care, ongoing monitoring or progressing functional limitations, which can result in a higher fall risk. This could be seen as a warning sign of a future fall, a crisis that could possibly be avoided if measures are taken in time. For many topics, differences in intercept values and the immediate separation of the trend lines suggest that there are baseline differences between fallers and non-fallers and variances in baseline health status. The early divergence in the trends between these groups serves as a critical indicator of early variations in healthcare utilisation, suggesting possible onset or exacerbation of medical conditions within the fallers group.
Our sensitivity analysis revealed that the majority of the identified key topics were also prominent in over half of the control samples. Remarkably, the retained topics consistently followed the same trend as our primary analysis, regardless of whether random or matched control samples were employed. Although clinical baseline characteristics of the drawn random control samples were similar, we observed additional topics, with some being unique to specific control samples. This variation was expected considering the inherent differences in clinical note documentation. Of note, the absence of certain topics in some samples may indicate their non-documentation in the control group, suggesting their potential significance.
Most of the topics presented in this study were to a large extent coherent. However, we encountered a few topics that lacked sufficient cohesion to be interpreted in a single entity such as the topic ‘Respiratory tract infection/cardiovascular drugs’. We also found three topics most likely referring to pharmacy services in different geographical areas, which can be merged into a single topic. This can be partially explained by BERTopic’s mechanism of topic reduction to improve interpretability but results in dissimilar topics being placed together. Nevertheless, these topics were relatively less prevalent compared to the other topics.
Our results have implications for GPs and researchers. Although some of the discovered topics may suggest potential risk factors for falls, the progressive emergence of many topics over time can be interpreted in the context of a rise in healthcare utilisation and a general decline in health status that foreshadow fall events. Our findings reveal a progressive increase in the prominence of topics related to healthcare utilisation encompassing medication use, consultation rates, referrals to specialists and hospital admissions. Similarly, topics related to morbidities, family caregivers, unsuccessful attempts made by GPs to communicate with patients, and assessments targeting cardiovascular and geriatric aspects, all of which can serve as indicators for frailty and other geriatric syndromes. While our results do not replace traditional fall-risk assessment tools, they can serve as valuable prompts for clinicians to anticipate potential increase in an individual’s healthcare utilisation, encouraging them to conduct thorough assessments that identify underlying new or worsening conditions and intervene to improve health and reduce fall risk.
Our study is merely descriptive, offering a preliminary glimpse into future research in this complex area. It is crucial to approach our results with caution, as we emphasise that our study does not establish association or causal links between specific risk factors and falls. The topics described in this study reflect broad themes and do not specifically address individual risk factors. For example, the topic ‘medications’ does not clearly indicate the specific medication involved, whether it is newly prescribed or recurring. It remains plausible that this topic could be an indicator of polypharmacy or the overall health status of the patients. Nonetheless, our findings help generate hypotheses that can serve as a starting point for researchers to further deepen our understanding of the evolving impact of the fall-risk factors.
This study has limitations. First, our results pertain specifically to a primary care setting. Documentation practices and clinical approaches can differ among healthcare providers, regions and countries. Healthcare professionals may differ in how they record information, like medication prescriptions. For example, some may choose to consistently register ongoing medication use, even as the dosage increases, while others may not. Therefore, the generalisability of our results to other contexts is yet to be determined. Second, it is important to acknowledge the potential for overlooking falls that were either not documented or not reported by patients due to, for example, their minor nature [37]. Consequently, our study’s case group primarily comprised individuals seeking medical attention, reflecting a high-risk patient group as demonstrated in the baseline characteristics. Third, the research was constrained by available computational capacity, hindering the inclusion of all available patients in the analysis. However, we performed a sensitivity analysis to evaluate the robustness of the results.
Future research should prospectively investigate the associations between implicit potential risk factors identified in this study and falls, using appropriate statistical methods that account for time-related aspects. Moreover, evaluating the potential of incorporating temporal risk factor changes in the development of fall risk stratification tools is warranted. In addition, generalisability of our findings to other clinical settings and geographical locations should be established.
Supplementary Material
Acknowledgements:
The authors are grateful to all participating GPs and the data managers of the Academic General Practitioner’s Network at Academic Medical Center (AHA AMC) for their time and effort in contributing routine care data for this study.
Contributor Information
Noman Dormosh, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Aging and Later Life & Methodology, Amsterdam Public Health, Amsterdam, The Netherlands.
Ameen Abu-Hanna, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Aging and Later Life & Methodology, Amsterdam Public Health, Amsterdam, The Netherlands.
Iacer Calixto, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Methodology & Mental Health, Amsterdam Public Health, Amsterdam, The Netherlands.
Martijn C Schut, Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Department of Laboratory Medicine, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Methodology & Quality of Care, Amsterdam Public Health, Amsterdam, The Netherlands.
Martijn W Heymans, Department of Epidemiology and Data Science, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Methodology & Personalized Medicine, Amsterdam Public Health, Amsterdam, The Netherlands.
Nathalie van der Velde, Department of Internal Medicine, Section of Geriatric Medicine, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands; Aging and Later Life, Amsterdam Public Health, Amsterdam, The Netherlands.
Declaration of Conflicts of Interest:
None.
Declaration of Sources of Funding:
This work was supported by the Dutch Research Council (NWO) (grant number 628011026), the Hague, the Netherlands. The funder did not have any role or influence in study design analysis or reporting.
References
- 1. Montero-Odasso M, Velde N, Martin FCet al. World guidelines for falls prevention and management for older adults: a global initiative. Age Ageing 2022; 51. 10.1093/ageing/afac205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. EuroSafe . EuroSafe: injuries in the European Union, summary on injury statistics 2012-2014. EuroSafe 2014; 505–18. [Google Scholar]
- 3. Ganz DA, Latham NK. Prevention of falls in community-dwelling older adults. N Engl J Med 2020; 382: 734–43. [DOI] [PubMed] [Google Scholar]
- 4. James SL, Lucchesi LR, Bisignano Cet al. The global burden of falls: global, regional and national estimates of morbidity and mortality from the global burden of disease study 2017. Inj Prev 2019; 26: i166–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Burns ER, Stevens JA, Lee R. The direct costs of fatal and non-fatal falls among older adults—United States. J Safety Res 2016; 58: 99–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Deandrea S, Lucenteforte E, Bravi F, Foschi R, la Vecchia C, Negri E. Risk factors for falls in community-dwelling older people: a systematic review and meta-analysis. Epidemiology 2010; 21: 658–68. [DOI] [PubMed] [Google Scholar]
- 7. Jehu DA, Davis JC, Falck RSet al. Risk factors for recurrent falls in older adults: a systematic review with meta-analysis. Maturitas 2021; 144: 23–8. [DOI] [PubMed] [Google Scholar]
- 8. Li Y, Hou L, Zhao H, Xie R, Yi Y, Ding X. Risk factors for falls among community-dwelling older adults: a systematic review and meta-analysis. Front Med 2023; 9: 1019094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Pfortmueller CA, Lindner G, Exadaktylos AK. Reducing fall risk in the elderly: risk factors and fall prevention, a systematic review. Minerva Med 2014; 105: 275–81. [PubMed] [Google Scholar]
- 10. Kuo FL, Yen CM, Chen HJ, Liao ZY, Lee Y. Trajectories of mobility difficulty and falls in community-dwelling adults aged 50+ in Taiwan from 2003 to 2015. BMC Geriatr 2022; 22: 902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Miyashita T, Tadaka E, Arimoto A. Cross-sectional study of individual and environmental factors associated with life-space mobility among community-dwelling independent older people. Environ Health Prev Med 2021; 26: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Yun HY. Environmental factors associated with older adult’s walking behaviors: a systematic review of quantitative studies. Sustainability 2019; 11: 3253. [Google Scholar]
- 13. Katsimpris A, Linseisen J, Meisinger C, Volaklis K. The association between polypharmacy and physical function in older adults: a systematic review. J Gen Intern Med 2019; 34: 1865–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Manias E, Kabir MZ, Maier AB. Inappropriate medications and physical function: a systematic review. Ther Adv Drug Saf 2021; 12: 204209862110303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Calderón-Larrañaga A, Vetrano DL, Ferrucci L. et al. Multimorbidity and functional impairment–bidirectional interplay, synergistic effects and common pathways. J Intern Med 2019; 285: 255–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Turunen KM, Kokko K, Kekäläinen Tet al. Associations of neuroticism with falls in older adults: do psychological factors mediate the association? Aging Ment Heal 2022; 26: 77–85. [DOI] [PubMed] [Google Scholar]
- 17. Kim T, Choi SD, Xiong S. Epidemiology of fall and its socioeconomic risk factors in community-dwelling Korean elderly. PloS One 2020;15: e0234787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Petersen N, König HH, Hajek A. The link between falls, social isolation and loneliness: a systematic review. Arch Gerontol Geriatr 2020; 88: 104020. [DOI] [PubMed] [Google Scholar]
- 19. Bjarnadottir RI, Lucero RJ. What can we learn about fall risk factors from EHR nursing notes? A text mining study. EGEMS (Wash DC) 2018; 6: 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Kreimeyer K, Foster M, Pandey Aet al. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform 2017; 73: 14–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Churchill R, Singh L. The evolution of topic modeling. ACM Comput Surv 2022; 54: 1–35. [Google Scholar]
- 22. Blei DM, Ng AY, Jordan MT. Latent Dirichlet allocation. Adv Neural Inf Process Syst 2002; 3: 993–1022. [Google Scholar]
- 23. Blei DM, Lafferty JD. Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning—ICML ‘06. Pittsburgh Pennsylvania. New York, NY, USA: ACM Press, 2006, 113–20. [Google Scholar]
- 24. Grootendorst M. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv preprint arXiv:2203.05794, 2022.
- 25. Dormosh N, Schut MC, Heymans MW, van der Velde N, Abu-Hanna A. Development and internal validation of a risk prediction model for falls among older people using primary care electronic health records. J Gerontol Ser A 2021; 77: 1438–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2022.
- 27. Verkijk S, Vossen P. MedRoBERTa.nl: a language model for Dutch electronic health records. Comput Linguist Netherlands J 2021; 11: 141–59. [Google Scholar]
- 28. Ho DE, Imai K, King Get al. MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw 2011; 42: 1–28. [Google Scholar]
- 29. Dormosh N, Schut MC, Heymans MWet al. Predicting future falls in older people using natural language processing of general practitioners’ clinical notes. Age Ageing 2023; 52. 10.1093/ageing/afad046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Nakatani H, Nakao M, Uchiyama H, Toyoshiba H, Ochiai C. Predicting inpatient falls using natural language processing of nursing records obtained from Japanese electronic medical records: case-control study. JMIR Med Informatics 2020; 8: e16970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Moon S, Chung HS, Kim YJet al. The impact of urinary incontinence on falls: a systematic review and meta-analysis. PloS One 2021; 16: e0251711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Goto NA, Weststrate ACG, Oosterlaan FMet al. The association between chronic kidney disease, falls, and fractures: a systematic review and meta-analysis. Osteoporos Int 2020; 31: 13–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. López-Soto PJ, De Giorgi A, Senno Eet al. Renal disease and accidental falls: a review of published evidence. BMC Nephrol 2015; 16: 176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Vries M, Seppala LJ, Daams JGet al. Fall-risk-increasing drugs: a systematic review and meta-analysis: I. Cardiovascular drugs. J Am Med Dir Assoc 2018; 19: 371.e1–9. [DOI] [PubMed] [Google Scholar]
- 35. Seppala LJ, Wermelink AMAT, Vries Met al. Fall-risk-increasing drugs: a systematic review and meta-analysis: II. Psychotropics. J Am Med Dir Assoc 2018; 19: 371.e11–7. [DOI] [PubMed] [Google Scholar]
- 36. Seppala LJ, Glind EMM, Daams JGet al. Fall-risk-increasing drugs: a systematic review and meta-analysis: III. Others. J Am Med Dir Assoc 2018; 19: 372.e1–8. [DOI] [PubMed] [Google Scholar]
- 37. Stevens JA, Ballesteros MF, Mack KA, Rudd RA, DeCaro E, Adler G. Gender differences in seeking care for falls in the aged medicare population. Am J Prev Med 2012; 43: 59–62. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
