Skip to main content
PLOS One logoLink to PLOS One
. 2022 Aug 25;17(8):e0268281. doi: 10.1371/journal.pone.0268281

Text mining in long-term care: Exploring the usefulness of artificial intelligence in a nursing home setting

Coen Hacking 1,2,*, Hilde Verbeek 1,2, Jan P H Hamers 1,2, Katya Sion 1,2, Sil Aarts 1,2
Editor: Haoran Xie3
PMCID: PMC9409502  PMID: 36006921

Abstract

Objectives

In nursing homes, narrative data are collected to evaluate quality of care as perceived by residents or their family members. This results in a large amount of textual data. However, as the volume of data increases, it becomes beyond the capability of humans to analyze it. This study aims to explore the usefulness of text mining approaches regarding narrative data gathered in a nursing home setting.

Design

Exploratory study showing a variety of text mining approaches.

Setting and participants

Data has been collected as part of the project ‘Connecting Conversations’: assessing experienced quality of care by conducting individual interviews with residents of nursing homes (n = 39), family members (n = 37) and care professionals (n = 49).

Methods

Several pre-processing steps were applied. A variety of text mining analyses were conducted: individual word frequencies, bigram frequencies, a correlation analysis and a sentiment analysis. A survey was conducted to establish a sentiment analysis model tailored to text collected in long-term care for older adults.

Results

Residents, family members and care professionals uttered respectively 285, 362 and 549 words per interview. Word frequency analysis showed that words that occurred most frequently in the interviews are often positive. Despite some differences in word usage, correlation analysis displayed that similar words are used by all three groups to describe quality of care. Most interviews displayed a neutral sentiment. Care professionals expressed a more diverse sentiment compared to residents and family members. A topic clustering analysis showed a total of 12 topics including ‘relations’ and ‘care environment’.

Conclusions and implications

This study demonstrates the usefulness of text mining to extend our knowledge regarding quality of care in a nursing home setting. With the rise of textual (narrative) data, text mining can lead to valuable new insights for long-term care for older adults.

Introduction

Patient perspectives have assumed a central role in various healthcare settings in the assessment of the quality of care [1,2]. For example, in nursing homes, the perspectives of residents, their family members, and care professionals are seen as an important prerequisite for the improvement of quality of care [3]. To obtain more in-depth information regarding the quality of care so that its essential features can be further explored, narrative data are collected. These data often contain not only the experiences of residents, but also information regarding their engagement, satisfaction, and quality of life [2,4]. Narrative data can be defined as any data that consist of stories concerning the lives of individuals (e.g., of residents) [5]. Examples of narrative data are interviews regarding the experienced quality of care or stories about the experience of older adults during the COVID-19 pandemic [6,7]. In addition, certain websites (e.g., Zorgkaart Nederland) allow long-term care receivers to post opinions on the quality of their care and their lives.

To date, several narrative data methods have been developed to evaluate the quality of care and the quality of life of residents [810]. In a research context, these often involve audio recordings and verbatim transcriptions thereof. Hence, these methods result in large amounts of unstructured textual data that cannot be analyzed manually. Ordinarily, they are analyzed using coding. This involves summarizing participants’ quotes in several words that capture their essence [11,12]. It is a very time-consuming and tedious task and requires researchers to take a consistently objective approach. Consequently, large amounts of textual data regarding the quality of care gathered daily in nursing homes are only analyzed on a limited scale and with a limited scope. Since the amount of data using narrative methods is only expected to rise because of its expanding importance in science and health care in general, innovative analytical approaches are required.

Computers can process large quantities of data and deliver highly consistent results. Computerized methods, such as those found in data science, could offer a possible solution to overcome the difficulties of manual coding. Data science is a field aimed at extracting knowledge and insights from all kinds of data [13]. Text mining is a sub-category of data science that is directed at the retrieval, extraction, and synthesis of information from text [14]. Text mining includes methods such as frequency distributions, clustering, sentiment analysis, and visualization [14,15]. In contrast with manual analysis, text mining involves processes and methods that analyze textual data automatically. As a result, it is often used to go beyond the scope of specific projects. It is not intended to replace manual coding analyses, but it does provide a novel way to analyze large amounts of text.

Text mining is already being used to gain insights into the quality of care. For example, it is used in hospitals to process health care claims; to group medical records by patients’ symptoms [16,17] or to predict the number of hospital admissions at an emergency department to avoid overcrowding [18]. A more recent study showed an automated method for extracting clinical entities, such as treatment, tests, drugs and genes, from clinical notes [19]. Another study discussed how text mining can be used to assess the sentiment in tweets towards the COVID 19 pandemic [20]. These types of applications suggest that text mining could also be useful in processing narrative data in long-term care for older adults to acquire novel insights into the quality of care and quality of life. Therefore, the present study aimed to explore the usefulness of text mining approaches to textual data gathered in nursing home care.

Methods

Several text mining methods were applied to examine narrative data collected within a nursing home setting. These narrative data consisted of interviews with residents of nursing homes, their family members, and their professional caregivers (i.e., triads). To enable automatic processing, all interviews were transcribed verbatim. Several preprocessing steps were applied. A variety of text mining analyses were conducted, including analysis of individual word frequencies and bigram frequencies, correlation analysis, and sentiment analysis. These are methods commonly used to gain insights into textual data [21].

Data collection

The data were collected as part of the project ‘Connecting Conversations, which assesses the quality of care by conducting separate interviews with residents of nursing homes, family members, and professional caregivers (i.e., triads) [3,4]. The underlying principles of ‘Connecting Conversations’ are ‘appreciative inquiry’ and ‘relationship-centeredness’ [3]. Appreciative inquiry means that a positive approach is used to focus on what is already good and helpful and to do it more frequently. Relationship-centered care means that the impact of relationships is integral to the process and outcomes of the care experience [22]. It has been shown that appreciative inquiry does not necessarily lead to a more positive conversation [23]. A total of n = 125 interviews were conducted at 5 different care organizations in the south of the Netherlands [8]. “The medical ethical committee of Zuyderland (the Netherland) approved the study protocol (17-N-86).” Information about the study was provided to all interviewers, residents, family members and caregivers by an information letter. All participants provided written informed consent: residents with legal representatives gave informed consent themselves (as well as their legal representatives) before and during the conversations.

A survey was conducted to establish a sentiment analysis model tailored to long-term care. Respondents were presented with a list of sentences that were randomly selected from the transcripts of ‘Connecting Conversations. These sentences had to be assigned according to their sentiment: positive, neutral, or negative. The survey was anonymous, though participants were asked to provide some demographic details (i.e., age, gender, place of residence, and level of education). Participants were allowed to fill in the questionnaire at different times, and each time they were presented with different sentences.

Data preprocessing

Preprocessing is a critical step in text mining. It involves the removal of any noisy and unreliable data from the original dataset [15]. Noise is erroneous information that makes the data more difficult to interpret [24]. An example of noisy data could be stuttering during a phone call or misspelled words in transcribed interviews. A lack of preprocessing could yield erroneous results. The product of data preprocessing is the final data set, which can then be analyzed for meaningful information. Preprocessing and analysis were performed using R and Python. R is a free software package for statistical computing and graphics [25], while Python is a free software package for general-purpose computing [26]. The code for all the analyses can be found at https://zenodo.org/record/6675839.

To perform text mining analysis such as frequency distributions, correlation analysis, and sentiment analysis, the data were preprocessed in several steps: 1) all transcribed interviews were exported from Word files to Excel files, which were then loaded into R; 2) Words uttered by the interviewer were excluded from all the transcripts so that the results would not include them; 3) since many common words have no special significance (e.g., the, a, and is), removal of these so-called stop words was conducted by using a predefined list consisting of the most common Dutch stop words (n = 100); 4) since words in Dutch containing two letters are often stop words and/or non-informative, the minimum word length (i.e., the minimum number of characters that constitutes a word) was set to three; 5) a stemming approach was conducted to increase statistical significance for the various text mining methods. This approach groups words that refer to the same concept (e.g., nurses -> nurse, tummy -> stomach). It is similar to word embedding. Stemming is applied by computationally identifying the plurals and diminutives and reducing them to their root [27,28]. This rule-based approach is more conservative than word embedding approaches that group more generally related words (e.g., mother -> family, nurse -> care) [29,30].

Data analysis

Word frequencies

To gain an initial understanding of the text, a frequency plot was conducted to visualize the individual words used most frequently in all the interviews (i.e., unigrams). Frequency plots can often be described by Zipf’s law, which states that for any piece of text based on natural language, the frequency of any word is inversely proportional to its rank in the frequency distribution [15]. The (n = 50) words used with the highest frequency in the interviews are displayed in the frequency plots.

Bigram plots were also conducted. Bigrams [15] are combinations of two consecutive words such as ‘very good. The sentence ‘I love it here’ could therefore be split up into the bigrams ‘I love,’ ‘love it, and ‘it here. Bigrams can play a crucial role in text classification since they can capture the meanings of words that are not present when analyzing unigrams. For example, the word ‘good’ has a different meaning when preceded by the word ‘not. The 50 most frequent bigrams are displayed in terms of their relative frequency for all interviews with residents, family members of residents, and care professionals, respectively.

Correlation analysis

It is important to understand which words co-occur in the interviews with residents and family members, residents and care professionals, and care professionals and family members, respectively. Correlation analyses were therefore conducted to assess the correlation (i.e., co-occurrence) of words across the three groups. The words are displayed as a collection of data points, divided over three scatter plots. A log transformation was used for the x and y-axes to account for the skewness of the data (i.e., only some words occurred very frequently) [31]. Three bilateral Pearson correlation coefficients (r) were assessed for residents and family members, residents and care professionals, and care professionals and family members, respectively.

Sentiment analysis

Sentiment analysis is the process of computationally identifying sentiment expressed in a piece of text [14,15,32]. For example, the sentence ‘It’s a good day’ could be identified as being positive, while the sentence ‘It’s a bad day’ could be identified as being negative: the sentence ‘Today I went for a walk’ could be neutral, as it does not convey whether the walk is experienced as a positive or negative event.

Previous sentiment analysis models have been based on the general Dutch language [33]. However, as certain words can have a different meaning in a nursing home setting, the sentiment behind these words can also be different. For example, there is a negative sentiment behind the word ‘plassen’ (‘peeing’), whereas in the general Dutch language this world would be considered neutral. Therefore, a general sentiment model may not be suitable for analysis in long-term care.

Using the results of the survey, the sentiment for all occurring unigrams (single words such as ‘good’ or ‘very’) and bigrams was calculated as a value between (-1) and (1). A value of (-1) denoted the most negative sentiment, a value of (1) denoted the most positive sentiment, and a value of (0) denoted a neutral sentiment. The calculation was based on how frequently a word was used in a positive or negative context. For example, a word that occurred 9 times as a positive word and once as a negative word was given a sentiment of 0.8 ((9 * 1 + 1 * -1) / 10 = 0.8). Since any sentiment larger than 0 could be defined as positive, a sentiment of 0.8 was positive. This analysis was carried out for all groups separately: residents (n = 39), family members (n = 37), and care professionals (n = 49), to show the difference between the three groups. The words that were used most often by each group were plotted with the corresponding sentiment of each word.

To show a broad overview of the differences in the overall experienced quality of care between the different groups, sentiment analysis on each interview was conducted [29]. This was illustrated by plotting the proportion of positive, negative, and neutral sentiments for each interview using a ternary plot [34]. A ternary plot is a triangular plot capable of displaying three variables that sum up to the same value [30]. In the case of the present sentiment analysis, the percentages of positive, negative, and neutral sentiments in every interview summed up to 100%.

Topic clustering analysis

Topic clustering is the process of identifying topics (i.e. themes) in text segments and clustering (i.e. grouping) them together based on those topics [35,36]. Clustering is similar to a qualitative coding process using a grounded approach, as topics are discovered without any prior knowledge [12]. To discover topic clusters within interviews, several steps were conducted. Firstly, for each utterance in the interviews, keywords were extracted using part-of-speech (POS) tagging [37,38]. POS tagging can identify whether a word is a noun (e.g. nurse, room), a proper noun (e.g. Sarah, Maastricht), a verb (e.g. walking, knitting), or any other type. By extracting nouns as keywords, it is possible to discover overarching semantic topics. Secondly, word2vec was used to calculate similarities in words [29,30]. Word2vec is an algorithm that creates a vector (i.e. a point in high-dimensional space) for each word in such a way that words that are similar are located together [29]. Lastly, k-means was used to create k clusters of keywords [35]. For example, with k = 2, two clusters could be discovered that correspond to topics ‘food’ and ‘family’. A value for k was calculated using the elbow method [36]. The elbow method balances between having many clusters which are too specific and having few clusters which are too general. For each cluster a topic name was manually assigned, based on the keywords belonging to that specific cluster. For example, if a cluster contained keywords such as ‘mother’, ‘father’ and ‘daughter’, the topic description was formulated as ‘family’.

Results

In total, 125 interviews were analyzed: 39 with residents, 37 with family members, and 49 with staff. A total of n = 202,986 words were uttered. Residents uttered 284.9 words per interview, family members 362.1 words, and care professionals 548.7 words.

Word frequencies

Fig 1 shows the distribution of the most frequently used words in the interviews. A typical Zipf’s law pattern is visible, indicating that the most commonly used words accounted for almost half of all the words in the interviews [15]. The frequency of any word was inversely proportional to its rank in the frequency distribution (i.e., the rank-frequency distribution was an inverse relation). The word ‘goed’ (‘good’) was among the most frequently used words, indicating that the interviewees referred to many positive aspects. Moreover, words such as ‘eten’ (‘food’) and ‘moeder’ (‘mother’), provided insights into topics that participants perceived as important aspects of quality care. The word ‘food’ was also mentioned frequently, suggesting that eating was another important topic. The frequent use of the word ‘mother’ may refer to such residents being mentioned by family members. For example, residents’ family members often referred to residents as their ‘mother, indicating that the individual concerned was often a woman.

Fig 1. First (n = 50) most frequently occurring unigrams across all interviews.

Fig 1

Since bigrams often contain more contextual information than unigrams, a bigram plot was conducted (Fig 2). The most frequently used bigrams included ‘heel erg’ (‘very’), ‘heel goed’ (‘very well’), ‘wel goed’ (‘good’), ‘wel heel’ (‘very well’), and ‘wel leuk’ (‘nice’). The bigram ‘heel erg’ (very) was the most common bigram across all groups. This was used both negatively (e.g. ‘very bad’) and positively (e.g., ‘very good’). A sensitivity analysis of the words that proceeded this particular bigram revealed that the top 3 were ‘goed’ (‘good, n = 9), ‘leuk’ (‘nice, n = 7) and ‘tevreden’ (‘satisfied, n = 7). By contrast, the word ‘slechts’ (‘bad’) only proceeded the bigram ‘heel erg’ (‘very’) once.

Fig 2. First (n = 50) most frequently occurring bigrams across all interviews.

Fig 2

Correlation

Fig 3 shows the correlation plots between word usage among the three groups, where each point represents a word. For some points, the English translation of the corresponding word is also shown. All three plots are similar. First, they each display an uphill pattern which is indicative of a positive relationship between the two variables. Second, the majority of words form a pattern around the red diagonal line, resulting in a high correlation coefficient − r = 0.91, r = 0.83, and r = 0.92, respectively (i.e., word usage was largely similar across all groups). The further away the points are from the red diagonal line, the more different the word usage between groups and perhaps each group’s perception of the quality of care. For example, words such as ‘hobby’ (‘hobby’), and ‘dochter’ (‘daughter’) occurred more frequently in the interviews with the residents than with family members. This suggests that these subjects were more important to the residents, either negatively or positively. Family members used words such as ‘instelling’ (‘institution’) and ‘zaterdag’ (‘Saturday’) more frequently.

Fig 3.

Fig 3

Bilateral correlations between the words of the 3 groups, respectively: (a) residents and family members; (b) residents and care professionals; and (c) care professionals and family members.

Sentiment analysis

A total of 234 participants assigned a sentiment to 11,519 sentences, which was 56% of the total. The mean age of all participants was 41 (SD: 13.7). Of all participants, 71% were women. Sixty-seven percent of the participants had at least a master’s degree, while 21% had a bachelor’s degree; 67% said that they were currently living in the south of the Netherlands.

A scatter plot of the top 40 most commonly occurring words for residents, care professionals, and family members is displayed (Fig 4). The x-axis shows the sentiment value for each word between -1 (most negative) and 1 (most positive): the y-axis displays the frequency with which these words occurred. Many of the most common words were similar between residents, their family members, and care professionals. The words ‘wel’ (‘well’ or ‘quite’), ‘heel’ (‘very’) and ‘goed’ (‘good’) occurred with very high frequency across all groups, but the sentiment was weakly positive. Words such as ‘leuk’ (‘nice’) and ‘fijn’ (‘nice’) occurred with high frequency with a strong positive sentiment; ‘muziek’ (‘music’) and ‘activiteiten’ (‘activities’) occurred with high frequency with a weakly positive sentiment.

Fig 4.

Fig 4

Sentiment analyses: Scatter plots of positive and negative words used most often by (a) residents; (b) family members; and (c) care professionals.

Sentiment analysis in triads

To illustrate the expressed sentiments of residents, family members, and care professionals, a ternary plot was created (Fig 5). It is 3-dimensional, with a positive, negative, and neutral axis. Each point in the triangle represents either a resident (red), a family member (blue), or a care professional (green). As can be seen, most conversations are closely grouped together and slightly above the absolute middle, meaning that they are mostly neutral and almost equally positive and negative. Residents represent the densest group, implying that they expressed rather similar sentiments, that is, neutral, with equal amounts of positive and negative sentences. Family members have a lower density, implying that they expressed a slightly more diverse range of sentiments; this group was on average more negative regarding the quality of care than the other groups. Care professionals cover the largest area and thereby were the most diverse in terms of sentiment expressed; this group was on average more positive regarding the quality of care than the other groups.

Fig 5. An overview of the ratios (positive, negative and neutral) for all transcripts.

Fig 5

Topic clustering analysis

To the topic clustering analysis is displayed in Fig 6; each keyword is represented as a dot. The axes have no real-world meaning; they are artificially created to highlight the difference between clusters (i.e. topics) [39]. Only the distance between dots has meaning: dots, and thus keywords, which are closer together are semantically more similar compared to dots that are further apart [29,39]. Dots with the same color belong to the same cluster. Although most keywords belong to one overarching cluster ‘quality of care; (which was expected, as all keywords are related to experienced quality of care in a nursing home setting), utterances still show nuanced differences, leading to the discovery of 12 different, but related topics.

Fig 6. A visualization of the topic clustering analysis.

Fig 6

Clusters are represented by numbers which correspond with the numbers in Table 1.

For each different cluster the topic name, the most important keywords and the number of sentences that belong to that topic are displayed (see Table 1). Certain clusters are well-defined such as ‘health’ and ‘food’, while others display more overlap, such as ’care environment’. This corresponds with Fig 6, as it shows certain clusters overlap very little with other clusters. The clusters ‘relations‘, ‘time’ and ‘life experiences’ are the clusters with the most occurring keywords.

Table 1. Topics from the cluster analysis including the corresponding number of keywords occurring in the interviews (n = 125).

# Topic name Example keywordsa # Occurrences
1 Relations Daughter, mother, family, children, man 1995
2  Activities Flower arranging, television, Christmas, music, movie 996
3 Time Time, start, hour, day, afternoon, week 2175
4 Care organization Decision, matter, system, organizations, privacy 1072
5 Daily experiences Meaning, moment, experience, progression, private 525
6 Physical nursing home environment Room, neighborhood, door, ambulance, walker 1106
7  Health Parkinson, hallucinations, miscarriage, eye drops, incident 423
8  Food Dinner, desert, coffee, sandwich, potatoes 527
9 Life experiences Life, moment, story, feeling, event 1827
10  Care environment Nursing home, care, education, help, somatic 1432
11 Physical Appearance Sewing machine, toe, hands, clothes, nightgown 860
12 Miscellaneous A little, word, small error, things, remainder 868

a Words were translated from Dutch into English. Only the English words are displayed.

Discussion

The present study aimed to explore the usefulness of text mining approaches regarding narrative data gathered in a nursing home setting. The textual information that was automatically gathered from the 125 interviews generated novel insights into the quality of care.

The results showed that the word ‘good’ was among the most frequently used words in the interviews, which could indicate that, in general, participants had a positive experience of care. However, it might be argued that individual words have different meanings when preceded by different words (e.g., the word ‘good’ can be preceded by the word ‘not’ or ‘very, thereby giving it a different sense). Hence, bigrams (i.e., groups of two consecutive words) were analyzed, and this revealed that the word ‘good’ was often preceded by adjectives, indicating magnitude (e.g., ‘very good’ or ‘very nice’). These word combinations frequently occurred in the interviews in all three groups, indicating positivity towards the quality of care. Previous research has demonstrated that, when conducting a manual sentiment analysis, words such as ‘good’ are indicative of a positive experience regarding quality of care [8]. Correlation analysis showed that the same set of words were used by residents, their family members, and care professionals when discussing the quality of care. These findings imply that the three groups talked about similar topics when discussing the substantive issue. Sentiment analysis highlighted several positive words, including ‘muziek’ (‘music’) and ‘activiteiten’ (‘activities’). Because these words were used frequently in the interviews, it may be inferred that ‘music’ and ‘activities’ were regarded as important criteria for judging the quality of care [40]. In addition, sentiment analysis showed that the majority of interviews expressed mainly neutral sentiments, though the care professionals were more diverse and positive in their sentiments compared with the other groups [8]. This finding is underscored by previous research that has shown that, in general, care professionals are often more positive than residents or residents’ family members concerning the quality of care the residents receive [8,41]. A topic clustering analysis yielded a variety of topics: while some topics were very clearly defined, including topics such as ‘food’ and ‘health’, others were less clearly defined (e.g. ‘miscellaneous’). The large number of occurring keywords related to the topics ‘relations’, ’life experiences’ and ‘care environment’ not only highlights the importance of these two topics in relation to experienced quality of care within a nursing home setting [3,8,42,43], but also underscores the validity of the text-mining approach.

The present study is the first to assess quality of care in a long-term care context by analyzing qualitative data through text mining. Making use of the vast amount of text in this way has given a voice to residents, their family members, and care professionals working in nursing homes. However, the study also has several possible limitations. Firstly, the word and bigram frequency analyses only contain absolute numbers. This analyses still contains words and bigrams which have little significance, i.e. words such as ‘well’ or ‘quite’, which are less informative. Another limitation is the explainability of the sentiment analysis model which is a ‘deep learning’ model. Deep learning is the optimization of large models for tasks such as sentiment analysis [32,37]. While sentiment analysis conducted using a deep learning method often results in more accurate results compared to machine learning models, these latter models can be explained more easily, as unigrams or n-grams (sequences of words) correspond with a certain sentiment prediction. With deep learning models, sentiment can be based on how every word relate to every other word in a text segment. These word relations are calculated from large text datasets and involves many abstract values [37,43].

Future work

Future research could focus on combining narrative data with more quantitative measures related to, for example, the prevalence of care problems [44]. This could be achieved by using a text mining approach and various predictive algorithms. It would then be possible to relate narrative data regarding the quality of care to particular care issues (e.g. incontinence and malnutrition), thereby providing a more comprehensive view of quality of care.

Future research could also aim at further exploring the text-mining approaches used in the current study. By comparing the text mining approach against the current gold standard of manual coding, the text-mining approach could not only be validated but perhaps also improved. For example, by further improving topic clustering, it may become possible to automate the processes of qualitative coding (i.e. the analyses of qualitative data). As a consequence, analyzing qualitative data may become less time-consuming and more objective.

Conclusion

To make use of the ever-growing amount of textual data related to the quality of care in long-term older persons’ care, innovative and efficient methods are needed. The present study demonstrates the usefulness of a text mining approach to extend our knowledge regarding the quality of care in a nursing home setting. With the shift to more collections of textual (narrative) data, text mining in long-term elderly care can lead to valuable new insights that would not have been found using manual analysis.

Acknowledgments

The authors would like to thank all the contributors of Python, R and the R and Python packages that were used for this study (a full list can be found at https://zenodo.org/record/6675839). Especially thanks to the contributors of the following packages: ‘ggplot package’, ‘tm package’, ‘stopwords package’, ‘Ternary package’, and the R-markdown package.

Data Availability

The code, data and models discussed in the article will be made available at https://zenodo.org/record/6675839. Our interview data will not be publicly available due to the privacy of our participants. While certain privacy-related information was removed from the transcripts (e.g. names of participants, living addresses, room numbers and other personally identifiable information), the stories that our participants tell are often of a personal nature. Upon request, our interview data can be provided with restrictions. For researchers who meet the criteria for access to confidential data. the data are available from the Living Lab in Ageing and Long-Term Care. Website: https://www.awolimburg.nl/, Email: ouderenzorg@maastrichtuniversity.nl, Phone: +31 (0)43 38 81570. Visiting Address: Duboisdomein 30 6229 GT Maastricht Postal Address: Academische Werkplaats Ouderenzorg Limburg Maastricht University Vakgroep Health Services Research - DUB 30 Postbus 616 200 MD Maastricht.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Pols J. Enacting appreciations: Beyond the patient perspective. Health Care Anal. 2005;13: 203–221. doi: 10.1007/s10728-005-6448-6 [DOI] [PubMed] [Google Scholar]
  • 2.Sion K, Verbeek H, Vries E de, Zwakhalen S, Odekerken-Schröder G, Schols J, et al. The feasibility of connecting conversations: A narrative method to assess experienced quality of care in nursing homes from the resident’s perspective. Int J Env Res Pub He. 2020;17: 5118. doi: 10.3390/ijerph17145118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sion KY, Haex R, Verbeek H, Zwakhalen SM, Odekerken-Schröder G, Schols JM, et al. Experienced quality of post-acute and long-term care from the care recipient’s perspective–a conceptual framework. JAMDA. 2019;20: 1386–1390. doi: 10.1016/j.jamda.2019.03.028 [DOI] [PubMed] [Google Scholar]
  • 4.Sion KY, Verbeek H, Boer B de, Zwakhalen SM, Odekerken-Schröder G, Schols JM, et al. How to assess experienced quality of care in nursing homes from the client’s perspective: Results of a qualitative study. BMC Geriatr. 2020;20: 1–12. doi: 10.1186/s12877-020-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Flick U. The SAGE handbook of qualitative data collection. Sage Publications; 2018. pp. 264–279. doi: 10.4135/9781526416070 [DOI] [Google Scholar]
  • 6.Sion KY, Verbeek H, Zwakhalen SM, Odekerken-Schröder G, Schols JM, Hamers JP. Themes related to experienced quality of care in nursing homes from the resident’s perspective: A systematic literature review and thematic synthesis. GGM. 2020;6. doi: 10.1177/2333721420931964 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Club V. The experiences of seniors during the corona/covid-19 crisis. 2020. Available from: https://www.leydenacademy.nl/the-experiences-of-seniors-during-the-corona-covid-19-crisis/. [Google Scholar]
  • 8.Sion K, Verbeek H, Aarts S, Zwakhalen S, Odekerken-Schröder G, Schols J, et al. The validity of connecting conversations: A narrative method to assess experienced quality of care in nursing homes from the resident’s perspective. Int J Env Res Pub He. 2020;17: 5100. doi: 10.3390/ijerph17145100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Scheffelaar A, Hendriks M, Bos N, Luijkx K, Van Dulmen S. Protocol for a participatory study for developing qualitative instruments measuring the quality of long-term care relationships. BMJ Open. 2018;8. doi: 10.1136/bmjopen-2018-022895 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Scheffelaar A, Janssen M, Luijkx K. The story as a quality instrument: Developing an instrument for quality improvement based on narratives of older adults receiving long-term care. Int J Env Res Pub He. 2021;18. doi: 10.3390/ijerph18052773 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Strauss A, Corbin J. Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage Publications; 1990. [Google Scholar]
  • 12.Strauss AL, Corbin JM. Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage Publications; 1998. [Google Scholar]
  • 13.Dhar V. Data science and prediction. Commun ACM. 2013;56: 64–73. [Google Scholar]
  • 14.Hotho A, Nürnberger A, Paaß G. A brief survey of text mining. LDV Forum. 2005;20: 19–62. [Google Scholar]
  • 15.Hofmann M, Chisholm A. Text mining and visualization: Case studies using open-source tools. CRC Press; 2016. [Google Scholar]
  • 16.Popowich F. Using text mining and natural language processing for health care claims processing. SIGKDD Explor. 2005;7: 59–66. [Google Scholar]
  • 17.Raja U, Mitchell T, Day T, Hardin JM. Text mining in healthcare. Applications and opportunities. J Healthc Inf Manag. 2008;22: 52–6. [PubMed] [Google Scholar]
  • 18.Lucini FR, Fogliatto FS, Silveira GJ da, Neyeloff JL, Anzanello MJ, Kuchenbecker RS, et al. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform. 2017;100: 1–8. doi: 10.1016/j.ijmedinf.2017.01.001 [DOI] [PubMed] [Google Scholar]
  • 19.Moqurrab SA, Ayub U, Anjum A, Asghar S, Srivastava G. An accurate deep learning model for clinical entity recognition from clinical notes. IEEE Journal of Biomedical and Health Informatics. 2021;25: 3804–3811. doi: 10.1109/JBHI.2021.3099755 [DOI] [PubMed] [Google Scholar]
  • 20.Azeemi AH, Waheed A. COVID-19 Tweets Analysis through Transformer Language Models. arXiv preprint arXiv:2103.00199. 2021. Feb 27. Available from https://arxiv.org/pdf/2103.00199. [Google Scholar]
  • 21.Abualigah L, Alfar HE, Shehab M, Hussein AMA. Sentiment analysis in healthcare: A brief review. Recent advances in NLP: The case of arabic language. Springer; 2020. pp. 129–141. [Google Scholar]
  • 22.Nolan M, Brown J, Davies S, Nolan J, Keady J. The senses framework: Improving care for older people through a relationship-centred approach. Getting research into practice (GRiP) report no 2. SHURA. 2006. Available from: http://shura.shu.ac.uk/280/. [Google Scholar]
  • 23.Bushe G. Appreciative inquiry is not about the positive. OD Practitioner. 2007;39: 33–38. [Google Scholar]
  • 24.Han J, Pei J, Kamber M. Data mining: Concepts and techniques. Elsevier; 2011. [Google Scholar]
  • 25.R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. Available from: http://www.R-project.org/. [Google Scholar]
  • 26.Van Rossum G, Drake FL Jr. Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam; 1995. [Google Scholar]
  • 27.Lovins JB. Development of a stemming algorithm. Mech Transl Comput Linguistics. 1968;11: 22–31. [Google Scholar]
  • 28.Jonker A, Ruijt C de, Gruijl J de. Bag & tag’em-a new dutch stemmer. Proceedings of the 12th language resources and evaluation conference. ELRA; 2020. pp. 3868–3876. [Google Scholar]
  • 29.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y, editors. 1st international conference on learning representations, ICLR 2013, scottsdale, arizona, USA, may 2–4, 2013, workshop track proceedings. 2013. Available from: http://arxiv.org/abs/1301.3781.
  • 30.Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. pp. 1532–1543. [Google Scholar]
  • 31.Adams RA, Essex C. Calculus: A complete course. Addison-Wesley Boston; 1999. [Google Scholar]
  • 32.Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018. Oct 11. Available from: https://arxiv.org/pdf/1810.04805.pdf. [Google Scholar]
  • 33.Schrauwen S. Machine learning approaches to sentiment analysis using the dutch netlog corpus. CLIPS. 2010; 30–34. Available from: https://www.clips.uantwerpen.be/sites/default/files/ctrs-001-small.pdf. [Google Scholar]
  • 34.Smith MR. Ternary: An r package for creating ternary plots. Zenodo; 2020. doi: 10.5281/zenodo.4312770 [DOI] [Google Scholar]
  • 35.Bock H-H. Clustering methods: A history of k-means algorithms. Selected contributions in data analysis and classification. 2007; 161–172. [Google Scholar]
  • 36.Syakur M, Khotimah B, Rochman E, Satoto BD. Integration k-means clustering method and elbow method for identification of the best customer profile cluster. IOP conference series: Materials science and engineering. IOP Publishing; 2018. p. 012017. [Google Scholar]
  • 37.Delobelle P, Winters T, Berendt B. RobBERT: A Dutch RoBERTa-based Language Model. Findings of the association for computational linguistics: EMNLP 2020. Online: Association for Computational Linguistics; 2020. pp. 3255–3265. doi: 10.18653/v1/2020.findings-emnlp.292 [DOI] [Google Scholar]
  • 38.Vries W de, Bartelds M, Nissim M, Wieling M. Adapting monolingual models: Data can be scarce when language similarity is high. Findings of the association for computational linguistics: ACL-IJCNLP 2021. Online: Association for Computational Linguistics; 2021. pp. 4901–4907. doi: 10.18653/v1/2021.findings-acl.433 [DOI] [Google Scholar]
  • 39.Van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9. [Google Scholar]
  • 40.Tak SH, Kedia S, Tongumpun TM, Hong SH. Activity engagement: Perspectives from nursing home residents with dementia. Educ Gerontol. 2015;41: 182–192. doi: 10.1080/03601277.2014.937217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Brodaty H, Draper B, Low L-F. Nursing home staff attitudes towards residents with dementia: Strain and satisfaction with work. J Adv Nurs. 2003;44: 583–590. doi: 10.1046/j.0309-2402.2003.02848.x [DOI] [PubMed] [Google Scholar]
  • 42.Koren MJ. Person-centered care for nursing home residents: The culture-change movement. Health Affairs. 2010;29: 312–317. doi: 10.1377/hlthaff.2009.0966 [DOI] [PubMed] [Google Scholar]
  • 43.Kingsley C, Patel S. Patient-reported outcome measures and patient-reported experience measures. BJA Education. 2017;17: 137–144. 10.1093/bjaed/mkw060. [DOI] [Google Scholar]
  • 44.Huppertz VA, Putten G-J van der, Halfens RJ, Schols JM, Groot LC de. Association between malnutrition and oral health in dutch nursing home residents: Results of the LPZ study. JAMDA. 2017;18: 948–954. doi: 10.1016/j.jamda.2017.05.022 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Natasha McDonald

7 Jan 2022

PONE-D-21-18516Text mining in long-term care: exploring the usefulness of computer-aided methods of analysisPLOS ONE

Dear Dr. Hacking,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The reviewers raised a number of concerns with the manuscript that must be addressed. These include but are not limited to their view that the study should contain further empirical experiments and the fact that the study should be more clearly placed in the context of the existing body of work. The reviewers' comments can be viewed in full, below.

Please submit your revised manuscript by Feb 19 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Natasha McDonald, PhD

Associate Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information."

3. Thank you for including your ethics statement:  "Written ethical approval was provided (17-N-86 METC Zuyderland).".  

a. Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study.

b. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

 For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research..

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study certainly does not represents both scientific and practical interest after complete revision according to my point of view the paper is not suitable for publication in this journal so it must be rejected

Reviewer #2: Very interesting research. An innovative approach to the subject. I believe that the work should continue. They raise a very important topic. Residents, family members and care professionals uttered respectively 285, 362 and 549 words per interview. Word frequency analysis showed that words that occurred most frequently in the interviews are often positive.

Reviewer #3: I would like to thank the authors for their contribution. The study brings interesting insights, and it is also good to see that the authors are aware of possible limitations, which is certainly appreciated. Please consider my feedback below for the next version.

(1)

The study presents interesting exploratory analysis, indeed. However, I see that there is a good opportunity for applying further empirical experiments. For example, data clustering or developing a classification model should be a valuable addition. In this way, the study would be properly positioned within the context text or data mining in general.

(2)

Further, the study should also be positioned in line with the state-of-the-art methods of Natural Language Processing (NLP). The NLP research is currently dominated by the use of transformer models (e.g BERT). Further, there are specialized models (e.g. BioBERT), which might be quite applicable in the present work. In the recent years, earlier methods, such as bag-of-words, have been losing ground to transformers. Therefore, I believe that it is so important for the study to demonstrate that the authors are aware of such advances, which may be adopted as part of the future work, at least.

(3)

The introduction should refer to more recent work discussing or applying NLP in the healthcare context. For example:

https://doi.org/10.1109/JBHI.2021.3099755

http://dx.doi.org/10.5220/0010414508250832

(4)

Furthermore, I recommend touching on the explainability aspect. Model explainability is currently a very active area of research in the healthcare domain in particular. I suggest that the authors may discuss that aspect as part of the future work as well.

(5)

As a minor issue, I find the title a bit vague, the use of “computer-aided analysis” sounds too broad. I recommend using more focused terminology.

Reviewer #4: The objective of this research work is to evaluate the quality of care at a nursing home from the comments(narrative data) provided by the residents or their family members and care professionals as perceived by them. Text mining techniques were employed for analysis. Sentiment analysis was done to identify the emotions associated with these people from the words they uttered. The results of sentiment analysis could have been quantified as the authors rightly pointed out that it would be more significant to conclude the quality of care at the nursing home.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes: Dr.Sarojini Balakrishnan

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 25;17(8):e0268281. doi: 10.1371/journal.pone.0268281.r002

Author response to Decision Letter 0


10 Feb 2022

Editor’s comments

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

We have incorporated the styling guidelines and adjusted the manuscript accordingly.

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information."

We have revised these statements to include the necessary information: information about the study was provided to all interviewers, residents, family members and caregivers in advance by an information letter. All participants provided written informed consent: residents with legal representatives gave informed consent themselves (as well as their legal representatives) before and during the conversations.

3. Thank you for including your ethics statement: "Written ethical approval was provided (17-N-86 METC Zuyderland).".

a. Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study.

The statement has been amended. The medical ethics committee of Zuyderland, the Netherlands, approved the study protocol (17-N-86) and concluded that the study was not subject to the Medical Research Involving Human Subjects Act.

b. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research..

The statement has been amended here as well.

Comments to the Author

Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes

We have addressed this comment by sharing our sentiment analysis model and the source data of our figures (i.e. unigram frequencies (per group), bigram frequencies and sentiment of unigrams). These will become available on Github after publication. Moreover, all R and Python code will be become available on Github. Our interview data will not be publicly available due to the privacy of our participants. While certain privacy-related information was removed from the transcripts (e.g. names of participants, living addresses, room numbers and other personally identifiable information), the stories that our participants tell are often of a personal nature. Upon request, our interview data can be provided.

5. Review Comments to the Author

We would like to thank all 4 reviewers for their comments and suggestions for the paper. Three reviewers were very positive about our study and made concrete reflections and suggestions for improvement. We have responded to these in our rebuttal below. One reviewer was negative about the paper although there was no explanation why and suggestions for improvement were lacking.

Reviewer #1: This study certainly does not represent both scientific and practical interest after complete revision according to my point of view the paper is not suitable for publication in this journal so it must be rejected

We would like to thank the reviewer for his/her time. We’re sorry that we couldn’t satisfy your scientific and practical interest and we regret that you were unable to provide us any concrete comments regarding this decision.

Reviewer #2: Very interesting research. An innovative approach to the subject. I believe that the work should continue. They raise a very important topic. Residents, family members and care professionals uttered respectively 285, 362 and 549 words per interview. Word frequency analysis showed that words that occurred most frequently in the interviews are often positive.

We would like to thank the reviewer for his/her positive words regarding our manuscript. We believe that the manuscript improved because of the changes made upon all reviewers’ comments.

Reviewer #3: I would like to thank the authors for their contribution. The study brings interesting insights, and it is also good to see that the authors are aware of possible limitations, which is certainly appreciated. Please consider my feedback below for the next version.

(1) The study presents interesting exploratory analysis, indeed. However, I see that there is a good opportunity for applying further empirical experiments. For example, data clustering or developing a classification model should be a valuable addition. In this way, the study would be properly positioned within the context text or data mining in general.

The reviewer stresses and important point here: more analyses related to data clustering or classification would be a valuable addition. However, these are beyond the scope of this paper. The current manuscript has a different focus: to introduce the nursing/long-term care community, to the possibilities of AI, and more specifically, to text-mining. Hence, we believe that an additional analysis would make the article less comprehensible. Therefore, we believe this could better be addressed in a follow-up article. As a matter of fact, we are currently working on various deep learning analyses with BERT-based language models. In a new study, we’ll be discussing different classification analyses in detail and comparing them against a human baseline (e.g. the human form of analyzing qualitative data: open and axial coding). We are currently conducting a thematic analysis through multi-label classification, using labels (i.e. themes) that are related to quality of care.

(2) Further, the study should also be positioned in line with the state-of-the-art methods of Natural Language Processing (NLP). The NLP research is currently dominated by the use of transformer models (e.g BERT). Further, there are specialized models (e.g. BioBERT), which might be quite applicable in the present work. In the recent years, earlier methods, such as bag-of-words, have been losing ground to transformers. Therefore, I believe that it is so important for the study to demonstrate that the authors are aware of such advances, which may be adopted as part of the future work, at least.

As the reviewer states, a bag of words approach, such as the naïve bayes approach that we used in our study does not reach the same accuracy as modern transformer models. We initially chose a naïve bayes approach, because such a model is ‘easy’ to explain and it’s much easier to replicate than a deep learning approach. Replication of a transformer model is more difficult due to inference and training requiring many computational resources. We wanted to show that even a less advanced method could lead to interesting insights. Additionally, deep learning approaches may not always outperform traditional methods [3]. In our case, by using naïve bayes we achieved an accuracy of around 70%, whereas with a BERT-based deep learning approach we achieved 75%. As this does increase the reliability of our research, we’ve replaced the naïve bayes sentiment analysis with RobBERT which was fine-tuned for sentiment analysis using the interview data [2]. Often such models are fine-tuned on review data (e.g. book or movie reviews), however the language in such reviews is very different from the language used in our interview data. Because of this change, we’ve also mentioned the programming language Python in the methods section, as many deep learning packages depend on this programming language.

(3) The introduction should refer to more recent work discussing or applying NLP in the healthcare context. For example:

https://doi.org/10.1109/JBHI.2021.3099755

http://dx.doi.org/10.5220/0010414508250832

The manuscript now contains the first article the reviewer proposed: “An Accurate Deep Learning Model for Clinical Entity Recognition From Clinical Notes.” Additionally, we’ve included another recent study about natural language processing in a healthcare context: a study conducted in 2020 that discusses how a model such as RoBERTa can be used to assess the sentiment in tweets regarding the COVID 19 pandemic [1].

(4) Furthermore, I recommend touching on the explainability aspect. Model explainability is currently a very active area of research in the healthcare domain in particular. I suggest that the authors may discuss that aspect as part of the future work as well.

We’ve dedicated an additional paragraph to model explainability as part of the future work. In this section we discuss the explainability of both traditional machine learning approaches and deep learning approaches. We also shortly discuss the advantages and disadvantages of both these approaches regarding the explainability (page 16, line 315-321).

(5) As a minor issue, I find the title a bit vague, the use of “computer-aided analysis” sounds too broad. I recommend using more focused terminology.

We’ve decided to change the title to: ‘Text mining in long-term care: exploring the usefulness of artificial intelligence in a nursing home setting’.

Reviewer #4: The objective of this research work is to evaluate the quality of care at a nursing home from the comments(narrative data) provided by the residents or their family members and care professionals as perceived by them. Text mining techniques were employed for analysis. Sentiment analysis was done to identify the emotions associated with these people from the words they uttered. The results of sentiment analysis could have been quantified as the authors rightly pointed out that it would be more significant to conclude the quality of care at the nursing home.

Thank you for your review. We’ve made a number of improvements based on all reviewers’ feedback. The results of the sentiment analysis were quantified and visualized in Figure 5 of the manuscript.

References:

1. Azeemi AH, Waheed A. COVID-19 Tweets Analysis through Transformer Language Models. arXiv preprint arXiv:2103.00199. 2021 Feb 27. Available from https://arxiv.org/pdf/2103.00199

2. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461. https://arxiv.org/pdf/1804.07461.pdf

3. Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84-90.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Haoran Xie

17 Mar 2022

PONE-D-21-18516R1

Text mining in long-term care: exploring the usefulness of artificial intelligence in a nursing home setting

PLOS ONE

Dear Dr. Hacking,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Haoran Xie

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

Reviewer #4: All comments have been addressed

********** 

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: (No Response)

Reviewer #4: Yes

********** 

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: (No Response)

Reviewer #4: Yes

********** 

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: (No Response)

Reviewer #4: (No Response)

********** 

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: (No Response)

Reviewer #4: Yes

********** 

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: I believe that the work fully deserves to be published in PLOS One. Original, innovative topic. I have no comments on the research work. Congratulations to the authors.

Reviewer #3: I thank the authors very much for their response, and for the amendments done.

However, I am afraid that I have to keep my initial recommendation that the manuscript would need further experiments to be considered for a journal publication, especially for a quite high-impact journal like PLOS ONE.

I find the argument about ”introducing the nursing care community to the possibilities of AI” not largely convincing. Applications of Text-Mining have already spanned a plethora of publications in the healthcare domain.

Reviewer #4: (No Response)

********** 

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Przemysław Karol Wolak

Reviewer #3: No

Reviewer #4: Yes: Dr.Sarojini Balakrishnan

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 25;17(8):e0268281. doi: 10.1371/journal.pone.0268281.r004

Author response to Decision Letter 1


25 Apr 2022

Reviewer #2: I believe that the work fully deserves to be published in PLOS One. Original, innovative topic. I have no comments on the research work. Congratulations to the authors.

We would like to thank the reviewer for his/her positive words regarding the originality and innovativeness of our work and corresponding manuscript.

Reviewer #3: I thank the authors very much for their response, and for the amendments done.

However, I am afraid that I have to keep my initial recommendation that the manuscript would need further experiments to be considered for a journal publication, especially for a quite high-impact journal like PLOS ONE. I find the argument about ”introducing the nursing care community to the possibilities of AI” not largely convincing. Applications of Text-Mining have already spanned a plethora of publications in the healthcare domain.

We would like to thank the reviewer for his/her response. The authors decided to include a topic clustering analysis with the aim of discovering topics that were discussed in the interviews with residents, family members and care professionals. Keywords were extracted using POS tagging with a Dutch RobBERTa-like model. Nouns were selected to be representative of the topics existing in the interviews. Using word2vec, keywords were encoded as vectors. These vectors were transformed from cosine to Euclidian space and clustered using k-means. A value for k was calculated using the elbow method. A value of k = 12 was found to be optimal. Afterwards, each cluster was manually given a topic name based on the keywords that were included in the respective cluster. Based on this added analysis, the methods, results, and discussion were amended.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Haoran Xie

27 Apr 2022

Text mining in long-term care: exploring the usefulness of artificial intelligence in a nursing home setting

PONE-D-21-18516R2

Dear Dr. Hacking,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Haoran Xie

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Haoran Xie

28 Jul 2022

PONE-D-21-18516R2

Text mining in long-term care: exploring the usefulness of artificial intelligence in a nursing home setting

Dear Dr. Hacking:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Haoran Xie

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The code, data and models discussed in the article will be made available at https://zenodo.org/record/6675839. Our interview data will not be publicly available due to the privacy of our participants. While certain privacy-related information was removed from the transcripts (e.g. names of participants, living addresses, room numbers and other personally identifiable information), the stories that our participants tell are often of a personal nature. Upon request, our interview data can be provided with restrictions. For researchers who meet the criteria for access to confidential data. the data are available from the Living Lab in Ageing and Long-Term Care. Website: https://www.awolimburg.nl/, Email: ouderenzorg@maastrichtuniversity.nl, Phone: +31 (0)43 38 81570. Visiting Address: Duboisdomein 30 6229 GT Maastricht Postal Address: Academische Werkplaats Ouderenzorg Limburg Maastricht University Vakgroep Health Services Research - DUB 30 Postbus 616 200 MD Maastricht.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES