Abstract
With advancements in analysis of cognitive decline in electronic health records, the research community witnesses a recent surge in social media posting by caregivers and/or loved ones of people with cognitive decline. The major challenges in this area are availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability. To this end, we construct a new dataset, Caregivers experiences with cognitive Decline (CareD), of 1005 posts with more than 194K words and 9541 sentences, highlighting discussions on people with dementia and Alzheimer’s disease on Reddit. We discuss the changing trends of discussions on cognitive decline in social media and open challenges for natural language processing and social computing. We first identify the Reddit posts reflecting substantial information as candidate posts. We further formulate the annotation guidelines, handle perplexities to investigate the existence of experiences, self-reported articles and potential caregiver in candidate posts, resulting in the discovery of latent symptoms, firsthand information, and prospective source of longitudinal information about the patient, respectively.
Index Terms—: Alzheimer’s disease, cognitive impairment, dementia, experiences, social media analysis
I. Background
The World Health Organization (WHO) estimates 50 million people living with dementia worldwide which is expected to triple by 2050.1 The USA report more than 7 million people ages 65 or older had dementia in 2020. If current demographic and health trends continue, more than 9 million Americans could have dementia by 2030 and nearly 12 million by 2040 [1]. Dementia is an organic disorder, which happens with aging as there is an associated physical deterioration of the brain tissue [2]. Alzheimer’s disease is the most common cause of dementia which affects more than 6 million people in the USA and is expected to increase to 15 million by 2060 [3].
Neurological disorders cause hindrance in day-to-day activities by inducing inabilities to think, remember or make-decisions. This state of mind is further affected with diminishing active vocabulary (finding appropriate words at the right time/place), confusions in decision-making, and mood changes. A time period of ‘symptoms-free’ 10–20 years acts as a preamble to cognitive impairment. This neurological disorder progresses from normal conditions to mild cognitive impairment (a manageable disorder if detected early) followed by dementia. Dementia can impact the individual’s cognitive ability of making safe decisions, such as driving or managing finances and hence, the early detection may help the families of people having dementia, take appropriate steps to ensure the safety of their loved ones. The research community witnesses an effort to slow down the progression to dementia and improve the quality of life with early detection of dementia as there is currently no cure available for dementia but the medications with timely indication of prospective disease may help in managing and delaying the dementia onset.
People with dementia challenge stigma and stereotypes to advocate their conditions as a manageable disability rather than ‘a living death’ [4] on public platforms such as books, public speaking, public interviews and other traditional media. Most of them do not have access to these public platforms and are unable to track their cognitive decline due to lack of awareness. People prefer using blogs as compared to writing formal books for expressing their social experiences. Zhou et. al. 2021 [5] used the dataset with five different classification features to perform classification task of identifying people living with dementia: (i) Part of speech tagging, (ii) context free grammar, (iii) syntactic complexity, (iv) psycholinguistic, (v) vocabulary richness, (vi) repetitiveness, leveraging on multi-task neural network and attention-based mechanism for words, sentences and documents. Irrespective of their conditions, people advocate their life experiences on blogs, websites, online forums, social media platforms to share their experiences, provide and receive support and increase their networks [6]. In this study, we extract the posts reflecting experiences of caregivers about people with cognitive decline (see Figure 1).
Fig. 1.

Overview for classifying the social media posts that narrate caregivers’ experiences about people with cognitive decline.
A. Motivation of the study
The prior works on determining social determinants of health through NLP on clinical narratives opens up new research directions in medical information extraction [7], [8]. According to Pew Research Center’s Internet and American Life Project Surveys, 58% of people with greater than 65 years of age are using online platforms and it increases to 85% for people with 50 to 65 years of age [9]. Social media platforms such as Twitter, Reddit, Gap etc. acts as a catalyst to powerful social change such as ‘#MeToo’, ‘#Black-LivesMatter’ and many more. In this context, social media platforms are valuable sources of information providing self-reported writings by users. As such different types of users sharing other health information are health professionals, news organizations, commercials, health professionals, advocacy, social media, journalists, care services, patient/family, and academic researchers [10]. The research community argues that 80% of Tweets about dementia are posted by capitalists and social initiatives that aim to raise voice for people with dementia and their caregivers [10]. On social media, there is a surge in postings reflecting narrations about caregivers’ experiences of dealing with people having cognitive decline. One of the major challenges is to separate these documented experiences from other health information about dementia.
Society has witnessed a transition phase from offline spaces to online spaces amid COVID-19 pandemic. The use of technology to enhance personal Human-Computer Interaction (HCI) experience through social connections and self-actualization promotes social well-being. People living with dementia are adversely affected by disruption of routines and lack of cognitive stimulation which is mitigated by sharing their emotional experiences on social media platforms. Thus, the semantics of information disseminated about social well-being has changed from purely statistical and technical observations to personal writings observed in social media platforms. Therefore, thematic representations and types of social media users who present dementia related information on social media are different. In this context, recent studies on people with dementia and their adaptation of the new digital landscape assist comprehensive analysis with ‘positive technology’ [11]. Here, positive technology is the scientific and applied approach of using technology to enhance personal living experience [12].
B. Our contributions
In this study, we do not aim to ascertain cognitive impairment, but use artificial intelligence techniques to identify narrations regarding cognitive decline described by caregivers in Reddit posts. This will open up new research directions towards the use of social media discussions. We used two subreddit communities, such as r/dementia and r/Alzheimers, to identify the caregivers and characterize the progression of dementia through examining the emotional spectrum in self-reported posts over the period of time. We annotated data focused on the following items:
Type of user posting about dementia and Alzheimer’s disease in the subreddit community. We find the ratio of commercial users v/s self-reported posts by patient or caregivers, signifying the changing trends in post-pandemic society.
Caregivers’ expressions of cognitive decline, disclosing the latent symptoms about human behavior. Classification of the presence or absence of the real-time experiences in Reddit posts may supplement the medical diagnosis of cognitive decline under professional and clinical supervision.
ID of caregivers for extracting longitudinal posts relevant subreddits such as r/AlzheimersGroup, r/AlzheimersSupport to name a few. This information may be used to discover the progression of cognitive decline over time.
Our major contributions are:
We construct and partially release2 our new dataset, CareD, on a public platform, leveraging on annotation scheme and perplexity guidelines, to facilitate future research in this direction.
We also report basic statistics of the Reddit post dataset and discover patterns among two classes through keyword extraction mechanisms
We ensure reliability and re-usability through inter-annotator agreement and FAIR principles, respectively.
We identify caregivers’ experience regarding cognitive decline using deep neural networks, highlighting the major challenges of this task through error analysis
II. Dataset
The Reddit social media platform facilitates the user to use a community-based platform for comprehensive discussions, thoughts and their viewpoints on a given topic of interest, known as a subreddit. The Reddit users share their experiences, thoughts and beliefs through personal writings, with ease. Reddit, being an anonymous platform, provides an opportunity of sharing user-generated and user-curated context in the community-specific subreddits for open discussions indicating particular themes or conditions. In this work, we use the most frequently growing subreddits for cognitive decline r/dementia and r/alzheimers with more than 20K and 10K subscribers, respectively, in the last decade.
A. Data Annotation
The limit on the number of characters is 10,000 for a comment, 40,000 for a post, and 300 for post titles. We collected 600 samples from each of the two subreddits through The Python Reddit API Wrapper (PRAW API).3 Out of 1200 posts, we removed the we manually cleaned the data by removing all the data samples that contains: (i) null value in selftext4, (ii) only hyperlinks in selftext. We further performed preprocessing by removing the irrelevant characters from a given text. We made observations for each data sample with 4 different tasks: (i) caregiver’s or loved one’s experience, (ii) potential caregiver, (iii) is the person still alive?, and (iv) the type of user (commercial or non-commercial). After this preprocessing and cleaning, we obtained 1005 data points (postings) in the final corpus. We made annotation and perplexity guidelines for each of these tasks with the help of our experts. In this section, we discuss the annotation guidelines (see Section II-A1) and perplexity guidelines (see Section II-A2) to ensure future coherence during the annotation task.
1). Annotation Guidelines:
We developed detailed annotation guidelines for all the four tasks. However, in this work, we focus on the first task of identifying a description of a caregiver’s experience with people suffering through cognitive decline in Reddit posts. Thus, we formulate and describe the annotation guidelines in brief for caregiver’s experiences as:
The narration about human behavior, the symptoms and perception of a patient with cognitive decline.
We consider the syndromes and symptoms associated with cognitive decline only, and filter out the chronic disease related issues such as depression, and anxiety, unless mentioned to be clinically diagnosed.
To clearly indicate the symptoms and checkpoints for identifying the cognitive decline related posts, we use six domains of cognitive and functional performance applicable to Alzheimer’s disease and related dementia: Memory, Orientation, Judgment & Problem Solving, Community Affairs, Home & Hobbies, and Personal Care, as mentioned in Cognitive Dementia Rating (CDR) scale [13].
We further investigate the authenticity of the information. If the user is suggesting a URL to external blogs or YouTube videos by narrating experiences for better understanding of cognitive decline, we consider these experiences non-reliable and mark it as a negative sample.
The experts-driven annotation guidelines ensure validity of the ground-truth. With these guidelines, our experts perform the task of classifying the caregiver’s experience as 0 or 1 (i.e., presence or absence of experience) with three common perplexities as indicated in perplexity guidelines.
2). Perplexity Guidelines:
We formulate the following perplexity guidelines to facilitate the coherence among annotations as follows:
- Mental health versus Cognitive Decline: There is a fine line between cognitive decline and mental health issues. Consider the following post P1:
-
P1: “Tomorrow my grandmother has an evaluation with hospice. Physically she’s great. However, her caregivers have noticed her mentally declining. She has apparently started talking to the wall and having full conversations with a stuffed reindeer - but thinks it’s a teddy bear”.Although the post P1 indicates poor mental health and cognitive dysfunction, there is no evidence to mark this dysfunction as cognitive decline. We acknowledge that we do not make any assumptions on cognitive decline, unless or otherwise clinically diagnosed or symptoms identified. Thus, we label this post P1 as a negative sample.
-
- Self experience versus Patient’s experience: The caregivers and loved ones of patients with cognitive decline does not necessarily narrate the behavioral aspects of the patient, but also their own thoughts, beliefs, struggles, and perceptions about people with cognitive decline. Consider the following post P2:
-
P2: “Neither caregiver nor patient but thoughts about killing grandmother “ended up calling the crisis hotline on the Monday from last week. The thoughts of killing my grandmother began to change to concrete plans. I was so afraid of my own head that I admitted myself voluntarily to a mental hospital. I have been released, now. I stayed there for about a week. I was discharged yesterday evening.”The post P2 highlights the anxiety and frustration of the caretaker, illustrating the mental health effects of caregiver. Thoughts about killing grandmother suggests that the user is a potential caretaker, but does not flag this post as a positive sample for identifying experience in this study.
-
- Caregiver’s perception versus human behavior: Sometimes, the caregivers make a perception about their loved ones with dementia and pass-on their own judgements in-place of narrating their behavior. We frame guidelines to differentiate them by constructing perplexity guidelines for this issue. Consider a post P3:
-
P3: “She’s turned into a complete idiot. And angry. I will make sure I don’t end up like this.”The post P3 exemplifies the madness of the caregiver through words such as idiot, indicating no real-life human behavior and thus, highlighting the P3 as a negative sample for this task.
-
3). Annotation Task:
Our experts train three postgraduate student annotators to annotate the dataset through annotation guidelines and perplexity guidelines. In the second round, the student annotators were asked to label 20 instances. The inter-annotator agreement score was comparatively higher than the ones performed in the first round (50 samples by experts). All three students were employed to consistently annotate the given file in separate rooms for 20 days where they annotate approximately 50 samples per day. This annotation task ensures consistency of CareD. Furthermore, to ensure the reliability of the dataset, we carry out the inter-annotator agreement study through textitFleiss’ Kappa inter-observer agreement study, where κ is calculated as 74.59%. The final annotation was achieved through a majority voting mechanism and verified by experts. The samples of CareD (see Table I) suggests the removal of all commercial posts and posts where caregivers of patients are sharing the experiences about themselves.
TABLE I.
Samples of CareD. We filter out all the Reddit posts in which caregivers of patients with cognitive decline are sharing the experiences about themselves and all the commercial postings by advertisers, journalists, researchers and about awareness programs, labeling them as 0.
| S.No. | Text | Label |
|---|---|---|
| T1 | Does anyone know of a good strategy to help someone with moderate dementia who still loves alone with remembering to take medication? | 0 |
| T2 | my uncle was in the last sad stages of dementia last week his sis came from Chicago to be with her brother. We were told to come visit they didn’t know how much longer he was gonna last. Tonight at 7:30 pm I got the text our uncle passed away. The only ” comfort ” I me & my mom know is he’s not suffering anymore. My mom had gone last week to see him I wanted to remember him as my funny uncle that he always was so I stayed home. He and my aunt had never been separated except for when he was in hospital recently they were married since they were very young. | 0 |
| T3 | Does anyone know of a research foundation to donate to, that is not primarily focused on the Amyloid theory? Thank you. | 0 |
| T4 | My dad has had Alzheimer’s for about five years. Me and my mum acts as his full time carer and my siblings help as much as they can and we have two companions that come to keep him company. Dad is declining. He can chat but he only talks about the same 4 things on rotation, all from his past, and will turn any conversation to these points. He sleeps in a separate room to my mum and takes an array of weapons to bed every night - walking sticks, toilet brushes, and more recently knives. He’s worried someone might break in. We’ve told the GP and they’re changing his medications again to see if that helps to ease the paranoia. He’s also started some more unusual behaviors like urinating into mugs and keeping them in his bedroom. | 1 |
B. FAIR Principles
The FAIR guiding principle increases the Findability, Accessibility, Interoperability, and Reusability of the dataset to emphasize the machine actionability due to increasing reliance on computational systems to facilitate future studies [14].
Findable: CareD contains the caregiver’s experiences and the label for 1005 data-points. We release the first version of our dataset as CareD.v1 at Github.
Accessible: CareD is available in the comma-separated format on Github. We plan to expand and update CareD with explainable and contextual information in upcoming versions.
Inter-operable: Our data is consistently structured and described, both syntactically and semantically.
Reusable: We sufficiently annotate our dataset with 1005 instances for a binary classification problem to facilitate its re-usability.
C. Data Analyses
The basic statistics of the data are illustrated in Table II. We observe 56.52% of the total data samples as the narration of caregiver’s experiences. Prior works classifies seven different user-types posting about cognitive decline of loved ones on social media plat- forms: (i) health professionals, (ii) news organizations, (iii) advocacy, (iv) journalists, (v) caregivers and other care services, (vi) patient/family, and (vii) academic researchers [10], suggesting more than 80% of the Twitter posts by commercial platforms including health professionals, news organizations, academic researchers and journalists. However, we observe 74.22% (746/1005) of the posts to be non-commercial (posted by patients or their caregivers/loved ones) in Reddit social media platform. This change could be due to the increase in use of social media platforms for comprehensive discussions on human behavior, increase in the number of cases of cognitive decline, or post-pandemic era where people have witnessed social distancing and lockdown.
TABLE II.
The statistics of Reddit dataset (N = 1005)
| Criteria | Experience | |
|---|---|---|
| No (0) | Yes (1) | |
| Number of Posts | 437 | 568 |
| Total number of Words | 54745 | 140152 |
| Total number of Sentences | 2787 | 6754 |
| Average number of Words | 126 | 247 |
| Max. number of Words | 963 | 1122 |
| Average number of Sentences | 7 | 12 |
| Max. number of Sentences | 56 | 66 |
| Reported by Patient | 997 | 8 |
| Patient Alive | 980 | 25 |
| Non-commercial | 259 | 746 |
| Reported by caregiver | 334 | 671 |
In this work, we further consider the potential caregivers as the ones whose historical posts are prospective resources of narrations about their experiences with loved ones having cognitive decline. The data showed that about two-thirds of posts are written by potential caregivers. However, other posts may contain narrations about experiences of other family members who are living far away, the neighbors or someone at work. Furthermore, about 2.5% of the posts mention the patient’s death and only 0.7% self-reported posts of personal experience with cognitive decline was recorded.
Syntax Overlapping:
Motivated with rule-based systems and handcrafted feature extraction for clinical trials [15], we examined the nature of CareD. The words in both classes with positive and negative samples have substantial overlapping due to long text and similar contextual information in a common topic of discussion. The idea behind identifying the syntactic and lexical similarity between two classes is to understand the nature of a given text and the need of infusing external information in computationally intelligent, learning-based, and pre-trained classifiers. Such overlapping context highlights the complexity of the text and points towards the importance of developing highly efficient classifiers for the task of identifying caregiver’s experiences about cognitive decline. The overlapping terms are obtained as top-12 keywords through following keyword extraction methods (see Table III):
YAKE: An unsupervised automatic keyword extraction method, identifies the most relevant keywords in a given data point through statistical information [16].
KeyBERT: A pre-trained model that finds the sub-phrases in a data point that reflects the semantics of original text. First, the document embeddings are extracted with the BERT model to get a document-level representation, followed by the word embeddings for N-gram words/phrases [17].
TABLE III.
Keyword extraction approach to determine the top-12 key terms reflecting semantics of a given text. Common keywords (cw) among both the classes. Blue: behavior and activity. Red: Things happening in environment due to cognitive decline
| Method | Label | #CW | Frequent terms |
|---|---|---|---|
| YAKE | Absence | 77 | loved, great, hospice, place, caregiver, understand, caregivers, health, support, moved, happy, patients |
| Presence | night, grandmother, move, diagnosis, worse, parents, meds, husband, couple, wife, walk, calls | ||
| KeyBERT | Absence | 8 | caregiver, aides, carers, volunteers, hospice, charity, carefree, organize, volunteering, nurse, donate, curated |
| Presence | passwords, grandpa, disability, wheelchairs, grandparents, locking, diagnoses, neuropsychologist, unlocking, custody, controlling, addiction |
There is a huge difference in the number of overlapping terms obtained through YAKE and KeyBERT. The overlapped contextual information among two classes for keywords from YAKE is greater than the keyword obtained through KeyBERT.
III. Experiment and Evaluation
In this section, we define the task, evaluation measures, classification methods and the experimental setup. We compare the experimental results averaged over 10-fold cross validation.
A. Problem Formulation
We formulate this task of identifying the caregivers’ experiences as a binary classification task with a ‘Present’ class confirming the presence of experience sharing about cognitive decline, and an ‘Absent’ class suggesting the absence of experiences with cognitive decline but other experiences may be present. Consider a given post P as a text document containing n number of words where n varies for each post Pi. Here, Pi = {w1, w2, …,wn} where i varies from 1 to 1005 in CareD. The idea behind this classification task is to find the value of E(Pi) as a Boolean value (0: Absent or 1: Present) where E is the experience in a given text.
B. Metrics and Measures
The performance of our experiments is recorded in precision, recall and F-measure. Additionally, we record the accuracy of our models. In case of the imbalanced dataset, the accuracy cannot be considered a reliable measure as it provides an overoptimistic estimation of the classifier’s ability on the majority class. An effective solution overcoming the class imbalance issue comes from the Matthews correlation coefficient (MCC) [18]. A correlation of: C = 1 indicates perfect agreement, C = 0 is expected for a prediction no better than random, and C = −1 indicates total disagreement between prediction and observation [19]. Thus, MCC produces a more informative and truthful score in evaluating imbalanced binary classifications than accuracy and F-measure.
C. Classification Methods
We investigate three different classification mechanisms on CareD:
1). Off-the-shelf methods:
We implemented three traditional ensemble models: (i) Random Forest (RF) Classifier, (ii) Ada-Boost Classifier, and (iii) Gradient Boost Classifier, to test the significance of binary classification for large text documents. The traditional machine learning algorithms are applied on word embedding [20], evolved through Word2Vec, to evaluate the performance and randomness of the prediction.
2). Recurrent Neural Networks (RNN):
We implemented the deep learning based recurrent neural network methods [21]: (i) Long Short term Memory (LSTM) [22], and (ii) Gated Recurrent Unit (GRU). The GRU unit controls the flow of information like the LSTM unit, but without having to use a memory unit, resulting in a computationally efficient approach.
3). Pre-trained Language Model:
We implement the most widely used pre-trained language model: Bidirectional Encoder Representations from Transformers (BERT) that make predictions based on right and left context in a given text [23]. We trained the bert-base-uncased model on a training set. However, BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. In this work, we implemented the BERT model as a baseline by truncating the text and limiting it to the first 300 words. To handle the problem of long real-time Reddit posts, we recommend the use of Longformer, or slicing the text by a sliding window or simplifying transformers.
D. Experimental Settings
For consistency, we used the same experimental settings for all three baseline methods with 10-fold cross-validation. All our results are reported as the average across all folds. A varying length posts are padded and trained for 150 epochs with early stopping with a patience of 10 epochs. Thus, we set hyper-parameters for our experiments with transformer-based models as H = 300, O = Adam (for RNN models) and O = Stochastic gradient descent (SGD) for BERT, the learning rate =1×10−5, and batch size 16 for all baselines. We further used a dropout layer with 50% dropout for units, and Kernel and Bias regularization with learning rate=0.01 to avoid over-fitting.
E. Experimental Results
Accuracy can be used when the class distribution is similar while F1-score is a better metric when there are imbalanced classes as in the above case. Accuracy is used when the True Positives and True negatives are more important while F1-score is used when the False Negatives and False Positives are crucial. In this task, we consider false negatives to be more crucial than true positives. Thus, we consider the importance of performance evaluation metrics for our task as: MCC > F–measure > Accuracy. To this end, we compare traditional classifiers and make observations (see Table IV).
TABLE IV.
Comparison of Models with Precision (P), Recall (R), F-score (F), Accuracy AND MCC. averaged over 10-fold cross-validation.
| Model | P | R | F | Accuracy | MCC |
|---|---|---|---|---|---|
| Word2Vec + AdaBoost | 0.76 | 0.83 | 0.80 | 0.75 | 0.46 |
| Word2Vec + RF | 0.83 | 0.82 | 0.82 | 0.79* | 0.57 |
| Word2Vec + GradientBoost | 0.78 | 0.81 | 0.80 | 0.75 | 0.48 |
| LSTM | 0.68 | 0.91 | 0.78 | 0.71 | 0.47 |
| GRU | 0.66 | 0.83 | 0.73 | 0.67 | 0.44 |
| BERT | 0.88 | 0.73 | 0.80 | 0.76* | 0.51 |
The best MCC (0.57) is given by a simple yet effective traditional ensemble approach: Random Forest Classifier, producing 0.82 and 0.79 of F-measure and Accuracy, potentially due to considering the entire context of a data sample with long texts. BERT produced the second best accuracy 0.76, second best (0.51) MCC and F-measure 0.80. In the near future, we shall investigate this trade-off between Accuracy, MCC and F-measure, with more explainable and visualization methods. We further observe the class ‘Absent’ to be less predictive than the class ‘Present’, maybe due to lesser number of samples in the class ‘Absent’ (437) than ‘Present’ (538). Furthermore, the MCC scores might be improved with balanced dataset. Thus, we plan to perform context preserving data augmentation on class ‘Absent’ to add more samples.
IV. Discussion
A. Future scope
The problem formulated for this binary classification task is complex and subjective. To ensure the computational enhancements, we recommend thinking beyond semantics since there is a huge potential in the use of discourses and pragmatics for identifying the contextual information in a given post. We plan to enhance this dataset with more data samples and other contextual information as a ground truth. The use of knowledge graphs may uncover early symptoms of cognitive decline and human behavior [24], enhancing care plan for people with dementia. In the near future, we plan to work on domain-specific pre-trained models, large language models, explainable AI and model visualizations for misclassified instances [25].
B. Ethics, Limitations and Broader Impact
Social media data is often sensitive, especially when the data is related to cognitive impairment. CareD contains only publicly available posts and Boolean marking for caregivers experience, and no user’s metadata is made publicly available as we are committed to the ethical practices of protecting the privacy and anonymity of the users. In this study, we acknowledge that we do not reach out to any social media user as it is not required to take consent from users of Reddit social media platforms where users are free to choose if they want to disclose their identity. We further rephrase the text of Reddit posts in examples. We consider all the Reddit posts to be reliable and non-manipulative without making any assumptions on the users’ perception. The dataset is available on GitHub5. Clearly, machine learning predictions cannot replace professional cognitive decline diagnostics, counseling, or therapy at this stage. As shown in our evaluation, their accuracy and trustworthiness remain insufficient for such purposes. The two major challenges of CareD are long text documents and syntactic overlapping causing huge complexity while applying pretrained models, and research gaps with discourse analysis [26] of a given text.
V. Conclusion
We constructed the Reddit dataset to classify the given text to uncover latent patterns in caregivers’ experience and behavior with cognitive decline, which opens new directions to facilitate social computing research. The dataset is released in public to facilitate the relevant research in this area. With further improvement in machine learning classification, we may automatically identify the caregivers’ real-time experiences, thereby extracting the historical posts for characterizing the progression of human behavior amid cognitive decline, facilitating research in dementia caregiving and impact of disease.
Acknowledgement
This study was supported by NIH R01 AG068007.
Footnotes
The reason behind the partial release of CareD is to uphold the ethical considerations.
A detailed explanation of the post.
Contributor Information
Muskan Garg, Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA.
Sunghwan Sohn, Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA.
References
- [1].Zissimopoulos JM, Tysinger BC, St. Clair PA, and Crimmins EM, “The impact of changes in population health and mortality on future prevalence of alzheimer’s disease and other dementias in the united states,” The Journals of Gerontology: Series B, vol. 73, no. suppl_1, pp. S38–S47, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Ray S and Davidson S, “Dementia and cognitive decline,” A review of the evidence. Age UK, vol. 27, pp. 10–12, 2014. [Google Scholar]
- [3].Brookmeyer R, Abdalla N, Kawas CH, and Corrada MM, “Forecasting the prevalence of preclinical and clinical alzheimer’s disease in the united states,” Alzheimer’s & Dementia, vol. 14, no. 2, pp. 121–129, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Beard RL, Knauss J, and Moyer D, “Managing disability and enjoying life: How we reframe dementia through personal narratives,” Journal of Aging Studies, vol. 23, no. 4, pp. 227–235, 2009. [Google Scholar]
- [5].Zhou D, Yuan J, and Si J, “Health issue identification in social media based on multi-task hierarchical neural networks with topic attention,” Artificial Intelligence in Medicine, vol. 118, p. 102119, 2021. [DOI] [PubMed] [Google Scholar]
- [6].Craig D and Strivens E, “Facing the times: A young onset dementia support group: Facebooktm style,” Australasian Journal on Ageing, vol. 35, no. 1, pp. 48–53, 2016. [DOI] [PubMed] [Google Scholar]
- [7].Torii M, Finn IM, Doan S, Wang P, Yang EW, and Zisook DS, “Task formulation for extracting social determinants of health from clinical narratives,” arXiv preprint arXiv:2301.11386, 2023. [Google Scholar]
- [8].Hahn U and Oleynik M, “Medical information extraction in the age of deep learning,” Yearbook of medical informatics, vol. 29, no. 01, pp. 208–220, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Madden M, Lenhart A, Cortesi S, and Gasser U, “Pew internet and american life project,” Washington, DC: Pew Research Center, 2010. [Google Scholar]
- [10].Robillard JM, Johnson TW, Hennessey C, Beattie BL, and Illes J, “Aging 2.0: health information about dementia on twitter,” PLoS One, vol. 8, no. 7, p. e69861, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Talbot CV and Briggs P, “The use of digital technologies by people with mild-to-moderate dementia during the covid-19 pandemic: A positive technology perspective,” Dementia, vol. 21, no. 4, pp. 1363–1380, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Gaggioli A, Villani D, Serino S, Banos R, and Botella C, “Positive technology: Designing e-experiences for positive change,” p. 1571, 2019. [DOI] [PMC free article] [PubMed]
- [13].Springate BA, Tremont G, Papandonatos G, and Ott BR, “Screening for mild cognitive impairment using the dementia rating scale-2,” Journal of Geriatric Psychiatry and Neurology, vol. 27, no. 2, pp. 139–144, 2014. [DOI] [PubMed] [Google Scholar]
- [14].Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE et al. , “The fair guiding principles for scientific data management and stewardship,” Scientific data, vol. 3, no. 1, pp. 1–9, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Shi J, Graves K, and Hurdle JF, “A generic rule-based system for clinical trial patient selection,” arXiv preprint arXiv:1907.06860, 2019. [Google Scholar]
- [16].Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, and Jatowt A, “Yake! keyword extraction from single documents using multiple local features,” Information Sciences, vol. 509, pp. 257–289, 2020. [Google Scholar]
- [17].Grootendorst M, “Keybert: Minimal keyword extraction with bert,” Zenodo, 2020. [Google Scholar]
- [18].Guilford JP, “Psychometric methods,” 1954.
- [19].Akosa J, “Predictive accuracy: A misleading performance measure for highly imbalanced data,” in Proceedings of the SAS global forum, vol. 12, 2017, pp. 1–4. [Google Scholar]
- [20].Wu Y, Xu J, Jiang M, Zhang Y, and Xu H, “A study of neural word embeddings for named entity recognition in clinical text,” in AMIA annual symposium proceedings, vol. 2015. American Medical Informatics Association, 2015, p. 1326. [PMC free article] [PubMed] [Google Scholar]
- [21].Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y et al. , “Deep learning in clinical natural language processing: a methodical review,” Journal of the American Medical Informatics Association, vol. 27, no. 3, pp. 457–470, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Jelodar H, Wang Y, Orji R, and Huang S, “Deep sentiment classification and topic discovery on novel coronavirus or covid-19 online discussions: Nlp using lstm recurrent neural network approach”, IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 10, pp. 2733–2742, 2020. [DOI] [PubMed] [Google Scholar]
- [23].Casola S, Lauriola I, and Lavelli A, “Pre-trained transformers: An empirical comparison,” Machine Learning with Applications, vol. 9, p. 100334, 2022. [Google Scholar]
- [24].Liu X, Khalil I, and Devarakonda M, “Customizing knowledge graph embedding to improve clinical study recommendation,” arXiv preprint arXiv:2212.14102, 2022. [Google Scholar]
- [25].Sayed MA, Qin X, Kate RJ, Anisuzzaman D, and Yu Z, “Identification and analysis of misclassified work-zone crashes using text mining techniques,” Accident Analysis & Prevention, vol. 159, p. 106211, 2021. [DOI] [PubMed] [Google Scholar]
- [26].Antonsson M, Lundholm Fors K, Eckerstrӧm M, and Kokkinakis D, “Using a discourse task to explore semantic ability in persons with cognitive impairment,” Frontiers in Aging Neuroscience, vol. 12, p. 607449, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
