Abstract
Objectives
Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information.
Materials and Methods
Psychiatric encounter notes from Mount Sinai Health System (MSHS, n = 300) and Weill Cornell Medicine (WCM, n = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network, instrumental support, and loneliness).
Results
For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81).
Discussion and Conclusion
Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted.
Keywords: social determinants of health, social support, social isolation, electronic health records, natural language processing, large language model
Introduction
Social determinants of health (SDOH) are the nonmedical conditions that shape daily life and affect health outcomes.1 Leveraging SDOH in clinical decision-making may personalize treatment planning and improve patient outcomes.2 Social support (SS) and social isolation (SI) are two key components of SDOH that significantly impact physical and mental well-being.
Social isolation is associated with higher health care expenditure,3 morbidity,4–6 and mortality7,8 and may be as harmful as smoking 15 cigarettes a day.9 Specific health risks linked to SI include poor physical and mental well-being,10 metabolic diseases, infectious diseases, dementia,9 suicidal thoughts,11 anxiety,12 and depression.11,13 We previously conducted a scoping review to evaluate the relationship between social connectedness and the risk of depression or anxiety and observed that loneliness was significantly associated with higher risks of major depressive disorder, depressive symptom severity, and generalized anxiety disorder.12 Social isolation comprises several interrelated psychosocial constructs, including lack of a social network, poor emotional support, and feelings of loneliness.14 The Surgeon General’s 2023 advisory, “Our Epidemic of Loneliness and Isolation,” recommends the identification of patients with SI at the health care system level to track community prevalence. Research may then study causal mechanisms, patterns across demographics, and preventive approaches.9,15
In contrast, SS and related constructs including emotional support, instrumental support, and social network are associated with improved health outcomes.5,16,17 Social connections facilitate health-related behaviors, including adherence to medication and treatment.18,19 Moreover, social relationships are indirectly linked to various health-related factors, including blood pressure, immune function, and inflammation.20–23 In regard to mental health, studies have shown SS to be a protective factor for depressive symptoms and disorders across a range of settings and populations.12
Existing studies on SS and SI are largely based on questionnaire or survey data from small samples or specific populations (eg, the elderly,13 adolescents during the pandemic,24,25 and pregnant/postpartum women12). Research on identifying SS and SI from real-world, large-scale, routinely collected electronic health records (EHRs) is lacking, likely because SDOH information, including SS and SI, are rarely encoded in EHRs as structured data elements.2 International Classification of Diseases (ICD)-9 V-codes and ICD-10 Z-codes have been expanded to include SI; however, studies note poor adoption rates among clinicians and health systems.26 Instead, these concepts are often captured in EHRs as part of narrative text during a clinical encounter, yet manual abstraction of such data is time-consuming and labor-intensive.2,27
Natural language processing (NLP) automates extraction of information from unstructured data and has been implemented in previous literature for identifying different SDOH constructs, including alcohol use, substance use, and homelessness.2 However, the highly varied language used by clinicians, domain/site-specific knowledge, and lack of annotated data present challenges in extracting SDOH from clinical notes.2 There are currently 3 main approaches to extract SS and SI from clinical text, each with strengths and limitations.
The first approach involves creating dictionaries (“lexicons”) and a set of rules with which to search the text for matches. Lexicons may be either derived from standardized medical ontologies or developed specifically for the task by domain experts. Software may be used to implement the rules, including recognition of negative terms or contexts in which the lexicon match is a “false-positive” (eg, if the documented SDOH is not describing the patient, but rather the patient’s sibling). The benefit of this rule-based system (RBS) is that the parameters are highly controlled; there is no “black box” (an inability to see how the algorithm makes decisions) since the RBS developer is naming exactly what is, or is not, included. However, it is exceedingly difficult to list every term in the lexicon and create a rule for every context in which the term might occur. Previous work using this approach includes studies detecting SI from clinical notes27,28 in specific patient populations. ClinicalRegex and Linguamatics I2E, 2 rule-based/lexicon software programs, were used to extract SI29 and SS mentions, again in specific patient populations.30 Other studies (eg, Navathe et al.31) combine ICD codes with lexicon terms to detect SS/SI. Since the aim of these studies is to identify SS/SI for a clinical purpose, the focus is not on the rigor of algorithm development. As a result, they often employ more straightforward approaches, such as (1) not differentiating between types of SS/SI or considering nuances, (2) using a relatively small sample of manually validated notes, (3) using a single site, and (4) not typically making their pipelines publicly available.
The second NLP approach involves traditional training or adapting various machine-learning models (eg, prepackaged topic modeling, machine- and deep-learning models). Electronic health record-based research has used such models trained on clinical corpora and thus are ideally suited to understand clinical language. However, to perform a task such as identifying specialized concepts like SS/SI, these models still require extensive manually labeled training data to fine-tune the model, which is labor-intensive and generates results that underperform the lexicon-based approach.32
Finally, an emerging approach is to use large language models (LLMs) which have been trained on massive amounts of data and use transfer learning to perform downstream tasks with little need for fine-tuning or manual labels. The advent of LLMs is a major milestone in the field of NLP and has been applied for multiple use cases including SDOH extraction from clinical notes. Categories of SDOH extracted with LLMs include patients’ employment, housing, parental status, and transportation issues.33,34 Preliminary work has used LLMs with minimal fine-tuning to extract SDOH from unstructured EHR data, although these models have yet to be optimized for extracting SS and adverse SS (SI).33
In summary, each of these approaches for extracting SDOH from clinical text have strengths and limitations. The RBS requires domain experts and significant time to develop lexicons and rules, which results in highly predictable outputs. In contrast, machine-learning- and deep-learning-based systems rely heavily on a large, annotated corpus for training. Lastly, LLMs need less data for fine-tuning compared to deep-learning algorithms, but are often considered black-box models, making their decision-making processes less transparent to the end user.
This work aims to build on these previous systems by dissecting SS/SI into their fine-grained categories, including presence/absence of social network, instrumental support, emotional support, and loneliness. This separation is important, as the literature has shown that they are separate concepts,35 not interchangeable, with distinct effects on health outcomes.12 A general label not only diminishes the ability to detect associations between subcategories but also limits the eventual interventions that might come from findings. A distinction is frequently drawn between subjective and objective social support, and they do not necessarily improve together.36 For example, loneliness is frequently found to be associated with depressive symptoms, but increasing a person’s social activity is not necessarily the way to alleviate loneliness, and other interventions might be more indicated for the individual experiencing loneliness.37 This study aims to fill a gap in the literature by not only focusing on SS/SI extraction in clinical narratives but also distinguishing fine-grained subcategories. Here, we describe the development of a rule book for manual annotations as well as rigorous development of two different NLP systems: RBS and LLM.
Additionally, the variability in clinical documentation, both within and across hospital systems, presents a challenge to the portability of NLP systems.2 Hence, another aim of this work is to create NLP pipelines that are portable across sites, here, two large academic medical centers in New York City: Mount Sinai Health System (MSHS) and Weill Cornell Medicine (WCM). By making benchmarked NLP pipelines publicly available, we aim to enable other NLP investigators to adopt, validate, improve, and deploy the developed SI/SS extraction tool for contextualizing both psychiatric research and clinical practice.
Data
Data sources
We used EHR data from 2 sites (MSHS and WCM) to develop the SS/SI NLP pipeline. First, we used 286 692 notes from 33 800 patients derived from the MSHS data warehouse. Notes are from psychiatric inpatient, psychiatric emergency department (ED), and psychiatry consult-liaison encounters between 2011 and 2020. Advantages of the corpus include relative consistency in documentation style and comprehensive biopsychosocial patient evaluations. Of note, the corpus also contains the occasional presence of clinical note templates (found in 14% of the notes) such as: “lacks social support: yes/no/unknown.” For this study, templates were removed while annotating and evaluating the NLP systems; however, in the future, we will compare clinician response to the gold-standard annotations.
Second, we used 48.98 million clinical notes from 558 133 patients derived from the WCM enterprise data warehouse. These notes from 2010 to 2023 comprise patients with any psychiatric diagnosis or antidepressant prescription from outpatient, inpatient, or ED visits. To align WCM data with the MSHS data, NLP development was restricted to psychiatric encounter notes. The study was approved by the Institutional Review Boards at MSHS and WCM.
SS and SI categories
In addition to the 2 coarse-grained categories (SS and SI), we sought to further classify these concepts into distinct fine-grained categories that uniquely impact health and well-being.38 The fine-grained categories are based on the seminal work of House39 and updated to include categories (eg, social networks) known to be relevant to health care outcomes.12 Our workgroup of clinical psychiatrists and psychologists, psychiatric epidemiologists, sociologists, and biomedical informaticians finalized the categories. There is an inherent degree of overlap and subjectivity between the fine-grained categories, for example, “lives with family” could conceivably be characterized as instrumental support, emotional support, social network, or perhaps a general category of SS. Therefore, to distinguish between categories we created an open-source annotation rule book39 to ensure consistency, transparency, and reproducibility (see Supplementary Material: Annotation Guideline). The final fine-grained SS subcategories included social network (eg, “goes to church”), emotional support (eg, “can talk about her problems”), instrumental support (eg, “home health aide”), and a general subcategory (eg, “patient has good social supports”), which is assigned when there is insufficient detail to ascertain a category. The final fine-grained SI subcategories included loneliness (eg, “feelings of loneliness”), no social network (eg, “socially isolated”), no emotional support (eg, “no one to confide in”), no instrumental support (eg, “homeless”), and a general subcategory (eg, “no social support”). The schema for SS and SI are presented in Figure 1.
Figure 1.
Overview of social support (SS) and social isolation (SI) categories, along with their corresponding fine-grained/subcategories categories for annotation.
When defining the SS and SI fine-grained categories, the mirroring categories were often antithetical. For example, “friends” is in the lexicon for SS: social network, and “no friends” is in the lexicon for SI: no social network. Furthermore, when we fine-tuned the models with examples of “yes,” “no,” and “not relevant” labels, we leveraged this antithetical property. For example, “Pt disclosed to family members and friends” is a training example for SS: social network labeled as “yes” and SI: no social network labeled as “no.” However, a mix of these categories can be present even within the same encounter note for an individual patient. For example, a real note indicated that a patient “needed a place to stay,” indicating that this patient has SI: no instrumental support, but also (in the same note) that this patient “goes to events with his coworkers” which qualifies as SS: social network. According to our annotation rule book, both SS and SI were observed for this patient. For these reasons, when notes were evaluated at the document level, the coarse-grained options to describe the note were: SS: “yes”/“no” and, independently, SI: “yes”/“no.”
Methods
Lexicon creation and expansion
The computational approaches to any NLP tasks require annotated lexicons and gold-standard data.2 We collected lexicons for fine-grained categories of SS and SI using an iterative method that included manual chart review and semiautomatic methods.
Manual chart review
Zhu et al.27 developed a lexicon for identifying SI from clinical notes of patients with prostate cancer. Initially this lexicon, which included 24 terms, was selected; however, it yielded relatively fewer clinical notes at MSHS and WCM compared to the published report. A list of terms for each category was created and extensively reviewed by the study team. The team then reviewed 50 notes from each site to identify new keywords for each SS and SI subcategory to enrich the existing lexicon.
Semiautomatic method
Each subcategory lexicon from the manual chart review as above was enhanced using word embeddings. First, the manually generated keywords/terms in each lexicon were vectorized using word2vec40 and eqn (1)
| (1) |
Here, and refer to the lexicon vector and word vector from a lexicon, respectively, and N is the number of words in a lexicon. Then, the top 10 similar keywords were identified for the vector using word2vec’s most_similar function. The word2vec model was trained on 1 million randomly selected clinical notes at WCM (separate from the notes otherwise involved in the pipeline) using the gensim package (https://radimrehurek.com/gensim/). The new set of keywords was manually reviewed and selected for the next round of vectorization and similarity matching. This process was repeated until the workgroup reached a consensus on the quality of the final set of lexicons.
Gold standard corpus development
To create gold-standard data for developing the NLP pipelines, we randomly selected 300 notes from 300 unique patients at MSHS and 225 notes from 221 unique patients at WCM for fine- and coarse-grained manual annotation. Notes were chosen from unique patients to maximize the contextual diversity of SS/SI terms (unique patients maximizes different note writers, different time periods, and avoiding redundancy caused by copy-forward practices within a single patient’s EHR). To optimize the gold-standard annotation set of notes, those selected for review were enriched for mentions of SS and SI: 75 notes were selected that had at least 1 occurrence of an SI lexicon term, another 75 notes for SS, and finally 75 notes were randomly selected from remainder of the underlying corpus. At MSHS, 75 additional notes were selected that contained a clinical note template to further enrich the annotation corpus for notes in which a clinician was prompted (by the template) to assess SS/SI.
The Brat Rapid Annotation Tool (BRAT)41 was used to annotate the notes manually with the same annotation configuration schema across sites. The annotation guideline (“rule book” and “annotation guideline” are used interchangeably) and lexicons are provided in Table S3. Initially, the annotations were performed at the entity level (every instance of a lexicon term in the note text) using BRAT. For evaluation, the entity-level annotations were converted to “document” (note) level. For example, if there was a single entity mentioning loneliness and 2 mentions of instrumental support in a given note, both the loneliness and instrumental support subcategories would be assigned to that note. Finally, the coarse-grained categories were assigned to each document using rules: SS (SI) was assigned to a document if there was 1 or more mention of any SS (SI) subcategory. A note that contained the entities above would be annotated with both SI (for loneliness) and SS (for instrumental support).
The notes were meticulously reviewed by 2 annotators and disagreements were resolved by a third adjudicator to create the final gold-standard corpus. For coarse-grained annotation, the interannotator agreement (IAA) and Cohen’s Kappa scores were 0.92 (MSHS) and 0.86 (WCM); for fine-grained, 0.77 (MSHS) and 0.81 (WCM). Counts of fine- and coarse-grained categories found in the gold-standard data are provided in Table S5.
The rule book was used to train the annotators and was continually updated during the adjudication process. Often, disagreeing annotations could both be seen as correct given the inherent subjectivity of the classification process; however, new rules were created to arrive at one consistent label for edge cases. Sometimes, rules were created for more practical reasons, for example, mentions of “psychotherapy” were excluded from emotional support because otherwise almost every note in the MSHS psychiatric corpus would be flagged. Of note, mentions were only labeled when SS/SI was explicit and not implied. For example, a mention of “boyfriend” (which might imply SS) or “living alone” (which might imply SI) would not be labeled as such unless more context was provided (eg, “feels very supported by boyfriend”; “does not know anyone in the city and lives alone”). The general subcategory became a “catch-all” for mentions that clearly involved support or isolation, but a single fine-grained category could not be discerned. For example, often clinical notes may say something like: “patient is experiencing social isolation,” but not give further details. At both institutions, the IAA was reflective of the subjective, overlapping nature of the fine-grained subcategories. Another reason for disagreements between annotators was the site-specific familiarity required to recognize acronyms and social services, for example, HASA stands for the HIV/AIDS Services Administration.
System description
We developed rule- and LLM-based systems to identify mentions of fine-grained categories in clinical notes. Rules were then utilized to translate sentence-level to note-level classifications and fine-grained to coarse-grained labels as mentioned in “Gold Standard Corpus Development”. An architecture of the 2 NLP systems is provided in Figure 2.
Figure 2.
Architecture of the rule- and large language model (LLM)-based NLP systems for identifying fine- and coarse-grained categories of SS/SI. For the LLM input, a single clinical note was sliced into multiple sentences due to the restriction of 512 tokens. The sentence-level fine-grained categories were combined to provide document-level fine-grained categories. Finally, the rules in “Gold Standard Corpus Development” were used to identify the coarse-grained categories from fine-grained categories.
Rule-based system
As aforementioned, a major advantage of the RBS is full transparency in how classification decisions are made. We implemented the system using the open-source spacy Matcher (https://spacy.io/api/matcher).42 Additionally, we compiled a list of exclusion keywords (see Table S4) to refine the rules, ensuring relevant identification.
Supervised models
Expanding on the published literature, we implemented support vector machines (SVMs) and Bidirectional Encoder Representations from Transformers (BERT) models at WCM to identify fine-grained categories. However, these models were inappropriate due to few SS/SI mentions in the corpus (see Supplementary Material and Table S6).
At MSHS, SVMs and BERT models were not the best fit due to the nature of the data annotations. Unlike WCM dataset where mentions are sparse, the enriched inpatient and psychiatric emergency department notes at MSHS were filled with frequent mentions of certain categories. Because of this, manual annotators were trained to tag only the first occurrence of a category in each note, even if it appeared multiple times. For example, if the term “homeless” appeared 12 times in a note under the subcategory of no instrumental support, annotators would tag only the first instance. This annotation approach was adopted as we evaluated the NLP pipelines at the document (note) level.
However, this annotation method poses a challenge for training models like SVM or BERT, which typically learn from sentences rather than entire notes. In this case, some sentences containing the term “homeless” would not be labeled as no instrumental support, making it difficult for the model to learn the correct associations.
Large language models
We developed a semiautomated method to identify SS and SI using an open-source advanced fine-tuned LLM called “Fine-tuned LAnguage Net-T5 (FLAN-T5)”.43,44 Specifically, we used FLAN-T5 in a “question-answering” fashion to extract sentences from clinical texts with mentions of SS and SI subcategories. A separate fine-tuned model was created for each of the fine-grained categories.
Model selection
T5 has been used for other classification tasks in clinical notes, and the FLAN version of T5, which employs chain-of-thought (COT) prompting, does not require labeled training data.44 Five variants of FLAN-T5 are publicly available based on the number of model parameters (https://huggingface.co/docs/transformers/model_doc/flan-t5). Guevara et al.33 observed that FLAN-T5-XL (3B parameters) performed better than the smaller models (FLAN-T5-L, FLAN-T5-base, and FLAN-T5-small) with no significant improvement with the larger FLAN-T5-XXL (11B parameters). Previous experiments using the FLAN-T5-XXL model demonstrated only marginal improvements over the FLAN-T5-XL model, with a 1.5% gain in text ranking45 and a 2.9% improvement in question answering.44 Informed by these findings, we selected FLAN-T5-XL for experimentation in this study.
Zero-shot
Given that LLMs follow instructions and are trained on massive amounts of data, they do not necessarily require labeled training data.46 This “zero-shot” approach was performed by providing model instruction, context, question, and choice (“yes,” “no,” or “not relevant”). An example is provided in Table 1. The choice “no” was selected for contexts that were negated (eg, “does not feel lonely”), and “not relevant” was chosen for those that did not pertain to the subcategory or the question.
Table 1.
Example of instruction, question, context, choices, and answer for fine-tuning the loneliness subcategory using FLAN-T5-XL model.
| Instruction | Read what the Clinician wrote about the patient in the Context and answer the Question by choosing from the provided Choices |
| Question | In the Clinician’s opinion, “does or did the patient experience feelings of loneliness?” |
| Choices | yes; no; not relevant |
| Context 1 | The Clinician wrote: “Pt continues to express feelings of loneliness.” |
| Answer | yes |
| Context 2 | The Clinician wrote: “He denies suffering from loneliness.” |
| Answer | no |
| Context 3 | The Clinician wrote: “Pt is currently homeless.” |
| Answer | not relevant |
Fine-tuning
Since the base (zero-shot) FLAN-T5-XL model (even with instructions) had poor F1-scores (see Table S8), the models were improved by fine-tuning them with synthetic examples33,47 that could help the model learn about the specific SS or SI subcategories. For each fine-grained category, about 50 yes, 50 no, and 50 not relevant examples were created. The synthetic examples themselves became a validation set to fine-tune the parameters. ChatGPT (GPT 4.0) (https://openai.com/blog/chatgpt) was used to help craft context examples, but ultimately after several iterations in the validation set, they were refined by the domain experts so that each example was specifically instructive about the inclusions and exclusions of the category. Examples of prompts for loneliness are provided in Table 1. All fine-tuning examples and questions for each subcategory are provided in the Supplementary Material and Table S7. Furthermore, giving the LLMs specific stepwise instructions to follow (“instruction tuning”) has been shown to improve the performance by reducing hallucinations.48,49 Therefore, we added instructions as a part of the prompt.
Parameters
Previously, the parameter-efficient Low-Rank Adaptation (LoRA) fine-tuning method was used with FLAN-T5 models to identify SDOH categories.33 However, the newer infused adapter by inhibiting and amplifying inner activations was selected for its better performance.50 We fine-tuned the data on 15-20 epochs. Fine-tuning parameters can be viewed in our code repository (https://github.com/CornellMHILab/Social_Support_Social_Isolation_Extraction).
Evaluation
All evaluations were performed at the note level for both the fine- and coarse-grained categories. To validate the NLP systems, precision, recall, and macroaveraged F1-score were calculated to give equal weight to the number of instances (see Supplementary Material: Evaluation Metric). Instances of emotional support and no emotional support subcategories were rare in the underlying notes (see Table S5 for full counts) and therefore the accuracy could not be assessed. The results of the rule-based system were analyzed by gender, race, and ethnicity to assess potential bias.
Results
Demographics
The demographic characteristics of patients within the annotated cohort are detailed in Table 2. Notably, the patient composition at MSHS was younger and more diverse as compared to patients at WCM.
Table 2.
Patient demographics from annotated data (# of patients [%]).
| Characteristics | MSHS | WCM |
|---|---|---|
| Age | ||
| <18 | 34 (9.04) | 20 (9.05) |
| 18-39 | 145 (38.56) | 39 (17.65) |
| 40-59 | 120 (31.91) | 82 (37.10) |
| ≥60 | 77 (20.48) | 80 (36.20) |
| Sex | ||
| Female | 194 (51.60) | 114 (51.58) |
| Male | 182 (48.40) | 107 (48.42) |
| Race | ||
| White | 86 (28.67) | 88 (39.82) |
| Black or AA | 112 (37.33) | 36 (16.29) |
| Asian | 7 (2.33) | 5 (2.26) |
| Other | 95 (31.67) | 50 (22.62) |
| Unknown | — | 42 (19.00) |
| Ethnicity | ||
| Hispanic | 69 (23.00) | 35 (15.85) |
| Non-Hispanic | 190 (63.33) | 186 (84.16) |
| Unknown | 41 (13.67) | — |
Abbreviations: AA, African American; MSHS, Mount Sinai Health System; WCM, Weill Cornell Medicine.
System performance
Analysis was conducted using the gold-standard, manually annotated data. The macroaveraged precision, recall, and F1-scores for classifying fine- and coarse-grained SS and SI categories at note level are provided in Table 3.
Table 3.
Macroaveraged precision (P), recall (R), and F1-scores (F) of different NLP pipelines for fine- and coarse-grained category classification.
| Rule |
FLAN-T5-XL |
||||||
|---|---|---|---|---|---|---|---|
| Categories | P | R | F | P | R | F | |
| MSHS | |||||||
| SS | Soc. network | 0.84 | 0.83 | 0.84 | 0.66 | 0.70 | 0.60 |
| Emotional | 0.49 | 0.49 | 0.49 | — | — | — | |
| Instrumental | 0.94 | 0.91 | 0.92 | 0.60 | 0.64 | 0.43 | |
| General | 0.86 | 0.82 | 0.83 | 0.65 | 0.69 | 0.57 | |
| SI | Loneliness | 0.96 | 0.97 | 0.97 | 0.65 | 0.89 | 0.69 |
| No soc. network | 0.92 | 0.94 | 0.93 | 0.63 | 0.83 | 0.65 | |
| No emotional | 1.00 | 1.00 | 1.00 | — | — | — | |
| No instrumental | 0.97 | 0.91 | 0.94 | 0.63 | 0.76 | 0.55 | |
| General | 0.93 | 0.92 | 0.93 | 0.77 | 0.83 | 0.79 | |
| Average | 0.89 | 0.91 | 0.90 | 0.65 | 0.76 | 0.62 | |
| SS | 0.85 | 0.85 | 0.84 | 0.75 | 0.61 | 0.55 | |
| SI | 0.95 | 0.95 | 0.95 | 0.80 | 0.76 | 0.72 | |
| Average | 0.90 | 0.89 | 0.89 | 0.78 | 0.69 | 0.65 | |
| WCM | |||||||
| SS | Soc. network | 0.83 | 0.82 | 0.82 | 0.81 | 0.83 | 0.82 |
| Emotional | 0.62 | 0.80 | 0.60 | — | — | — | |
| Instrumental | 0.81 | 0.75 | 0.75 | 0.82 | 0.81 | 0.80 | |
| General | 0.93 | 0.88 | 0.90 | 0.77 | 0.84 | 0.77 | |
| SI | Loneliness | 0.99 | 0.95 | 0.97 | 0.84 | 0.93 | 0.88 |
| No soc. network | 0.93 | 0.69 | 0.74 | 0.81 | 0.88 | 0.75 | |
| No emotional | 0.98 | 0.62 | 0.69 | — | — | — | |
| No instrumental | 0.93 | 0.83 | 0.87 | 0.76 | 0.81 | 0.77 | |
| General | 0.94 | 0.89 | 0.91 | 0.81 | 0.80 | 0.80 | |
| Average | 0.84 | 0.81 | 0.82 | 0.80 | 0.85 | 0.81 | |
| SS | 0.77 | 0.73 | 0.74 | 0.86 | 0.79 | 0.81 | |
| SI | 0.94 | 0.94 | 0.94 | 0.82 | 0.82 | 0.82 | |
| Average | 0.86 | 0.85 | 0.85 | 0.84 | 0.81 | 0.82 | |
Here, we used fine-tuning and instruction for the FLAN-T5-XL model. The highest scores for individual categories are underlined.
Abbreviations: FLAN-T5, Fine-tuned LAnguage Net-T5; MSHS, Mount Sinai Health System; SI, social isolation; SS, social support; WCM, Weill Cornell Medicine.
At MSHS, the RBS outperformed the LLM-based system for both fine- and coarse-grained classification. For the fine-grained categories, the RBS achieved macroaveraged F1-score of 0.90, compared to 0.62 for the LLM. For coarse-grained classification, the RBS had macroaveraged F1-score of 0.89 versus 0.65 for the LLM.
At WCM, the RBS also outperformed the LLM for fine-grained classification with macroaveraged F1-scores of 0.82 versus 0.81, respectively. The coarse-grained categories were similar, with a macroaveraged F1-score of 0.85 for the RBS compared to 0.82 for FLAN. The performance of the zero-shot FLAN-T5-XL model is provided in Table S8.
Demographic comparison
When assessed by gender, race, and ethnicity, there were no statistically significant differences in performances between groups for RBS at either site (Table S9).
Comparison to ICD codes
There were zero visits associated with the annotated clinical notes where SI was captured by the structured ICD codes (ICD-10: “Z60.2,” “Z60.4,” “Z60.9”; ICD-9: “V60.3,” “V62.4” [see Table S10]). Without the NLP pipelines, the presence of SI in the EHRs will not be captured at either site. However, there are no corresponding ICD codes available for SS, making comparison impossible.
Discussion
This study presents rule- and LLM-based NLP systems to identify fine-grained categories of SS and SI in clinical notes of psychiatric patients. A primary goal of the study was to develop and validate two portable and open-source NLP systems. Given that none of the selected clinical notes were associated with SI-related ICD codes, the development of the NLP systems enabled the identification and subcategorization of this known risk factor.
Comparative accuracy performance was initially unexpected given that LLMs typically outperform RBS on related tasks.34 Upon manual review of the results, it became evident that the rule- and LLM-based approaches solved the task in different ways, both of which could be considered valid. However, these differences are not reflected in the performance metrics, highlighting the importance of qualitative review of the results by domain experts. The performance of the RBS appears to have been significantly better than that of the LLM since the RBS is most similar to the manual annotation rule book (and the resulting gold-standard annotations that are assigned as ground-truth when evaluating system performance). Indeed, the rule book and the lexicons were developed together by domain experts, with the goal of the lexicon-based approach to approximate the rules as closely as possible.
Furthermore, the gold-standard annotations and the RBS assign a single label for each SI/SS occurrence, whereas the LLM system can assign multiple labels. This is a consequence of having separate fine-tuned LLMs for each of the SI and SS subcategories. Future work is warranted to improve model accuracy when adapting COT question-answering for multilabel classification tasks.33 Another difference is that the rule book and lexicons took a conservative approach, only assigning a label if the concept was explicit, whereas the LLM was more flexible. For example, “she feels depressed and suicidal because she has no friends and no boyfriend” was labeled by the RBS as well as the gold-standard annotation as no social network because having no friends is in the lexicon and rule book. In contrast, the LLM inferred both no social network and loneliness based on a more nuanced understanding of the entire context.
Comparing efficiency in the development of the RBS and LLMs, we assumed that the LLMs would require less manual input given that FLAN-T5 is “few-shot,” requiring no labeled training data.46 However, without fine-tuning, the model performed poorly and therefore required fine-tuning with synthetic examples. The iterative validation process revealed that strategic tuning examples were required to coerce the LLM to override the colloquial understanding of the categories for the task-specific definitions. Still, there were some concepts that the LLM would not unlearn during fine-tuning, even with a higher learning rate. For example, in the no instrumental support model, the synthetic example “doesn’t have a lot of spending money on hand to engage in the activities she would like” continued to be labeled “yes” rather than the correct “not relevant” label. There were many cases where the LLM identified the presence of SI/SS correctly that were neither identified by the RBS nor present in the gold-standard annotations as dictated by the rule book. For example, “Lived in an Assisted Living facility for a year,” and “Pt hasn’t been in touch with her family” would have otherwise been missed. For a more comprehensive qualitative comparison, see Table S11.
The performance of both the RBS and LLMs was relatively poor in identifying instrumental support. This is likely because the keywords often contain site-specific names and entities such as “HASA” and “Lenox Hill Neighborhood House,” to name a couple of the many examples. Another area where both approaches fell short was in the plan section of the psychiatric clinical note. Part of the clinical plan might be to increase social connectedness. The manual annotators were easily able to understand that context whereas both systems generated false-positive labels. Further erroneous examples from the rule- and LLM-based systems are provided in Table S11.
The RBS performed comparably at MSHS and WCM; however, the LLM performed better at WCM compared to MSHS. This is likely related to 2 key differences. The first is the higher frequency of SS/SI mentions at MSHS (eg, 75.3% of notes at MSHS had a manually annotated mention of SS versus 52.2% at WCM). A related difference is that, in addition to SS/SI mentions that fit the inclusion criteria for manual annotations, MSHS also has more mentions of SS/SI concepts that were not discovered by the RBS but were identified by the LLM. These differences are likely related to the MSHS underlying corpus being from clinical care sites (such as inpatient psychiatry) with comprehensive psychiatric evaluations that systematically include overall more SDOH information.
The work described in this article expands on the body of literature by specifically focusing on the fine-grained classification of SS and SI, a novel approach not undertaken by earlier studies. Specifically, Guevara et al.33 utilized LLM-based classification for SS and adverse SS (what we refer to as SI). The best F1-scores were 0.60 (FLAN-T5-XXL) and 0.56 (FLAN-T5-XL) for SS and adverse SS, respectively, across 154 test documents. In contrast, Zhu et al.27 deployed 24 lexicons and the Linguamatics I2E NLP tool for identifying SI and achieving an F1-score of 0.93 from 194 clinical notes. Our study presents superior outcomes across two sites and with more specific (fine-grained) categories.
Limitations
Several limitations should be noted. There were insufficient instances of the emotional support subcategories found in the notes to evaluate those NLP systems. Emotional support (and lack thereof) is an important and distinct fine-grained category that would ideally be identified in the notes. Second, the RBS was designed with specific lexicons from manual review at MSHS and WCM, and may have experienced overfitting and led to an inflated F1-score. It would be beneficial to validate these NLP systems on clinical notes from different EHR systems. Third, other health care systems that implement a lexicon-based rules approach will need to perform site-specific template removal to avoid the problem of false-positives. With fine-tuning, the LLM approach may have been able to interpret the templates correctly; however, because the templates were removed from the notes before the annotation process, this was not assessed. In the future, we will compare clinician responses in the templates to the gold-standard annotations as well as system outputs. Finally, FLAN-T5-XL was the most appropriate choice for this use case, based on the parameters and goals of the investigation. However, the generalizability of using LLMs to differentiate subjective descriptors of SS and SI is limited, as only 1 LLM was tested. Future research will not only compare the performance of additional LLMs but also assess their acceptability.
Conclusion
We offer 2 open-source NLP systems with different approaches as well as a manual annotation guideline for identifying SS and SI. The rule-based approach and the LLM approach each have strengths and limitations in performing this challenging task of creating a portable system to identify fine- and coarse-grained categories of SS and SI in the notes of psychiatric patients.
Supplementary Material
Contributor Information
Braja Gopal Patra, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
Lauren A Lepow, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Praneet Kasi Reddy Jagadeesh Kumar, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
Veer Vekaria, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
Mohit Manoj Sharma, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
Prakash Adekkanattu, Information Technologies and Services, Weill Cornell Medicine, New York, NY 10065, USA.
Brian Fennessy, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Gavin Hynes, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Isotta Landi, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Jorge A Sanchez-Ruiz, Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN 55905, USA.
Euijung Ryu, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA.
Joanna M Biernacka, Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN 55905, USA; Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA.
Girish N Nadkarni, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Ardesheer Talati, Department of Psychiatry, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA; New York State Psychiatric Institute, New York, NY 10032, USA.
Myrna Weissman, Department of Psychiatry, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA; New York State Psychiatric Institute, New York, NY 10032, USA.
Mark Olfson, Department of Psychiatry, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA; New York State Psychiatric Institute, New York, NY 10032, USA; Columbia University Irving Medical Center, New York, NY 10032, USA.
J John Mann, New York State Psychiatric Institute, New York, NY 10032, USA; Columbia University Irving Medical Center, New York, NY 10032, USA.
Yiye Zhang, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
Alexander W Charney, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Jyotishman Pathak, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA; Department of Psychiatry, Weill Cornell Medicine, New York, NY 10065, USA.
Author contributions
Braja Gopal Patra and Lauren A. Lepow worked on planning, data curation, data annotation, data extraction, software development, writing - original draft, and writing - review & editing. Praneet Kasi Reddy Jagadeesh Kumar worked on LLM implementation and editing. Veer Vekaria helped in data annotation, writing - review & editing. Mohit Manoj Sharma, Brian Fennessy, and Gavin Hynes helped in data annotation. Jyotishman Pathak and Prakash Adekkanattu helped in planning, advising, and writing - review & editing. Myrna Weissman, Mark Olfson, and J. John Mann helped in initial category identification, lexicon development, advising, and writing - review & editing. Isotta Landi, Jorge A. Sanchez-Ruiz, Euijung Ryu, Joanna M. Biernacka, Girish N. Nadkarni, Ardesheer Talati, Yiye Zhang, and Alexander W. Charney helped in advising and writing - review & editing. Jyotishman Pathak and Alexander W. Charney provided resources and funding acquisition.
Supplementary material
Supplementary material is available at Journal of the American Medical Informatics Association online.
Funding
This study was funded in part by grants from the National Institutes of Health (R01 MH119177, R01 MH121907, R01 MH121921, R01 MH121922, R01 MH121923, and R01 MH121924). This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai Health System and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. The research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under award numbers S10OD026880 and S10OD030463. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflicts of interest
None declared.
Data availability
All codes for rule- and LLM-based systems are available on GitHub (https://github.com/CornellMHILab/Social_Support_Social_Isolation_Extraction).
References
- 1. World Health Organization. Social determinants of health. Accessed October 29, 2023. https://www.who.int/health-topics/social-determinants-of-health#tab=tab\_1.
- 2. Patra BG, Sharma MM, Vekaria V, et al. Extracting social determinants of health from electronic health records using natural language processing: a systematic review. J Am Med Inform Assoc. 2021;28:2716-2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Polikandrioti M. Perceived social isolation in heart failure. J Innov Card Rhythm Manag. 2022;13:5041-5047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Zhu W, Wu Y, Zhou Y, et al. Living alone and clinical outcomes in patients with heart failure with preserved ejection fraction. Psychosom Med. 2021;83:470-476. [DOI] [PubMed] [Google Scholar]
- 5. Lin T-K, Hsu B-C, Li Y-D, et al. The impact of sources of perceived social support on readmissions in patients with heart failure. J Psychosom Res. 2022;154:110723. [DOI] [PubMed] [Google Scholar]
- 6. Olano-Lizarraga M, Wallström S, Martín-Martín J, Wolf A. Causes, experiences and consequences of the impact of chronic heart failure on the person’s social dimension: a scoping review. Health Soc Care Comm. 2022;30:e842-e858. [DOI] [PubMed] [Google Scholar]
- 7. Cené CW, Beckie TM, Sims M, et al. Effects of objective and perceived social isolation on cardiovascular and brain health: a scientific statement from the American Heart Association. J Am Heart Assoc. 2022;11:e026493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Naito R, McKee M, Leong D, et al. Social isolation as a risk factor for all-cause mortality: systematic review and meta-analysis of cohort studies. PLoS One. 2023;18:e0280308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. US Surgeon General. Our epidemic of loneliness and isolation. The US Surgeon General’s Advisory on the Healing Effects of Social Connection and Community 2023. 2023. https://www.hhs.gov/sites/default/files/surgeon-general-social-connection-advisory.pdf [PubMed]
- 10. Cacioppo JT, Cacioppo S. Social relationships and health: the toxic effects of perceived social isolation. Soc Personal Psychol Compass. 2014;8:58-72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Kim MH, An JH, Lee HR, Jeong SH, Hwang SJ, Hong JP. Social isolation, loneliness and their relationships with mental health status in South Korea. Psychiatry Investig. 2021;18:652-660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Wickramaratne PJ, Yangchen T, Lepow L, et al. Social connectedness as a determinant of mental health: a scoping review. PLoS One. 2022;17:e0275004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Czaja SJ, Moxley JH, Rogers WA. Social support, isolation, loneliness, and health among older adults in the prism randomized controlled trial. Front Psychol. 2021;12:728658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lamblin M, Murawski C, Whittle S, Fornito A. Social connectedness, mental health and the adolescent brain. Neurosci Biobehav Rev. 2017;80:57-68. [DOI] [PubMed] [Google Scholar]
- 15. Holt-Lunstad J. Social connection as a critical factor for mental and physical health: evidence, trends, challenges, and future implications. World Psychiatry. 2024;23:312-332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Purcell C, Dibben G, Hilton Boon M, et al. Effectiveness of social network interventions to support cardiac rehabilitation and secondary prevention in the management of people with heart disease. Cochrane Database Syst Rev. 2021;6:CD013820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Freak-Poli R, Ryan J, Neumann JT, et al. Social isolation, social support and loneliness as predictors of cardiovascular disease incidence and mortality. BMC Geriatr. 2021;21:711-714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. DiMatteo MR. Social support and patient adherence to medical treatment: a meta-analysis. Health Psychol. 2004;23:207. [DOI] [PubMed] [Google Scholar]
- 19. DiMatteo MR. Variations in patients’ adherence to medical recommendations: a quantitative review of 50 years of research. Med Care. 2004;42:200-209. [DOI] [PubMed] [Google Scholar]
- 20. Robles TF, Kiecolt-Glaser JK. The physiology of marriage: pathways to health. Physiol Behav. 2003;79:409-416. [DOI] [PubMed] [Google Scholar]
- 21. Uchino BN. Social support and health: a review of physiological processes potentially underlying links to disease outcomes. J Behav Med. 2006;29:377-387. [DOI] [PubMed] [Google Scholar]
- 22. Hostinar CE, Sullivan RM, Gunnar MR. Psychobiological mechanisms underlying the social buffering of the hypothalamic–pituitary–adrenocortical axis: a review of animal models and human studies across development. Psychol Bull. 2014;140:256-282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Holt-Lunstad J. The potential public health relevance of social isolation and loneliness: prevalence, epidemiology, and risk factors. Public Policy Aging Rep. 2017;27:127-130. [Google Scholar]
- 24. Trucco EM, Fava NM, Villar MG, Kumar M, Sutherland MT. Social isolation during the covid-19 pandemic impacts the link between child abuse and adolescent internalizing problems. J Youth Adolesc. 2023;52:1313-1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Thakur H, Stutts M, Choi JW, Temple JR, Cohen JR. Adolescent loneliness during the covid-19 pandemic: the role of pre-pandemic risk factors. Child Indic Res. 2023;16:617-639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Truong HP, Luke AA, Hammond G, Wadhera RK, Reidhead M, Joynt Maddox KE. Utilization of social determinants of health ICD-10 Z-codes among hospitalized patients in the united states, 2016-2017. Med Care. 2020;58:1037-1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Zhu VJ, Lenert LA, Bunnell BE, Obeid JS, Jefferson M, Halbert CH. Automatically identifying social isolation from clinical narratives for patients with prostate cancer. BMC Med Inform Decis Mak. 2019;19:1-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Kharrazi H, Anzaldi LJ, Hernandez L, et al. The value of unstructured electronic health record data in geriatric syndrome case identification. J Am Geriatr Soc. 2018;66:1499-1507. [DOI] [PubMed] [Google Scholar]
- 29. Greenwald JL, Cronin PR, Carballo V, Danaei G, Choy G. A novel model for predicting rehospitalization risk incorporating physical function, cognitive status, and psychosocial support using natural language processing. Med Care. 2017;55:261-266. [DOI] [PubMed] [Google Scholar]
- 30. Bhatt S, Johnson PC, Markovitz NH, et al. The use of natural language processing to assess social support in patients with advanced cancer. Oncologist. 2023;28:165-171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Navathe AS, Zhong F, Lei VJ, et al. Hospital readmission and social risk factors identified from physician notes. Health Serv Res. 2018;53:1110-1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Wang L, Lakin J, Riley C, Korach Z, Frain LN, Zhou L. Disease trajectories and end-of-life care for dementias: latent topic modeling and trend analysis using clinical notes. AMIA Ann Symp Proc 2018;2018:1056. [PMC free article] [PubMed] [Google Scholar]
- 33. Guevara M, Chen S, Thomas S, et al. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med. 2024;7:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Lybarger K, Bear Don’t Walk OJ, Yetisgen M, Uzuner Ö. Advancements in extracting social determinants of health information from narrative text. J Am Med Inform Assoc. 2023;30:1363-1366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Barrera M. Jr., Distinctions between social support concepts, measures, and models. Am J Comm Psychol. 1986;14:413-445. [Google Scholar]
- 36. Ge L, Yap CW, Ong R, Heng BH. Social isolation, loneliness and their relationships with depressive symptoms: a population-based study. PLoS One. 2017;12:e0182145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Masi CM, Chen H-Y, Hawkley LC, Cacioppo JT. A meta-analysis of interventions to reduce loneliness. Pers Soc Psychol Rev. 2011;15:219-266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Heaney CA, Israel BA. Social networks and social support. Health Behav Health Educ: Theory, Res Pract. 2008;4:189-210. [Google Scholar]
- 39. House JS. Social support and social structure. Sociol Forum. 1987;2:135-146. [Google Scholar]
- 40. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst. 2013;26:1-9. [Google Scholar]
- 41. NLPLAB. Brat rapid annotation tool. Accessed October 2021. https://brat.nlplab.org.
- 42. Honnibal M., Montani I., Van Landeghem S, Boyd A. spaCy: industrial-strength natural language processing in Python. Zenodo. https://spacy.io/
- 43. Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21:5485-5551. [Google Scholar]
- 44. Won Chung H, Hou L, Longpre S, et al. Scaling instruction-finetuned language models. arXiv, arXiv:2210.11416, 2022:1-22, preprint: not peer reviewed.
- 45. Qin Z, Jagerman R, Hui K, et al. Large language models are effective text rankers with pairwise ranking prompting. In: Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics; 2024:1504-1518.
- 46. Chae Y, Davidson T. Large Language Models for Text Classification: From Zero-Shot Learning to Fine-Tuning. Open Science Foundation. 2023.
- 47. Li Z, Zhu H, Lu Z, Yin M. Synthetic data generation with large language models for text classification: potential and limitations. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2023:10443-10461.
- 48. Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Adv Neural Inform Process Syst. 2022;35:27730-27744. [Google Scholar]
- 49. Zhou W, Zhang S, Poon H, Chen M. Context-faithful prompting for large language models. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2023:14544-14556.
- 50. Liu H, Tam D, Muqeeth M, et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv Neural Inform Process Syst. 2022;35:1950-1965. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All codes for rule- and LLM-based systems are available on GitHub (https://github.com/CornellMHILab/Social_Support_Social_Isolation_Extraction).


