Abstract
Background:
Patients with frailty have a higher risk of major postoperative mortality and morbidity. Identifying frailty from the medical record, however, is not straightforward since it is a multifactorial state based on multiple organ systems and a sum of factors accumulated over time. The objective of this study was to develop a large language model-based binary classifier using accurately phenotyped datasets to identify preoperative frailty from clinical notes.
Methods:
We trained various large language models to identify frailty from anesthesia preoperative clinic notes. There were two development datasets used: (1) patients undergoing spine surgery whose frailty was characterized by patient responses to the Vulnerable Elders-13 Survey (VES-13); and (2) patients undergoing surgery whose frailty was characterized by their calculated electronic frailty index (eFI) score.
Results:
When trained on our VES-13 development set and tested on our VES-13 validation set, the area under the receiver operating characteristics curve (AUC) for the RoBERTa, BERT, BioBERT, and PubMedBERT models was 0.99, 0.64, 0.67, and 0.73, respectively. When tested on the eFI validation set, the AUCs were 0.63, 0.83, 0.87, and 0.87, respectively. Models trained on the eFI development dataset did not discriminate frailty adequately when tested on the VES-13 validation set.
Conclusion:
We report the development and validation of a classifier that detects older adults at risk for preoperative frailty from preoperative anesthesia clinical notes. Large language models can be used to accurately identify a difficult-to-quantify and multifactorial characteristic such as frailty in patients by using readily available unstructured information from clinical notes.
Keywords: artificial intelligence, frailty, large language models, surgery
1 |. Introduction
Older adults with frailty have a higher risk of major postoperative mortality and morbidity [1–7], longer hospital lengths of stay, hospital readmissions, and increased total healthcare utilization [8–11]. The prevalence of frailty, which may vary based on the measuring tool and population studied, has been reported to be 4.2%–24% [12–15] in the general population. Frailty is a concept that is based on a patient’s physiological reserve and functional status. In 2011, an international panel of experts agreed that frailty is a multidimensional construct consisting of domains related to physical performance, gait speed, mobility, nutritional status, mental health, and cognition [16]. The decline in physiological reserve associated with frailty may place these patients at increased risk of mortality within 6 months–5 years [17–19]. Thus, timely identification of frailty is crucial as it may allow opportunities for preoperative interventions that could optimize patients for surgery and improve outcomes. Examples of preoperative interventions include exercise programs (e.g., high intensity interval and endurance training, functional exercises), nutritional support, psychological support, polypharmacy management, and comorbidity optimization in collaboration with geriatricians and allied health professionals [20, 21].
Identifying which older adults are at risk of frailty, however, is not straightforward since it is a multifactorial state involving multiple organ systems and a sum of factors accumulated over time. There is no single structured data point (e.g., diagnosis code) assigned to frailty and, thus, it is difficult to capture retrospectively. Models have been developed to measure frailty, but each has strengths and weaknesses, and agreement between measures varies widely [22]. There are several validated instruments which measure frailty, including questionnaires [17, 23–25] and metrics calculated based on diagnosis codes and medications [26–28]. Examples of questionnaires include the Vulnerable Elders-13 Survey (VES-13) [29], 70-item frailty index [24], and Research and Development 36-Item Health Survey [23]. The VES-13 is a validated questionnaire that may measure frailty, but requires every patient to complete a survey of 13 items asking questions related to age, overall health, physical activities, and activities of daily living. While these survey questionnaires are sensitive and specific to characterizing frailty, streamlining the data collection process may be resource-intensive in institutions with high surgical volume and limited preoperative providers available to gather data. Thus, measurements for frailty using structured data points that may be more easily and automatically captured from the electronic health record (EHR) have been described, including the electronic frailty index (eFI) [30]. The eFI was originally developed in the United Kingdom and included 36 deficits from the EHR to estimate a patient’s frailty status [31]. To account for differences between a unified health system like the National Health Service and the multiple health systems in the United States, where patient data is likely to be more fragmented, an adapted eFI using United States patient data was developed consisting of 54 elements [24]. We included the adapted eFI (54 elements), rather than the original 36-element version, in our project to account for these differences. The elements in the eFI include various comorbidities, functional status, laboratory and physical exam findings, and are items which may be captured automatically from the EHR rather than directly obtaining data from the patient via an actual physical exam and questionnaire [30]. While this class of frailty indices may allow for automated screening using the EHR and not depend on the administration of physical questionnaires, they are not direct measurements of frailty.
Large language models are a type of artificial intelligence model built on machine learning techniques trained on vast amounts of data and used to understand and generate text. Since frailty is not simply a diagnosis code, the objective of our study was to develop a binary classification model, which will be a large language model-based classifier using accurately phenotyped datasets to identify frailty preoperatively. Furthermore, we planned to test the classifiers in identifying patients who were frail based on the eFI score. We hypothesized that these large language model-based binary classifiers would accurately identify frailty based on preoperative clinical notes.
2 |. Methods
2.1 |. Study Population
The resulting dataset remained de-identified and did not contain sensitive patient-health information as defined by the institutional Human Research Protections Program and, therefore, was exempt from the informed consent requirement and approved by our institutional review board. In this retrospective observational study, the objective was to develop a large language model-based binary classifier for identifying frailty from preoperative anesthesia clinic notes in patients undergoing spine surgery. We followed guidelines provided by the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.
2.2 |. Study Population and Identification of Frailty
At our institution, the Anesthesia Preparedness Clinic medically evaluates surgical patients prior to surgery. Surgical patients are seen by nurse practitioners or anesthesia residents, who are supervised by an anesthesiology attending. The purpose of the clinic is to perform a comprehensive history and physical and determine if further consultations or medical optimization are required prior to surgery.
For this study, we utilized large language models to develop a binary classifier model that processed anesthesia preoperative clinic history and physical notes to identify frailty from surgical patients. The surgical population underwent major spine surgery, which was defined as multi-level (more than two levels) surgeries involving combinations of laminectomy, fusion, laminotomy, discectomy, vertebroplasty, and/or spinal decompression surgery. We developed and validated a classifier model on two datasets, each containing unique patients, that were labeled as frail based on either VES-13 or the eFI. The VES-1 3 dataset consisted of older adults aged ≥ 65 years who underwent major spine surgery and who completed the VES-13 survey during their clinic appointment. Patients were considered frail if they received a score ≥ 3 on the VES-13 survey (binary classification). The eFI dataset included a separate set of patients who also underwent major spine surgery. Inclusion criteria for the eFI dataset were all patients who underwent complex spine surgery at our institution over 1 year and were not included in the first dataset. Frailty in the eFI dataset was characterized based on their calculated eFI based on a previous study [1]. The eFI is derived from 54 elements that include various comorbidities, functional status, laboratory, and physical exam findings and are items that are captured automatically from the EHR (rather than directly obtained data from the patient via an actual physical exam and questionnaire) [30]. Those with a score ≥ 0.3 were determined to have preoperative frailty.
2.3 |. Large Language Models and Statistical Approach
The VES-13 and eFI datasets were split 50%/25%/25% for training/validation/testing for model development, hyperparameter tuning, and validation of model performance, respectively. We refer to the training set as the development set and the test set as the validation set moving forward. Inputs to the large language model-based classifiers were Anesthesia Preparedness Clinic history and physical notes related to the surgery. The note used was the preanesthesia clinic note associated with this surgery. All patients had a preoperative anesthesia note as this was a requirement for our institution (thus there were no missing notes). To account for the token limits for large language model inputs, we extracted textual data from the medical history, history of present illness, and assessment and plan and removed other components of the notes.
The output of the models was the probability of frailty. The overall study design was as follows: (1) using the VES-13 dataset, we trained and internally validated the large language model-based classifier to identify frailty within the dataset in which frailty was labeled based on the VES-13 response; (2) then, we validated this model’s performance on the eFI dataset (instead, patients are labeled as frail based on the eFI); (3) then we used the eFI dataset to develop a model to predict frailty (patients were labeled as frail based on the eFI); and (4) finally, we validated this eFI model to predict frailty in the VES-13 dataset (Figure 1).
FIGURE 1 |.

Illustration of overall methodology. AUC, area under the receiver operating characteristics curve; eFI, electronic frailty index; VES-13, Vulnerable Elders-13 Survey.
Four language models were used to develop our models—Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), Bio-ClinicalBERT, and PubMedBERT. BERT is a natural language processing model developed by Google [32]. Its methodology involves pretraining BERT on extensive text data to learn contextualized word embeddings. After pretraining, BERT is fine-tuned on specific tasks using labeled data to enhance its performance and adapt it to different applications. The bidirectional nature of BERT allows it to capture dependencies and relationships between words in both forward and backward directions. RoBERTa is an extension of BERT and was introduced by Facebook AI [33]. Its methodology involves optimizing the pretraining process of the BERT model by removing the next sentence prediction task and training on larger batches of data for longer epochs. RoBERTa’s architecture is based on the BERT model but incorporates enhancements such as dynamic masking and larger training datasets to achieve potentially better contextual understanding and representation learning. Bio-ClinicalBERT is a specialized variant of BioBERT (a pretrained biomedical language representation model for biomedical text mining), designed specifically for clinical text processing tasks. It is pretrained on a large corpus of electronic clinical notes. It was trained using the Medical Information Mart for Intensive Care III database, a collection of electronic medical records from intensive care unit patients at Beth Israel Hospital [34]. PubMedBERT is a transformer-based language model specifically designed for biomedical natural language processing tasks [35]. It is trained using a sizable corpus of biomedical abstracts from PubMed. It offers a potentially effective tool for knowledge extraction, conducting literature reviews, and creating applications that need to comprehend and process biomedical text.
For large language model training, we focused on training the classification head. We fine-tuned the given models by keeping the base model’s weights fixed and training only the final layers (the classification head) on a smaller dataset relevant to the target task. The inputs into the VES-13 were anesthesia preoperative clinic history and physical notes related to the surgery. The output of the models was the probability of frailty. Hyperparameter tuning was performed on the validation set and included train batch size, learning rate, adam epsilon, and adafactor decay rate. For each model, the final hyperparameter settings for learning rate were 4e–5, which controls the size of the steps taken during training. The max sequence length of 512 was chosen so that the model can effectively capture long-range dependencies. Thus, some inputs were truncated due to this 512 context limit. We used the AdamW optimizer as it is efficient in handling large-scale datasets. The Adam epsilon value was 1e–8. The train batch size was optimized at 8, for balancing computational efficiency and model training effectiveness. The loss function was cross-entropy loss. The models were tested on either the VES-13 or eFI test sets. The performance metrics calculated were area under the receiver operating characteristics curve (AUC), precision–recall curves (which is a better metric for measuring discrimination on imbalanced data compared to AUC), F1-score, accuracy, sensitivity, and specificity. Python (v3.10.12) was used for all analyses. To calculate 95% confidence intervals (CI), we created 1000 bootstrapped replicates with replacement among the test set. The metrics were calculated for each replicate and the CI was subsequently calculated.
3 |. Results
Within our VES-13 dataset, there were 479 unique patients, in which 158 (31.3%) were identified as having preoperative frailty based on their response to the VES-13 questionnaire. The mean age was 73.2 years (standard deviation = 5.1 years). Within our eFI dataset, there were 998 unique patients, in which 64 (6.4%) were identified as having preoperative frailty based on their calculated eFI score (Table 1). The mean age was 66 years (standard deviation = 7.2 years).
TABLE 1 |.
Composition of development and validation sets for VES-13-based frailty and eFI-based dataset.
| Development data |
Validation data |
|||
|---|---|---|---|---|
| Total | Frailty, n (%) | Total | Frailty, n (%) | |
|
| ||||
| VES-13-based frailty dataset | 359 | 115 (32.0) | 120 | 43 (35.8) |
| eFI-based frailty dataset | 748 | 48 (6.4) | 250 | 16 (6.4) |
Note: The VES-13-based frailty dataset consists of 479 anesthesia preoperative notes (written at time of preoperative outpatient visit prior to day of surgery), in which older adults undergoing spine surgery are labeled as frail based on their responses to the VES-13 questionnaire. The eFI-based frailty dataset consists of 998 patients (not limited to older adults) undergoing spine surgery, who are labeled as frail based on the eFI score.
Abbreviations: eFI, electronic frailty index; VES-13, Vulnerable Elders-13 Survey.
3.1 |. Development and Validation of VES-13-Based Large Language Model Classifier for Preoperative Frailty
When developed on our VES-13 training set and validated on our VES-13 validation set, the AUCs [95% CI] for the RoBERTa, BERT, BioBERT, and PubMedBERT models were 0.99 [0.97–1.0], 0.64 [0.56–0.71], 0.67 [0.60–0.73], and 0.73 [0.66–0.79], respectively (Figure 2A). The average precisions were 0.98 [0.95–1.0], 0.54 [0.45–0.63], 0.48 [0.40–0.55], and 0.57 [0.59–0.64], respectively (Figure 2B). When validated on the eFI validation set, the AUCs were 0.63 [0.57–0.69], 0.83 [0.78–0.88], 0.87 [0.83–0.91], and 0.87 [0.85–0.89], respectively (Figure 2C). The average precisions were 0.12 [0.07–0.16], 0.23 [0.17–0.28], 0.30 [0.26–0.33], and 0.28 [0.22–0.34], respectively (Figure 2D).
FIGURE 2 |.

AUC and precision–recall plots for performance of the models (based on RoBERTa, BERT, ClinicalBERT, and PubMedBERT) on the validation set: (A) AUC of the model when the development data was from the VES-13 dataset (patients were labeled as frail based on their responses to the VES-13 instrument) and the validation data was also from the validation hold-out set from the VES-13 dataset; (B) precision–recall curve of the model when the development data was from the VES-13 dataset (patients were labeled as frail based on their responses to the VES-13 instrument) and the validation data was also from the validation hold-out set from the VES-13 dataset; (C) AUC of the model when the development data was from the VES-13 dataset (patients were labeled as frail based on their responses to the VES-13 instrument) and the validation data was from the validation hold-out set from the eFI dataset; (D) precision–recall curve of the model when the development data was from the VES-13 dataset (patients were labeled as frail based on their responses to the VES-13 instrument) and the validation data was from the validation hold-out set from the eFI dataset. AUC, area under the receiver operating characteristics curve; eFI, electronic frailty index; VES-13, Vulnerable Elders-13 Survey.
3.2 |. Development and Validation of eFI-Based Large Language Model Classifier for Preoperative Frailty
When developed on our eFI training set and validated on our eFI validation set, the AUCs for the RoBERTa, BERT, BioBERT, and PubMedBERT models were 0.90 [0.88–0.94], 0.91 [0.87–0.94], 0.88 [0.84–0.92], and 0.90 [0.88–0.93], respectively (Figure 3A). The average precisions were 0.53 [0.46–0.59], 0.57 [0.51–0.63], 0.51 [0.45–0.57], and 0.54 [0.48–0.60], respectively (Figure 3B). When validated on the VES-13 validation set, the AUCs were 0.60 [0.51–0.69], 0.61 [0.52–0.69], 0.61 [0.53–0.68], and 0.62 [0.53–0.70], respectively (Figure 3C). The average precisions were 0.40 [0.31–0.49], 0.42 [0.33–0.51], 0.40 [0.30–0.50], and 0.44 [0.35–0.53], respectively (Figure 3D). Table 2 lists the F1-score, precision, recall, accuracy, sensitivity, and specificity of each of the experiments.
FIGURE 3 |.

AUC and precision–recall plots for performance of the models (based on RoBERTa, BERT, ClinicalBERT, and PubMedBERT) on the validation set: (A) AUC of the model when the development data was from the eFI dataset (patients were labeled as frail based on their calculated score using eFI) and the validation data was also from the validation hold-out set from the eFI dataset; (B) precision–recall curve of the model when the development data was from the eFI and the validation data was also from the validation hold-out set from the eFI dataset; (C) AUC of the model when the development data was from the eFI dataset and the validation data was from the validation hold-out set from the VES-13 dataset; (D) precision–recall curve of the model when the development data was from the eFI dataset and the validation data was from the validation hold-out set from the eFI dataset. AUC, area under the receiver operating characteristics curve; eFI, electronic frailty index; VES-13, Vulnerable Elders-13 Survey.
TABLE 2 |.
Performance metrics of various language models used to predict frailty based on anesthesia preoperative clinic notes.
| Experiment | F1-score | Precision | Recall | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
|
| ||||||
| Development set = VE S -13 | ||||||
| Validation set = VES-13 | ||||||
| BERT | 0.44 | 0.5 | 0.4 | 0.64 | 0.4 | 0.78 |
| RoBERTa | 0.96 | 0.91 | 1 | 0.97 | 1 | 0.95 |
| BioBERT | 0.54 | 0.55 | 0.53 | 0.68 | 0.53 | 0.75 |
| PubMedBERT | 0.62 | 0.61 | 0.63 | 0.73 | 0.63 | 0.73 |
| Development set = VE S -13 | ||||||
| Validation set = eFI | ||||||
| BERT | 0 | 0 | 0 | 0.06 | 0.93 | 0 |
| RoBERTa | 0.11 | 0.06 | 0.92 | 0.06 | 0 | 0.6 |
| BioBERT | 0 | 0 | 0 | 0.94 | 0 | 1 |
| PubMedBERT | 0.36 | 0.33 | 0.41 | 0.91 | 0.41 | 0.94 |
| Development set = eFI | ||||||
| Validation set = eFI | ||||||
| BERT | 0.43 | 0.35 | 0.56 | 0.9 | 0.56 | 0.93 |
| RoBERTa | 0.63 | 0.63 | 0.63 | 0.95 | 0.63 | 0.97 |
| BioBERT | 0.44 | 0.44 | 0.44 | 0.93 | 0.44 | 0.96 |
| PubMedBERT | 0.49 | 0.4 | 0.63 | 0.92 | 0.63 | 0.94 |
| Development set = eFI | ||||||
| Validation set = VES-13 | ||||||
| BERT | 0.18 | 0.46 | 0.11 | 0.66 | 0.11 | 0.93 |
| RoBERTa | 0.01 | 0.25 | 0 | 0.67 | 0 | 0.99 |
| BioBERT | 0.21 | 0.41 | 0.15 | 0.65 | 0.15 | 0.9 |
| PubMedBERT | 0 | 0 | 0 | 0.67 | 0 | 1 |
Note: There were four experiments, in which the development and validation sets differed (described above).
Abbreviations: BERT, bidirectional encoder representations from transformers; eFI, electronic frailty index; RoBERTa, a robustly optimized BERT pretraining approach; VES-13, Vulnerable Elders-13 Survey.
4 |. Discussion
We report the development and validation of a large language model-based binary classifier for frailty, which detects preoperative frailty from preoperative anesthesia clinical notes. Frailty was determined by patient responses to a validated frailty questionnaire, the VES-13. In parallel, we developed large language model-based classifiers using development sets, in which a separate group of spine surgery patients were identified as frail based on their calculated eFI score, which estimated frailty indirectly via a combination of diagnosis codes. Validation of the VES-13-based models demonstrated good discrimination of frailty when validated both internally and on the eFI validation sets. However, validation of the eFI-based models did not demonstrate good discrimination when validated on the VES-13-based test sets.
Patient phenotyping using the VES-13 is a direct measurement of frailty as it queries functional information from the patient that specifically assesses frailty. In contrast, eFI is more indirect as it accumulates a collection of diagnoses from the medical record rather than specifically assessing a patient’s functional status. Arguably then, labeling frailty is more accurate with VES-13 than with the eFI. We showed that developing the model with VES-13 likely could more accurately identify patients that were truly frail and even those who were identified as frail via the eFI. Interestingly, the eFI model did not perform well on the VES-13 dataset, which would suggest that the indirect measurement of frailty by eFI may not capture certain components of frailty. This was a key finding in our study as that would suggest that, while eFI is validated in predicting postoperative outcomes, it may not capture frailty in the same granularity as a patient-filled survey, like VES-13.
As there is no structured data point (e.g., single diagnosis code) that captures the frailty status of a patient, use of large language models to extract this information from unstructured clinical text may be of clinical value. Our study showed that large language models can be used to accurately identify frailty from readily available patient clinical documentation within the EHR. Interestingly, some large language models performed better than others. RoBERTa performed the best when tested internally among VES-13-labeled patients, but significantly declined when tested on the eFI dataset. This could be a consequence of overfitting. In contrast, PubMedBERT had good discrimination of the internal test set, but performed best when validated on the eFI test set. PubMedBERT is further fine-tuned on text from clinical abstracts from PubMed and, thus, theoretically could have better captured clinical terms in the scientific literature as it relates to eFI-calculated scores [35].
In the United States, many patients are seen in acute settings, and identification of frailty in these patients requires methods that include additional clinical resources or screening that is accurate without long-term diagnosis information. Short questionnaires have been shown to be accurate but require additional resources to gather and record. The VES-13 is a validated 13-item questionnaire for directly identifying frailty; however, routine administration of this test may be resource-intensive and unlikely to be able to identify frailty retrospectively from the EHR [17]. Thus, the benefit of using large language models trained on VES-13-phenotyped patients includes the development of an automated process for screening for preoperative frailty prospectively and identifying frail patients retrospectively. Our model uses clinical data and documentation that is present with all admissions and can identify frailty in patients without long-term follow-u p accurately without additional resources. This information can change treatment plans or expectations for patients and families in acute situations such as trauma, where interventions are time sensitive and additional time for further evaluation is not unlimited. Barriers to implementation of large language models into EHRs include computing needs. This would require daily processes that fetch patient notes and analytics using the model. The amount of computing power needed will depend on the type of large language model (those with higher number of parameters/weights would require much more computing power) and patient volume. The large language models used in the study utilize lower scale models that may be run without the need for graphics processing units.
A limitation of this study is that we only included spine surgical patients from one center as opposed to perioperative patients undergoing surgery of all types. Furthermore, the training data was focused on anesthesia preoperative notes. There may be other clinical notes that could also contribute significant knowledge. However, we chose anesthesia preoperative notes because they are generally comprehensive in nature, including information regarding general functional status, as well as pertinent comorbidities. There may be heterogeneity in anesthesia preoperative notes across institutions, and thus, a natural next step would be to validate such models at multiple external institutions and surgical populations. In addition, the unstructured nature of clinical documentation creates difficulties in training an effective large language model that can best tease out nuances of human language in clinical notes and properly account for the use of abbreviations, misspellings, and variability in the language of different providers. Further fine-tuning of pre- or post-processing of clinical notes may be able to better identify frailty using such unstructured information.
In conclusion, we show that large language models can be used to accurately identify a difficult-to-quantify and multifactorial characteristic such as frailty in patients by using readily available information in the EHR. Certain tools to measure frailty, though widely used, may not be applicable to certain patient populations. Additional studies can provide new information on perioperative management if patients at higher risk of poor perioperative outcomes, such as those with frailty, can be identified. Improvement of our ability to take into account multifactorial clinical features that affect our patients allows fine-tuning of perioperative management to optimize outcomes.
Summary.
• Key points
Identifying frailty from the medical record is not straightforward since it is a multifactorial state based on multiple organ systems and a sum of factors accumulated over time.
We trained various large language models to identify frailty from anesthesia preoperative clinic notes using training datasets that used: (1) patients undergoing spine surgery whose frailty was characterized by patient responses to the Vulnerable Elders-13 Survey (VES-13) and (2) patients undergoing surgery whose frailty was characterized by their calculated electronic frailty index score.
We report development and validation of a large language model-based classifier that detects preoperative frailty from preoperative anesthesia clinical notes.
• Why does this paper matter?
Frailty is difficult to automatically detect from the electronic medical record as there are no consistent structured data points designating frailty for a patient.
Our study demonstrated the use of large language models to automatically identify frailty among older surgical adults from clinical notes.
Footnotes
Conflicts of Interest
Dr. Gabriel’s institution received funding for research unrelated to this manuscript: Merck, Takeda, Avanos, and Pacira Biosciences. Dr. Gabriel’s institution serves as a consultant for Avanos and Pacira Biosciences, which Dr. Gabriel represents. The remaining authors declare no conflicts of interest.
References
- 1.Khanna AK, Motamedi V, Bouldin B, et al. , “Automated Electronic Frailty Index-Identified Frailty Status and Associated Postsurgical Adverse Events,” JAMA Network Open 6 (2023): e2341915–e2341915, 10.1001/jamanetworkopen.2023.41915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bunino FM, Marrano E, Carbone F, et al. , “Clinical Frailty Score is a Good Predictor of Postoperative Mortality in Patients Undergoing Open Abdomen Surgery: A Multicenter Retrospective Cohort Study,” Minerva Surgery 79, no. 2 (2024): 147–154, 10.23736/s2724-5691.23.09981-1. [DOI] [PubMed] [Google Scholar]
- 3.Callahan KE, Clark CJ, Edwards AF, et al. , “Automated Frailty Screening At-Scale for Pre-Operative Risk Stratification Using the Electronic Frailty Index,” Journal of the American Geriatrics Society 69 (2021): 1357–1362, 10.1111/jgs.17027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chaliparambil RK, Nandoliya KR, Jahromi BS, and Potts MB, “Charlson Comorbidity Index and Frailty as Predictors of Resolution Following Middle Meningeal Artery Embolization for Chronic Subdural Hematoma,” World Neurosurgery 183 (2024): e877–e885, 10.1016/j.wneu.2024.01.049. [DOI] [PubMed] [Google Scholar]
- 5.Feng C, Wu H, Qi Z, et al. , “Association of Preoperative Frailty With the Risk of Postoperative Delirium in Older Patients Undergoing Hip Fracture Surgery: A Prospective Cohort Study,” Aging Clinical and Experimental Research 36 (2024): 16, 10.1007/s40520-023-02692-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Stutsrim AE, Brastauskas IM, Craven TE, et al. , “Automated Electronic Frailty Index is Associated With Non-Home Discharge in Patients Undergoing Open Revascularization for Peripheral Vascular Disease,” American Surgeon 89, no. 11 (2023): 4501–4507, 10.1177/00031348221121547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Varela S, Thommen R, Rumalla K, et al. , “The Risk Analysis Index Demonstrates Superior Discriminative Ability in Predicting Extended Length of Stay in Pituitary Adenoma Resection Patients When Compared to the 5-Point Modified Frailty Index,” World Neurosurgery: X 21 (2023): 100259, 10.1016/j.wnsx.2023.100259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Flinn SJ Jr., Silver DS, Hodges J, et al. , “Association of Frailty With Healthcare Utilization for Patients Over One Year Following Surgical Evaluation,” Annals of Surgery 281, no. 2 (2024): 280–287, 10.1097/sla.0000000000006218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.McAdams-DeMarco MA, Law A, Salter ML, et al. , “Frailty and Early Hospital Readmission After Kidney Transplantation,” American Journal of Transplantation 13 (2013): 2091–2095, 10.1111/ajt.12300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Robinson TN, Wallace JI, Wu DS, et al. , “Accumulated Frailty Characteristics Predict Postoperative Discharge Institutionalization in the Geriatric Patient,” Journal of the American College of Surgeons 213 (2011): 37–42, 10.1016/j.jamcollsurg.2011.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Robinson TN, Wu DS, Stiegmann GV, and Moss M, “Frailty Predicts Increased Hospital and Six-Month Healthcare Cost Following Colorectal Surgery in Older Adults,” American Journal of Surgery 202 (2011): 511–514, 10.1016/j.amjsurg.2011.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cawthon PM, Marshall LM, Michael Y, et al. , “Frailty in Older Men: Prevalence, Progression, and Relationship With Mortality,” Journal of the American Geriatrics Society 55 (2007): 1216–1223, 10.1111/j.1532-5415.2007.01259.x. [DOI] [PubMed] [Google Scholar]
- 13.Kiely DK, Cupples LA, and Lipsitz LA, “Validation and Comparison of Two Frailty Indexes: The MOBILIZE Boston Study,” Journal of the American Geriatrics Society 57 (2009): 1532–1539, 10.1111/j.1532-5415.2009.02394.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Woods NF, LaCroix AZ, Gray SL, et al. , “Frailty: Emergence and Consequences in Women Aged 65 and Older in the Women’s Health Initiative Observational Study,” Journal of the American Geriatrics Society 53 (2005): 1321–1330, 10.1111/j.1532-5415.2005.53405.x. [DOI] [PubMed] [Google Scholar]
- 15.O’Caoimh R, Sezgin D, O’Donovan MR, et al. , “Prevalence of Frailty in 62 Countries Across the World: A Systematic Review and Meta-Analysis of Population-Level Studies,” Age and Ageing 50 (2021): 96–104, 10.1093/ageing/afaa219. [DOI] [PubMed] [Google Scholar]
- 16.Rodríguez-Mañas L, Féart C, Mann G, et al. , “Searching for an Operational Definition of Frailty: A Delphi Method Based Consensus Statement: The Frailty Operative Definition-Consensus Conference Project,” Journals of Gerontology. Series A, Biological Sciences and Medical Sciences 68 (2013): 62–67, 10.1093/gerona/gls119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hall DE, Arya S, Schmid KK, et al. , “Development and Initial Validation of the Risk Analysis Index for Measuring Frailty in Surgical Populations,” JAMA Surgery 152 (2017): 175–182, 10.1001/jamasurg.2016.4202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Inouye SK, Studenski S, Tinetti ME, and Kuchel GA, “Geriatric Syndromes: Clinical, Research, and Policy Implications of a Core Geriatric Concept,” Journal of the American Geriatrics Society 55 (2007): 780–791, 10.1111/j.1532-5415.2007.01156.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rockwood K and Mitnitski A, “Frailty Defined by Deficit Accumulation and Geriatric Medicine Defined by Frailty,” Clinics in Geriatric Medicine 27 (2011): 17–26, 10.1016/j.cger.2010.08.008. [DOI] [PubMed] [Google Scholar]
- 20.Wijma AG, Bongers BC, Annema C, et al. , “‘Effects of a Home-Based Bimodal Lifestyle Intervention in Frail Patients With End-Stage Liver Disease Awaiting Orthotopic Liver Transplantation’: Study Protocol of a Non-Randomised Clinical Trial,” BMJ Open 14, no. 1 (2024): e080430, 10.1136/bmjopen-2023-080430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.H. M. K. Wong, D. Qi, B. H. M. Ma, P. Y. Hou, C. K. W. Kwong, and A. Lee, “Multidisciplinary Prehabilitation to Improve Frailty and Functional Capacity in High-Risk Elective Surgical Patients: A Retrospective Pilot Study,” Perioperative Medicine (London, England) 13, no. 1 (2024): 6, 10.1186/s13741-024-00359-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Aguayo GA, Donneau AF, Vaillant MT, et al. , “Agreement Between 35 Published Frailty Scores in the General Population,” American Journal of Epidemiology 186 (2017): 420–434, 10.1093/aje/kwx061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sirola J, Pitkala KH, Tilvis RS, Miettinen TA, and Strandberg TE, “Definition of Frailty in Older Men According to Questionnaire Data (RAND-36/SF-36): The Helsinki Businessmen Study,” Journal of Nutrition, Health & Aging 15, no. 9 (2011): 783–787, 10.1007/s12603-011-0131-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Theou O, Brothers TD, Mitnitski A, and Rockwood K, “Operationalization of Frailty Using Eight Commonly Used Scales and Comparison of Their Ability to Predict All-Cause Mortality,” Journal of the American Geriatrics Society 61 (2013): 1537–1551, 10.1111/jgs.12420. [DOI] [PubMed] [Google Scholar]
- 25.Theou O, Brothers TD, Peña FG, Mitnitski A, and Rockwood K, “Identifying Common Characteristics of Frailty Across Seven Scales,” Journal of the American Geriatrics Society 62 (2014): 901–906, 10.1111/jgs.12773. [DOI] [PubMed] [Google Scholar]
- 26.Festa N, Shi SM, and Kim DH, “Accuracy of Diagnosis and Health Service Codes in Identifying Frailty in Medicare Data,” BMC Geriatrics 20 (2020): 329, 10.1186/s12877-020-01739-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kim DH, Schneeweiss S, Glynn RJ, Lipsitz LA, Rockwood K, and Avorn J, “Measuring Frailty in Medicare Data: Development and Validation of a Claims-Based Frailty Index,” Journals of Gerontology. Series A, Biological Sciences and Medical Sciences 73, no. 7 (2018): 980–987, 10.1093/gerona/glx229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kohsaka S, Sandhu AT, Parizo JT, Shoji S, Kumamamru H, and Heidenreich PA, “Association of Diagnostic Coding-Based Frailty and Outcomes in Patients With Heart Failure: A Report From the Veterans Affairs Health System,” Journal of the American Heart Association 9 (2020): e016502, 10.1161/jaha.120.016502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Min L, Yoon W, Mariano J, et al. , “The Vulnerable Elders-13 Survey Predicts 5-Year Functional Decline and Mortality Outcomes in Older Ambulatory Care Patients,” Journal of the American Geriatrics Society 57 (2009): 2070–2076, 10.1111/j.1532-5415.2009.02497.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Pajewski NM, Lenoir K, Wells BJ, Williamson JD, and Callahan KE, “Frailty Screening Using the Electronic Health Record Within a Medicare Accountable Care Organization,” Journals of Gerontology. Series A, Biological Sciences and Medical Sciences 74, no. 11 (2019): 1771–1777, 10.1093/gerona/glz017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Clegg A, Bates C, Young J, et al. , “Development and Validation of an Electronic Frailty Index Using Routine Primary Care Electronic Health Record Data,” Age and Ageing 47 (2018): 319, 10.1093/ageing/afx001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Devlin J, Chang M-W, Lee K, and Toutanova K, “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” preprint, arXiv, arXiv:1810.04805, 2018. [Google Scholar]
- 33.Liu Y, Ott M, Goyal N, et al. , “RoBERTa: A Robustly Optimized Bert Pretraining Approach,” preprint, arXiv, arXiv:190711692, 2019. [Google Scholar]
- 34.Huang K, Altosaar J, and Ranganath R, “ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission,” preprint, arXiv, arXiv:1904.05342, 2019. [Google Scholar]
- 35.Gu Y, Tinn R, Cheng H, et al. , “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing,” ACM Transactions on Computing for Healthcare (HEALTH) 3, no. 1 (2021): 1–23. [Google Scholar]
