Abstract
Background:
Cognitive tests and biomarkers are the key information to assess the severity and track the progression of Alzheimer’s’ disease (AD) and AD-related dementias (AD/ADRD), yet, both are often only documented in clinical narratives of patients’ electronic health records (EHRs). In this work, we aim to (1) assess the documentation of cognitive tests and biomarkers in EHRs that can be used as real-world endpoints, and (2) identify, extract, and harmonize the different commonly used cognitive tests from clinical narratives using natural language processing (NLP) methods into categorical AD/ADRD severity.
Methods:
We developed a rule-based NLP pipeline to extract the cognitive tests and biomarkers from clinical narratives in AD/ADRD patients’ EHRs. We aggregated the extracted results to the patient level and harmonized the cognitive test scores into severity categories using cutoffs determined based on both relevant literature and domain knowledge of AD/ADRD clinicians.
Results:
We identified an AD/ADRD cohort of 48,912 patients from the University of Florida (UF) Health system and identified 7 measurements (6 cognitive tests and 1 biomarker) that are frequently documented in our data. Our NLP pipeline achieved an overall F1-score of 0.9059 across the 7 measurements. Among the 6 cognitive tests, we were able to harmonize 4 cognitive test scores into severity categories, and the population characteristics of patients with different severity were described. We also identified several factors related to the availability of their documentation in EHRs.
Conclusion:
This study demonstrates that our NLP pipelines can extract cognitive tests and biomarkers of AD/ADRD accurately for downstream studies. Although, the documentation of cognitive tests and biomarkers in EHRs appears to be low, RWD is still an important resource for AD/ADRD research. Nevertheless, providing standardized approach to document cognitive tests and biomarkers in EHRS are also warranted.
Keywords: Alzheimer’s Disease and Related Dementias, Cognitive Tests, Natural Language Processing
Introduction
Alzheimer’s disease (AD) and AD-related dementias (AD/ADRD) is a class of complicated neurodegenerative disorders that poses significant public health burdens in the United States (US) and worldwide. In 2021, approximately 6.2 million Americans are living with ADRD, and its prevalence will more than double over the next several decades.[1–3] Globally, there are currently more than 40 million people having AD/ADRD, and the prevalence is expected to increase to more than 100 million by 2050.[2,3] AD/ADRD is one of the leading causes of death, especially for those age 65 and older. According to latest data from the US Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO), 271,872 and 1.55 million people died from ADRD in 2019 in the US and globally,[2,4,5] respectively, making it the sixth and seventh leading cause of death in the US and worldwide, respectively.
A growing number of studies are proposed to examine the disease progression of AD/ADRD to understand its underlying causal mechanisms and to identify potential treatments that can slow down the neurodegeneration process. Typically, cognitive tests have been used to assess AD/ADRD severity and track its progression. Currently, there are several cognitive tests being used for AD/ADRD screening and severity assessment in clinical practice. For example, the Mini-Mental State Exam (MMSE) is a widely used cognitive function test to screen for AD/ADRD and other dementias, which measures orientation to time and place, short-term memory, attention, and ability to solve problems, language, and comprehension and motor skills.[6,7] The MMSE score ranges between 0 – 30, where a lower score indicates more severe cognitive function impairment, where a score at a cutoff of 23 or less (or 24 or less) is deemed to indicate AD/ADRD and other dementias.[6] Despite being developed as a screening tool, MMSE is often used in clinical trials as an outcome measure and analyzable endpoint. Another example of commonly used cognitive tests is the Montreal Cognitive Assessment (MoCA), which assesses an individual’s memory, visuospatial ability, executive function, attention, concentration, working memory, and orientation.[7,8] Similarly to MMSE, MoCA scores range from 0 to 30 and lower scores signify more impairment, where a cutoff of 23 is often used to differentiate mild cognitive impairment (MCI) or dementia from normal.[8]
Nevertheless, AD only can be diagnosed with complete certainty after death, when microscopic examination of the brain reveals the characteristic plaques and tangles. In 2018, National Institute on Aging (NIA) and Alzheimer’s Association (AA) updated their NIA-AA Research Framework, shifting the definition of AD in living people from a syndromal to a biological construct, where AD should be defined by its underlying pathologic processes that can be documented by postmortem examination or in vivo by biomarkers.[9] In addition to cognitive tests, a number of biomarkers can also be used to assess the severity of AD/ADRD. The NIA-AA Research Framework suggested the AT(N) system for AD biomarker grouping, where A represents β amyloid deposition that can be measured through Amyloid Positron Emission Tomography (PET) imaging to assess the concentrations of amyloid-β 1–42 (Aβ42) or Aβ42/Aβ42 ratio in cerebrospinal fluid (CSF), T represents pathologic tau which measures the phosphorylated tau in the CSF through Tau PET, and N represents neurodegeneration or neuronal injury that can be evaluated via anatomic magnetic resonance imaging (MRI), fluorodeoxyglucose (FDG) PET measures, and CSF total tau (T-tau).[9] Nevertheless, these biomarkers have not been routinely examined in clinical practice.
On the other hand, electronic health records (EHRs), as an important real-world data (RWD) source,[10] have been becoming increasingly important for generating real-world evidence, including for AD/ADRD research. For example, Duan et al. applied distributed learning algorithms on multisite EHRs to assess the associations between clinical risk factors and AD/ADRD.[11] Desai et al. examined the association between prior healthcare resource use burden and AD/ADRD disease severity at the time of initial cognitive assessment using EHR data.[12] More interestingly, Chen et al. demonstrated the feasibility of simulating AD/ADRD trials using EHRs;[13] however, it only modeled serious adverse events (SAEs) as the endpoints, recognizing the difficulties of extracting endpoints that are often used in AD/ADRD trials to estimate the treatment effectiveness from RWD sources like EHRs. Indeed, there are challenges in accurately identifying AD/ADRD and tracking its severity in EHRs. First, biomarkers like those in the AT(N) system are rarely used in routine clinical care partially because they are largely still research instruments but also because of the high cost and how care is being paid. Thus, the utilization of MRI and PET imaging for assessing AD/ADRD biomarkers in routine care is rare. Second, many of the cognitive testing results are often not documented in structured EHR data. Previous studies showed the availability cognitive tests results were low in structured EHR data. One study found only 11% of dementia patients and 24% of AD patients had a cognitive measure documented in the 5 years prior to diagnosis, and the missingness were associated with different race, age, and socioeconomic factors.[14] Another study examined the availability of MMSE and Saint Louis University Mental Status Examination (SLUMS) scores in their EHR system, and found that although the agreement between the two scores were good, only about 3% of total patients had cognitive test results documented.[15] Natural language processing (NLP) techniques can be used to make these data points in clinical notes available for research. For example, Pichon et al. extract frequently used concepts characterizing cognitive function from the clinical notes of AD patients and developed a model to infer the severity of cognitive impairment based on MMSE scores.[16] In another study with a sample of 320,886 patients aged 65 years and older in the Optum Clinical EHR Database, 78,827 (24.6%) of them can be assigned to explicit AD/ADRD severity categories (i.e., either explicitly documented severity terms such as “mild”, “mild – to - moderate” and “severe” or valid quantitative MMSE scores) via an NLP system.[17] Nevertheless, there is a wide range of other cognitive tests such as the MoCA score we discussed above that are often used as endpoints in AD/ADRD trials, which often are not recorded in EHRs in a standardized, discrete way but exist in clinical narratives (e.g., physician notes). Extracting and harmonizing these different cognitive tests and creating a unified measurement of AD/ADRD severity can benefit EHR-based AD/ADRD research studies.
The goal of our work is to assess the documentation of cognitive tests and biomarkers in EHRs that can be used as real-world endpoints (for RWD-based trial simulation studies) via a 3-step process: (1) to identify commonly used cognitive tests used as endpoints (primary and secondary outcomes) in AD/ADRD clinical trials, then (2) to assess the documentation of these cognitive tests and biomarkers in clinical narratives via an NLP pipeline, and (3) to harmonize the different tests (e.g., MMSE, MoCA) into categorical AD/ADRD severity and assess whether information extracted from EHR can be used to track patients’ longitudinal AD/ADRD severity changes.
Methods
In this study, we first compiled a list of cognitive tests and biomarkers based on the U.S. Preventive Services Task Force (USPSTF) recommendation and a review of AD/ADRD interventional trials on clinicaltrials.gov. We then examined the frequencies of these of cognitive tests and biomarkers in clinical notes of an AD/ADRD cohort from University of Florida Health (UF Health) Integrated Data Repository (IDR). A rule-based NLP pipeline to extract the most frequent cognitive measurements were developed, and the extracted results were aggregated to patient level and harmonized the cognitive test scores into severity categories using cutoffs determined based on both literature and domain knowledge of AD/ADRD clinicians. Figure 1 shows the overall study flow, and details of each step are described below.
Figure 1. Overall study flow.

Study population and data source
This study used EHR data from the University of Florida Health (UF Health) Integrated Data Repository (IDR) – a clinical data warehouse that aggregates data records from the UF Health various clinical and administrative information systems, including the Epic EHR system. Using International Classification of Diseases (ICD) codes, we identified a AD/ADRD cohort of 48,912 patients who had at least one record of AD/ADRD-related diagnoses between 1/1/2012 and 12/31/2019, including mild cognitive impairment (ICD-9: 331.81, 294.9; ICD-10: G31.83, F09), Alzheimer’s disease (ICD-9: 331.0; ICD-10: G30, G30.0, G30.1, G30.8, G30.9), vascular dementia (ICD-9: 290.4, 290.40, 290.41, 290.42, 290.43; ICD-10: F01, F01.5, F01.50, F01.51), Lewy body dementia (ICD-9: 331.82; ICD-10: G31.83), and Frontotemporal Dementia (ICD-9: 331.1, 331.11, 331.19; ICD-10: G31.0, G31.01, G31.09). We collected patients’ clinical records including structured data elements such as demographics (e.g., age, gender, race, and ethnicity), encounters, diagnoses, procedures, lab results, and medications, etc. as well as all available clinical notes associated with patients’ clinical visits from the UF Health IDR. This study was approved by the UF Institutional Review Board.
Identification and selection of potential endpoints
We first conducted a literature review to identify measurements (i.e., cognitive tests and biomarkers) often used for cognitive function and AD/ADRD severity. Following recommendations from the USPSTF,[7,18] we identified a total of 96 measurements that are used to screen and monitor cognitive functions in older adults, including 85 cognitive tests and 11 biomarkers. In addition to the USPSTF recommendation, we also reviewed all (N=4,656) interventional trials on AD/ADRD in clinicaltrials.gov. We summarized AD/ADRD-related measurements of cognition functions that are used as the primary and secondary outcome endpoints in these trials. For each measurement, we identified its full name and commonly used alternative names. We then searched all available clinical notes extracted from our UF Health AD/ADRD cohort using the names and alternative names as keywords and calculated the frequencies. Next, we identified frequently used ADRD cognitive tests and biomarkers according to the frequencies derived from keyword searching of clinical notes. We further excluded non-scoring tests (e.g., clock-drawing test, which requires patients to draw the numbers and hands on the face of the clock to show a specific time) as these tests cannot be quantitatively analyzed.
Development of the natural language processing (NLP) system for extracting cognitive tests scores from clinical notes
We defined concept categories using the selected AD/ADRD cognitive tests and biomarkers, developed annotation guidelines, and recruited two annotators to create a gold standard corpus using the BRAT annotation tool.[19] For each of the AD/ADRD cognitive tests and biomarkers, we randomly selected up to 100 notes using a keyword-based searching strategy, which generated a corpus of 681 clinical notes. Two annotators (HZ and SW) manually reviewed these notes to identify AD/ADRD cognitive tests (and their scores) and biomarkers. We monitored annotation agreement by calculating Cohen’s kappa measurement from a subset of 40 notes annotated by both annotators. Following a standard NLP development procedure, we conducted training sessions to ensure a reasonable agreement and solved discrepancies between the two annotators. We divided the annotated notes into a training set of 545 notes and a test set of 136 notes according to a ratio of 4:1.
We developed a rule-based NLP pipeline as these AD/ADRD cognitive tests and gene/biomarkers have consistent documenting patterns with limited variations, which is ideal for rule-based NLP solutions. Based on our previous experience in developing rule-based NLP pipelines,[20,21] we used a rule-engine developed in our previous study using Java.[20] In this rule engine, we implement a two-layer architecture to improve the efficiency in developing and managing rules. The first layer of this rune engine captured common variations of a concept into different dictionaries using regular expressions. For example, we used lexicons of “mmse”, “mini mental status”, “mini mental state exam”, “mini mental state examination”, “folstein mental status examination”, and “folstein test” to capture common mentions of the Mini Mental State Exam (MMSE), and lexicons of “\d{1,2}” and “\d{1,2}/30” (in regex format) to capture the scores yielded by the MMSE. In the second layer, developers can define high-level rules using the dictionaries defined in the first layer. We designed and optimized rules by observing the AD/ADRD cognitive tests (and their scores) and biomarkers annotated in the training dataset. After optimization using the training set, we applied these rules to the test set to calculate the rules performance metrics. Therefore, the test dataset was not used during the rule development and optimization but only used for performance evaluation.
Harmonization of cognitive test scores into AD/ADRD severity categories
As there are multiple cognitive test scores been used in practice and these cognitive tests have different score ranges and severity definitions, it is critical to harmonize these test scores for consistent classification of severity categories. We harmonized the different AD/ADRD assessment scores derived from each cognitive test to a schema of unified, consistent severity categories (i.e., mild, moderate, and severe). To do so, we first collected existing severity definitions for each of the cognitive tests from literature as well as from AD/ADRD trials.[6–8,22–24] Then, we discussed with the AD/ADRD clinicians on the study team to adjust these severity definitions according to their experiences in clinical practice. We applied the rule-based NLP pipeline to extract all AD/ADRD cognitive test scores and biomarkers from all clinical notes of our AD/ADRD cohort. As there may be multiple cognitive test scores for one patient, we aggregated the NLP extraction results to the patient level and then determined the severity category by applying the revised severity definitions. We summarized the distribution of AD/ADRD patients with different severity categories and described patient characteristics by severity category.
Results
Frequently used AD/ADRD endpoints
Among the 96 cognitive tests and biomarkers, most of them have very low frequencies. Table 1 shows the top 10 most frequently used measurements and the number/percentage of clinical notes and patients that contain these measurements. We selected the top 7 measurements (6 tests and 1 biomarker) to be extracted using NLP, including Mini-Mental State Exam (MMSE), Montreal Cognitive Assessment (MoCA), Clinical Dementia Rating (CDR), Functional Activities Questionnaire (FAQ), Global Deterioration Scale (GDS), Mini-Cog, and the Apolipoprotein E (APOE) genotype. It is worth noting that these frequently used AD/ADRD-related measurements were also among the most used endpoint measurements we identified from the clinical trials. In the 4,656 trials, the top-5 most commonly used measurements are MMSE, followed by MoCA, CDR, Mini-Cog, and DRS, which were all included in the 7 measurements we extracted.
Table 1.
Number and percentages of notes contain keywords for top 10 most frequent AD/ADRD-related measurements.
| Measurements | Number and percentage of notes contain keywords (n=20,318,015) | Number and percentage of patients who have notes contain keywords (n=48,912) |
|---|---|---|
| Mini-mental state (MMSE) | 46,905 (0.231%) | 9,569 (19.564%) |
| Montreal Cognitive Assessment (MoCA) | 17,378 (0.086%) | 3,974 (8.125%) |
| Hopkins Verbal Learning Test (HVLT) | 9,977 (0.049%) | 3,213 (6.569%) |
| Apolipoprotein E (APOE) | 9,862 (0.049%) | 1,353 (2.766%) |
| clock drawing test (CDT) | 9,742 (0.048%) | 4,105 (8.393%) |
| Immediate Recall | 9,403 (0.046%) | 4,417 (9.031%) |
| Global Deterioration Scale (GDS) | 5,951 (0.029%) | 2,016 (4.122%) |
| Mini-Cog | 4,789 (0.024%) | 1,965 (4.017%) |
| FDG PET scan | 4,669 (0.023%) | 781 (1.597%) |
| Dementia Rating Scale (DRS) | 3,711 (0.018%) | 1,567 (3.204%) |
Performance of the NLP system
We developed a total of 53 rules using the training set of 545 notes for the 6 selected measurements and 1 biomarker. We applied the NLP pipeline on the test set of 136 notes and calculated the precision, recall, and F1-score for each measurement. We calculated the overall scores for all categories using the micro average. Table 2 shows the performance of the NLP pipeline. Our rule-based NLP pipeline achieved a micro average F1-score of 90.59% for all categories. We observed the best F1-score (0.963) and recall (92.86%) in identifying APOE gene and the lowest F1-score (0.8399) and recall (72.40%) in identifying MoCA scores. We optimized our NLP pipeline by precision to ensure the accuracy of information extraction; and 100% precision was achieved for all categories.
Table 2.
Performance of identifying 7 selected measurements on test set.
| Measurement | Precision | Recall | F1-score |
|---|---|---|---|
| Mini-mental state (MMSE) score | 100.00% | 84.09% | 0.9136 |
| Montreal Cognitive Assessment (MoCA) score | 100.00% | 72.40% | 0.8399 |
| Functional Activities Questionnaire (FAQ) score | 100.00% | 92.31% | 0.9600 |
| Clinical Dementia Rating (CDR) score | 100.00% | 83.33% | 0.9091 |
| Mini-Cog score | 100.00% | 75.00% | 0.8571 |
| Global Deterioration Scale (GDS) score | 100.00% | 80.00% | 0.8889 |
| APOE gene | 100.00% | 92.86% | 0.9630 |
| Micro average | 100.00% | 82.80% | 0.9059 |
Harmonization of cognitive tests into severity categories
For each cognitive test, we first identified severity assessment cutoff scores from literature, and then discussed and revised the cutoff criteria with AD/ADRD clinicians and experts on the study team. Table 3 shows the final cutoff values we used to harmonize different cognitive tests into severity categories. Among the 6 selected cognitive tests, 4 of them have established cutoff scores for AD/ADRD severity levels.
Table 3.
Included cognitive tests and their respective cutoff points.
| Cognitive tests | Description | ADRD severity cutoff points |
|---|---|---|
| Mini-Mental State Exam (MMSE)[6,22,24,25] | 1. Clinician administered patient evaluation 2. Assesses 5 cognitive domains: orientation, memory (registration and recall), attention/calculation, language, and visuospatial abilities |
Normal: < 21 Mild: 21-24 Moderate: 13-20 Severe: 0-12 |
| Clinical Dementia Rating (CDR) [7,24,26] | 1. Clinician administered semi-structured interview of patient and a reliable collateral source (e.g., family member) 2. Characterizes six domains of cognitive functional performance applicable to ADRD: memory, orientation, judgement & problem solving, community affairs, home & hobbies, and personal care |
Normal: 0 Mild: 1 Moderate: 2 Severe: >=3 |
| Global Deterioration Scale (GDS)[7,18,27] | 1. Clinician rated based on cognitive change only 2. Provides caregivers an overview of the stages of cognitive function from a primary degenerative dementia such as AD |
Normal: < 3 Mild: 3 Moderate: 4, 5 Severe: 6, 7 |
| Montreal Cognitive Assessment (MoCA) [8,22,24,25] | 1. Assesses memory, visuospatial ability, executive function, attention, concentration, working memory and orientation 2. The MoCA takes approximately 20 minutes to administer and has a maximum score of 30, with lower scores representing poorer performance. |
Normal: < 18 Mild: 18-23 Moderate: 10-17 Severe: <=9 |
Among the 48,912 AD/ADRD patients identified from UF Health IDR (covering patients visited UF Health system from 2012 to 2019), 10,005 (20.5%) have at least one record that contains cognitive test scores that can be mapped into severity categories. After we extracted the included cognitive tests for all AD/ADRD patients using our NLP pipeline, we aggregated the scores to the patient level, and applied the cutoffs to categorize patients into three severity categories including mild, moderate, and severe. For patients having multiple cognitive tests, we used the severity category determined based on the last record. Figure 2 shows the number of patients by severity categories based on different measurements and overall. Among the 4 cognitive tests, MoCA identified more patients with mild and moderate disease, while MMSE identified the most patients with severe disease. Overall, we identified 2,009 patients with mild AD/ADRD, 1774 patients with moderate AD/ADRD, and 3,134 with severe disease.
Figure 2. Number of patients by severity category.

Cohort analyses
Next we examined the characteristics of our study population, comparing patients without severity categories identified versus those who have severity categories identified and then by severity category. As shown Table 4, compared with patients without severity categories identified, those who have severity categories identified are more likely to be older (mean age 73.9 vs. 66.4). There are more non-Hispanic Whites (75.9% vs. 65.2%), more former smokers (40.0% vs. 28.8%), more patients with Medicare (74.8% vs. 57.1%), less non-Hispanic Blacks (16.9% vs. 23.7%), and less current smokers (10.1% vs. 14.7%) among those with severity categories identified, compared with those without severity categories identified. Table 4 also shows the characteristics of the AD/ADRD cohort across the severity categories. As the severity increases, the patient population gets older, has more females, more Hispanics, more non-Hispanic Blacks, and more people enrolled in the Medicare, whereas the percentage of current smokers decreases along with the increase of severity.
Table 4.
Characteristics of the AD/ADRD across severity categories.
| No severity measurements | Have severity measurements | AD/ADRD severity | ||||
|---|---|---|---|---|---|---|
| Normal | Mild | Moderate | Severe | |||
| (N=38,907) | (N=10,005) | (N=3,079) | (N=2,009) | (N=1,774) | (N=3,143) | |
| Demographics | ||||||
| Age (Mean, SD) | 66.4 (22.4) | 73.9 (13.4) | 71.6 (13.7) | 73.0 (13.3) | 75.7 (13.0) | 75.7 (13.1) |
| Sex | ||||||
| Female | 20,470 (52.6%) | 5,380 (53.8%) | 1,672 (54.3%) | 983 (48.9%) | 942 (53.1%) | 1,783 (56.7%) |
| Male | 18,437 (47.4%) | 4,625 (46.2%) | 1,407 (45.7%) | 1,026 (51.1%) | 832 (46.9%) | 1,360 (43.3%) |
| Race/ethnicity | ||||||
| Hispanic | 1,480 (3.8%) | 333 (3.3%) | 96 (3.1%) | 56 (2.8%) | 66 (3.7%) | 115 (3.7%) |
| NHB* | 9,235 (23.7%) | 1,690 (16.9%) | 530 (17.2%) | 262 (13.0%) | 316 (17.8%) | 582 (18.5%) |
| NHW* | 25,361 (65.2%) | 7,589 (75.9%) | 2,338 (75.9%) | 1,624 (80.8%) | 1,302 (73.4%) | 2,325 (74.0%) |
| Other | 2,267 (5.8%) | 344 (3.4%) | 100 (3.2%) | 60 (3.0%) | 74 (4.2%) | 110 (3.5%) |
| Unknown / missing | 564 (1.4%) | 49 (0.5%) | 15 (0.5%) | 7 (0.3%) | 16 (0.9%) | 11 (0.3%) |
| Smoking status | ||||||
| Current | 5,737 (14.7%) | 1,009 (10.1%) | 334 (10.8%) | 196 (9.8%) | 180 (10.1%) | 299 (9.5%) |
| Former | 10,991 (28.2%) | 4,000 (40.0%) | 1,259 (40.9%) | 846 (42.1%) | 719 (40.5%) | 1,176 (37.4%) |
| Never | 16,162 (41.5%) | 4,611 (46.1%) | 1,424 (46.2%) | 892 (44.4%) | 813 (45.8%) | 1,482 (47.2%) |
| Unknown / missing | 6,017 (15.5%) | 385 (3.8%) | 64 (2.0%) | 75 (3.7%) | 62 (3.5%) | 186 (5.9%) |
| Insurance type | ||||||
| Medicaid | 4,407 (11.3%) | 511 (5.1%) | 148 (4.8%) | 115 (5.7%) | 100 (5.6%) | 148 (4.7%) |
| Medicare | 22,212 (57.1%) | 7,481 (74.8%) | 2,290 (74.4%) | 1,439 (71.6%) | 1,357 (76.5%) | 2,395 (76.2%) |
| Other governmental | 1,848 (4.7%) | 149 (1.5%) | 64 (2.1%) | 33 (1.6%) | 23 (1.3%) | 29 (0.9%) |
| Private | 3,422 (8.8%) | 682 (6.8%) | 239 (7.8%) | 156 (7.8%) | 101 (5.7%) | 186 (5.9%) |
| Self-pay | 1,457 (3.7%) | 205 (2.0%) | 73 (2.4%) | 52 (2.6%) | 28 (1.6%) | 52 (1.7%) |
| Managed Care | 1,637 (4.2%) | 297 (3.0%) | 115 (3.7%) | 58 (2.9%) | 40 (2.3%) | 84 (2.7%) |
| Other | 998 (2.6%) | 163 (1.6%) | 28 (0.9%) | 32 (1.6%) | 31 (1.7%) | 72 (2.3%) |
| Missing | 2,926 (7.5%) | 517 (5.2%) | 122 (4.0%) | 124 (6.2%) | 94 (5.3%) | 177 (5.6%) |
| Clinical conditions | ||||||
| Anxiety | 9,811 (25.2%) | 4,071 (40.7%) | 1,385 (45.0%) | 784 (39.0%) | 613 (34.6%) | 1,289 (41.0%) |
| Apathy | 71 (0.2%) | 150 (1.5%) | 44 (1.4%) | 40 (2.0%) | 22 (1.2%) | 44 (1.4%) |
| Cerebrovascular diseases | 23,044 (59.2%) | 5,042 (50.4%) | 1,597 (51.9%) | 986 (49.1%) | 909 (51.2%) | 1,550 (49.3%) |
| Cardiovascular diseases | 11,026 (28.3%) | 4,830 (48.3%) | 1,505 (48.9%) | 930 (46.3%) | 808 (45.5%) | 1,587 (50.5%) |
| Hypertension | 26,368 (67.8%) | 7,007 (70.0%) | 2,194 (71.3%) | 1,310 (65.2%) | 1,241 (70.0%) | 2,262 (72.0%) |
| Diabetes | 11,680 (30.0%) | 3,326 (33.2%) | 1,048 (34.0%) | 644 (32.1%) | 572 (32.2%) | 1,062 (33.8%) |
| Lab tests | ||||||
| Cholesterol (mg/dL) | ||||||
| Mean, SD | 182 (55.0) | 195 (50.9) | 201 (49.8) | 194 (51.6) | 190 (51.1) | 193 (51.2) |
| Missing | 21,903 (56.3%) | 4,160 (41.6%) | 1,169 (38.0%) | 947 (47.1%) | 775 (43.7%) | 1,269 (40.4%) |
| HDL (mg/dL) | ||||||
| Mean, SD | 54.1 (21.0) | 59.9 (20.9) | 61.8 (21.5) | 57.6 (20.3) | 59.1 (20.7) | 59.7 (20.7) |
| Missing | 21,943 (56.4%) | 4,161 (41.6%) | 1,169 (38.0%) | 947 (47.1%) | 776 (43.7%) | 1,269 (40.4%) |
| Folate (ng/mL) | ||||||
| Mean, SD | 1010 (495) | 876 (385) | 968 (439) | 825 (409) | 837 (105) | 833 (390) |
| Missing | 38,760 (99.6%) | 9,944 (99.4%) | 3,059 (99.4%) | 1,994 (99.3%) | 1,766 (99.5%) | 3,125 (99.4%) |
| HbA1c (%) | ||||||
| Mean, SD | 6.83 (3.23) | 6.88 (3.22) | 7.06 (4.80) | 6.79 (1.99) | 6.78 (2.21) | 6.80 (2.11) |
| Missing | 22,899 (58.9%) | 4,766 (47.6%) | 1,404 (45.6%) | 1,029 (51.2%) | 838 (47.2%) | 1,495 (47.6%) |
| Glucose (mmol/L) | ||||||
| Mean, SD | 8.36 (3.11) | 8.37 (2.88) | 8.41 (2.86) | 7.99 (2.48) | 8.49 (2.88) | 8.46 (3.11) |
| Missing | 3,6919 (94.9%) | 8,914 (89.1%) | 2,580 (83.8%) | 1,847 (91.9%) | 1,637 (92.3%) | 2,850 (90.7%) |
| Medications | ||||||
| NSAID | 23,510 (60.4%) | 6,812 (68.1%) | 2,283 (74.1%) | 1,327 (66.1%) | 1,100 (62.0%) | 2,102 (66.9%) |
| Statin | 20,257 (52.1%) | 6,486 (64.8%) | 2,063 (67.0%) | 1,298 (64.6%) | 1,161 (65.4%) | 1,964 (62.5%) |
| Hormone | 9,717 (25.0%) | 3,040 (30.4%) | 1,072 (34.8%) | 597 (29.7%) | 486 (27.4%) | 885 (28.2%) |
| Benzodiazepines | 29,930 (76.9%) | 7,861 (78.6%) | 2,519 (81.8%) | 1,525 (75.9%) | 1,345 (75.8%) | 2,472 (78.7%) |
| Anticholinergics | 35,902 (92.3%) | 9,708 (97.0%) | 3,001 (97.5%) | 1,946 (96.9%) | 1,735 (97.8%) | 3,026 (96.3%) |
| Other cognitive tests that cannot be categorized into severity and extracted biomarkers | ||||||
| Have APOE recorded | 463 (1.2%) | 4,829 (34.1%) | 777 (20.4%) | 923 (33.1%) | 1,272 (45.4%) | 1,862 (39.0%) |
| FAQ score (mean, SD) | n/a | 10.6 (8.70) | 7.52 (8.49) | 9.85 (7.22) | 10.8 (7.84) | 12.0 (8.96) |
| Mini-Cog score (mean, SD) | n/a | 3.51 (1.54) | 3.65 (1.53) | 2.44 (1.62) | 3.61 (1.40) | 3.13 (1.41) |
NHB: non-Hispanic Black; NHW: non-Hispanic White; HDL: high-density lipoprotein; HbA1c: Hemoglobin A1C; NSAID: Nonsteroidal anti-inflammatory drug
In addition to demographic variables, we also examined clinical characteristics (e.g., comorbidities such as cardiovascular diseases and hypertension, labs such as cholesterol and glucose, and medications such as Statin) by different severity category. Compared with patients without severity categories identified, those who have severity categories identified are more likely to have anxiety, apathy, cardiovascular diseases, hypertension, and diabetes; they are also more likely to have higher values in the lab tests we included (along with higher chances of have these tests performed). Corresponding to the higher prevalence of clinical comorbidities, those with severity measurements also tend to have received more treatments for those conditions (e.g., antihypertensive treatments such as statins).
Among the 6 included cognitive tests, MOCA and MMSE are the most frequently used, with 5,295 (52.8%) and 5,461 (54.4%) patients have at least one record, respectively. Whereas CDR is the one least used, with only 46 (0.5%) patients had this test recorded. Table 5 shows a detailed analysis of the patients stratified by the number of cognitive tests recorded longitudinally. There are 5,299 (52.8%), 2,016 (20.1%), 868 (8.7%), 509 (5.1%), and 1,338 (13.3%) patients have 1, 2, 3, 4, and more than 5 tests available, respectively. Table 5 also shows the duration days for patients who have multiple (more than 1) cognitive test results in their charts. The duration is defined as the number of days from the first test to the last test. The median duration days for those who have multiple tests is 377 days with an interquartile range of 104 to 680. Among the 6 tests, MMSE has the longest duration (median 490, IQR: 186–1092) while CDR has the shortest duration (median 157, IQR: 140–511).
Table 5.
Number of patients by different cognitive tests and the number of these tests.
| Have the test | Number of tests | Duration of records (median, IQR) | |||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5+ | |||
| MoCA | 5,295 (52.8%) | 3,199 (31.9%) | 919 (9.2%) | 408 (4.1%) | 232 (2.3%) | 537 (5.4%) | 385 (172, 816) |
| MMSE | 5,461 (54.4%) | 3,317 (33.1%) | 926 (9.2%) | 415 (4.1%) | 243 (2.4%) | 560 (5.6%) | 490 (186, 1092) |
| Mini-Cog | 1,185 (11.8%) | 826 (8.2%) | 242 (2.4%) | 82 (0.8%) | 11 (0.1%) | 24 (0.2%) | 466 (299, 674) |
| FAQ | 555 (5.5%) | 267 (2.7%) | 36 0.4%) | 5 (<0.1%) | 2 (<0.1%) | 245 (2.4%) | 365 (127, 453) |
| CDR | 46 (0.5%) | 35 (0.3%) | 7 (0.1%) | 1 (<0.1%) | 3 (<0.1%) | 0 | 157 (140, 511) |
| GDS | 862 (8.6%) | 657 (6.6%) | 122 (1.2%) | 30 (0.3%) | 17 (0.2%) | 36 (0.4%) | 315 (125, 536) |
| Any test | 10,030* | 5,299 (52.8%) | 2,016 (20.1%) | 868 (8.7%) | 509 (5.1%) | 1,338 (13.3%) | 377 (104, 680) |
Discussion
Cognitive tests are critical information to assess the severity of the disease and track the progression of AD/ADRD; and biomarkers such as amyloid-β, tau, and APOE are important biological definitions of AD/ADRD that can help predict or diagnose the disease. The increasing availability of large scale EHRs for research has open new doors for accelerating AD/ADRD research (e.g., enabling simulation of clinical trials). Nevertheless, to support HER-based AD/ADRD studies, the ability to extract cognitive tests and biomarkers from EHRs is critical. Yet, both cognitive tests and biomarkers are often not available as structured EHR data but usually captured in clinical narratives. We started with a total of 96 measurements that are used to screen and monitor cognitive functions in older adults, including 85 cognitive tests and 11 biomarkers, extracted based on literature from the USPSTF and existing clinical trials of AD/ADRD,[7,18] nevertheless, only a handful of cognitive tests and biomarkers are documented in EHRs. There are three insights from these results: (1) most cognitive tests are not used in routine care, rightfully, many of them are academic exercise might be used in research settings but have not been proven useful in clinical care as evidenced by the consistent frequencies of cognitive tests between clinical trials and our data; (2) the existence of multiple different cognitive tests used in clinical care leads to harmonization issues across these measures, especially for research use of these data; and (3) a biological definition of AD/ADRD, as recommended in the 2018 NIA-AA Research Framework using the AT(N) system, remains to be a research tool and often not used in routine clinical care (possibility due to insurance and cost issues).
We developed a rule-based NLP system to extract cognitive tests and biomarkers from EHRs. Our NLP pipeline achieved an overall F1-score of 0.9059 in extracting 6 widely used cognitive tests and 1 genetic biomarker (APOE). We optimized the NLP pipeline towards an optimal precision of 100% to maximize the accuracy of the information extraction. Our NLP pipeline achieved F1-scores with a range from 0.83 to 0.96 across the 7 categories of AD/ADRD-related cognitive tests and biomarkers, where the extraction of APOE achieved the best F1-score of 0.963 and the extraction of MoCA scores achieved the lowest F1-score of 0.8399. Overall, the rule-based NLP pipeline worked well as these cognitive tests and biomarker information have consistent patterns with limited variations.
Our work is important because we create a unified measurement of AD/ADRD severity from EHRs. To the best of our knowledge, there has been no effort in extracting different cognitive tests and harmonizing them into categorical severity in previous EHR-based studies. Our findings can be used as real-world endpoints for observational EHR-based cohort studies aiming towards generating real-world evidence (e.g., estimating the treatment effectiveness from RWD sources like EHRs). Real-world data and real-world evidence are playing an increasing role in health care decision making. For example, the U.S. Food and Drug Administration (FDA) has started to RWE generated from RWD to make regulatory decisions, and the clinical research communities have started to use RWD and RWE to support clinical trial designs. Our prior studies have attempted to use RWD to rationalize clinical trials’ eligibility criteria design for AD trials, assessing how trial eligibility criteria can jointly affect trial generalizability and patient safety, [28] where we were not able to model the impact of change eligibility criteria on drug effectiveness and the key barrier was lacking real-world endpoints that we can use to model treatment effectiveness. The harmonized AD/ADRD severity form the current study can be used to track patients’ longitudinal severity changes, thus, serves as a real-world endpoints for treatment effectiveness. For example, one could track the harmonized severity among patients with different progressive states in the spectrum of AD/ADRD to determine patient’s disease progression. In addition, the NLP pipeline we developed can be used to accurately extract multiple cognitive tests and biomarkers of AD/ADRD from clinical narratives, which greatly compensated the lack of documentations in structured EHR data. Based on our NLP results, the documentation of cognitive tests and biomarkers in EHRs for AD/ADRD patients is rather low. Among the entire cohort of 48,912 patients, there were 10,030 (20.5%) have cognitive tests recorded in their clinical notes, and a high percentage of them - 4,731 (47.2%) - have multiple records over the study period. Considering that AD/ADRD is an aging disease, this makes it challenging for tracking the disease progression of AD/ADRD patients. Nevertheless, the duration of records is rather short in our study, where most patients only have 1–2 years of records. These results lead to a number of important insights for future EHR-based AD/ADRD studies.
First, note that we used a loose definition of AD/ADRD (i.e., ≥ 1 AD/ADRD-related diagnosis codes), and it is well-known that using diagnostic codes alone to identify patients in RWD like EHRs leads to misclassification errors,[29–31] including for AD.[32,33] This is why validated computable phenotypes (i.e., “clinical conditions, characteristics, or sets of clinical features that can be determined solely from EHRs and ancillary data sources.”) are needed.[34–36] Accurately identifying AD/ADRD is not the goal of this paper; however, using only diagnostic code for cohort identification means that the prevalence of the documented cognitive tests and biomarkers would be higher than what are reported in Table 4 among “true” AD/ADRD patients identified using validated computable phenotypes.
Second, subsequently, even when using computable phenotypes to AD/ADRD, additional strategies are needed to (1) identify a subset of patients with more complete EHR information for downstream data analysis studies such as using algorithms to identify patients with high EHR data continuity [37]. Nevertheless, such strategies may introduce additional selection bias for the downstream analyses, which needs to be carefully considered (e.g., using causal-principled frameworks, such as the target trial approach,[38] for secondary data analyses of RWD); and/or (2) use imputation methods for dealing with the missing information issues. Nevertheless, carefully considerations are also needed, as the missing data in EHRs can be both missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR), where different imputation strategies are needed to deal with them separately.
Third, the incompleteness of patients in one health system indicates the importance of collecting patient EHRs from multiple health care systems for research use. Indeed, a number of national clinical data research networks have been established in the last few years. One of the prominent examples such network is the National Patient-Centered Clinical Research Network (PCORnet) funded by the Patient-Centered Outcomes Research Institute (PCORI). PCORnet is a network of networks), where the Clinical Research Networks (CRNs) in PCORnet combined have accumulated EHR data of more than 80 million patients nationally “from 337 hospitals, 169,695 physicians, 3,564 primary care practices, 338 emergency departments, and 1,024 community clinics.”[39] The PCORnet has also established a common protocol for privacy-preserving record linkage through a third-party company called Datavant, so that patient records from different data sources (i.e., from different hospitals or across EHRs and claims data sources) are linked. Our data source in this study, the UF Health system, is contributing to the OneFlorida+ network,[40] one of the 8 current CRNs in PCORnet 3.0. Such infrastructure will greatly increase our ability to obtain complete patient EHR histories to alleviate some of the missingness issues.
Further, we harmonized 4 cognitive tests commonly documented in EHRs for AD/ADRD patients into AD/ADRD severity categories (i.e., normal, mild, moderate, and severe), which can provide a consistent and harmonized outcome measure of AD/ADRD. Based on this, we then examined the documentation of cognitive scores and biomarkers in a cohort of AD/ADRD patients and found several factors are related to the prevalence of the documentation. We found that those who have severity categories identified (i.e., those with cognitive tests documented in EHRs) tend to be older with more non-Hispanic Whites, more former smokers, and more patients with Medicare. They also tend to have higher prevalence of AD/ADRD risk factors as shown in Table 4. Further, as the documented AD/ADRD severity increases, we also found the patient population gets older, has more females, has more Hispanics and non-Hispanic Blacks, which is consistent with previous literature on AD/ADRD patient characteristics.[1,3]
Finally, in our study, the most frequently documented cognitive tests are MMSE and MoCA, both covers about half of the AD/ADRD patient cohort; nevertheless, among all the AD/ADRD patients, only about 20.1% have repeated cognitive test scores documented. This may due to the poor documentation of cognitive scores as reported in several clinical samples, [41–43] and previous study also found that patients with undocumented cognitive impairment were significantly less likely to have diagnostic evaluations or use of health services. It is also worth noting that different cognitive tests are designed with different purpose, thus may measuring different aspects of cognitive impairment. As evidenced by our findings of the 4 cognitive tests that can mapped to AD/ADRD severity, MoCA is mostly used among patients with mild and moderate disease, while MMSE was more likely to be used for patients with severe disease. Future studies can extent on our work to examine the consistencies between different cognitive tests. Further, a standardized approach of documenting cognitive measurements in EHR systems is also warranted.
Conclusion
Clinical NLP pipelines can be used to extract cognitive tests and biomarkers of AD/ADRD from clinical narratives, which can be used to create a unified measurement of AD/ADRD severity and track patients’ longitudinal severity changes. Yet, the documentation of cognitive tests and biomarker information in EHRs appears to be low; nevertheless, RWD is still an important resource for AD/ADRD research, and a number of strategies can and should be considered when using RWD for AD/ADRD research. Future studies are warranted to explore these strategies.
Summary Points.
What was already known on the topic?
Cognitive tests are the key information to assess the severity and track the progression of AD/ADRD
There are challenges in accurately identifying AD/ADRD and tracking its severity in EHRs
What this study added to our knowledge?
We developed a clinical NLP pipeline to extract cognitive tests and biomarkers of AD/ADRD from clinical narratives
We create a unified measurement of AD/ADRD severity and track patients’ longitudinal severity changes based on information extracted from clinical notes
The documentation of cognitive tests and biomarker information in EHRs appears to be low, advanced methods can be considered when using RWD for AD/ADRD research.
Acknowledgements
This work was supported in part by NIH grants R01AG076234, R56AG069880, and R21AG068717, as well as PCORI Award (ME-2018C3-14754). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or PCORI.
References
- [1].Rajan KB, Weuve J, Barnes LL, McAninch EA, Wilson RS, Evans DA, Population estimate of people with clinical Alzheimer’s disease and mild cognitive impairment in the United States (2020–2060), Alzheimers. Dement. (2021). 10.1002/alz.12362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Nichols E, Vos T, Estimating the global mortality from Alzheimer’s disease and other dementias: A new method and results from the Global Burden of Disease study 2019, Alzheimers. Dement. 16 (2020). 10.1002/alz.042236. [DOI] [Google Scholar]
- [3].2021 Alzheimer’s disease facts and figures, Alzheimers. Dement. 17 (2021) 327–406. 10.1002/alz.12328. [DOI] [PubMed] [Google Scholar]
- [4].Kramarow EA, Tejada-Vera B, Dementia mortality in the United States, 2000–2017, Natl. Vital Stat. Rep. 68 (2019) 1–29. https://www.ncbi.nlm.nih.gov/pubmed/31112120. [PubMed] [Google Scholar]
- [5].Underlying Cause of Death, 1999–2020 Request, (n.d.). https://wonder.cdc.gov/ucd-icd10.html (accessed March 3, 2022). [Google Scholar]
- [6].Creavin ST, Wisniewski S, Noel-Storr AH, Trevelyan CM, Hampton T, Rayment D, Thom VM, Nash KJE, Elhamoui H, Milligan R, Patel AS, Tsivos DV, Wing T, Phillips E, Kellman SM, Shackleton HL, Singleton GF, Neale BE, Watton ME, Cullum S, Mini-Mental State Examination (MMSE) for the detection of dementia in clinically unevaluated people aged 65 and over in community and primary care populations, Cochrane Database Syst. Rev. (2016) CD011145. 10.1002/14651858.CD011145.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].US Preventive Services Task Force, Screening for Cognitive Impairment in Older Adults: US Preventive Services Task Force Recommendation Statement, JAMA. 323 (2020) 757–763. 10.1001/jama.2020.0435. [DOI] [PubMed] [Google Scholar]
- [8].Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, Cummings JL, Chertkow H, The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment, J. Am. Geriatr. Soc. 53 (2005) 695–699. 10.1111/j.1532-5415.2005.53221.x. [DOI] [PubMed] [Google Scholar]
- [9].Jack CR Jr, Bennett DA, Blennow K, Carrillo MC, Dunn B, Haeberlein SB, Holtzman DM, Jagust W, Jessen F, Karlawish J, Liu E, Molinuevo JL, Montine T, Phelps C, Rankin KP, Rowe CC, Scheltens P, Siemers E, Snyder HM, Sperling R, Contributors, NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease, Alzheimers. Dement. 14 (2018) 535–562. 10.1016/j.jalz.2018.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].O. of the Commissioner, Real-World Evidence, (2020). https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence (accessed May 7, 2020).
- [11].Duan R, Chen Z, Tong J, Luo C, Lyu T, Tao C, Maraganore D, Bian J, Chen Y, Leverage real-world longitudinal data in large clinical research networks for Alzheimer’s disease and related dementia (ADRD), AMIA Annu. Symp. Proc. 2020 (2020) 393–401. https://www.ncbi.nlm.nih.gov/pubmed/33936412. [PMC free article] [PubMed] [Google Scholar]
- [12].Desai U, Kirson NY, Lu Y, Bruemmer V, Andrews JS, Disease severity at the time of initial cognitive assessment is related to prior health-care resource use burden, Alzheimers Dement. (Amst.). 12 (2020) e12093. 10.1002/dad2.12093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Chen Z, Zhang H, Guo Y, George TJ, Prosperi M, Hogan WR, He Z, Shenkman EA, Wang F, Bian J, Exploring the feasibility of using real-world data from a large clinical data research network to simulate clinical trials of Alzheimer’s disease, NPJ Digit Med. 4 (2021) 84. 10.1038/s41746-021-00452-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Maserejian N, Krzywy H, Eaton S, Galvin JE, Cognitive measures lacking in EHR prior to dementia or Alzheimer’s disease diagnosis, Alzheimers. Dement. 17 (2021) 1231–1243. 10.1002/alz.12280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Harding BN, Floyd JS, Scherrer JF, Salas J, Morley JE, Farr SA, Dublin S, Methods to identify dementia in the electronic health record: Comparing cognitive test scores with dementia algorithms, Healthc (Amst). 8 (2020) 100430. 10.1016/j.hjdsi.2020.100430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Pichon A, Idnay B, Marder K, Schnall R, Weng C, Cognitive Function Characterization Using Electronic Health Records Notes, AMIA Annu. Symp. Proc. 2021 (2021) 999–1008. https://www.ncbi.nlm.nih.gov/pubmed/35308911. [PMC free article] [PubMed] [Google Scholar]
- [17].Halpern R, Seare J, Tong J, Hartry A, Olaoye A, Aigbogun MS, Using electronic health records to estimate the prevalence of agitation in Alzheimer disease/dementia, Int. J. Geriatr. Psychiatry. 34 (2019) 420–431. 10.1002/gps.5030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Lin JS, O’Connor E, Rossom RC, Perdue LA, Burda BU, Thompson M, Eckstrom E, Screening for Cognitive Impairment in Older Adults: An Evidence Update for the U.S. Preventive Services Task Force, Agency for Healthcare Research and Quality (US), Rockville (MD), 2013. http://www.ncbi.nlm.nih.gov/books/NBK174643/ (accessed March 3, 2022). [PubMed] [Google Scholar]
- [19].Stenetorp P, Pyysalo S, Topíc G, Ohta T, Ananiadou S, Tsujii J, BRAT: AWeb-based tool for NLP-Assisted text annotation, in: EACL 2012 - Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012. [Google Scholar]
- [20].Yang X, Yang H, Lyu T, Yang S, Guo Y, Bian J, Xu H, Wu Y, A Natural Language Processing Tool to Extract Quantitative Smoking Status from Clinical Narratives, in: 2020 IEEE International Conference on Healthcare Informatics (ICHI), 2020: pp. 1–2. 10.1109/ICHI48887.2020.9374369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Liu M, Hu Y, Tang B, Role of text mining in early identification of potential drug safety issues, Methods Mol Biol. 1159 (2014) 227–251. 10.1007/978-1-4939-0709-0_13. [DOI] [PubMed] [Google Scholar]
- [22].Saczynski JS, Inouye SK, Guess J, Jones RN, Fong TG, Nemeth E, Hodara A, Ngo L, Marcantonio ER, The Montreal Cognitive Assessment (MoCA): Creating a Crosswalk with the Mini-Mental State Examination, J Am Geriatr Soc. 63 (2015) 2370–2374. 10.1111/jgs.13710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Saczynski JS, Inouye SK, Guess J, Jones RN, Fong TG, Nemeth E, Hodara A, Ngo L, Marcantonio ER, The Montreal Cognitive Assessment (MoCA): Creating a Crosswalk with the Mini-Mental State Examination, J Am Geriatr Soc. 63 (2015) 2370–2374. 10.1111/jgs.13710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Langbaum JB, Ellison NN, Caputo A, Thomas RG, Langlois C, Riviere M-E, Graf A, Lopez Lopez C, Reiman EM, Tariot PN, Hendrix SB, The Alzheimer’s Prevention Initiative Composite Cognitive Test: a practical measure for tracking cognitive decline in preclinical Alzheimer’s disease, Alz Res Therapy. 12 (2020) 66. 10.1186/s13195-020-00633-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Vellas B, Bateman R, Blennow K, Frisoni G, Johnson K, Katz R, Langbaum J, Marson D, Sperling R, Wessels A, Salloway S, Doody R, Aisen P, Task Force Members, Endpoints for Pre-Dementia AD Trials: A Report from the EU/US/CTAD Task Force, J Prev Alzheimers Dis. 2 (2015) 128–135. 10.14283/jpad.2015.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].O’Bryant SE, Waring SC, Cullum CM, Hall J, Lacritz L, Massman PJ, Lupo PJ, Reisch JS, Doody R, Staging Dementia Using Clinical Dementia Rating Scale Sum of Boxes Scores, Arch Neurol. 65 (2008) 1091–1095. 10.1001/archneur.65.8.1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Reisberg B, Ferris SH, de Leon MJ, Crook T, The Global Deterioration Scale for assessment of primary degenerative dementia, Am J Psychiatry. 139 (1982) 1136–1139. 10.1176/ajp.139.9.1136. [DOI] [PubMed] [Google Scholar]
- [28].Li Q, Guo Y, He Z, Zhang H, George TJ Jr, Bian J, Using Real-World Data to Rationalize Clinical Trials Eligibility Criteria Design: A Case Study of Alzheimer’s Disease Trials, AMIA Annu. Symp. Proc. 2020 (2020) 717–726. https://www.ncbi.nlm.nih.gov/pubmed/33936446. [PMC free article] [PubMed] [Google Scholar]
- [29].Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM, A review of approaches to identifying patient phenotype cohorts using electronic health records, J. Am. Med. Inform. Assoc. 21 (2014) 221–230. 10.1136/amiajnl-2013-001935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, Basford M, Chute CG, Kullo IJ, Li R, Pacheco JA, Rasmussen LV, Spangler L, Denny JC, Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network, J. Am. Med. Inform. Assoc. 20 (2013) e147–54. 10.1136/amiajnl-2012-000896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Banda JM, Seneviratne M, Hernandez-Boussard T, Shah NH, Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models, Annu Rev Biomed Data Sci. 1 (2018) 53–68. 10.1146/annurev-biodatasci-080917-013315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Lin P-J, Kaufer DI, Maciejewski ML, Ganguly R, Paul JE, Biddle AK, An examination of Alzheimer’s disease case definitions using Medicare claims and survey data, Alzheimers. Dement. 6 (2010) 334–341. 10.1016/j.jalz.2009.09.001. [DOI] [PubMed] [Google Scholar]
- [33].Ponjoan A, Garre-Olmo J, Blanch J, Fages E, Alves-Cabratosa L, Martí-Lluch R, Comas-Cufí M, Parramon D, García-Gil M, Ramos R, How well can electronic health records from primary care identify Alzheimer’s disease cases?, Clin. Epidemiol. 11 (2019) 509–518. 10.2147/CLEP.S206770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Tasker RC, Why Everyone Should Care About Computable Phenotypes, Pediatric Critical Care Medicine: A Journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies. 18 (2017) 489–490. 10.1097/PCC.0000000000001115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Wei W-Q, Denny JC, Extracting research-quality phenotypes from electronic health records to support precision medicine, Genome Med. 7 (2015) 41. 10.1186/s13073-015-0166-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, Zhu Q, Xu J, Montague E, Carrell DS, Lingren T, Mentch FD, Ni Y, Wehbe FH, Peissig PL, Tromp G, Larson EB, Chute CG, Pathak J, Denny JC, Speltz P, Kho AN, Jarvik GP, Bejan CA, Williams MS, Borthwick K, Kitchner TE, Roden DM, Harris PA, Desiderata for computable representations of electronic health records-driven phenotype algorithms, J. Am. Med. Inform. Assoc. 22 (2015) 1220–1230. 10.1093/jamia/ocv112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Lin KJ, Rosenthal GE, Murphy SN, Mandl KD, Jin Y, Glynn RJ, Schneeweiss S, External validation of an algorithm to identify patients with high data-completeness in electronic health records for comparative effectiveness research, Clin. Epidemiol. 12 (2020) 133–141. 10.2147/CLEP.S232540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Hernán MA, Robins JM, Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available, Am. J. Epidemiol. 183 (2016) 758–764. 10.1093/aje/kwv254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Forrest CB, McTigue KM, Hernandez AF, Cohen LW, Cruz H, Haynes K, Kaushal R, Kho AN, Marsolo KA, Nair VP, Platt R, Puro JE, Rothman RL, Shenkman EA, Waitman LR, Williams NA, Carton TW, PCORnet® 2020: current state, accomplishments, and future directions, J. Clin. Epidemiol. 129 (2021) 60–67. 10.1016/j.jclinepi.2020.09.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Hogan WR, Shenkman EA, Robinson T, Carasquillo O, Robinson PS, Essner RZ, Bian J, Lipori G, Harle C, Magoc T, Manini L, Mendoza T, White S, Loiacono A, Hall J, Nelson D, The OneFlorida Data Trust: a centralized, translational research data infrastructure of statewide scope, J. Am. Med. Inform. Assoc. (2021). 10.1093/jamia/ocab221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Mackin RS, Areán PA, Incidence and documentation of cognitive impairment among older adults with severe mental illness in a community mental health setting, Am. J. Geriatr. Psychiatry. 17 (2009) 75–82. 10.1097/JGP.0b013e31818cd3e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Löppönen M, Räihä I, Isoaho R, Vahlberg T, Kivelä S-L, Diagnosing cognitive impairment and dementia in primary health care -- a more active approach is needed, Age Ageing. 32 (2003) 606–612. 10.1093/ageing/afg097. [DOI] [PubMed] [Google Scholar]
- [43].Callahan CM, Hendrie HC, Tierney WM, Documentation and evaluation of cognitive impairment in elderly primary care patients, Ann. Intern. Med. 122 (1995) 422–429. 10.7326/0003-4819-122-6-199503150-00004. [DOI] [PubMed] [Google Scholar]
