Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Feb 1.
Published in final edited form as: Seizure. 2021 Jan 13;85:138–144. doi: 10.1016/j.seizure.2020.11.011

Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches

Barbara M Decker 1, Chloé E Hill 2, Steven N Baldassano 1, Pouya Khankhanian 1
PMCID: PMC7897304  NIHMSID: NIHMS1663908  PMID: 33461032

Abstract

As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables.

Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.

Keywords: Epilepsy, Natural Language Processing, Antiepileptic drug efficacy, Electronic health record, Automated extraction

INTRODUCTION

Rationale:

Epilepsy affects 60 million people worldwide.1 Anti-epileptic drugs (AEDs) are first-line therapy for epilepsy and control seizures in two thirds of patients.2 More than 25 AEDs are now available and rational “trial and error” often determines drug choice, as comparative data on efficacy between AEDs remains limited. Despite millions of people taking AEDs daily, retrospective and prospective chart review studies comparing AEDs head-to-head are only available for a limited number of medications. Moreover, small samples sizes limit interpretation of these studies. A wealth of information regarding AED efficacy lies within electronic health records (EHRs), yet efficient data extraction has remained a critical barrier to closing this knowledge gap.

Overview:

The goal of this report is to explore how pertinent data can be automatically extracted from the EHR for studies of AED efficacy and other pertinent epilepsy variables relevant to improving patient care. The main exposure variable is AED prescription. The outcome variable of seizure frequency can be assessed by comparing pre- and post-drug seizure frequencies. To compare two or more AEDs, cohorts of patients taking the AEDs of interest can be matched on relevant covariables to minimize confounding and relative changes in seizure frequency over time can be measured.

Patient characteristics that define the clinical context (Table 1) are important to include in the analysis, as they have potential to cause confounding effects in multiple ways. Demographic variables, such as socioeconomic status (SES), comorbidities (including psychiatric conditions such as substance abuse), and medication allergies may play a role in how providers choose which AED to trial, exacerbating nonrandom assignment to treatment groups. Underlying epilepsy etiology, baseline epilepsy severity, and age of onset may also correlate with refractoriness to AED treatment. Furthermore, underlying cause of epilepsy may have interaction effects with particular medications, such as the indication for broad-spectrum agents in the treatment of generalized epilepsy.

TABLE 1: KEY VARIABLES:

If these variables could be extracted from the electronic health record automatically, an antiepileptic drug efficacy study could be performed.

1) Exposure:
 a) Prescribed medication extraction (S+U)
2) Covariables:
 a) Age, gender, & ethnicity (S)
 b) Socioeconomic status by zip code, type of insurance (S)
 c) Epilepsy type (U)
 d) Seizure type (U)
 e) Underlying cause of the epilepsy (S+U):
  i) genetic syndrome
  ii) structural lesions
   (1) strokes
   (2) tumors
   (3) surgery
   (4) bleeding
  iii) autoimmune disease
  iv) neurodegenerative disease
  v) traumatic brain injury
 f) Age of onset (S)
 g) Medication allergies (S)
 h) Co-morbid diseases, aka past medical history (S+U)
 i) Refractory epilepsy: as defined by number of other medications tried, or surgery tried
 j) Epilepsy risk factors (S+U)
 i) Abnormal birth and development (e.g. prematurity, developmental delay, autism)
  ii) History of brain infection
  iii) Substance abuse
  iv) Family history of epilepsy
  v) History of febrile seizure
 k) Prior EEG abnormalities (U)
3)  Outcome: seizure frequency and surrogates thereof
 a) Seizure frequency (U)
 b) Use of rescue medications (S+U)
 c) Need for increase in therapy: e.g. doses increased, new meds added (S+U)
 d) Use of the nursing telephone help-line (S+U)
 e) Use of the emergency room (S+U)
 f) Hospital admissions (S+U)
 g) Total health-care cost (calculated based on the above)

(S) = structured fields; (U) = Unstructured data; (S+U) = structured fields and unstructured data.

In this investigation, we performed a literature review of the currently available data extraction methods for the pertinent variables and examined techniques specific to epilepsy when appropriate to the variable (e.g., seizure frequency).3 For more generalizable variables (e.g., medication), we also reviewed methods developed outside the epilepsy field.

METHODS

Search strategy:

We performed a PubMed search for the variables of interest listed in Table 1. The specific PubMed search phrases are described in detail in the supplementary text. Abstracts were screened for relevance by employing a method of automated EHR extraction for one of the pertinent variables. Some manuscripts focused on the details of a method of extraction and its accuracy, while other manuscripts employed automated extraction methods as a means to describe a clinical outcome; both types of manuscripts were deemed relevant. Once a manuscript was deemed relevant, we reviewed and catalogued elements of the manuscript, including the extraction method summary, accuracy measures, and estimate of generalizability to epilepsy as applicable if the method was not specifically created for epilepsy. Citations within relevant manuscripts were also reviewed in the same manner.

Terminology:

Data in the EHR comprises of both structured and unstructured fields. Structured data fields use controlled vocabulary and limit the variability, which allows more inter-user consistency and accurate data aggregation. As examples of structured fields, blood pressure must be populated by exactly two numerical values (systolic and diastolic pressure), a medication field must be populated by a recognized medication name as selected from a standardized list, or a medical problem list must be populated by selecting from a list of Intelligent Medical Objects (IMOs) or International Statistical Classification of Disease codes (ICD codes).

Conversely, unstructured data components are composed of narrative text or prose written by the provider, usually in the setting of a progress note in various encounter settings. Telephone calls are also often documented with free text. Natural Language Progressing (NLP) uses computer algorithms to extract information from unstructured free text language. Simple forms of NLP use dictionaries (lists of terms or synonyms) and rules (pre-set sentence structures) to extract information. More complex forms of NLP use machine learning to create a classifier, which categorizes a note with the presence or absence of a particular variable. NLP algorithms and machine learning processes are often compiled into larger pipelines.

Accuracy measures are critical to assess the performance of each method. The most commonly reported measures of accuracy are precision, recall, and the F1 statistic for data retrieval. Precision is the proportion of retrieved data that is true (positive predictive value). Recall is the proportion of true data that is correctly retrieved (sensitivity). The F1 statistic is the geometric mean between precision and recall; this statistic ranges from zero to one, where the value of one indicates perfect accuracy. The area under the receiver operator curve (AUC) is a summary measure that combines sensitivity and specificity and ranges from zero to one, where a value of one indicates perfect accuracy. For any extraction algorithm measuring a given variable, precision and recall can be reported on a training set that was used to create the extraction algorithm or on a test set, which is independent and not used to develop the extraction algorithm.

This report will reference specific EHR extraction algorithms and pipelines, including complex NLP machine learning methods (please see table 2), and report their measures of accuracy. In this review, all statistics were assumed to be reported on independent test sets unless specifically stated otherwise. Further elaboration of specific methodology of the pipelines is beyond the scope of this review.

TABLE 2:

Acronyms and abbreviations for data standards and methods of machine learning

ACE score Adverse Childhood Events scoring system
AUC area under the curve (receiver operator)
Bi-LSTM bidirectional LSTM
CNN convolutional neural network
CRF Conditional Random Field
cTakes Clinical Text Analysis, Knowledge Extraction System
CUI Concept unique identifier
EpSO Epilepsy-Specific Language
GATE General Architecture for Text Engineering
GUI Graphical User Interface
HPO Human Phenotype Ontology
ICD codes International Statistical Classification of Diseases
HEDEA Healthcare Data Extraction and Analysis
HL7 Health Level Seven International
LOINC Logical Observation Identifiers Names and Codes
LSTM Long short-term memory
MTL Multi-task Learning
N-gram a phrase of n words (e.g. a trigram is a 3word phrase)
NER Named entity recognition
NLP Natural Language Processing
NLTK a python NLP toolkit
RxNorm normalized names for clinical drugs
SNOMED-CT a systematically organized computer processable collection of clinical terms
providing codes, terms, synonyms and definitions
SVM Support Vector Machines
UIMA Unstructured Information Management Application
UMLS United Medical Language System

RESULTS

Over 2000 articles were returned by the PubMed search criteria and screened for relevance. A total of 128 articles were deemed sufficiently relevant for detailed review.

EXPOSURE VARIABLES

Medication:

There are a variety of emerging techniques to extract medications from the EHR, in part due to publicly issued crowdsource challenges which have generated a large number of highly accurate pipelines.4 Medications recorded within the EHR have variable interpretations depending on the context, as records may include current medications, past medications, recommended future medications, or medication allergies, which makes extractions challening.5 Many analysts suggest methods to account for this, such dividing a clinical note into various sections (i.e., “current medications”, “allergies”, and “recommendations”).6,7 The best approaches include analyzing a combination of structured fields and unstructured notes.8 This approach should account for potential differences in prescribing patterns among institutions and/or medication documentation in the EHR.6,8

The best pipeline for extraction of active medication ingredients and dosing achieves greater than 97% F1s.9,10 However, extracting the indication for each medication is more difficult, with the best group achieving F1 of 66%.11 Extracting the indication is particularly important in the field of epilepsy, since many antiepileptic drugs may be prescribed for alternative indications (e.g. topiramate for migraine, oxcarbazepine for mood stabilization, etc.).

COVARIABLES

Age and Sex:

Age and binary sex are extracted reliably from structured fields.12 Improved methods are needed to account for nonbinary gender designations.

Race and ethnicity:

Race and ethnicity are often extracted from structured fields.1216 However, extraction can be incompletely sensitive, and missing data tend to be biased, with underreporting of underrepresented socioeconomic demographics.1720 When ethnicity is present in structured fields, Denny et al report over 90% concordance with genetic ethnicity, which can be used as a gold standard for ethnicity extraction when available.17,21 Sholle et al augmented structured fields with simple NLP to achieve an F1 of 91%.19

Socioeconomic status (SES):

Some authors were able to extract values from structured fields while others used relatively simple forms of NLP to improve data or population statistics to extrapolate based on geography, such as the use of postal codes to impute SES.22 Bejan et al captured “homelessness” with an AUC of 0.83.23 Hatef et al showed that a simple NLP algorithm can supplement structured codes for “financial strain” and “housing issues” and increase recall by 10–15-fold over the use of ICD codes alone.18 Canadian census tract data leveraged by zip code using a combined deprivation index, which combines several ‘material’ and ‘social’ variables (such as income, education, living alone or with a spouse, etc.) has been used to derive a single measure of SES.24 Hollister et al used simple search terms to extract “low education” (PPV 80%), occupation (85%), unemployment (88%), retirement (64%), uninsured status (23%), Medicaid status (82%), and homelessness (33%).25

Epilepsy type and seizure type:

Focal epilepsy (FE), generalized epilepsy (GE), and unknown epilepsy (UE) can be discriminated by n-gram and an SVM classifier, as illustrated by Connolly et al who trained the model on data from one institution and tested it from two other institutions (F1 = 72%), which suggests reasonable generalizability. Performance further improved when trained on data from two institutions and tested on data from a third (F1 = 80%).26 A method defined as PEEP (phenotype extraction in epilepsy) extracted epileptogenic zone, etiology, and EEG pattern from Epilepsy Monitoring Unit (EMU) discharge summaries. F1s ranged between 75% to 85% for exact matches for semiology, lateralizing signs, and EEG pattern, and up to 95% for epileptogenic zone. This is the best method reviewed for specific semiology and etiologies but performs only on EMU discharge summary note types.27

The most comprehensive pipeline specific to the field of epilepsy is ExECT (extraction of epilepsy clinical text).28 This extracts a diagnosis of epilepsy as a binary field (88% precision, 89% recall), focal seizures as a binary field (96% precision, 70% recall), generalized seizures as a binary field (89% precision, 52% recall), and epilepsy type as a trinary field, defined as either focal, generalized, or absence (90% precision, 80% recall).

Underlying cause of epilepsy:

A combination of approaches has been used, which include extracting text from unstructured neuroradiology reports, structured comorbid disease fields, and the unstructured impression section of clinician notes. For certain specific etiologies with no known automated EHR extraction (e.g. cortical dysplasia), raw image processing algorithms may be used.

The ExECT pipeline identified abnormal or epileptogenic imaging findings on CT (56% precision, 59% recall) and MRI (82% precision, 69% recall) from clinical encounters, though these were limited to binary yes/no variables.28 For brain tumors, metastatic brain tumors identified from radiology notes achieved an AUC of 0.92 and best accuracy of 83%.29

For cortical dysplasia, information extracted from raw imaging data successfully identified dysplasia locations in seven out of seven patients, using 30 controls as a reference.30 For mesial temporal sclerosis, hippocampus and amygdala atrophy was identified from raw imaging data to within approximately 10% of the “gold standard” based on structural volume.31,32 Castro et al improved on an initial screen of ICD codes with a simple dictionary and classifier NLP to achieve 86% PPV for cerebral aneurysm.12 Combining ICD codes and NLP achieved 100% PPV and 94% NPV to identify venous thromboembolism.21 Existing pipelines for flexible information extraction from free-text radiology reports have not been validated in epilepsy but may be repurposed for use within this field.3335

Prior EEG abnormalities:

In addition to direct analysis of raw EEG signal, EEG abnormalities can be captured by means of structured fields and unstructured reports. Biswal et al achieved an AUC of 0.99 for detecting reports with seizures and AUC of 0.96 for epileptiform discharges, but they did not differentiate focal from generalized findings.36

Bao reported 94% accuracy in interictal EEG diagnosis from raw EEG signal. However, this does not improve on the accuracy obtained from structured fields and unstructured reports, and increased the computational resources required for analysis as well as the administrative cost to obtain the raw EEG signal data.37

Epilepsy severity:

Epilepsy severity can be estimated by several characteristics, which may be included in a model such as the number of current AEDs and historically prescribed AEDs (see the medications section above) or baseline seizure frequency (see seizure frequency section). Wissel et al used n-grams (up to n=3) and SVM to achieve sensitivity 80%, specificity 77%, PPV 25%, and NPV 98% for determining medically refractory epilepsy.38

Age of epilepsy onset:

Extracting the age of onset in epilepsy remains an unmet challenge. Methods exist to extract the age of onset for other diseases in family history sections (i.e. age of cancer onset in family member).39,40 However, these methods are not readily convertible to extract age of onset of a patient with epilepsy.

Medication allergies and adverse drug effects:

As noted in the medication section, prior studies reported that typical toolkits for NLP on clinical notes did not work well to abstract medications unless the note was divided into sections.41 The best reviewed available pipeline to identify medication allergies showed that an SVM model achieved the best average F1 of 89% on test data.42

Comorbidities and past medical history:

Comorbidities are often found in structured problem and diagnosis lists, but these are notoriously underpopulated by physicians, and thus need to be supplemented with free text extraction.43,44

SEDFE (SEmantics-Driven Feature Extraction) has been used to collect medical concepts from online knowledge sources as candidate features and derived methods to achieve AUCs ranging from 0.90–0.95.45 Other pipelines reviewed achieved slightly lower AUCs with widely differing methodologies,46,47 including the repurposing of a crowdsourcing marketplace.48

Capturing common psychiatric comorbidities associated with epilepsy remain challenging. Commonly, comorbidities such as anxiety and depression can be captured as a diagnosis or solely as patient-reported symptoms. Furthermore, these comorbidities are likely underdiagnosed and undertreated within the epilepsy population. Validated depression instruments, such as the Patient Health Questionnarie-9 (PHQ-9) to assess depressive symptoms, can be easily extracted from structured fields. However, Adekkanattu et al implemented an NLP platform to extract PHQ-9 scoring from unstructured clinical text for patients prescribed an antidepressant with high accuracy (F-score 97%) and found that nearly one-third of patients’ charts had a score that clinically indicated major depressive disorder without a structured ICD diagnosis code associated.49 In a study to predict advanced care of depression via statewide EHR data, free text extraction from decision tree models yielded AUC scores of approximately 90% for patients deemed high-risk patients versus approximately 80% for the overall patient population, respectively.50

Epilepsy risk factors – history of drug abuse:

The main work in this area has been developed for the detection of opioid dependence and smoking.51 Notably, many authors have reported that ICD codes are insufficient for the accurate diagnosis of opioid dependence or overdose.5254 NLP can help improve recall of drug abuse from unstructured fields in clinician’s notes.5557 Including nursing notes further improves performance.58 NLP has been shown to be more generalizable by publishing the difference in accuracy between their training and test sets.54

Variables more applicable to seizure risk include alcohol abuse and stimulant abuse. 5861 These variables have not been studied in as much detail as opiates and nicotine. The most accurate study reviewed attained F1s of 90% for alcohol abuse and 85% for drug abuse detection.60

Epilepsy risk factors – family history of epilepsy:

Family history statements have been extracted from a variety of note types for a variety of diseases, 6266 and it is likely that these can be repurposed to epilepsy. One method was able to achieve precision of 100% and recall of 97% using NLP; however, it was limited to looking at discharge summaries and admission notes.65,66 Mowery et al achieved precision 96% and recall 94% using NLP on clinicians’ notes.40

OUTCOME VARIABLES

Primary Outcome - Seizure frequency:

The ExECT pipeline (methods discussed in Epilepsy Type section above) identified the phrase or sentence within a clinical document that contained the seizure frequency but did not return an exact numeric value, with precision 86% and recall 54%.28 To our knowledge, this is the only study of seizure frequency in the literature.

Secondary Outcomes- Use of rescue medications and the total number of AEDs required:

Surrogate markers of recurrent seizures may be identified by the use of rescue medications or the need to add additional AEDs. AED levels can notably be measured from structured fields. See the medication section above for details on how medications can be extracted.

OTHER GENERAL USE PIPELINES:

Our PubMed search results also found a series of general use pipelines, documented in detail in the supplementary text. One effort worth noting is a proposed strategy for incorporating prospective research-quality data collection into the practitioners’ workflow without burdening practitioners with excessive documentation, which remains the primary barrier to this type of collection.67,68 They were able to implement prospective cohort building pipelines in 43 chronic diseases.67 The implementation of such a framework within the field of epilepsy would allow for prospective data collection which is known to be superior to retrospective studies.

PIPELINES IN OTHER DIALECTS AND LANGUAGES:

For greater applicability and sample size, a multi-center international study would require the use of pipelines in many languages. Currently, convert pipelines from one language to another remains limited. There is one notable study where NLP developed in Europe and the United States was applied to medical notes written in Indian-English to extract medical diagnoses, labs, procedures, demographic information, and outcomes.69 Details about pipelines in other languages can be found in the supplementary material.

DISCUSSION

Automated extraction of EHR data has advanced impressively in the last several years, with a plethora of methods published to date. While this review was not systematic, our comprehensive approach returned over 2000 articles by PubMed search criteria of which 128 articles were reviewed in detail. The key studies worth highlighting from this search include: 1) PEEP (phenotype extraction in epilepsy) which is highly accurate for medication dosing and extracts key information regarding epileptogenic zone, etiology, semiology and EEG patterns from EMU discharge summaries9,10 and 2) ExECT (extraction of epilepsy clinical text) which extracts epilepsy diagnosis, epilepsy type, abnormal findings on imaging, and seizure frequency (albeit with some limitations).28 However, great variability in algorithm performance across variables persists.

Some variables are easily and accurately extracted from the EHR by way of structured and unstructured fields, such as age, sex, and family history. Other variables can be extracted with reasonably high sensitivity but at the price of lower specificity, such as SES, ethnicity, epilepsy risk factors, EEG and MRI results.

Medications can be extracted with high accuracy (precision and recall > 95%) using a number of approaches, thanks in part due to public challenges that awarded prize money to competing teams to crowdsource the best method. Furthermore, techniques that are developed in fields outside of epilepsy can easily be applied to epilepsy.

Yet, several variables cannot be reliably extracted with current published methods, including seizure etiology and epilepsy severity. The problem of seizure etiology may potentially be solved by a multi-modal approach incorporating EEG findings, MRI findings, structured and unstructured fields, and setting up decision trees based on ILAE diagnostic criteria. This approach would be relatively easy to implement if the underlying variables could be extracted reliably (EEG findings, MRI findings, and comorbidities). Epilepsy severity is complex and multivariable, including the number of medications used, seizure frequency, presence of convulsive seizures, and potentially electroencephalographic markers.

The greatest limitation to assessing AED efficacy using EHR data is that seizure frequency cannot be extracted reliably from the record using currently available techniques. Assessing seizure frequency accuracy is crucial and can serve as a surrogate of seizure control efficacy. The American Academy of Neurology has emphasized documentation of seizure frequency as a recent quality measure, which provides further unstructured data to extract these measures.70 Unstructured data sources of seizure frequency could include seizure diaries, collateral history, telephone encounters, etc. A recent approach classified and categorized patient portal messages to appropriately triage communication. This approach could be implemented to identify which message types are associated with poor epilepsy control (i.e., seizure reporting) versus routine communication.71

Our search, described in Section 1 of the supplementary text, yielded 72 results pertinent to “seizure” which included “seizure frequency.” The best reviewed techniques can identify only the seizure frequency text with poor sensitivity (~50%) and only marginal specificity (~80%). As previously mentioned, the ExECT pipeline identified the phrase within clinical text that contained seizure frequency, but was unable to return an exact numeric value in part due to the challenges of reporting variability (F1 66%); this is the only study of seizure frequency to our knowledge in the literature at the time of this review.28 Future approaches to increase sensitivity will undoubtedly cost specificity measures. However, the creation of semi-automated algorithms could be developed, wherein an automated method screens notes comprehensively for key sentences with subsequent manual review to extract seizure frequency quantitatively and with greater accuracy. However, these semi-automated methods would require time intensive human review of each data-reduced chart.

Within the NLP field, the continued use of crowdsourcing will be vital to creating new pipelines and increasing accuracy to optimize data extraction from EHRs. Notably, medication extraction, medication allergy extraction, and opioid use and dependence can all be readily and accurately extracted with one of several available pipelines produced through crowdsourcing competitions.

As automated extraction methods continue to evolve, standards on reporting the accuracy of these pipelines should be followed. This will allow for comparisons to be drawn between methods. We call for, at minimum, the reporting of precision, recall, and F1 statistics for training and test sets (when a test set is available). We also recommend for all studies to use an independent test set when possible, and ideally another independent validation dataset from a different institution. This reporting is crucial because accuracy decrease between the training set and the test set is informative of the algorithm’s generalizability. We also appreciate that most of the studies that we reviewed have provided public access to the extraction algorithms, and we encourage all authors to do the same.

CONCLUSION

In summary, we evaluated the feasibility, availability, and performance of automated data extraction methods to facilitate prospective and retrospective investigation of AED efficacy and pertinent epilepsy variables. The most significant roadblock remains the dearth of algorithms to extract seizure frequency.

Supplementary Material

1

HIGHLIGHTS:

  • Automated data extraction is rapidly evolving and can be harnessed to efficiently mine the electronic health record.

  • Natural language processing (NLP) of unstructured text improves data extraction accuracy when added to ICD coding and structured fields.

  • We review these techniques specific to epilepsy and highlight strengths as well as areas of further improvement.

ACKNOWLEDGEMENTS:

The authors would like to acknowledge Brian Litt, MD for his expertise and support.

FUNDING:

Barbara Decker, MD receives funding from NIH T32-NS-061779. Pouya Khankhanian, MD receives finding from NIH T32-NS-091008. Chloé E Hill, MD, MS receives funding from NIH KL2TR002241. Steven Baldassano, MD receives funding from NIH T32-NS-091006-01. We confirm that we have read the journal’s position on ethical publication and affirm that this report is consistent with those guidelines.

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

DECLARATION OF COMPETING INTERESTS: None.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES:

  • 1.Allers K, Essue BM, Hackett ML, et al. The economic impact of epilepsy: a systematic review. BMC Neurol. 2015;15(1):245. doi: 10.1186/s12883-015-0494-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Eatock J, Baker GA. Managing patient adherence and quality of life in epilepsy. Neuropsychiatr Dis Treat. 2007;3(1):117–131. doi: 10.2147/nedt.2007.3.1.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Velupillai S, Suominen H, Liakata M, et al. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J Biomed Inform. 2018;88:11–19. doi: 10.1016/j.jbi.2018.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jagannatha A, Liu F, Liu W, Yu H. Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0). Drug Saf. 2019;42(1):99–111. doi: 10.1007/s40264-018-0762-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chhieng D, Day T, Gordon G, Hicks J. Use of natural language programming to extract medication from unstructured electronic medical records. AMIA. Annu Symp proceedings AMIA Symp October 2007:908 http://www.ncbi.nlm.nih.gov/pubmed/18694008 Accessed February 24, 2020. [PubMed] [Google Scholar]
  • 6.Sohn S, Clark C, Halgrim SR, et al. Analysis of Cross-Institutional Medication Description Patterns in Clinical Narratives. Biomed Inform Insights. 2013;6s1:BII.S11634. doi: 10.4137/BII.S11634 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Farooq F, Yu S, Anand V, Krishnapuram B. Categorizing medications from unstructured clinical notes. AMIA Jt Summits Transl Sci proceedings AMIA Jt Summits Transl Sci. 2013;2013:48–52. http://www.ncbi.nlm.nih.gov/pubmed/24303296 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 8.Cimino JJ, Bright TJ, Li J. Medication reconciliation using natural language processing and controlled terminologies. Stud Health Technol Inform. 2007;129(Pt 1):679–683. http://www.ncbi.nlm.nih.gov/pubmed/17911803 Accessed February 24, 2020. [PubMed] [Google Scholar]
  • 9.Dietrich G, Krebs J, Liman L, et al. Replicating medication trend studies using ad hoc information extraction in a clinical data warehouse. BMC Med Inform Decis Mak. 2019;19(1):15. doi: 10.1186/s12911-018-0729-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jiang M, Wu Y, Shah A, Priyanka P, Denny JC, Xu H. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system. AMIA Jt Summits Transl Sci proceedings AMIA Jt Summits Transl Sci. 2014;2014:37–42. http://www.ncbi.nlm.nih.gov/pubmed/25954575 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 11.Li F, Liu W, Yu H. Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning. JMIR Med Informatics. 2018;6(4):e12159. doi: 10.2196/12159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Castro VM, Dligach D, Finan S, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88(2):164–168. doi: 10.1212/WNL.0000000000003490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gundlapalli AV, Jones AL, Redd A, et al. Combining Natural Language Processing of Electronic Medical Notes With Administrative Data to Determine Racial/Ethnic Differences in the Disclosure and Documentation of Military Sexual Trauma in Veterans. Med Care. 2019;57:S149–S156. doi: 10.1097/MLR.0000000000001031 [DOI] [PubMed] [Google Scholar]
  • 14.Kaur H, Sohn S, Wi C-I, et al. Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC Pulm Med. 2018;18(1):34. doi: 10.1186/s12890-018-0593-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Meystre SM, Thibault J, Shen S, Hurdle JF, South BR. Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents. J Am Med Informatics Assoc. 2010;17(5):559–562. doi: 10.1136/jamia.2010.004028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zheng C, Luo Y, Mercado C, et al. Using natural language processing for identification of herpes zoster ophthalmicus cases to support population-based study. Clin Experiment Ophthalmol. 2019;47(1):7–14. doi: 10.1111/ceo.13340 [DOI] [PubMed] [Google Scholar]
  • 17.Denny JC. Chapter 13: Mining Electronic Health Records in the Genomics Era. Lewitter F, Kann M, eds. PLoS Comput Biol. 2012;8(12):e1002823. doi: 10.1371/journal.pcbi.1002823 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hatef E, Rouhizadeh M, Tia I, et al. Assessing the Availability of Data on Social and Behavioral Determinants in Structured and Unstructured Electronic Health Records: A Retrospective Analysis of a Multilevel Health Care System. JMIR Med Informatics. 2019;7(3):e13802. doi: 10.2196/13802 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sholle ET, Pinheiro LC, Adekkanattu P, et al. Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation. J Am Med Informatics Assoc. 2019;26(8–9):722–729. doi: 10.1093/jamia/ocz040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wissel BD, Greiner HM, Glauser TA, et al. Investigation of bias in an epilepsy machine learning algorithm trained on physician notes. Epilepsia. 2019;60(9):e93–e98. doi: 10.1111/epi.16320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Heit JA, Armasu S, McCauley B, et al. Identification of unique venous thromboembolism-susceptibility variants in African-Americans. Thromb Haemost. 2017;117(04):758–768. doi: 10.1160/TH16-08-0652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dandona L, Dandona R, Naduvilath TJ, et al. Population-based assessment of the outcome of cataract surgery in an urban population in southern India. Am J Ophthalmol. 1999;127(6):650–658. doi: 10.1016/s0002-9394(99)00044-6 [DOI] [PubMed] [Google Scholar]
  • 23.Bejan CA, Angiolillo J, Conway D, et al. Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records. J Am Med Informatics Assoc. 2018;25(1):61–71. doi: 10.1093/jamia/ocx059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Biro S, Williamson T, Leggett JA, et al. Utility of linking primary care electronic medical records with Canadian census data to study the determinants of chronic disease: an example based on socioeconomic status and obesity. BMC Med Inform Decis Mak. 2016;16(1):32. doi: 10.1186/s12911-016-0272-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.HOLLISTER BM, RESTREPO NA, FARBER-EGER E, CRAWFORD DC, ALDRICH MC, NON A. DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS In: Biocomputing 2017. WORLD SCIENTIFIC; 2017:230–241. doi: 10.1142/9789813207813_0023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Connolly B, Matykiewicz P, Bretonnel Cohen K, et al. Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals. J Am Med Informatics Assoc. 2014;21(5):866–870. doi: 10.1136/amiajnl-2013-002601 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cui L, Sahoo SS, Lhatoo SD, et al. Complex epilepsy phenotype extraction from narrative clinical discharge summaries. J Biomed Inform. 2014;51:272–279. doi: 10.1016/j.jbi.2014.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Fonferko-Shadrach B, Lacey AS, Roberts A, et al. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open. 2019;9(4):e023232. doi: 10.1136/bmjopen-2018-023232 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Senders JT, Karhade AV, Cote DJ, et al. Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports. JCO Clin Cancer Informatics. 2019;(3):1–9. doi: 10.1200/CCI.18.00138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kassubek J, Huppertz H-J, Spreer J, Schulze-Bonhage A. Detection and localization of focal cortical dysplasia by voxel-based 3-D MRI analysis. Epilepsia. 2002;43(6):596–602. doi: 10.1046/j.1528-1157.2002.41401.x [DOI] [PubMed] [Google Scholar]
  • 31.Chupin M, Mukuna-Bantumbakulu AR, Hasboun D, et al. Anatomically constrained region deformation for the automated segmentation of the hippocampus and the amygdala: Method and validation on controls and patients with Alzheimer’s disease. Neuroimage. 2007;34(3):996–1019. doi: 10.1016/j.neuroimage.2006.10.035 [DOI] [PubMed] [Google Scholar]
  • 32.Istephan S, Siadat M-R. Unstructured medical image query using big data – An epilepsy case study. J Biomed Inform. 2016;59:218–226. doi: 10.1016/j.jbi.2015.12.005 [DOI] [PubMed] [Google Scholar]
  • 33.Pons E, Braun LMM, Hunink MGM, Kors JA. Natural language processing in radiology: A systematic review. Radiology. 2016;279(2):329–343. doi: 10.1148/radiol.16142770 [DOI] [PubMed] [Google Scholar]
  • 34.Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artif Intell Med. 2016;66:29–39. doi: 10.1016/j.artmed.2015.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Steinkamp JM, Chambers C, Lalevic D, Zafar HM, Cook TS. Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning. J Digit Imaging. 2019. doi: 10.1007/s10278-019-00234-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Biswal S, Nip Z, Moura Junior V, Bianchi MT, Rosenthal ES, Westover MB. Automated information extraction from free-text EEG reports. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) IEEE; 2015:6804–6807. doi:10.1109/EMBC.2015.7319956 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Forrest Sheng Bao, Jue-Ming Gao, Hu Jing, Lie Donald, Zhang Yuanlin, Oommen KJ. Automated epilepsy diagnosis using interictal scalp EEG. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE; 2009:6603–6607. doi:10.1109/IEMBS.2009.5332550 [DOI] [PubMed] [Google Scholar]
  • 38.Wissel BD, Greiner HM, Glauser TA, et al. Prospective validation of a machine learning model that uses provider notes to identify candidates for resective epilepsy surgery. Epilepsia. 2020;61(1):39–48. doi: 10.1111/epi.16398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sin M, McGuinness JE, Trivedi MS, et al. Automatic Genetic Risk Assessment Calculation Using Breast Cancer Family History Data from the EHR compared to Self-Report. AMIA. Annu Symp proceedings AMIA Symp 2018;2018:970–978. http://www.ncbi.nlm.nih.gov/pubmed/30815140 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 40.Mowery DL, Kawamoto K, Bradshaw R, et al. Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record. AMIA Jt Summits Transl Sci proceedings AMIA Jt Summits Transl Sci. 2019;2019:173–181. http://www.ncbi.nlm.nih.gov/pubmed/31258969 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 41.Chen L, Gu Y, Ji X, et al. Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning. J Am Med Informatics Assoc. 2020;27(1):56–64. doi: 10.1093/jamia/ocz141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Munkhdalai T, Liu F, Yu H. Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning. JMIR Public Heal Surveill. 2018;4(2):e29. doi: 10.2196/publichealth.9361 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Karmakar C, Luo W, Tran T, Berk M, Venkatesh S. Predicting Risk of Suicide Attempt Using History of Physical Illnesses From Electronic Medical Records. JMIR Ment Heal. 2016;3(3):e19. doi: 10.2196/mental.5475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sheehan OC, Kharrazi H, Carl KJ, et al. Helping Older Adults Improve Their Medication Experience (HOME) by Addressing Medication Regimen Complexity in Home Healthcare. Home Healthc Now. 2018;36(1):10–19. doi: 10.1097/NHH.0000000000000632 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ning W, Chan S, Beam A, et al. Feature extraction for phenotyping from semantic and knowledge resources. J Biomed Inform. 2019;91:103122. doi: 10.1016/j.jbi.2019.103122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Chen Q, Li H, Tang B, et al. An automatic system to identify heart disease risk factors in clinical texts over time. J Biomed Inform. 2015;58:S158–S163. doi: 10.1016/j.jbi.2015.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Gronsbell J, Minnier J, Yu S, Liao K, Cai T. Automated feature selection of predictors in electronic medical records data. Biometrics. 2019;75(1):268–277. doi: 10.1111/biom.12987 [DOI] [PubMed] [Google Scholar]
  • 48.Yetisgen-Yildiz M, Solti I, Xia F. Using Amazon’s Mechanical Turk for Annotating Medical Named Entities. AMIA. Annu Symp proceedings AMIA Symp 2010;2010:1316 http://www.ncbi.nlm.nih.gov/pubmed/21785667 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 49.P A, ET S, J D, J P, SB J, TR C. Ascertaining Depression Severity by Extracting Patient Health Questionnaire-9 (PHQ-9) Scores From Clinical Notes. AMIA. Annu Symp proceedings AMIA Symp 2018;2018. https://pubmed.ncbi.nlm.nih.gov/30815052/?from_term=nlp+depression&from_pos=3 Accessed May 21, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 50.Kasthurirathne SN, Biondich PG, Grannis SJ, Purkayastha S, Vest JR, Jones JF. Identification of Patients in Need of Advanced Care for Depression Using Data Extracted From a Statewide Health Information Exchange: A Machine Learning Approach. J Med Internet Res. 2019;21(7):e13809. doi: 10.2196/13809 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc. 2008;15(1):14–24. doi: 10.1197/jamia.M2408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Afshar M, Joyce C, Dligach D, et al. Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients. Cerda M, ed. PLoS One. 2019;14(7):e0219717. doi: 10.1371/journal.pone.0219717 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Haller IV, Renier CM, Juusola M, et al. Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing. Pain Med. December 2016:pnw283. doi: 10.1093/pm/pnw283 [DOI] [PubMed] [Google Scholar]
  • 54.Hazlehurst B, Green CA, Perrin NA, et al. Using natural language processing of clinical text to enhance identification of opioid-related overdoses in electronic health records data. Pharmacoepidemiol Drug Saf. 2019;28(8):1143–1151. doi: 10.1002/pds.4810 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Carrell DS, Cronkite D, Palmer RE, et al. Using natural language processing to identify problem usage of prescription opioids. Int J Med Inform. 2015;84(12):1057–1064. doi: 10.1016/j.ijmedinf.2015.09.002 [DOI] [PubMed] [Google Scholar]
  • 56.Palmer RE, Carrell DS, Cronkite D, et al. The prevalence of problem opioid use in patients receiving chronic opioid therapy. Pain. 2015;156(7):1208–1214. doi: 10.1097/j.pain.0000000000000145 [DOI] [PubMed] [Google Scholar]
  • 57.Green CA, Perrin NA, Hazlehurst B, et al. Identifying and classifying opioid-related overdoses: A validation study. Pharmacoepidemiol Drug Saf. 2019;28(8):1127–1137. doi: 10.1002/pds.4772 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Topaz M, Murga L, Bar-Bachar O, Cato K, Collins S. Extracting Alcohol and Substance Abuse Status from Clinical Notes: The Added Value of Nursing Data. Stud Health Technol Inform. 2019;264:1056–1060. doi: 10.3233/SHTI190386 [DOI] [PubMed] [Google Scholar]
  • 59.Lingeman JM, Wang P, Becker W, Yu H. Detecting Opioid-Related Aberrant Behavior using Natural Language Processing. AMIA. Annu Symp proceedings AMIA Symp 2017;2017:1179–1185. http://www.ncbi.nlm.nih.gov/pubmed/29854186 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 60.Wang Y, Chen ES, Pakhomov S, et al. Automated Extraction of Substance Use Information from Clinical Texts. AMIA. Annu Symp proceedings AMIA Symp 2015;2015:2121–2130. http://www.ncbi.nlm.nih.gov/pubmed/26958312 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 61.Afshar M, Phillips A, Karnik N, et al. Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation. doi: 10.1093/jamia/ocy166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bill R, Pakhomov S, Chen ES, Winden TJ, Carter EW, Melton GB. Automated extraction of family history information from clinical notes. AMIA. Annu Symp proceedings AMIA Symp 2014;2014:1709–1717. http://www.ncbi.nlm.nih.gov/pubmed/25954443 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 63.Mehrabi S, Krishnan A, Roch AM, et al. Identification of Patients with Family History of Pancreatic Cancer--Investigation of an NLP System Portability. Stud Health Technol Inform. 2015;216:604–608. http://www.ncbi.nlm.nih.gov/pubmed/26262122 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 64.Friedlin J, McDonald CJ. Using a natural language processing system to extract and code family history data from admission reports. AMIA. Annu Symp proceedings AMIA Symp 2006;2006:925 http://www.ncbi.nlm.nih.gov/pubmed/17238544 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 65.Goss FR, Plasek JM, Lau JJ, Seger DL, Chang FY, Zhou L. An evaluation of a natural language processing tool for identifying and encoding allergy information in emergency department clinical notes. AMIA. Annu Symp proceedings AMIA Symp 2014;2014:580–588. http://www.ncbi.nlm.nih.gov/pubmed/25954363 Accessed February 24, 2020. [PMC free article] [PubMed] [Google Scholar]
  • 66.Zhou L, Plasek JM, Mahoney LM, Chang FY, DiMaggio D, Rocha RA. Mapping Partners Master Drug Dictionary to RxNorm using an NLP-based approach. J Biomed Inform. 2012;45(4):626–633. doi: 10.1016/j.jbi.2011.11.006 [DOI] [PubMed] [Google Scholar]
  • 67.Kannan V, Fish J, Mutz J, et al. Rapid Development of Specialty Population Registries and Quality Measures from Electronic Health Record Data. Methods Inf Med. 2017;56(S 01):e74–e83. doi: 10.3414/ME16-02-0031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Warner JL, Anick P, Hong P, Xue N. Natural Language Processing and the Oncologic History: Is There a Match? J Oncol Pract. 2011;7(4):e15–e19. doi: 10.1200/JOP.2011.000240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Ramanan SV, Radhakrishna K, Waghmare A, et al. Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator. J Med Syst. 2016;40(8):187. doi: 10.1007/s10916-016-0541-2 [DOI] [PubMed] [Google Scholar]
  • 70.Epilepsy Update 2017 Quality Measurement Set.; 2004. https://www.aan.com/siteassets/home-page/policy-and-guidelines/quality/quality-measures/epilepsy-and-seizures/20180215-epilepsy-measures-final.pdf Accessed September 22, 2020.
  • 71.Cronin RM, Fabbri D, Denny JC, Rosenbloom ST, Jackson GP. A comparison of rule-based and machine learning approaches for classifying patient portal messages. Int J Med Inform. 2017;105:110–120. doi: 10.1016/J.IJMEDINF.2017.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES