Determination of Marital Status of Patients from Structured and Unstructured Electronic Healthcare Data

Brian T Bucher; Jianlin Shi; Robert John Pettit; Jeffrey Ferraro; Wendy W Chapman; Adi Gundlapalli

. 2020 Mar 4;2019:267–274.

Determination of Marital Status of Patients from Structured and Unstructured Electronic Healthcare Data

Brian T Bucher ¹, Jianlin Shi ¹, Robert John Pettit ¹, Jeffrey Ferraro ², Wendy W Chapman ¹, Adi Gundlapalli ^1,³

PMCID: PMC7153091 PMID: 32308819

Abstract

Social Determinants of Health, including marital status, are becoming increasingly identified as key drivers of health care utilization. This paper describes a robust method to determine the marital status of patients using structured and unstructured electronic healthcare data from a single academic institution in the United States. We developed and validated a natural language processing pipeline (NLP) for the ascertainment of marital status from clinical notes and compared the performance against two baseline methods: a machine learning n-gram model, and structured data obtained from the electronic health record. Overall our NLP engine had excellent performance on both document-level (F1 0.97) and patient-level (F1 0.95) classification. The NLP Engine had superior performance compared with a baseline machine learning n-gram model. We also observed a good correlation between the marital status obtained from our NLP engine and the baseline structured electronic healthcare data (κ 0.6).

Introduction

Social Determinants of Health (SDoH) are set of constructs that are becoming increasingly identified as key moderators of health-related outcomes¹. For example, socioeconomic status, housing availability/stability, and social support affect the health of patients and should not be overlooked when investigating access to health care utilization and outcomes. SDoH can affect healthcare via a number of direct mechanisms including behavioral and disease risk factors, access to and processes of care, and quality of health care received. Recently, the United States Centers for Medicare and Medicaid Services (CMS) has proposed financial penalties to hospitals with high rates of 30-day readmission compared to the national averages². Although this program’s goal is to improve the quality of health care provided, it fails to take into account SDoH that may drive differences in readmission, thus penalizing hospitals who preferentially care for high-risk patient populations.

Social support and social relationships are an important type of SDoH that can aid patients in access to healthcare and support during acute and chronic health care episodes. In particular, marital status is a traditional construct of social support and has been shown to have a positive effect on healthcare outcomes. Being married or in a domestic relationship has been shown to improve outcomes after hip replacement, heart transplantation, and even positively impact mortality^3-5. With regards to resource utilization, being married has been shown to be associated with shorter hospital length of stay and hospital readmission^6,7. Being widowed or lacking a partner has also been associated with increased risk of hospital readmission⁸. It is important to note that marital status is not all inclusive of social support and relationships. Marital status is not only dichotomous (married vs not married), and may also include being separated, divorced, or widowed. Furthermore, in contemporary times, domestic partners are also considered under this status.

The challenge in studying the effect of SDoH on health-related outcomes is the lack of reliable data available to clinicians and researchers. SDoHs are inconsistently collected in the electronic health record (EHR) and vary between institutions.⁹ Common SDoH that are routinely collected include: age, gender, and race/ethnicity. However marital status, while commonly collected in structured form as part of an intake to a healthcare facility, has the potential to change over time and may not be updated appropriately or in a timely manner to reflect the changes. In addition, given the constantly changing definitions of marriage and domestic partnership, the structured field may be inadequate to capture a patient’s own view of marital status.

Free text clinical notes provide a rich source of SDoH that has the potential to be accessed using natural language processing techniques (NLP)¹⁰. The advantage of clinical notes includes rich text features which provide context around SDoH. In addition, patients may disclose marital status more freely to providers and health care workers who document the social history in clinical notes. In these situations, the advantage of clinical notes is the ability to identify changes in marital status over time.

Previous work by our group and others have demonstrated the value of NLP in the identification of SDoH from free-text clinical notes. In a study from the VA Healthcare System, we developed an NLP system to extract mentions of “social support,” “housing situation,” and “living alone”¹⁰. The performance of the system on a held-out test set showed an F1 score of 0.90 for social support, 0.61 for housing situation, and 0.81 for living alone. In a separate study, Navathe et al. develop a rules-based NLP system from 500 annotated clinical notes from a single academic institution to identify the following SDoH: tobacco use, alcohol use, drug abuse, depression, housing instability, fall risk, and poor social support¹¹. They demonstrated excellent performance with F1 –scores ranging from 0.75 for Alcohol Use to 0.94 for Poor Social Support. In addition, they demonstrated the value of NLP-identified terms in predicting readmission risk compared to structured data alone.

To our knowledge and literature review, no previous studies have utilized NLP for the ascertainment of marital status. This is likely due to the belief that marital status is commonly collected in structured form in the electronic health record. However, as marital status may change over time, the accuracy of the structured data collection is unknown as it can be prone to error if it is not updated appropriately. The goal of the current study is to describe the development and validation of a rules-based NLP pipeline for determination of marital status from clinical notes. Our rules-based pipeline is then compared to a machine learning n-gram model and marital status obtained from structured data fields in the electronic health record. We subsequently describe the advantages and disadvantages of ascertainment of marital status from both structured data and NLP.

Methods

Study Design

We performed a retrospective study of patients treated at the University of Utah Health Sciences Center from 2015-2017. The University of Utah has maintained an electronic health record capable of comprehensive electronic data capture since 2013 and the data are stored in an enterprise data warehouse (EDW). Given we are interested in marital status determination, we limited our corpus to clinical Social Work Notes as these notes were highly likely to contain references to marital status.

Structured Data Acquisition

In the University of Utah EDW, marital status is treated as a patient level structured data field. This data field is updated on demand with updates in the electronic health record. Therefore marital status is not linked to encounter based information. To work around this limitation we obtained the log records for marital status changes to determine the date of marital status changes at the point of encounters.

Reference Standard Corpus Creation

Because there is only one structured field in the EHR for marital status and the structured data field is not linked to each encounter, we consider clinical notes to be a better representation of a patient’s true marital status for each encounter. We obtained 23,794 Social Work notes from 4716 patients between 2015-2017. We performed manual document-level annotation on 865 notes to create a reference standard corpus¹². We trained two annotators who performed document level annotation for marital status. The document level categories were: Single, Domestic Partner, Married, Separated, Divorced, Widowed. Overall, the inter-annotator agreement was 0.96. Any discrepancies were adjudicated between the two annotators to create a reference standard corpus. We then split the corpus into a development set (378 notes) to develop our rules-based pipeline or train the machine-learning model, and a blind validation set (487 notes) to measure performance.

Knowledge Base Development

To design our knowledge base for our domain, we started by creating a roadmap of the terms and relationships that define the underlying concept of marital status using real-world clinical text (annotations) from the development corpus as a guide. In addition, we enriched the ontology with terms using the Unified Medical Language System¹³. The knowledge base was revised iteratively based on expert opinion and chart review and was used in our rules-based NLP pipeline.

NLP Engine Development

We utilized EasyCIE, a lightweight, rules-based NLP tool, which supports quick and easy implementation of clinical information extractions¹⁴. EasyCIE uses a set of highly optimized and customizable NLP components built above n-Trie (a fast rule processing engine)¹⁵, including sentence segmentation¹⁶, named entity recognition¹⁷, and context detection¹⁸, feature inferencer, and document inferencer (Table 1).

Table 1.

NLP components and corresponding functionality of EasyCIE.

NLP Components	Functionality Description
Section Detector	Identify the sections, e.g., History of Present Illness, Family History
Sentence Segmenter	Detect sentence boundaries
Named Entity Recognizer	Identify target concepts using dictionaries
Context Detector	Attach the context information as feature values to the corresponding target concepts
Attribute Inferencer	Create mention-level conclusions based on target concepts and corresponding attributes
Document Inferencer	Create a document-level conclusion from the corresponding mention-level conclusions

Open in a new tab

EasyCIE leverages the knowledge base built above with information extraction models (IEM), including a term mapping IEM: a semantic representation of target concepts (“married”, “divorced”, “wife,” “husband”, etc.) and the corresponding contextual modifiers. The context IEMs for this study include four types: 1) Negation, whether a target concept is negated or affirmed, e.g. “not married”; 2) Certainty, whether a target concept certainly or uncertainly indicates the marital status, e.g. “marital status unknown”; 3) Temporality, whether a target concept indicates present, historical, or future, e.g. “wife recently died”; 4) Experiencer, whether a target concept is referring to a patient or a patient’s family member, e.g. “met with patient’s daughter and her husband.” An annotation schema was defined from these IEMs. For example, “patient has been married for 5 years” would be encoded as Target Concept: Married, Negation: affirmed, Certainty: certain, Temporality: current, Experiencer: patient. Based on the values for these annotations in the document, EasyCIE applies rules to infer the final marital status for each document: Single, Domestic Partner, Married, Separated, Divorced, Widowed, Unknown. The rules were developed from the development corpus by manual review of reference standard annotations and iterative error analyses. The source code for EasyCIE can be found at https://github.com/jianlins/EasyCIE_GUI. The knowledge base and configuration file for ascertainment of marital status can be found at https://github.com/jianlins/EasyCIE_Hub.

Machine Learning Baseline Development

Given the lack of previous studies using NLP to determine marital status from clinical notes, we sought to compare the performance of the rules-based NLP system to a baseline machine learning. We developed an ngram-classifier trained using unigram and bigrams (1-2 word windows as word features) from the development set, excluding stop words. Our final feature set included 1488 binary n-gram features. We then created a word vector for each document in the development and validation set. We used the manual annotations as the reference standard to develop and validate the machine learning approach. We trained a Random Forest Classifier on the development set and used 10-fold cross-validation to tune the regularization parameters. Once we had optimal performance on the development set, we tested the performance of the machine learning classifier on the blind validation set.

Analysis

Performance of both the machine learning classifier and our NLP engine was measured by precision, recall, and F1 score for classifying documents compared against the reference standard annotations. The definitions are shown below

P r e c i s i o n = \frac{\begin{matrix} T r u e & P o s i t i v e s \end{matrix}}{\begin{matrix} T r u e & P o s i t i v e s & \begin{matrix} + & \begin{matrix} F a l s e & P o s i t i v e s \end{matrix} \end{matrix} \end{matrix}}

R e c a l l = \frac{\begin{matrix} T r u e & P o s i t i v e s \end{matrix}}{\begin{matrix} T r u e & P o s i t i v e s & \begin{matrix} + & \begin{matrix} F a l s e & N e g a t i v e s \end{matrix} \end{matrix} \end{matrix}}

F_{1} = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

We compared the performance of our NLP approach to the baseline machine learning approach using McNemar’s Test defined below were b and c are the discordant positive and negative classified documents between the NLP and machine learning classifier and χ² is the test statistic:

X^{2} = \frac{{(b - c)}^{2}}{b + c}

We compared the performance of our NLP engine against the structured data fields using Cohen’s Kappa defined below where po is the observed agreement and pe is the expected agreement by chance.:

k = \frac{p_{o} - p_{e}}{1 - p_{e}}

Results

Patient Cohort and Annotation

We obtained structured marital status from 4716 patients treated from 2015-2017. Based on the structured EHR data, the majority of patients in the cohort were married (55%), followed by single (23%), divorced (9%), widowed (7%), Domestic Partner( 2%), and Separated (2%). Overall, marital status was unknown in 2% of the cohort. Since marital status can change over time, we examined the number of times the marital status changed in the structured data fields (Table 2). Overall, 16% of patients had at least one change in marital status with 2% of patients changing four times or more. Of the changes, 581 (78%) occurred on the same day, indicating a number of changes may be related to duplicate entries or updates to existing marital status.

Table 2.

Changes in Marital Status in the Structured EHR data fields.

Number of Changes	Count (Percent)
0	3951 (84%)
1	247 (5%)
2	242 (5%)
3	194 (4%)
4+	82 (2%)

Open in a new tab

Rules-based NLP Engine

Given the changes and possible uncertainty in the structured data fields, we sought to develop and validate an NLP engine to identify a patient’s marital status from clinical Social Work notes and compare the performance of our NLP engine to the structured data fields.

The performance of our NLP engine and machine learning approach on the validation set is shown in Table 3. Overall the NLP engine had significantly better performance compared to the baseline machine learning approach (F1-Score 0.97 vs 0.63, p-value <0.001). Examining individual classes, the rules-based approach had excellent performance (F1 score>0.90 in all individual class with “Separated” having the best performance. Compared to the baseline machine learning approach, the rules-based NLP approach did not have significantly better performance for “Divorced” or “Separated” classes. This is likely related to the low number of patients who met these criteria

Table 3.

Baseline Machine Learning and Rules-Based NLP Engine Performance

	Machine Learning Baseline			rules based NLP Engine
Class	Precision	Recall	F1	Precision	Recall	F1	P-Value
Unknown	0.75	0.92	0.83	0.98	0.99	0.99	0.003
Single	0.90	0.4	0.5	1.0	0.88	0.93	<0.001
Domestic Partner	0.6	0.21	0.32	0.93	0.96	0.95	<0.001
Married	0.75	0.82	0.78	0.97	0.98	0.97	<0.001
Separated	1.0	0.25	0.40	1.0	1.0	1.0	0.25
Divorced	0.88	0.85	0.85	0.90	1.0	0.94	0.125
Widowed	1.0	0.5	0.67	1.0	0.94	0.97	0.008
Overall	0.84	0.56	0.63^a	0.97	0.97	0.97^a	<0.001
a. Micro-Averaged F1 Score

Open in a new tab

We then compared the performance of the NLP engine to that of the structured data field for patient-level marital status classification on the validation set by aggregating across patients and eliminating unknowns. (Table 4) Compared to the NLP engine, the structured data field showed decreased classification accuracy on patient level classification; however, this difference was not statistically significant (p=0.4).

Table 4.

Patient-Level Classification Performance on the Validation Set

Method	Precision	Recall	F1
Rules Based NLP	0.95	0.95	0.95
Machine Learning	0.74	0.43	0.51
Structured	0.88	0.88	0.88

Open in a new tab

Error Analysis

We analyzed the failures of our NLP engine in the validation set and characterized the Errors in Table 5. Overall the majority of errors (42%) were related to lack of appropriate context assignment due to mentions of partners/spouses of relatives. The next most common errors occurred due to lack of negation (25%) or failure of document level inferencing due to multiple conflicting mentions of marital status.

Table 5.

Natural Language Processing System Error Analysis

Type of Error	Count (Percent)	Example
Context Inference	5 (42%)	“Patient lives with his sister Sarah and her husband”
Document-Level Inference	3 (25%)	“Patient is divorced but lives with his girlfriend”
Negation	3 (25%)	“Patient was never married”
Named Entitiy Recognition	1 (8%)	“Marital Status: W”

Open in a new tab

Comparison of NLP vs Structured Data Field

Having established the excellent performance of our NLP Engine, we processed the entire corpus of 23,784 social work notes with the NLP engine and compared classification by NLP against the structured data fields. Since marital status may change over time, we aggregated notes by Year and Quarter to determine the NLP-derived marital status for that particular time. The NLP and structured labels are shown in Table 6. Overall the NLP-engine was unable to determine the marital status on 28% of patients, compared to only 3% for the structured data (p<0.0001).

Table 6.

The Distribution of NLP and Structured Data Labels in Full Corpus

Class	NLP	Structured
Single	787 (10%)	2361 (30%)
Domestic Partner	466 (6%)	66 (1%)
Married	3365 (43%)	4088 (53%)
Separated	99 (1%)	132 (2%)
Divorced	574 (7%)	450 (6%)
Widowed	286 (4%)	455 (6%)
Unknown	2204 (28%)	229 (3%)

Open in a new tab

We next compared the correlation of NLP-derived labels with those in the structured data. Overall the cohen’s κ between the NLP-derived variable was 0.37. Excluding the unknown labels, the cohen’s κ improved to 0.60. The pairwise agreement between the NLP-derived variable and the structured variable is shown in Figure 1. Overall there was a good agreement for Married (73%), and modest agreement for Domestic Partner (45%), Divorced (46%) and Widowed (51%). The largest disagreement was for Single. Of the patient labeled as Single by the structured data fields but not by NLP, NLP assigned the following marital status: domestic partner(14%). Married (10%), Divorced (9%), Sepearted (1%) and Widowed (1%).

– Normalized Comparison of NLP and Structured Data

Discussion

We present an analysis of the determination of patient marital status from both structured and unstructured electronic healthcare data. The present study contributes three novel and significant findings. First, marital status can be obtained from structured data fields; however, in our EHR there is only one field per patient, and the value may not correlate with past encounters due to marital status changes over time. In addition, the structured field is prone to manual entry errors or inconsistent updates when changes occur.¹⁹ Second, we demonstrated a rules-based NLP Engine can extract marital status almost as well as a human reviewer. Third, although there was not a statistically significant difference between NLP and structured field assignment of marital status on a small test set, we saw large discrepancies between the NLP ascertainment of marital status and the structured data fields on a large corpus. This work demonstrated the utility of NLP derived variables in the supplementation of structured electronic healthcare data. For the patients the NLP Engine was unable to identify a marital status, the distribution of structured labels is shown in Figure 2. The two most common structured labels were Single (39%) and Married (38.5%).

Figure 2. — Structured marital status labels for patient’s without NLP obtained marital status

In our cohort, based on the structured data fields, approximately 50% of the patients were married, with single being the next most common status at 25%. The advantage of using structured data is easy for storage and extraction in the electronic healthcare record. However, there remain challenges in the use of structured data fields. First, the marital status definitions can have diverse interpretations, especially culturally sensitive topics in health disparity populations such as racial/ethnic and sexual/gender minorities. Second, the marital status may change over time. This can pose a problem in EHR design, especially if marital status is a patient-level variable. In our institution, marital status is determined at the patient level and changes are only stored in the log databases of the enterprise data warehouse, thus limiting the use of marital status in encounter level analyses. Second, data entry errors may create multiple entries for marital status. In our analysis, 78% of the changes in the structured marital status fields occurred on the same day. This indicates either data entry errors or conflicting information from different tables in the EHR.

To supplement the limitation of the structured data fields, we developed and validated an NLP engine to ascertain marital status from clinical notes. We elected to limit the notes to clinical Social Work notes as these documents have a high likelihood of containing marital status information. Overall our NLP engine had excellent performance on both a document level (F1 Score 0.97) and patient level (F1 Score 0.95).

We compared the performance of our NLP engine with the structured data fields on our full cohort of patients. After analysis of 23,794 notes, the NLP engine was able to make a determination on 72% of notes; however 28% of notes did not contain references to marital status. This large number of unknowns is a limitation of NLP due to the limited information contained in the clinical text. Given we only used Social Work notes, we hypothesize the inclusion of other clinical documentation may decrease the number of unknowns. Eliminating the patients that NLP was unable to make a determination on, we saw good agreement between our NLP engine and the structured data fields (Cohens κ 0.6). There was good agreement with married, divorced, widowed, and domestic partner. We saw the most disagreement in patients labeled as Single in the structured data fields. NLP was able to determine those patient are married (10%), domestic partner (14%), divorced (9%). This demonstrated the value of NLP in that sensitive topic such as domestic partners or divorce may be misrepresented in the structured data fields. The largest number of disagreements in each structured data class was related to the NLP engine being unable to ascertain marital status. We analyzed the distribution of structured data labels in those the NLP system was unable to determine marital status.

(Figure 2) The distribution of marital status in the unknown patients matched the overall distribution of the structured data fields. This indicates our NLP engine is not biased toward the ascertainment of one particular marital status class.

Overall, based on the results of our study we feel a hybrid approach to the ascertainment of a patient’s marital status should be used. NLP can be used to supplement structured data fields especially in instances where the structured field is Unknown or Single.

Limitations and Future Work

We acknowledge several limitations. First, this study was limited to the traditional definition of marital status as is currently captured in electronic health records. As there is insufficient knowledge of how the spectrum of social/domestic relationships (including same-sex partnerships) are currently documented in the health record, there is an opportunity to extend our work in this direction in the future. The study was limited to one large academic medical center and to one electronic note type (social work notes that were predicted to be of high yield for marital status). These findings merit extension to other note types and other healthcare facilities. Marital status is one of many social risk factors which have become increasingly identified as drivers of health care utilization. Moving forward, we seek to expand out ascertainment of other social risk factors using NLP tools.

Conclusions

In conclusion, we described the determination of marital status from electronic health record data. We developed an NLP engine to inference a patient’s marital status from clinical notes. Overall our NLP engine had excellent performance on both a document level and patient level classification. There was good agreement between the NLP method compared with structured data entries present in the electronic healthcare record. This work demonstrated the value of combining structured and unstructured data for determination of marital status. Our techniques are can be extended to other SDoH and provide researchers valuable insight into the role these factors play in healthcare utilization.

Acknowledgments

This work is funded by a grant from the Agency for Healthcare Research and Quality (K08HS025776) and the VA Salt Lake City Center of Innovation Award (COIN) #I50HX001240. The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. The views expressed in this paper are those of the authors and do not necessarily represent the views of the United States Department of Veterans Affairs or the United States government.

Figures & Table

References

1.Behforouz HL, Drain PK, Rhatigan JJ. Rethinking the social history. N Engl J Med. 2014;371(14):1277–9. doi: 10.1056/NEJMp1404846. [DOI] [PubMed] [Google Scholar]
2.Borza T, Oreline MK, Skolarus TA, et al. Association of the Hospital Readmissions Reduction Program With Surgical Readmissions. JAMA surgery. 2018;153(3):243–50. doi: 10.1001/jamasurg.2017.4585. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Young NL, Cheah D, Waddell JP, Wright JG. Patient characteristics that affect the outcome of total hip arthroplasty: a review. Can J Surg. 1998;41(3):188–95. [PMC free article] [PubMed] [Google Scholar]
4.Manzoli L, Villari P, MP G, Boccia A. Marital status and mortality in the elderly: a systematic review and meta-analysis. Soc Sci Med. 2007;64(1):77–94. doi: 10.1016/j.socscimed.2006.08.031. [DOI] [PubMed] [Google Scholar]
5.Coglianese EE, Samsi M, Liebo MJ, Heroux AL. The value of psychosocial factors in patient selection and outcomes after heart transplantation. Curr Heart Fail Rep. 2015;12(1):42–7. doi: 10.1007/s11897-014-0233-5. [DOI] [PubMed] [Google Scholar]
6.Howie-Esquivel J, Spicer JG. Association of partner status and disposition with rehospitalization in heart failure patients. American journal of critical care: an official publication, American Association of Critical-Care Nurses. 2012;21(3):e65–73. doi: 10.4037/ajcc2012382. [DOI] [PubMed] [Google Scholar]
7.Iwashyna TJ, Christakis NA. Marriage, widowhood, and health-care use. Soc Sci Med. 2003;57(11):2137–47. doi: 10.1016/s0277-9536(02)00546-4. [DOI] [PubMed] [Google Scholar]
8.Damiani G, Salvatori E, Silvestrini G, et al. Influence of socioeconomic factors on hospital readmissions for heart failure and acute myocardial infarction in patients 65 years and older: evidence from a systematic review. Clin Interv Aging. 2015;10:237–45. doi: 10.2147/CIA.S71165. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Practice BOPH, Public H, et al. Committee on Accounting for Socioeconomic Status in Medicare Payment P, Board on Population H, Public Health P, Board on Health Care S, Institute of M, National Academies of Sciences E. editors. National Academies Press; 2016. Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors. [Google Scholar]
10.South BR, Christensen LM, Mowery DL, et al., editors. Automatic Extraction of Social Determinants of Health from Veterans Affairs Clinical Documents using Natural Language Processing. CRI. 2017 [Google Scholar]
11.Navathe AS, Zhong F, Lei VJ, et al. Hospital Readmission and Social Risk Factors Identified from Physician Notes. Health Serv Res. 2018;53(2):1110–36. doi: 10.1111/1475-6773.12670. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.South BR, Mowery D, Suo Y, et al. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J Biomed Inform. 2014;50:162–72. doi: 10.1016/j.jbi.2014.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Scuba W, Tharp M, Mowery D, et al. Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semantics. 2016;7(1):42. doi: 10.1186/s13326-016-0086-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Shi J, Mowery D, editors. EasyCIE: A Development Platform to Support Quick and Easy, Rule-based Clinical Information Extraction; Fifth IEEE International Conference on Healthcare Informatics; Park City, UT. 2017. [Google Scholar]
15.Shi J, Hurdle JF. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable. J Biomed Inform. 2018;85:106–13. doi: 10.1016/j.jbi.2018.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Shi J, Mowery D, Doing Harris K, Hurdle JF, editors. AMIA Annual Symposium. Chicago, IL: 2016. RuSH: a Rule-based Segmentation Tool Using Hashing for Extremely Accurate Sentence Segmentation of Clinical Text. [Google Scholar]
17.Shi J, Mowery D, Zhang M, Sanders J, Chapman W, Gawron L, editors. Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing. Healthcare Informatics (ICHI), 2017 IEEE International Conference on. 2017 IEEE. [Google Scholar]
18.Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform. 2011;44(5):728–37. doi: 10.1016/j.jbi.2011.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. Journal of the American Medical Informatics Association : JAMIA. 1997;4(5):342–55. doi: 10.1136/jamia.1997.0040342. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r1-3202110] 1.Behforouz HL, Drain PK, Rhatigan JJ. Rethinking the social history. N Engl J Med. 2014;371(14):1277–9. doi: 10.1056/NEJMp1404846. [DOI] [PubMed] [Google Scholar]

[r2-3202110] 2.Borza T, Oreline MK, Skolarus TA, et al. Association of the Hospital Readmissions Reduction Program With Surgical Readmissions. JAMA surgery. 2018;153(3):243–50. doi: 10.1001/jamasurg.2017.4585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3-3202110] 3.Young NL, Cheah D, Waddell JP, Wright JG. Patient characteristics that affect the outcome of total hip arthroplasty: a review. Can J Surg. 1998;41(3):188–95. [PMC free article] [PubMed] [Google Scholar]

[r4-3202110] 4.Manzoli L, Villari P, MP G, Boccia A. Marital status and mortality in the elderly: a systematic review and meta-analysis. Soc Sci Med. 2007;64(1):77–94. doi: 10.1016/j.socscimed.2006.08.031. [DOI] [PubMed] [Google Scholar]

[r5-3202110] 5.Coglianese EE, Samsi M, Liebo MJ, Heroux AL. The value of psychosocial factors in patient selection and outcomes after heart transplantation. Curr Heart Fail Rep. 2015;12(1):42–7. doi: 10.1007/s11897-014-0233-5. [DOI] [PubMed] [Google Scholar]

[r6-3202110] 6.Howie-Esquivel J, Spicer JG. Association of partner status and disposition with rehospitalization in heart failure patients. American journal of critical care: an official publication, American Association of Critical-Care Nurses. 2012;21(3):e65–73. doi: 10.4037/ajcc2012382. [DOI] [PubMed] [Google Scholar]

[r7-3202110] 7.Iwashyna TJ, Christakis NA. Marriage, widowhood, and health-care use. Soc Sci Med. 2003;57(11):2137–47. doi: 10.1016/s0277-9536(02)00546-4. [DOI] [PubMed] [Google Scholar]

[r8-3202110] 8.Damiani G, Salvatori E, Silvestrini G, et al. Influence of socioeconomic factors on hospital readmissions for heart failure and acute myocardial infarction in patients 65 years and older: evidence from a systematic review. Clin Interv Aging. 2015;10:237–45. doi: 10.2147/CIA.S71165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9-3202110] 9.Practice BOPH, Public H, et al. Committee on Accounting for Socioeconomic Status in Medicare Payment P, Board on Population H, Public Health P, Board on Health Care S, Institute of M, National Academies of Sciences E. editors. National Academies Press; 2016. Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors. [Google Scholar]

[r10-3202110] 10.South BR, Christensen LM, Mowery DL, et al., editors. Automatic Extraction of Social Determinants of Health from Veterans Affairs Clinical Documents using Natural Language Processing. CRI. 2017 [Google Scholar]

[r11-3202110] 11.Navathe AS, Zhong F, Lei VJ, et al. Hospital Readmission and Social Risk Factors Identified from Physician Notes. Health Serv Res. 2018;53(2):1110–36. doi: 10.1111/1475-6773.12670. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12-3202110] 12.South BR, Mowery D, Suo Y, et al. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J Biomed Inform. 2014;50:162–72. doi: 10.1016/j.jbi.2014.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13-3202110] 13.Scuba W, Tharp M, Mowery D, et al. Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semantics. 2016;7(1):42. doi: 10.1186/s13326-016-0086-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14-3202110] 14.Shi J, Mowery D, editors. EasyCIE: A Development Platform to Support Quick and Easy, Rule-based Clinical Information Extraction; Fifth IEEE International Conference on Healthcare Informatics; Park City, UT. 2017. [Google Scholar]

[r15-3202110] 15.Shi J, Hurdle JF. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable. J Biomed Inform. 2018;85:106–13. doi: 10.1016/j.jbi.2018.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16-3202110] 16.Shi J, Mowery D, Doing Harris K, Hurdle JF, editors. AMIA Annual Symposium. Chicago, IL: 2016. RuSH: a Rule-based Segmentation Tool Using Hashing for Extremely Accurate Sentence Segmentation of Clinical Text. [Google Scholar]

[r17-3202110] 17.Shi J, Mowery D, Zhang M, Sanders J, Chapman W, Gawron L, editors. Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing. Healthcare Informatics (ICHI), 2017 IEEE International Conference on. 2017 IEEE. [Google Scholar]

[r18-3202110] 18.Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform. 2011;44(5):728–37. doi: 10.1016/j.jbi.2011.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19-3202110] 19.Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. Journal of the American Medical Informatics Association : JAMIA. 1997;4(5):342–55. doi: 10.1136/jamia.1997.0040342. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Determination of Marital Status of Patients from Structured and Unstructured Electronic Healthcare Data

Brian T Bucher, MD, MS

Jianlin Shi, MD MS

Robert John Pettit, BS

Jeffrey Ferraro, PhD

Wendy W Chapman, PhD

Adi Gundlapalli, MD PhD

Abstract

Introduction

Methods

Study Design

Structured Data Acquisition

Reference Standard Corpus Creation

Knowledge Base Development

NLP Engine Development

Table 1.

Machine Learning Baseline Development

Analysis

Results

Patient Cohort and Annotation

Table 2.

Rules-based NLP Engine

Table 3.

Table 4.

Error Analysis

Table 5.

Comparison of NLP vs Structured Data Field

Table 6.

Figure 1.

Discussion

Figure 2.

Limitations and Future Work

Conclusions

Acknowledgments

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases