Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models

Colin G Walsh; Drew Wilimitis; Qingxia Chen; Aileen Wright; Jhansi Kolli; Katelyn Robinson; Michael A Ripperger; Kevin B Johnson; David Carrell; Rishi J Desai; Andrew Mosholder; Sai Dharmarajan; Sruthi Adimadhyam; Daniel Fabbri; Danijela Stojanovic; Michael E Matheny; Cosmin A Bejan

doi:10.1101/2023.11.30.23299249

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Dec 1:2023.11.30.23299249. [Version 1] doi: 10.1101/2023.11.30.23299249

Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models

Colin G Walsh, Drew Wilimitis, Qingxia Chen, Aileen Wright, Jhansi Kolli, Katelyn Robinson, Michael A Ripperger, Kevin B Johnson, David Carrell, Rishi J Desai, Andrew Mosholder, Sai Dharmarajan, Sruthi Adimadhyam, Daniel Fabbri, Danijela Stojanovic, Michael E Matheny, Cosmin A Bejan

PMCID: PMC10705655 PMID: 38076830

Abstract

Post marketing safety surveillance depends in part on the ability to detect concerning clinical events at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires awareness and understanding among healthcare professionals to achieve its potential. Reliance on readily available structured data such as diagnostic codes risk under-coding and imprecision. Clinical textual data might bridge these gaps, and natural language processing (NLP) has been shown to aid in scalable phenotyping across healthcare records in multiple clinical domains. In this study, we developed and validated a novel incident phenotyping approach using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It’s based on a published, validated approach (PheRe) used to ascertain social determinants of health and suicidality across entire healthcare records. To demonstrate generalizability, we validated this approach on two separate phenotypes that share common challenges with respect to accurate ascertainment: 1) suicide attempt; 2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide attempt and sleep-related behaviors, respectively, we conducted silver standard (diagnostic coding) and gold standard (manual chart review) validation. We showed Area Under the Precision-Recall Curve of ∼ 0.77 (95% CI 0.75-0.78) for suicide attempt and AUPR ∼ 0.31 (95% CI 0.28-0.34) for sleep-related behaviors. We also evaluated performance by coded race and demonstrated differences in performance by race were dissimilar across phenotypes and require algorithmovigilance and debiasing prior to implementation.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.

PERMALINK

This is a preprint.

Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models

Colin G Walsh

Drew Wilimitis

Qingxia Chen

Aileen Wright

Jhansi Kolli

Katelyn Robinson

Michael A Ripperger

Kevin B Johnson

David Carrell

Rishi J Desai

Andrew Mosholder

Sai Dharmarajan

Sruthi Adimadhyam

Daniel Fabbri

Danijela Stojanovic

Michael E Matheny

Cosmin A Bejan

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models

Colin G Walsh

Drew Wilimitis

Qingxia Chen

Aileen Wright

Jhansi Kolli

Katelyn Robinson

Michael A Ripperger

Kevin B Johnson

David Carrell

Rishi J Desai

Andrew Mosholder

Sai Dharmarajan

Sruthi Adimadhyam

Daniel Fabbri

Danijela Stojanovic

Michael E Matheny

Cosmin A Bejan

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases