Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jan 1.
Published in final edited form as: Acad Pediatr. 2023 Aug 29;24(1):92–96. doi: 10.1016/j.acap.2023.08.015

Natural Language Processing – A Surveillance Stepping Stone to Identify Child Abuse

May Shum a, Allen Hsiao a, Wei Teng b, Andrea Asnes a, Joshua Amrhein c, Gunjan Tiyyagura a
PMCID: PMC10840716  NIHMSID: NIHMS1929339  PMID: 37652162

Abstract

Objective:

We aimed to refine a Natural Language Processing (NLP) algorithm that identified injuries associated with child abuse and identify areas in which integration into a realtime clinical decision support (CDS) tool may improve clinical care.

Methods:

We applied an NLP algorithm in “silent mode” to all emergency department (ED) provider notes between July 2021–December 2022 (n=353) at one pediatric and eight general EDs. We refined triggers for the NLP, assessed adherence to clinical guidelines and evaluated disparities in degree of evaluation by examining associations between demographic variables and abuse evaluation or reporting to child protective services.

Results:

Seventy-three cases falsely triggered the NLP, often due to errors in interpreting linguistic context. We identified common false positive scenarios and refined the algorithm to improve NLP specificity. Adherence to recommended evaluation standards for injuries defined by nationally accepted clinical guidelines was 63%. There were significant demographic differences in evaluation and reporting based on presenting ED type, insurance status, and race/ethnicity.

Conclusions:

Analysis of an NLP algorithm in “silent mode” allowed for refinement of the algorithm and highlighted areas in which real-time CDS may help ED providers identify and pursue appropriate evaluation of injuries associated with child physical abuse.

Keywords: Natural Language Processing, clinical decision support, child protection team, guideline adherence, bias

INTRODUCTION

Physical abuse disproportionally affects infants <12-months-old.1 30% of children who suffer serious abusive injuries were previously evaluated for injuries that were not recognized as abusive.2 Cases are missed more frequently in community emergency departments (EDs) with fewer child abuse resources and experience than pediatric EDs.2 Additionally, Black and Hispanic children are more likely to receive a skeletal survey and child protective services (CPS) report, even when presenting with injuries that are likely accidental.3, 4

There is growing impetus to develop digital tools for clinical decision-making.5 Studies of active and passive screens that trigger electronic clinical decision support (CDS) systems have demonstrated increased identification of children with suspicious injuries and compliance with clinical guidelines for abuse.69 However, the effect of such systems on the true number of missed cases of abuse (i.e., cases that are not identified by the system or that are identified but not evaluated) remains unknown. Diagnostic codes, frequently used in epidemiological studies, lack sensitivity in identifying child abuse, often due to provider unwillingness to formally diagnose abuse during an ongoing evaluation.10 Furthermore, the informativeness of diagnosis codes in identifying abuse without active trauma or child protection team involvement has not been well characterized.11 In contrast, Natural Language Processing (NLP) is a field of artificial intelligence that automates data extraction, converting free text to structured data through rule-based algorithms.12 NLP may be an effective tool to help providers identify potentially abusive injuries in real-time and allow for improved surveillance in epidemiological studies of cases concerning for abuse over time.13, 14 We previously developed and validated an NLP algorithm with high sensitivity (92.7%, 95%CI 79.0–98.1%) and specificity (98.1%, 95%CI 97.1–98.7%) that identifies cases based on a set list of injuries associated with physical abuse in infants.15

Our NLP has been integrated into a CDS tool which will provide feedback to ED providers upon signing a note and thus allow providers to consider the diagnosis of abuse during the same encounter. The algorithm was initially run in “silent mode” at eight general EDs and one pediatric ED such that study personnel received weekly data on encounters that triggered the alert system without physician awareness of or interaction with the system. Berger et. al similarly used a “silent mode” analysis to evaluate an electronic child abuse screening tool prior to live implementation.7 In this study, we aimed to refine our NLP algorithm and highlight areas in which integration into a real-time CDS tool may improve clinical care related to suspected abuse.

METHODS

NLP Algorithm

Our institution follows a guideline recommending consultation with a child protection team (CPT) when an infant ≤12-months-old presents with an injury associated with abuse. The algorithm was designed to identify the abuse-associated injuries on the guideline: 1) long bone fracture; 2) skull fracture; 3) rib fracture; 4) intracranial injury; 5) burn; 6) solid organ injury; 7) bruising of the ear, neck, torso, angle of jaw, cheek, or eyelid; 8) subconjunctival hemorrhage; 9) frenulum tear; or 10) if the patient was <5-months-old and had any oral injury or any bruise.16, 17 The researchers collaborated with 3M M*Modal in developing the NLP algorithm.15

A subset of injuries identified by NLP are those for which the American Academy of Pediatrics (AAP) recommends evaluation for abuse (bruising in infants <5-months-old, fractures in non-mobile infants (defined in our study as <9-months-old), and intracranial hemorrhage in infants <12-months-old) and those described by a validated bruising rule, TEN-4 FACESp (bruising of the torso, ears, neck, frenulum, angle of jaw, cheek, eyelid, sclera, patterned bruising, or any bruising in an infant ≤4-months-old).1820 We defined adherence to these nationally accepted clinical guidelines as evaluation for abuse with a CPT consultation.

Study Design

We performed a cross-sectional study at nine EDs within one healthcare system. 3M M*Modal received ED provider documentation via a Health Level 7 interface, which allowed interoperability and exchange of data between information technology systems, and an Admission, Discharge, Transfer interface which provided demographic data. The algorithm was then run weekly by 3M analysts within the free text of all medical provider notes from the study EDs from July 19, 2021–December 31, 2022. Cases were shared with the researchers via an encrypted system. Two researchers independently reviewed and discussed cases for consensus agreement on inclusion, defined as manual confirmation that the infant presented with ≥1 of the injuries identified by NLP. One researcher conducted structured chart reviews of included encounters and extracted data on demographics, injury characteristics, and clinical interventions.

Cases were discussed within a week of the ED encounter at a multidisciplinary meeting with child abuse experts, social workers, radiologists, CPS investigators, and ED providers to review decisions around CPT consultations and CPS reporting. If experts had ongoing concerns for abuse, patients were called back to the ED for additional evaluation by the CPT.

Setting & Population

Annual pediatric volumes ranged from 1,800–19,500 visits at the general EDs to 35,000 visits at the pediatric ED. The CPT was available on-site at the pediatric ED and by phone at the remaining EDs. Social workers were available in-person at the pediatric ED and in-person or by phone at the general EDs. As mandated reporters, ED providers could report to CPS prior to or without CPT consultation. However, CPT involvement was required for skeletal survey testing. Since transfer to a pediatric setting may indicate risk recognition and appropriate escalation of care, infants who were transferred from a study community ED to the pediatric ED were included in the community ED data.

Data Analysis

Demographics, injury characteristics, and guideline adherence were summarized using descriptive statistics. Initial bivariate analyses using Pearson Chi-square, Fisher’s exact, and T-tests were performed for all injuries and the subset of AAP guideline and TEN-4 FACESp injuries to compare frequency of abuse evaluation and CPS reporting with demographic variables. Multivariate logistic regression was subsequently used to test associations between demographic variables and evaluation and reporting to adjust for potential confounding. A two-sided statistical significance level of p<0.05 was applied to all analyses. Data were analyzed using SPSS v26. This study was exempted as a Quality Improvement study by the study sites’ Institutional Review Boards.

RESULTS

Evaluation of NLP

During the study period, 353 encounters were captured and 73 cases falsely triggered the NLP, most commonly due to misinterpretation of linguistic context (n=35). For example, past medical history was misinterpreted as current injury. Two scenarios were consistently misidentified: 1) scalp hematomas were interpreted as intracranial, and 2) past histories of intraventricular hemorrhage were interpreted as current. An additional 53 cases appropriately triggered the algorithm but were excluded for not having a true injury. For example, a provider initially documented dermal melanocytosis as bruising. The remaining 227 cases met inclusion criteria (Figure 1).

Figure 1.

Figure 1.

Flow chart of inclusion criteria.

aIn 12 encounters, we were unable to reproduce and determine the error due to iterative changes to the algorithm during the study period.

bFor example, dermal melanocytosis initially documented as bruising; bruising caused by phlebotomy in the ED; and injuries found to be due to known birth trauma during the ED encounter.

Demographics and Guideline Adherence

Median age was 5 months. Half of patients were male and 42% presented to a community ED. Sixty-three percent of injuries meeting AAP guidelines or with TEN-4 FACESp bruising received evaluations for abuse (Table 1).

Table 1.

Patient demographics and encounter characteristics.

All Injuries (n=227) Nationally Accepted Clinical Guideline Injuries (n=133)
Mean age, months (SD) 5.2 (3.6) 3.9 (3.1)
Male, n (%) 123 (54) 59 (44)
Race/Ethnicity, n (%)
 Non-Hispanic White 91 (40) 59 (44)
 Hispanic 74 (32) 45 (34)
 Non-Hispanic Black 32 (14) 16 (12)
 Othera 30 (13) 13 (10)
Non-English speaking, n (%) 32 (14) 14 (11)
Public insurance, n (%) 133 (59) 78 (59)
ED type, n (%)
 Pediatric 132 (58) 81 (61)
 Community 95 (42) 52 (39)
Injury typeb, n (%)
 Bruising 97 (43) 74 (56)
 Skull fracture 55 (24) 31 (23)
 Intracranial injury 45 (20) 44 (33)
 Long bone fracture 46 (20) 27 (20)
 Burn 31 (14) 0 (0)
 Frenulum tear 15 (7) 12 (9)
 Subconjunctival hemorrhage 14 (6) 14 (11)
 Solid organ injury 6 (3) 4 (3)
Evaluation for abusec, n (%) 129 (57) 84 (63)
CPS report, n (%) 54 (24) 40 (30)
a

Includes Asian, Native American/Alaskan Indian, and race not listed.

b

Infants may have had more than one injury type.

c

Defined as CPT consultation.

Disparities in Evaluation and Reporting

All Injuries:

Younger infants (p=0.003) and infants with public vs. private insurance (p=0.01) were more likely to be evaluated for abuse; those who presented to a community vs. pediatric ED were less likely to be evaluated (p=0.007) (Supplementary Table 1). In multivariate regression analysis, younger age (aOR=0.89, 95%CI 0.83–0.97) was associated with increased evaluations, while presentation to a community ED had lower odds of evaluation (aOR=0.56, 95%CI 0.32–0.99) (Table 2).

Table 2.

Multivariable logistic regression model for abuse evaluation and reporting.

All Injuries - Evaluation for Abuse
Variable aOR 95% CI P
 Age 0.89 0.83–0.97 0.004
 Female vs. Male 1.06 0.61–1.85 0.84
 Minority race/ethnicity vs. Non-Hispanic White 1.28 0.69–2.38 0.43
 Public insurance vs. Private insurance 1.78 0.96–3.30 0.07
 Community ED vs. Pediatric ED 0.56 0.32–0.99 0.05
All Injuries - CPS Reporting
Variable aOR 95% CI P
 Age 0.96 0.88–1.06 0.42
 Female vs. Male 1.55 0.81–2.94 0.19
 Minority race/ethnicity vs. Non-Hispanic White 1.76 0.81–3.79 0.15
 Public insurance vs. Private insurance 2.73 1.23–6.07 0.01
 Community ED vs. Pediatric ED 0.60 0.30–1.17 0.13
Nationally Accepted Clinical Guideline Injuries - CPS Reporting
Variable aOR 95% CI P
 Age 1.03 0.90–1.18 0.66
 Female vs. Male 3.07 1.33–7.08 0.009
 Minority race/ethnicity vs. Non-Hispanic White 3.35 1.23–9.12 0.02
 Public insurance vs. Private insurance 1.80 0.69–4.73 0.23
 Community ED vs. Pediatric ED 1.26 0.54–2.92 0.59

Infants with public vs. private insurance (p<0.001) were more likely to be reported to CPS; those who presented to a community vs. pediatric ED were less likely to be reported (p=0.04) (Supplementary Table 1). In multivariate regression analysis, publicly insured infants had higher odds of having a CPS report than privately insured infants (aOR=2.73, 95%CI 1.23–6.07) (Table 2).

Nationally Accepted Clinical Guideline Injuries:

Younger infants (p=0.006) were more likely to be evaluated for abuse (Supplementary Table 1). No demographic variables were significant in multivariate regression. Female infants (aOR=3.07, 95%CI 1.33–7.08) and infants of minority race/ethnicity (aOR=3.35, 95%CI 1.23–9.12) were more likely to be reported to CPS on bivariate and multivariate regression analyses (Table 2).

DISCUSSION

In this study evaluating an NLP algorithm that identified abuse-associated injuries, there were three key findings. First, we identified common false positives to improve algorithmic specificity. Second, evaluation for abuse with a CPT consultation occurred for 63% of injuries described by nationally accepted clinical guidelines. Finally, there were significant demographic differences in evaluation and reporting.

Although the NLP algorithm described in this study was previously validated with excellent specificity,15 rule-based algorithms should be routinely assessed for quality assurance.21 Reducing false positives may mitigate alert fatigue, or provider desensitization to excessive alerts.15, 22 Iterative changes to the NLP algorithm based on the false positives identified in this study include differentiating scalp hematomas (commonly due to accident in older, mobile infants)19 from intracranial bleeding and removal of the term “intraventricular hemorrhage” as a trigger, which often referred to past medical history. The NLP algorithm may benefit from external validation prior to implementation in diverse ED settings.

Prior assessment of interventions to improve child abuse recognition at three of this study’s EDs found that skeletal survey testing for AAP guideline injuries increased from 17% to 55%; similarly, testing for TEN-4 FACESp bruises increased from 4% to 25%.17 In this follow-up study, 63% of AAP guideline or TEN-4 FACESp injuries received abuse evaluations. NLP and real-time CDS may help overcome the challenge of sustaining improvements.9, 23 Importantly, NLP linked with CDS may be effective as it does not require extra clinician efforts to initiate.15 Next steps include the implementation of NLP-triggered CDS and evaluation of its impact on clinical outcomes.

Many authors have demonstrated socioeconomic and racial biases in the evaluation of abuse and CPS reporting for patients with abuse-associated injuries.68 We similarly demonstrated differences in management based on presenting ED, insurance status, and race/ethnicity. Public insurance status may be a proxy for poverty and increased evaluations may be partly explained by the association between poverty and child abuse.24 Although our algorithm does not factor in these variables, previous quality improvement work in three of this study’s community EDs had decreased differences in the management of suspected abuse between community and pediatric settings.17 Standardization with NLP-triggered CDS at all EDs may minimize the disparities identified in this study. Algorithms, however, are naturally subject to label choice bias,25 or the mismatch between ideal targets (patient injuries) and proxy variables (injury characterization by providers); by relying on subjective provider descriptions, NLP-linked CDS may be limited in its ability to mitigate all bias. Additionally, improved injury identification may not necessarily lead to changes in a provider’s decision whether to report. Nonetheless, continued examination of disparities after implementation of such systems will be critical.

A consensus conference of child abuse experts recommended developing, disseminating, and sustaining CDS systems for child abuse in all EDs and surveillance to assure cases are not misdiagnosed.23 Our process of prompt review of NLP-identified cases led to experts identifying two cases as particularly concerning for abuse and requiring additional evaluation by the CPT. Routine case surveillance, especially when documentation is completed after a patient’s discharge, must accompany the implementation of NLP-based CDS to assure that cases without further evaluation are not concerning for missed abuse. Hospital systems should consider formalizing monitoring systems to lessen the resource burden of manual review.22

Interventions that improve abuse recognition may inadvertently increase unnecessary reports to CPS by increasing consideration of abuse as a diagnosis, even when accidental injury is more likely.9 To mitigate this consequence, our institution recommends CPT consultation and multidisciplinary review to initiate nuanced decision-making when abuse is considered rather than automatic testing and reporting.16 Future interventions to improve abuse recognition must continue to assess the frequency and appropriateness of CPS reporting.

Limitations

Although we studied nine diverse EDs, our NLP has only been evaluated in one healthcare system, limiting generalizability. The small sample limited study power and ability to adjust statistical models for clinical variables; however, differences persisted in sub-group analysis of national guideline injuries. Lastly, while algorithmic identification and case reviews were within a week of encounters, data review was retrospective and may have missed nuances in decision-making.

CONCLUSION

Analysis of an NLP algorithm in “silent mode” allowed for refinement of the algorithm and highlighted areas in which real-time CDS may help ED providers identify and pursue appropriate evaluation of injuries associated with child physical abuse.

Supplementary Material

1

What’s New:

Analysis of an NLP algorithm in “silent mode” allowed for refinement of the algorithm and highlighted areas in which real-time CDS may help ED providers identify and pursue appropriate evaluation of injuries associated with child physical abuse.

Acknowledgments:

We thank Carol Kutryb and Virginia Sevin for their contributions in review of false positives identified by the algorithm in this study.

Funding:

This work was supported in part by the Prevent Child Abuse CT Pathways to Prevention Grant (GT, MS, AA) and the National Institute of Child Health & Human Development grant K23HD107178 (GT). The contents of this manuscript are solely the responsibility of the authors and do not necessarily represent the official view of the NIH or the Prevent Child Abuse CT organization.

ABBREVIATIONS AND ACRONYMS

AAP

American Academy of Pediatrics

CDS

clinical decision support

CPS

child protective services

CPT

child protection team

ED

emergency department

NLP

Natural Language Processing

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest: The authors report no financial or ethical conflicts of interest. There are no prior publications or submissions with any overlapping information, including studies and patients.

REFERENCES

  • 1.U.S. Department of Health & Human Services AfCaF, Administration on Children, Youth and Families, Children’s Bureau. Child Maltreatment 2020. 2022.
  • 2.Ravichandiran N, Schuh S, Bejuk M, et al. Delayed identification of pediatric abuse-related fractures. Pediatrics. 2010;125:60–66. [DOI] [PubMed] [Google Scholar]
  • 3.Wood JN, Hall M, Schilling S, Keren R, Mitra N, Rubin DM. Disparities in the evaluation and diagnosis of abuse among infants with traumatic brain injury. Pediatrics. 2010;126:408–414. [DOI] [PubMed] [Google Scholar]
  • 4.Hymel KP, Laskey AL, Crowell KR, et al. Racial and Ethnic Disparities and Bias in the Evaluation and Reporting of Abusive Head Trauma. J Pediatr. 2018;198:137–143.e131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Suresh S, Barata I, Feldstein D, et al. Clinical Decision Support for Child Abuse: Recommendations from a Consensus Conference. J Pediatr. 2022. [DOI] [PubMed] [Google Scholar]
  • 6.Rosenthal B, Skrbin J, Fromkin J, et al. Integration of physical abuse clinical decision support at 2 general emergency departments. J Am Med Inform Assoc. 2019;26:1020–1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Berger RP, Saladino RA, Fromkin J, Heineman E, Suresh S, McGinn T. Development of an electronic medical record-based child physical abuse alert system. J Am Med Inform Assoc. 2018;25:142–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Suresh S, Heineman E, Meyer L, et al. Improved Detection of Child Maltreatment with Routine Screening in a Tertiary Care Pediatric Hospital. J Pediatr. 2022;243:181–187 e182. [DOI] [PubMed] [Google Scholar]
  • 9.Tiyyagura G, Asnes AG, Leventhal JM. Improving Child Abuse Recognition and Management: Moving Forward with Clinical Decision Support. J Pediatr. 2023;252:11–13. [DOI] [PubMed] [Google Scholar]
  • 10.Hooft AM, Asnes AG, Livingston N, et al. The Accuracy of ICD Codes: Identifying Physical Abuse in 4 Children’s Hospitals. Acad Pediatr. 2015;15:444–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rasooly IR, Khan AN, Aldana Sierra MC, et al. Validating Use of ICD-10 Diagnosis Codes in Identifying Physical Abuse Among Young Children. Acad Pediatr. 2023;23:396–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lee S, Mohr N, Street N, Nadkarni P. Machine Learning in Relation to Emergency Medicine Clinical and Operational Scenarios: An Overview. Western Journal of Emergency Medicine. 2019;20:219–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Harper NS, Feldman KW, Sugar NF, Anderst JD, Lindberg DM. Additional Injuries in Young Infants with Concern for Abuse and Apparently Isolated Bruises. The Journal of Pediatrics. 2014;165:383–388.e381. [DOI] [PubMed] [Google Scholar]
  • 14.Petska HW, Sheets LK. Sentinel injuries: subtle findings of physical abuse. Pediatr Clin North Am. 2014;61:923–935. [DOI] [PubMed] [Google Scholar]
  • 15.Tiyyagura G, Asnes AG, Leventhal JM, et al. Development and Validation of a Natural Language Processing Tool to Identify Injuries in Infants Associated With Abuse. Acad. Pediatr 2022;22:981–988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shum M, Asnes A, Leventhal JM, Bechtel K, Gaither JR, Tiyyagura G. The Use of Experts to Evaluate a Child Abuse Guideline in Community Emergency Departments. Acad Pediatr. 2021;21:521–528. [DOI] [PubMed] [Google Scholar]
  • 17.Shum M, Asnes AG, Leventhal JM, et al. The impact of a child abuse guideline on differences between pediatric and community emergency departments in the evaluation of injuries. Child Abuse Negl. 2021;122:105374. [DOI] [PubMed] [Google Scholar]
  • 18.Christian CW, Committee on Child A, Neglect AAoP. The evaluation of suspected child physical abuse. Pediatrics. 2015;135:e1337–1354. [DOI] [PubMed] [Google Scholar]
  • 19.Pierce MC, Kaczor K, Lorenz DJ, et al. Validation of a Clinical Decision Rule to Predict Abuse in Young Children Based on Bruising Characteristics. JAMA Netw Open. 2021;4:e215832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Flaherty EG, Perez-Rossello JM, Levine MA, et al. Evaluating children with fractures for child physical abuse. Pediatrics. 2014;133:e477–489. [DOI] [PubMed] [Google Scholar]
  • 21.Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics. 2009;42:760–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Olakotan OO, Yusof MM. Evaluating the alert appropriateness of clinical decision support systems in supporting clinical workflow. Journal of Biomedical Informatics. 2020;106:103453. [DOI] [PubMed] [Google Scholar]
  • 23.Suresh S, Barata I, Feldstein D, et al. Clinical decision Support for Child Abuse: Recommendations from a Consensus Conference. J Pediatr. 2022. [DOI] [PubMed] [Google Scholar]
  • 24.Sedlak AJ, Mettenburg J, Basena M, Petta I, McPherson K, Greene A, Li S Fourth National Incidence Study of Child Abuse and Neglect (NIS–4): Report to Congress. Washington, DC: U.S. Department of Health and Human Services, Administration for Children and Families; 2010. [Google Scholar]
  • 25.Obermeyer Z, Nissan R, Stern M, Eaneff S, Bembeneck EJ, Mullainathan S. Algorithmic bias playbook. Center for Applied AI at Chicago Booth. 2021. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES