Abstract
Background:
Rates of child maltreatment (CM) obtained from electronic health records are much lower than national prevalence rates indicate. There is a need to understand how CM is documented to improve reporting and surveillance.
Objectives:
To examine whether using natural language processing (NLP) in outpatient chart notes can identify cases of CM not documented by ICD diagnosis code, the overlap between the coding of child maltreatment by ICD versus NLP, and any differences by age, gender, or race/ethnicity.
Methods:
Outpatient chart notes of children age 0-18 years old within Kaiser Permanente Washington (KPWA) 2018-2020 were used to examine a selected set of maltreatment-related terms categorized into concept unique identifiers (CUI). Manual review of text snippets for each CUI was completed to flag for validated cases and retrain the NLP algorithm.
Results:
The NLP results indicated a crude rate of 1.55% to 2.36% (2018-2020) of notes with reference to CM. The rate of CM identified by ICD code was 3.32 per 1,000 children, whereas the rate identified by NLP was 37.38 per 1,000 children. The groups that increased the most in identification of maltreatment from ICD to NLP were adolescents (13-18 yrs old), females, Native American, and those on Medicaid. Of note, all subgroups had substantially higher rates of maltreatment when using NLP.
Conclusions:
Use of NLP substantially increased the estimated number of children who have been impacted by CM. Accurately capturing this population will improve identification of vulnerable youth at high risk for mental health symptoms.
Keywords: child abuse, child maltreatment, electronic health records, natural language processing
Child maltreatment (physical abuse, sexual abuse, emotional abuse, and neglect) is a significant and costly social issue with over 656,000 cases documented by child welfare agencies annually, a rate of 9 per 1,000 children (U. S. Department of Health and Human Services, 2021). Cumulative prevalence rates indicate that 12.5% of children in the US experience maltreatment by their 18th birthday, with 5.8% having confirmed maltreatment by age 5 (Wildeman et al., 2014). The effects of maltreatment are wide-ranging (Brown et al., 2010; Monteiro & Azevedo, 2010) and include increased rates of morbidity and mortality from chronic diseases (Cohen, Doyle, Turner, Alper, & Skoner, 2004; Danese & Tan, 2014; Fang, Brown, Florence, & Mercy, 2012; Felitti et al., 1998; Jonson-Reid, Kohl, & Drake, 2012; Shonkoff, Garner, Committee on Psychosocial Aspects of Child and Family Health, & Committee on Early Childhood, 2012). This increased risk for disease contributes to the lifetime economic burden of child maltreatment, which is estimated to be $585 billion (Fang et al., 2012).
Healthcare systems play an important role in identifying and treating children who experience maltreatment. Medical personnel are ranked as one of the top sources of referrals to child welfare, accounting for 11% of child welfare referrals in 2020 alongside educators (21%), law enforcement (19.1%), and social service personnel (10.3%) (U. S. Department of Health and Human Services, 2021). However, child maltreatment rates as indicated by diagnosis codes in electronic health records (EHR) is far lower than the national estimates, indicating probable under-identification or under-reporting. A systemic review indicated a prevalence rate of less than 1% in pediatric hospitalizations or emergency department visits (Karatekin, Almy, Mason, Borowsky, & Barnes, 2018). Among all visit types, the estimated prevalence was even lower, with only .02% of youth 0-21 years old receiving a maltreatment diagnosis in their health record (Karatekin et al., 2018). Data from linkage of electronic health records with child welfare data found 12% more children were identified as maltreated when using EHR versus only child welfare records, indicating there may be information in electronic health records that never reaches child welfare (Schnitzer, Slusher, & Van Tuinen, 2004) and prevalence rates are likely higher than previously found.
There are number of reasons child maltreatment may not be documented with a diagnosis code in the EHR. Qualitative interviews with healthcare providers, advocates from the community, and representatives from state agencies and insurance industry illuminated concerns about using ICD10-CM child abuse diagnosis codes including diagnostic uncertainty, lack of resources for the primary care providers to deal with probable maltreatment, and a fear that coding “abuse” in the medical record will be more harmful than helpful to a child’s wellbeing (Rovi & Johnson, 2003). Grey areas such as suspicious injuries or historical abuse may be documented in chart notes rather than as a diagnosis code given that a diagnosis of maltreatment triggers official reporting requirements (Rovi & Johnson, 2003).
Using information from free-text notes may increase the ability to detect child maltreatment experiences and could provide opportunities to intervene with vulnerable youth. A number of studies have used artificial intelligence including deep learning and natural language processing (NLP) to predict child physical abuse cases in EHR or to predict recurrence (Annapragada, Donaruma-Kwoh, Annapragada, & Starosolski, 2021; Horikawa et al., 2016; Shahi et al., 2021). While these predictive models have focused on the prevention of child maltreatment using validation cases that were assessed by child welfare or medical child abuse protection teams, there may also be utility in the identification of maltreatment that was documented in free text but not as a diagnosis. Use of free-text notes may capture suspicious injuries, historical (past) abuse, or sub-reportable events that may still have an impact on deleterious outcomes.
Should NLP identify maltreatment cases that are not documented through ICD codes, this could indicate the need for healthcare systems to develop methods and tools to standardize the identification and documentation of child maltreatment. Education and assessment tools built into the EHR may help medical providers identify children at risk, facilitate referrals, and prevent future abuse. Additionally, understanding the discordance in maltreatment documented by diagnosis code versus chart notes may help identify demographic characteristics that are associated with under-documentation of maltreatment, further highlighting potential reporting bias. For example, evidence suggests that Black youth are more likely to be referred to child welfare, (Drake et al., 2011) and there is physician bias in documenting suspicious injuries as abuse in Black youth (Flaherty et al., 2008; Jenny, Hymel, Ritzen, Reinert, & Hay, 1999). Determining if certain demographic characteristics are associated with higher identification of child maltreatment in chart notes versus diagnosis codes will help guide provider education regarding the importance of accurate documentation to ensure those children receive the necessary treatment and services.
The Current Study
To address the potential under-report of child maltreatment via diagnosis codes in the EHR, the current study sought to determine 1) whether the use of NLP in EHR yields child maltreatment rates more comparable to national epidemiologic data; 2) if NLP of chart notes identifies new cases of child maltreatment that are not documented with ICD codes; 3) the degree of overlap between the two methods (ICD code and NLP); and 4) if there are differences in rates of child maltreatment for each method by age, gender, or race/ethnicity. Outpatient notes were used because the healthcare system from which we obtained the data contracts for all inpatient and emergency care and therefore we did not have access to the notes for these encounters.
Methods
Participants
Data were obtained from the electronic health records (EHR) of members of Kaiser Permanente Washington (KPWA) - a large integrated healthcare delivery system in the Northwest United States. Children age 0 to 18 years old who had at least one outpatient visit within the KPWA integrated delivery system between 2016 and 2020 were included. The retrieval of electronic health records for this study was approved by the KPWA Institutional Review Board.
Measures
Child maltreatment ICD codes.
Child maltreatment (CM) was defined by using the specific child maltreatment codes from the International Classification of Disease 10th revision (ICD-10). These codes are entered into the child’s electronic health record by the medical provider during a healthcare visit. ICD-10 codes included “child abuse neglect and other maltreatment confirmed” T74.02 to T74.92 (XA, XD, XS) as well as “child abuse neglect and other maltreatment suspected” (T76.02 to T76.92 (XA, XD, XS).
Demographic characteristics.
Child age at cohort entry, gender (male, female), race/ethnicity (Asian, Black, Hispanic, Native American, White, other, unknown), and insurance (commercial, Medicaid, other) were obtained from the EHR.
Data
The primary goal of the NLP investigation was to identify children with evidence of CM noted in free text that was not recorded/documented in structured diagnosis fields.
Assembling Clinical Notes for Processing
Clarity® is the relational database for data extracted from the Epic® EHR. It contains structured EHR data and unstructured clinical “notes”. A “note” is the free text section of documentation for a clinical encounter recorded in an electronic health record. Clinicians may enter information about socio-demographic context, impressions of the patient, patient history, or supporting information for a diagnosis (e.g., symptoms/complaints). Notes vary in length between a few characters and several hundred words and may contain information copied and pasted from elsewhere in a patient’s EHR. In addition to the presence of characteristics/features, clinicians may also document the absence of these characteristics/features (e.g., “patient denies problems with sleep”). The notes used in this study are the routinely collected notes in the KPWA health system and are broadly representative of documentation found across Kaiser Permanente systems and other health care organizations. The corpus of notes included documentation from all outpatient visits to primary care, specialty care, urgent care and secure messages within the patient portal. The corpus did not include notes from emergency department visits because KPWA contracts for all inpatient and emergency care and therefore we did not have access to the notes for these encounters.
Identifying MCI-Related Concepts and Annotation
The first step in building an NLP system was to identify relevant terms and phrases which might indicate CM. We used clinical experience and manually reviewed notes to identify an initial set of terms and phrases. These were expanded through further manual review of notes sampled from the population corpus. Three abstractors (SN, FL, RP) reviewed notes and highlighted sections of text which might indicate CM. These results were reviewed, and the most significant terms and phrases (n = 636) were grouped semantically into 11 unique concepts (CUIs): Abusive, Hitting, Suspicious Abuse, Signs, Perpetrator, Neglect, History of ___, Fear, CPS, Code, and Maltreatment. Linguistically equivalent word form variations were added (e.g., “call” −> “called”, “calling”, “calls”).
The three abstractors manually adjudicated 20 notes from each CUI (n=220) to identify each instance as a true positive or false positive. Each note was reviewed by the three abstractors who discussed whether the note indicated child maltreatment had occurred (true positive) or had not (false positive). The final coding was determined by consensus after discussion. This information was used to further train the NLP system.
Extracting Relevant Information from Clinical Notes
We used a locally developed Python package to extract terms and phrases from notes corresponding to each concept. The package takes each note as input and relies on various regular expressions to identify potential concepts. Boilerplate, negation, and other non-relevant language were eliminated in a post-processing step. This included terms such as: denies abuse, no abuse, [sibling] was abused.
Data Analysis
Each individual in the sample cohort was identified as positive for CM using NLP and structured diagnosis codes(ICD codes) independently. We then tabulated the proportion of youth who were positive for CM using NLP alone, ICD diagnosis codes alone, and both NLP and ICD diagnosis codes.
Results
Encounter level.
There were five cohorts, one each year from 2016 to 2020. The total number of encounters included each year ranged from 86,824 to 98,299. The total number of notes identified with a reference to child maltreatment using NLP ranged from 1,526 to 2,353 for a rate of chart notes with a reference of child maltreatment ranging from 1.55%-2.37%. The highest frequency of child maltreatment references in notes were for 1-4 year olds (44.52%), females (60.27%), and White children (62.54%). The higher proportion for White children is due to the majority of the membership population being White.
Individual level.
There were 199,085 children included in the cohort. The rate of child maltreatment identified by ICD code was 3.32 per 1,000 children, whereas the rate identified by NLP was 37.38 per 1,000 children, a 10-fold increase. All the children with an ICD code of maltreatment also had an NLP flag, indicating that an ICD code was always accompanied by documentation in the primary care progress notes.
Demographic characteristics.
As shown in Table 1, children ages 5-12, females, Native American, and those with Medicaid had the highest rate of maltreatment by ICD code. For NLP indication of maltreatment, children ages 13-18, females, Native American, and those with Medicaid had the highest rates. The groups that increased the most in identification of maltreatment from ICD10 codes to NLP were adolescents (13-18 yrs old), females, Native Americans, and those on Medicaid. Of note, all groups increased the identification of maltreatment substantially when using NLP.
Table 1.
Number of unique children identified by method (ICD code vs NLP)
All children | ICD code | NLP | Has any indication of CM (ICD and/or NLP) |
Increase between ICD vs NLP- identified CM |
|
---|---|---|---|---|---|
N | per 1000 | per 1000 | per 1000 | per 1000 | |
Age Group | |||||
<1 | 21241 | 1.98 | 27.16 | 29.14 | 25.19 |
1-4 | 33524 | 3.01 | 27.68 | 30.69 | 24.67 |
5-12 | 70292 | 4.03 | 37.33 | 41.36 | 33.30 |
13-18 | 74028 | 3.16 | 44.75 | 47.91 | 41.59 |
Sex | |||||
Female | 98279 | 4.49 | 45.64 | 50.12 | 41.15 |
Male | 100792 | 2.16 | 29.33 | 31.49 | 27.16 |
Race | |||||
Asian | 20101 | 2.24 | 29.45 | 31.69 | 27.21 |
Black | 13481 | 6.45 | 66.17 | 72.62 | 59.71 |
Hispanic | 3603 | 4.72 | 49.13 | 53.84 | 44.41 |
Native American/Alaskan native | 2791 | 9.67 | 91.37 | 101.04 | 81.69 |
Other | 6852 | 6.13 | 46.99 | 53.12 | 40.86 |
Unknown | 69900 | 0.80 | 7.87 | 8.67 | 7.07 |
White | 82357 | 4.69 | 56.51 | 61.20 | 51.82 |
Hispanic Ethnicity | 13140 | 6.39 | 58.22 | 64.61 | 51.83 |
Insurance | |||||
Commercial | 115587 | 2.81 | 35.23 | 38.04 | 32.42 |
Medicaid | 22933 | 8.07 | 80.32 | 88.39 | 72.25 |
Other | 60565 | 2.48 | 25.23 | 27.71 | 22.75 |
Total N | 199085 | 3.32 | 37.38 | 40.70 | 34.07 |
Note: ICD=International Classification of Disease diagnosis code; NLP= natural language processing; CM=child maltreatment; all children with an ICD code for CM also had chart note indicating CM and are included in the NLP group. Hispanic race and ethnicity was allowed to be reported for either or both, n=375 reported both Hispanic race and ethnicity.
Discussion
The current study tested the use of natural language processing to increase the identification of child maltreatment in electronic health records within one large integrated healthcare system. Overall, we found that NLP increased the rate of children with maltreatment by over 10 times the rate from ICD10 codes only. This provides a proof-of-concept for the need to further investigate the use of NLP to identify children who may benefit from further screening or treatment related to their experiences.
Rates of child maltreatment from national child welfare data indicate an annual prevalence rate of 9 per 1,000 children (U. S. Department of Health and Human Services, 2021) while cumulative estimates find 12.5% of children experience maltreatment before the age of 18 (Wildeman et al., 2014). Our prevalence rates as indicated by ICD codes was far lower, with only 3 children per 1,000 being identified. Our data included only outpatient encounters while others have examined emergency and inpatient as well as primary care (Karatekin et al., 2018). Despite this difference our prevalence rates are very similar to these other studies of EHR data (Karatekin et al., 2018). As noted previously, lower rates among EHR data may stem from provider hesitation to document potential abuse as an ICD code as it initiates a chain of mandated reporting requirements and life-changing events for the child and family (Gilbert et al., 2009; Van Haeringen, Dadds, & Armstrong, 1998). A study of physicians treating injured children found that although providers had suspicion that about 10% of injury visits were related to child abuse, they only reported 6% to child welfare (Flaherty et al., 2008). These data, in agreement with ours, show that ascertaining prevalence rates only by diagnosis codes is clearly hampered by inconsistent documentation.
Importantly, our proof-of-concept NLP analyses identified 37 per 1,000 children with child maltreatment, 10 times more children identified as maltreated compared to ICD code alone. While the use of NLP resulted in higher rates than National child welfare reports (9 out of 1,000 children)(U. S. Department of Health and Human Services, 2021), it does not come close to that found across self-reported surveys. A meta-analysis of self-reported maltreatment from survey and interview data in North America finds median rates ranging from 18-30% for different maltreatment types (Moody, Cannings-John, Hood, Kemp, & Robling, 2018). This difference in reporting rates may be because self-reported maltreatment likely includes experiences whose severity does not rise to the level of requiring a mandated report or are experiences that may not necessitate a healthcare visit. However, this does not imply that those experiences may not have an impact on mental health as some evidence supports the importance of self-reported rather than official reports for mental health (Danese & Widom, 2020). Historical abuse is most likely reported by the child or parent and thus may occur at higher rates than current abuse. Future work should investigate rates of current versus historical abuse in progress notes. Ensuring that medical providers are aware of a patient’s child maltreatment history will enable better assessment as they determine care needs for that child.
There are limitations that should be considered when interpreting these findings. First, this was a small proof-of-concept pilot study, larger samples and more complex NLP analyses will be necessary to draw stronger inferences about the rates of child maltreatment in progress notes. We did not perform chart review to validate the codes, which will be necessary as a next step. Second, the data were from one region of a large integrated healthcare system, which may not be representative of other geographic areas. We were unable to look at maltreatment types, and it is possible that certain types of maltreatment that are more difficult to ascertain in a medical exam, such as emotional abuse, are less likely to be documented with an ICD code. Finally, it was unclear whether the lack of a diagnosis code was because the type of injury or abuse did not meet the criteria for a mandated report, such as historical abuse. It is also possible that incident abuse would be more likely to be coded as a diagnosis as opposed to historical abuse which would be in a progress note.
Conclusion
Accurate identification of children with maltreatment has the potential to enable better treatment of associated mental health symptoms as well as prevention of recurrence. Our finding of a 10-fold increase when using NLP over diagnosis codes points to the need to include information from progress notes in order to capture maltreatment experiences and ensure those children have received the necessary care. In addition, those youth 13-18 years old, of Native American or Black race/ethnicity, or with Medicaid insurance had the most increase in identification when using NLP, implying potential bias when using only ICD codes to ascertain maltreatment. Unfortunately, we do not know if those identified as maltreated only through their progress notes received a more comprehensive assessment or treatment related to their experiences. This is an important next step. Improvement in the way medical providers document maltreatment could help to ensure that children experiencing maltreatment receive the best coordination of care possible.
Funding/support:
This research was supported by NIMH Cooperative Agreement U19 MH121738.
Abbreviations:
- CM
child maltreatment
- KPWA
Kaiser Permanente Washington
- ICD
International Classification of Disease
- EHR
electronic health records
- NLP
natural language processing
Footnotes
Conflict of Interest Disclosures: The authors have no conflicts of interest relevant to this article to disclose.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Annapragada AV, Donaruma-Kwoh MM, Annapragada AV, & Starosolski ZA (2021). A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records. PLoS One, 16(2), e0247404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown DW, Anda RF, Felitti VJ, Edwards VJ, Malarcher AM, Croft JB, & Giles WH (2010). Adverse childhood experiences are associated with the risk of lung cancer: a prospective cohort study. BMC Public Health, 20(10), 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen S, Doyle WJ, Turner RB, Alper CM, & Skoner DP (2004). Childhood socioeconomic status and host resistance to infectious illness in adulthood. Psychosomatic Medicine, 66(4), 553–558. doi: 10.1097/01.psy.0000126200.05189.d3 [DOI] [PubMed] [Google Scholar]
- Danese A, & Tan M (2014). Childhood maltreatment and obesity: systematic review and meta-analysis. Molecular Psychiatry, 19(5), 544–554. doi: 10.1038/mp.2013.54 [DOI] [PubMed] [Google Scholar]
- Danese A, & Widom CS (2020). Objective and subjective experiences of child maltreatment and their relationships with psychopathology. Nature Human Behavior, 4(8), 811–818. [DOI] [PubMed] [Google Scholar]
- Drake B, Jolley JM, Lanier P, Fluke J, Barth RP, & Jonson-Reid M (2011). Racial bias in child protection? A comparison of competing explanations using national data. Pediatrics, 127(3), 471–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang X, Brown DS, Florence CS, & Mercy JA (2012). The economic burden of child maltreatment in the United States and implications for prevention. Child Abuse & Neglect, 36(2), 156–165. doi: 10.1016/j.chiabu.2011.10.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felitti VJ, Anda RF, Nordenberg D, Williamson DF, Spitz AM, Edwards V, … Marks JS (1998). Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults. The Adverse Childhood Experiences (ACE) Study. American Journal of Preventive Medicine, 14(4), 245–258. [DOI] [PubMed] [Google Scholar]
- Flaherty EG, Sege RD, Griffith JW, Price LL, Wasserman R, Slora E, … Angelilli ML (2008). From suspicion of physical child abuse to reporting: primary care clinician decision-making. Pediatrics, 122(3), 611–619. [DOI] [PubMed] [Google Scholar]
- Gilbert R, Kemp A, Thoburn J, Sidebotham P, Radford L, Glaser D, & MacMillan HL (2009). Recognising and responding to child maltreatment. The Lancet, 373(9658), 167–180. [DOI] [PubMed] [Google Scholar]
- Horikawa H, Suguimoto SP, Musumari PM, Techasrivichien T, Ono-Kihara M, Kihara M, & Neglect. (2016). Development of a prediction model for child maltreatment recurrence in Japan: A historical cohort study using data from a Child Guidance Center. Child Abuse & Neglect, 59, 55–65. [DOI] [PubMed] [Google Scholar]
- Jenny C, Hymel KP, Ritzen A, Reinert SE, & Hay TC (1999). Analysis of missed cases of abusive head trauma. JAMA, 281(7), 621–626. [DOI] [PubMed] [Google Scholar]
- Jonson-Reid M, Kohl PL, & Drake B (2012). Child and adult outcomes of chronic child maltreatment. Pediatrics, 129(5), 839–845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karatekin C, Almy B, Mason SM, Borowsky I, & Barnes A (2018). Documentation of child maltreatment in electronic health records. Clinical Pediatrics, 57(9), 1041–1052. doi: 10.1177/0009922817743571 [DOI] [PubMed] [Google Scholar]
- Monteiro R, & Azevedo I (2010). Chronic inflammation in obesity and the metabolic syndrome. Mediators of Inflammation, 2010(Article ID: 289645). doi: 10.1155/2010/289645 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moody G, Cannings-John R, Hood K, Kemp A, & Robling M (2018). Establishing the international prevalence of self-reported child maltreatment: a systematic review by maltreatment type and gender. BMC Public Health, 18(1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rovi S, & Johnson MS (2003). More harm than good? Diagnostic codes for child and adult abuse. Violence and Victims, 18(5), 491–502. [DOI] [PubMed] [Google Scholar]
- Schnitzer PG, Slusher P, & Van Tuinen M (2004). Child maltreatment in Missouri: combining data for public health surveillance. American Journal of Preventive Medicine, 27(5), 379–384. [DOI] [PubMed] [Google Scholar]
- Shahi N, Shahi AK, Phillips R, Shirek G, Lindberg DM, & Moulton SL (2021). Using deep learning and natural language processing models to detect child physical abuse. Journal of Pediatric Surgery, 56(12), 2326–2332. [DOI] [PubMed] [Google Scholar]
- Shonkoff JP, Garner AS, Committee on Psychosocial Aspects of Child and Family Health, & Committee on Early Childhood. (2012). The lifelong effects of early childhood adversity and toxic stress. Pediatrics, 129(1), e232–e246. doi: 10.1542/peds.2011-2663 [DOI] [PubMed] [Google Scholar]
- U. S. Department of Health and Human Services. (2021). Child Maltreatment 2019. Retrieved from Available from https://www.acf.hhs.gov/cb/research-data-technology/statistics-research/child-maltreatment:
- Van Haeringen AR, Dadds M, & Armstrong KL (1998). The child abuse lottery—will the doctor suspect and report? Physician attitudes towards and reporting of suspected child abuse and neglect. Child Abuse & Neglect, 22(3), 159–169. [DOI] [PubMed] [Google Scholar]
- Wildeman C, Emanuel N, Leventhal JM, Putnam-Hornstein E, Waldfogel J, & Lee H (2014). The prevalence of confirmed maltreatment among US children, 2004 to 2011. JAMA Pediatrics, 168(8), 706–713. [DOI] [PMC free article] [PubMed] [Google Scholar]