Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 1.
Published in final edited form as: Child Maltreat. 2023 Aug 6;29(4):601–611. doi: 10.1177/10775595231194599

Identification of Child Survivors of Sex Trafficking from Electronic Health Records: An Artificial Intelligence Guided Approach

Aaron W Murnan a, Jennifer J Tscholl b,c, Rajesh Ganta d, Henry O Duah a, Islam Qasem a, Emre Sezgin d,e
PMCID: PMC11000265  NIHMSID: NIHMS1976997  PMID: 37545138

Abstract

Child victims of sex trafficking experience high rates of maltreatment, abuse, and victimization. Victims regularly seek pediatric care, yet fail to be identified, exacerbating existing disparities in their health. This study sought to identify and describe sex trafficking victims’ healthcare presentation within a pediatric healthcare setting.

A sex trafficking-related keyword search (sexual exploitation, trafficking, prostitution, etc.) was conducted to identify victims within the electronic medical records (EMRs) of roughly 1.5 million patients at a large midwestern pediatric hospital. Health and healthcare utilization data were assessed using diagnostic codes.

0.18% (n=2,654) of all pediatric EMRs included a trafficking-related keyword. Among children identified via keyword search, the most common diagnostic codes present included: Confirmed Sexual/Physical Assault; Trauma and Stress-Related Disorders; Depressive Disorders; Anxiety Disorders; and Suicidal Ideation.

Findings reflect adverse physical and psychological outcomes among trafficking victims and illuminate promise of Natural Language Processing Methods to improve screening and research efforts with this population.

Keywords: Child Health, Sex Trafficking, Natural Language Processing, Medical Records

Introduction

Child victims of sex trafficking (CVST) are a significant public health concern in the United States (US) and the world at large due to their commercial sexual exploitation, exceptional risk for adverse experiences and poor outcomes, as well as a dearth of effective identification and intervention strategies (Greenbaum et al., 2015). Risk factors for sex trafficking among children are multifaceted and include an interplay of individual, household, family system, and environmental level factors (Bang et al., 2013; Greenbaum et al., 2015; Hershberger et al., 2018; Naramore et al., 2017). In addition to increased risk for sexually transmitted infections (STI), CVST also experience various physical, social, and psychological trauma. Our understanding of this population and the development of effective intervention strategies remains incredibly limited, due to the covert and criminal nature of sex trafficking, as well as low self-identification rates among victims (Bang et al., 2013; Greenbaum et al., 2015; National Research Council, 2014). As a result, high-quality prevalence estimates and robust research are limited. There is a critical need to prioritize effective strategies to identify CVST, in order to better connect them with essential support and establish evidence-based, best practices to target their salient needs.

Prior research indicates there are numerous factors that heighten a child’s risk for being trafficked. These include runaway behavior, foster care involvement, prior sexual and physical abuse, illicit substance use, mental health issues, living in poverty, and parental or guardian involvement in sex trading or sexual trafficking, amongst other adverse childhood experiences (Bang et al., 2013; Greenbaum et al., 2015; Hershberger et al., 2018; Naramore et al., 2017). Children who are trafficked experience an array of physical, social, sexual, and psychological trauma resulting in adverse health outcomes, including increased risk for STI, malnutrition, depression, anxiety, and post-traumatic stress disorder (PTSD), amongst other mental health sequelae (Bang et al., 2013; Barnert et al., 2017; Edinburgh et al., 2015; Goldberg et al., 2017). The degree to which this constellation of risk factors consistently differentiates CVST from other high-risk children remains poorly understood.

Prior studies demonstrate that CVST often seek medical care within the healthcare system for other health-related needs; however, these victims are rarely identified as such by healthcare providers during that time frame (Goldberg et al., 2017). Failure to identify these children as victims of trafficking within health care settings is influenced by many factors, including: low self-identification rates among victims, absence of standardized identification tools for CVST, insufficient knowledge among health care providers in this subject area, and increasing patient loads and time constraints (Greenbaum et al., 2015; Jaeckl & Laughon, 2020; National Research Council, 2014; Rafferty, 2016). Additionally, different providers may attend to a child who is being trafficked during separate medical encounters, and such disconnected services make it difficult to establish a comprehensive and accurate longitudinal assessment of the child, which would be advantageous in accurately identifying trafficking risk or experience (National Research Council, 2014). These represent critical missed opportunities by the healthcare system to identify and connect these children with services that can potentially mitigate their risk of developing a multitude of the associated adverse outcomes. Although, attempts have been made to increase knowledge and awareness for rapid identification by health care providers (Kenny et al., 2019; National Research Council, 2014), there remains the persisting challenge of low self-identification among victims, due to fear of reprisal attacks from traffickers, emotional manipulation, trauma bonding with traffickers, shame, distrust for the health system, and simply not perceiving themselves as being trafficked nor being mistreated (Casassa et al., 2022; Greenbaum et al., 2015; Rafferty, 2016). Consequently, there is a limited amount of literature that explores the health care utilization and clinical representation among CVST. As such, little is known about experiences and presenting concerns that could consistently differentiate CVST within healthcare settings. Knowledge and diagnosis of this population can guide capacity building strategies for healthcare systems to understand and address the needs of CVST as well as inform novel identification methodologies to identify CVST.

Paradoxically, vast amounts of data on CVST and all patients are obtained during clinical encounters and are embedded within the narrative documentation of electronic medical records (EMR). Providers would need to review all prior documentation in the EMR to develop a comprehensive understanding of each child to illuminate potential trafficking experiences or risk factors. This degree of chart review is not feasible, nor realistic, in most busy clinical settings, especially in the context of subspecialty care that addresses targeted chief complaints, such as an Emergency Department. However, artificial intelligence and machine learning algorithms could be leveraged to help identify patient cohorts across all records within an EMR (Haleem et al., 2019). Specifically, Natural Language Processing (NLP), which is a specific subfield of artificial intelligence and machine learning, can be used to understand, comprehend, and interpret unstructured medical text. Recent literature has demonstrated the effectiveness of NLP methods on unstructured text of EMR to explore health phenomenon such as parental criminal justice involvement/incarceration and health care utilization, suicide and accidental deaths, stroke, and oncology outcomes, amongst others (Boch et al., 2021; Carrell et al., 2014; Gonzalez-Hernandez et al., 2017; Kehl et al., 2019; Koleck et al., 2021; Ling et al., 2019; McCoy et al., 2016; Moosavinasab et al., 2021; Ong et al., 2020; Viani et al., 2021). Boch et al. (2021) successfully employed an NLP method on a large text of EMR to explore the health care ultilization and clinical presentations of pediatric patients with probable personal or family history of involvement within the correctional system. NLP methods have the potential to similarly identify CVST by leveraging unstructured EMR data in ways that would not be feasible for individual providers. As such, the present study reports on an identified cohort of CVST, to use NLP as a novel tool for identifying and studying CVST, by extracting and uncovering information regarding presenting healthcare concerns and utilization from existing patient data in EMR following a novel NLP technique (Moosavinasab et al., 2021).

If NLP algorithms can be honed appropriately, we believe these methodologies can be used to illuminate characteristics of healthcare presentation and past medical history, which are embedded in EMR text, to identify patients who are CVST or at high risk for being trafficked. The goal is for this technology to essentially serve as a screening tool by utilizing information from a patient’s own medical record, rather than relying on accurate responses to a self-administered tool. This could ultimitately enhance the capacity of healthcare systems to appropriately identify and act upon the risk factors elucidated, via subsequent usage of best practice alerts to the medical providers involved in a patient’s care. Granted that this approach works, it will represent an innovative avenue for further investigation into the health care ultilization and clinical presentations of CVST. Long-term, NLP has the potential to enable health care providers to identify not only CVST, but also children at high risk of becoming CVST with similar risk factors, to then connect them with resources to potentially mitigate their existing risk factors

Methods

Study settings

Medical records were acquired from the EMR (Epic) of a large Midwest pediatric hospital. The hospital receives more than 1.5 million patient visits annually from across the USA, with over 500 inpatient beds and more than 60 facilities for healthcare delivery for both inpatient and outpatient settings. The medical records include records for 1.5 million unique pediatric patients (under 21 years of age) by the time of query (November 2021). We queried the medical records from 2007 to 2021.

Data query details

Before searching medical records, we used an NLP method (Moosavinasab et al., 2021) to identify similar keywords to CVST in our medical records. The tool parses the entire medical records and extracts any terms semantically or linguistically related to CVST, and helped to expand our keyword terms to be used in the data pull process to retrieve potential CVST patients’ notes. Subject matter experts (co-authors AM and JT) populated a variety of terms that may be used clinically to describe a child who may be a victim of sex trafficking. There is an inconsistent knowledge base amongst pediatric medical care providers regarding all elements of sex trafficking, such as risk factors, screening methodology, and even terminology. As such, we used a variety of broad terms that may be used to describe sex trafficking, including some antiquated and stigmatizing terms that may reflect provider failure to recognize patients as victims. Our initial keywords used with the NLP method are presented within “_” in the following list. Initial and expanded keywords by the NLP methods, the finalized query included [solicitation OR prostitution OR “sex work” OR prostitute OR “sex trade” OR “selling sex” OR prostituting OR “sex trafficking” OR “sex trafficked” OR “sexual exploitation”]. This query was run by data specialists within hospital database and results were exported as spreadsheet data files containing patient information, demographic data, medical treatment setting, International Classification of Diseases Codes 9 and 10 (ICD) diagnostic codes, healthcare utilization, and full-written medical note containing aforementioned search keywords.

Statistical analysis

We use descriptive statistics to report demographic characteristics, health care coverage and use, ICD diagnostic codes, and health characteristics of CVST in comparison to general patient population.

CVST characterization

In addition to identifying CVST, the NLP keyword extraction also captured free text relating to caregivers or others in close proximity to patients who had involvement in the sex trade/trafficking victimhood. A manual review was undertaken of 374 patient charts, which was estimated to be representative of 2824 patients with 95% confidence level and 5% margin of error (confidence interval). The manual review facilitated coding to reflect whether the CVST keyword identification within the text represented any of the following categories: true CVST, sex trade involvement/trafficking victimhood of someone else related to the patient, or a another type of sexual victimization of the patient.

Results

Patient characteristics

Table 1 summarizes the demographics of patients identified by the CVST keyword search, and total patients in the database. About 0.18% of patients (2824 out of 1,509,715) had a CVST keyword within their records. Most of the patients (n = 2048 or 72.5%) with a CVST keyword identified as Female, and over half (n = 1632 or 57.8%) were White. Age was calculated at the time of the data pull and therefore, most of the patients identified by CVST encounters keyword were ages 13 and older (41.4%). Most of the identified CVST keyword patients (68.5%) had Medicaid/SCHIP as their health insurance coverage (n = 1790).

Table 1.

Demographic and health utilization of patients identified by SCST encounters keyword and all patients in electronic medical record (EMR) database from 2007 to 2021.

Patients Characteristics Patients with SCST keyword encounters in EMR N= 2,824 All Patients in Database N= 1,509,715 Percentage of all patients with keyword encounters (N1/N2)

N1 % N2 % %
Gender

Male 773 27.4% 746,665 49.46% 0.10%
Female 2,048 72.5% 762,267 50.49% 0.26%
Unknown 3 0.1% 783 0.050% 0.38%

Age

0–4 years 97 3.4% 803,807 53.2% 0.012%
5–9 years 281 10.0% 233,071 15.4% 0.12%
10–12 years 306 10.8% 126,335 8.4% 0.24%
13–18 years 1,169 41.4% 215,396 14.3% 0.54%
19+ years 971 34.4% 131,266 8.7% 0.74%

Race

White 1,632 57.8% 759,272 50.3% 0.21%
Black 713 25.2% 216,582 14.4% 0.33%
Multiple race 304 10.8% 60,969 4.0% 0.5%
Unknown 106 3.8% 220,366 14.6% 0.05%
Asian 25 0.9% 37,695 2.5% 0.07%
No Information / refuse 24 0.9% 186,855 12.3% 0.08%
Native Hawaiian or Other Pacific Islander 13 0.5% 12,921 0.90% 0.1%
American Indian or Alaska Native 5 0.2% 5,203 0.30% 0.1%
Other 2 0.1% 10,164 0.70% 0.02%

Ethnicity

Not Hispanic or Latino 2,625 93.0% 1,076,723 71.3% 0.2%
No Information 11 0.4% 174,612 11.6% 0.01%
Hispanic or Latino 130 4.6% 51,882 3.4% 0.3%
Other 26 0.9% 10,645 0.7% 0.2%
Unknown 32 1.1% 196,165 13% 0.02%

Health insurance

Medicaid/SCHIP 1,790 68.5% 680,261 45.1% 0.3%
Private/Commercial 575 22.0% 696,495 46.1% 0.1%
Other/Unknown 138 5.3% 36,237 2.4% 0.4%
Self-pay 103 3.9% 229,651 15.2% 0.04%
Medicare 9 0.3% 6,865 0.5% 0.1%

CVST keywords identified in patients’ notes differed across demographic characteristics. Notes for males (n=1,685) were commonly identified via “prostitution” or “solicitation”-related keywords (84% compared to 54.5% for females), as well as keywords for “selling or trading sex” (2.6% compared to 1.3% for females). Conversely, notes for females (n=5,272) were more commonly identified via “sex trafficking” (14.3% vs. 7.9%) and “exploitation” keywords (29.9% vs. 6.1%) compared to males. See Table 2.

Table 2.

Keywords distribution by child biological sex.

Child Biological Sex Sample Size (# of notes) ‘Trafficking’ Keywords ‘Exploitation’ Keywords Prostitution Keywords Selling sex, sex work, trading Keywords
n % n % n % n %
Male 1,685 134 7.9 102 6.1 1,416 84.0 43 2.6
Female 5,272 753 14.3 1,575 29.9 2,874 54.5 70 1.3

More subtle differences were observed when comparing CVST keywords across racial identity. Black patients’ notes (n=2,048) were more likely to contain “trafficking” keywords compared to White patients’ notes (n=3,622); 15.9% compared to 10.7%. Similarly, the notes of Black patients were more likely to contain “sexual exploitation” keywords compared to the notes of White patients (29.1% vs. 21.3%). Among the notes of Asian patients (n=91), 56% were identified via “trafficking” keywords; however, no notes in this demographic subgroup contained “exploitation”-related keywords. Keywords related to “prostitution” or “solicitation” were most likely to be identified in the charts of White patients (66.4%), followed by multi-racial patients (61.9%), then Black patients (53.8%), and Asian patients (43.9%). See Table 3.

Table 3.

Keywords distribution by child’s racial identity.

Child Racial Identity Sample Size (# of notes) ‘Trafficking’ Keywords ‘Exploitation’ Keywords Prostitution Keywords Selling sex, sex work, trading Keywords
n % n % n % n %
White 3,622 387 10.7 770 21.3 2,406 66.4 59 1.6
Black 2,048 326 15.9 596 29.1 1,102 53.8 24 1.2
Multi-racial 886 86 9.7 234 26.4 548 61.9 18 2.0
Asian 91 51 56.0 0 0.0 40 43.9 0 0.0
Native Hawaiian or Pacific Islander 32 3 9.3 0 0.0 29 90.6 0 0.0

Patient diagnoses and utilization characteristics

Of the 2,824 notes identified by the model, 74.2% of identified patients were seen in an outpatient setting. Many patients were seen by multiple medical subspecialties on the encounter for which CVST keywords were noted in their EMR. The most common departments where these patients were seen included the behavioral health clinic (n=1039; 36.8%), the child abuse assessment clinic (n=991; 35.1%); behavioral psychiatry (n=916; 32.4%); and the Emergency Department (n=610; 21.6%).

Table 4 summarizes diagnostic codes for the patient’s identified by the keyword search. A total of 27,369 ICD codes were included in the cohort notes. The 4 most common ICD code categories included: 1) Mental and Behavioral Health; 2) External Factors and Influences on Healthcare; 3) Abnormal Labs; and 4) Genitourinary Health. See Table 4 for more details. Irrespective of ICD code category, the most common diagnostic codes were: 1) Confirmed Sexual or Physical Assault (n=1,854; 65.7%); 2) Trauma and Stress-Related Disorders (n=1,430; 50.1%); 3) Depressive Disorders (n=1,291; 45.7%); 4) Anxiety Disorders (n=976; 34.6%); 5) Suicidal Ideation (n=677; 24.0%); and 6) Disruptive/Impulse-Control/Coduct Disorders (n=667; 23.6%).

Table 4.

Most common ICD codes present within identified sample notes.

ICD-Codes n =
Codes Related to Mental and Behavioral Health n = 7,786
Trauma and Stress-Related Disorders n = 1,430
Depressive Disorders n = 1,291
Anxiety Disorders n = 976
Disruptive, Impulse-Control, and Conduct Disorders n = 667
Gender Dysphoria n = 319
Codes Related to External Factors Influencing Health and Morbidity n = 6,075
Sexual or Physical Assault – Confirmed n = 1,854
Sexual or Physical Assault – Suspected n = 467
Child placed in Welfare Custody or Foster Care n = 265
High-Risk Sexual Behaviors n = 222
Family History of Substance Use or Mental Health Disorder n = 114
Codes Related to Abnormal Lab Results/ Screening n = 2,888
Suicidal Ideation n = 677
Behavioral Issues/Aggression n = 363
Abdominal Pain n = 251
Dysuria Injury n = 97
Homicidal Ideation n = 76
Codes Related to Genitourinary n = 1,038
Non-Inflammatory Disorders of Vulva or Perineum n = 140
Suppression of Menses and Other Unspecified Conditions Related to Female Organs or Menstrual Cycle n = 127
Dysmenorrhea n = 123

Chart validation of query: types of sex trafficking exposure.

About 45% of the 374 clinician notes that were examined for validation indicated some type of personal or family involvement in the sex trade. Of the 374 notes reviewed, only 43 indicated a child met the criteria for sex trafficking, 99 indicated that a parent, sibling, or other family member was involved in the sex trade. Another 28 provided elucidated suspicious circumstances prompting concern for sex trafficking (ex: arrested in a hotel with older men on suspicion of selling sex, but child makes no disclosure of abuse/trafficking). An additional 107 indicated suspected or confirmed victimhood of some type of sexual offense; however, it was unclear whether the child was a trafficking victim. Lastly, 97 indicated enough evidence to confirm a false positive where reviewers were able to rule out child victimhood or close proximity to the sex trade industry. Examples of false positives would be notes that included verbiage such as, “patient denies sexual exploitation on a screener and does not have relevant history on their EMR”, where the keyword was identified via the NLP tool, but in the context of the provider documenting a negative screen. In anothers instance, the patient disclosed nightmare about hypothetically prostituting. In future iterations or works, exclusion criteria could be queried and overlayed to help sort out ‘false positives’ within the process of cohort identification.

Discussion

Overall, we present an initial, first-of-its-kind attempt to leverage Natural Language Processing (NLP) methods within a pediatric healthcare setting to identify and study child victims of sex trafficking, by extracting information buried within the free text documentation of an EMR. Our findings align with currently available CVST literature reflecting relationships between child sex trafficking victimhood, close proximity to the sex trade, and poor mental health. The true innovation of the project stems from the novel utilization of NLP in identifying these children within a pediatric healthcare setting. NLP methods are increasingly being used across a variety of fields to enhance the study of a wide range of health-related topics. The initial purpose of our study was to identify children who had been victims of sex trafficking, the NLP methodologies uncovered a broader range of at-risk children, which also included: those who were suspected to be victims, those who had a parent or family member in the sex trade or who had been trafficked, and children who had been victims of some type of sexual offense, which could be related or unrelated to trafficking. This was unexpected insight that emerged, as these other groups of children in close proximity to the sex trade industry present with established risk factors for sex trafficking victimhood could also benefit from interventions to prevent future victimization. NLP methods have the potential to overcome well-documented and long-standing obstacles to this work including under-reporting of trafficking, lack of knowledge about trafficking among victims and providers, and difficulty targeting screening and early intervention efforts. Taken together, findings from our study are encouraging and reflect the potential for NLP; however, only represent the ‘tip of the iceberg’ in terms of the potential utility of NLP methods in understanding and combating human trafficking of children within pediatric healthcare settings.

Our findings closely aligned with well-documented patterns within the literature that suggest associations between sex trafficking victimhood among children and various adverse physical, psychological, and behavioral health outcomes (Bang et al., 2013; Barnert et al., 2017; Edinburgh et al., 2015; Goldberg et al., 2017). Prior literature has identified distinct risk factors for CVST, including prior physical and sexual abuse, mental health issues, early onset substance use, runaway behavior, and other social determinants of health risk (Greenbaum et al., 2015). These risk factors can be viewed as characteristics that make a child more vulnerable to traffickers. For example, children with a history of substance use may be marginalized by their families or communities, providing a variety of opportunities for a trafficker to exploit their unmet needs. A trafficker will then provide substances of abuse or a sense of family, in exchange for “payment” via sexual exploitation. There are numerous high-risk social circumstances that create similar types of vulnerabilities for children. Many of these associated risk factors include the most common diagnoses linked with the patient encounters identified by our CVST keyword search, such as mental health disorders, trauma-related diagnoses, and prior child maltreatment (physical/sexual abuse). Within our cohort, the most common clinical presentation with their notes included: Sexual or Physical Assault; Trauma and Stress-Related Disorders (most commonly Post-Traumatic Stress Disorder - PTSD); Depressive Disorders (most commonly Major Depressive Disorder); Anxiety Disorders (most commonly Generalized Anxiety Disorder and Unspecified Anxiety Disorders); and Suicidal Ideation. Taken together, findings are closely aligned with prior work that indicate high prevalence of childhood abuse victimhood and related mental health conditions that skew towards trauma-related issues (Goldberg et al., 2017; Greenbaum et al., 2015). This alignment of diagnoses within the cohort reflects we are likely capturing a cohort of children at risk of trafficking despite being unable to confirm trafficking victimhood in many. Exploring demographic characteristics of CVST was a small goal of the project in comparison to initially exploring the potential utility of NLP to identify children who have been or are at high-risk to be victims of sex trafficking.

One interesting observation within the data were differences in keywords that identified children’s notes across children’s racial identity and biological sex. In our study boys were much more frequently identified via ‘prostitution’ or ‘solicitation’ related keywords compared to girls (84.0% to 54.5%). In turn, girls were much more commonly identified by ‘sex trafficking’ and ‘sexual exploitation’ keywords compared to boys (44.2% to 14.0%). Racial disparities in keyword prevalence were present as well. Asian children and Black children were more frequently identified via ‘sex trafficking’ verbiage compared to their White counterparts (56.0% and 15.9% compared to 10.7%). In turn, White children were more commonly identified via ‘prostitution’ and ‘solicitation’ related keywords compared to Black children (66.4% to 53.8%). These may reflect biases among providers that could impact and drive disparities in care for these children. It may also reflect possible disproportionate rates of parental or family involvement across child racial groups, which could alternatively explain differences in verbiage. Distinctions between terminology such as victim of sex trafficking, prostitute, or sex worker, are especially critical in this area given drastically different political, law enforcement, and treatment responses as well as differing levels and forms of stigma related to each. In the United States, victims of trafficking are extended supportive services, granted legal protections, and various treatment options acknowledging their exposure to trauma and victimization; whereas individuals (even if children) labeled as prostitutes or sex workers are far less likely to receive support services. These children and adults experience profound criminalization, stigmatization, and experience additional barriers to obtaining high quality care (Dalla et al., 2001; Duff et al., 2015).

It is not surprising that this initial NLP methodology identified many patients with risk factors for CVST who were not current victims, such as children who are in close proximity to the sex trade due to family involvement. This presents an opportunity for linkage with resources or interventions to mitigate potential risk factors and subsequently prevent (or greatly decrease the likelihood of) future victimization. Findings from a very limited number of studies indicate that children whose parents are in the sex trade, similar to CVST, may be at elevated risk for experiencing abuse/maltreatment; mental health-related issues; early onset substance use; elevated risk for sex trade involvement; as well as unique stigmatization interfering with their access to quality care (Dalla, 2003; Murnan & Holowacz, 2020; Murnan et al., 2020; Murnan et al., 2018). These studies have solely relied on maternal reports for their children’s health and experiences, which introduces numerous limitations and potential for inaccurate information. Yet additional information related trickle-down effects of parental involvement onto children in close proximity remains profoundly limited due to ethical and logistic challenges of proper screening and identification. Given the identification of these children within the current study, NLP methods may represent a unique avenue for also identifying children at elevated risk for being trafficked due to their close proximity to the sex trade to better understand their healthcare needs and potential mitigation strategies.

NLP in Cohort Identification

It is clear that NLP methods have the potential to improve our capacity to identify and analyze patient data that has historically required either extensive chart reviews or reliance on patients self-reporting with accurate and honest recall. Although the current study did not expand the NLP method beyond the intended initial search; future studies could include unsupervised methods to effectively execute screening tools without needing to administer the screens to patients, which can lead to labeling and identification of patient cohorts (Lee et al., 2022; Zeng et al., 2018). Such technology working in the background of an EMR could then prompt a Best Practice Alert (BPA) to the medical provider involved in a patient encounter. This BPA could prompt the provider to ask additional questions or, at the very least, highlight the patient’s risk factors, to color the lens through which a provider evaluates an encounter.

Similar to earlier works (Boch et al., 2021; Carson et al., 2019), results from our study demonstrates cause for optimism for NLP as a tool to better identify and study CVST and those who are in close proximity to the sex trade industry. In this first-step project, 74.6% of the children we identified using the keyword search were either: victims of trafficking, had a parent or sibling who was involved in the sex trade; were suspected to have been trafficked; or were the victim of sexual offense for which we were unable to determine if it was related to trafficking. Future work will need to refine these models iteratively to identify critical terms and variables captured within the EMR that can differentiate CVST and children who are at high-risk for trafficking. Such information could be leveraged in an invaluable way to better target screening efforts, enhance identification of CVST and children at-risk, and target prevention and early intervention efforts within pediatric settings. Enhancing these approaches has the potential to be incredibly valuable in interrupting cycles of abuse and exploitation as well as connecting these children with potentially beneficial services.

Limitations and Future Directions

One of the primary limitations relates to generalizability as the current study identified a relatively small cohort at one single pediatric hospital setting. To refine and maximize these methodologies for identifying CVST and those at high-risk, large multi-site studies would be needed. Future efforts will need to test these methods in multiple locations to evaluate validity and generalizability. Secondly, the identification of keywords and ICD codes occurred at the note level as opposed to the patient level, which may increased the prevalence of ICD codes. The NLP methodologies identified CVST in addition to children at high-risk for trafficking victimhood, which was not the original intention. While the expanded lens is ultimately an unintended strength, future research efforts should separate these groups and compare and contrast the clinical presentation of various sub-groups identified within the current cohort as well as keywords that may help algorithms differentiate them from one another as well as other pediatric patients. To this effect, future work should seek to expand this work by identify factors and constellation of risk factors within notes that differentiate trafficking victims from children who are otherwise at high-risk of being trafficked. This information could be invaluable towards guiding screening practices and policies to better identify victims of trafficking within pediatric health settings. Revised versions of what we currently have could (using larger pools) provide an innovative tool capable of obtaining very large sample sizes and supporting novel and nuanced research targeting understanding the healthcare needs of this population as well as enhancing our knowledge for best practices in serving / meeting their needs once identified.

Conclusion

The integration of models using NLP in health care setting could augment the detection of CVST and children at high-risk for victimhood. More importantly, if the results are confirmed, the urgency of preventive interventions for children affected by the sex trafficking cannot be overstated, especially for their physical and psychological health. We look forward to validating this work using such outcomes, in larger samples, and across multiple treatment settings. A validated prediction algorithm can then be studied clinically to determine safe implementation practices.

Footnotes

Authors have no interests to disclose. This work was funded by the National Center for Advancing Translational Sciences/ National Institutes of Health [UL1TR002733]. We would like to acknowledge Gage Boyer as a key contributor with his diligent work. He has read and approved the submission.

References

  1. Bang B, Baker PL, Carpinteri A, & Van Hasselt VB (2013). Commercial Sexual Exploitation of Children. Springer Science & Business Media; (1st ed). 10.1007/978-3-319-01878-2 [DOI] [Google Scholar]
  2. Barnert E, Iqbal Z, Bruce J, Anoshiravani A, Kolhatkar G, & Greenbaum J (2017). Commercial sexual exploitation and sex trafficking of children and adolescents: A narrative review. Academic Pediatrics, 17(8), 825–829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beck M, Lineer M, Melzer-Lange M, Simpson P, Nugent M, & Rabbitt A (2015). Medical providers’ understanding of sex trafficking and their experiences with at-risk patients. Pediatrics, 135(4), e895–e902. [DOI] [PubMed] [Google Scholar]
  4. Boch S, Sezgin E, Ruch D, Kelleher K, Chisolm D, & Lin S (2021). Unjust: The health records of youth with personal/family justice involvement in a large pediatric health system. Health and Justice, 9(1), 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cabitza F, Rasoini R, & Gensini G (2017). Unintended consequences of machine learning in medicine. JAMA, 318(6), 517–518. [DOI] [PubMed] [Google Scholar]
  6. Carrell DS, Halgrim S, Tran DT, Buist DSM, Chubak J, Chapman WW, & Savova G (2014). Using natural language processing to improve efficiency of manual chart abstraction in research: The case of breast cancer recurrence. American Journal of Epidemiology, 179(6), 749–758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carson NJ, Mullin B, Sanchez MJ, Lu F, Yang K, Menezes M, & Cook BL (2019). Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PLoS ONE, 14(2), e0211116. 10.1371/journal.pone.0211116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Casassa K, Knight L, & Mengo C (2022). Trauma bonding perspectives from service providers and survivors of sex trafficking. A scoping review. Trauma, Violence, and Abuse, 23(3), 969–984. [DOI] [PubMed] [Google Scholar]
  9. Cepeda JE (2022, June 6). Combating Human Trafficking Using Machine Learning: Part 2. MLearning. https://medium.com/mlearning-ai/combating-human-trafficking-using-data-part-2-f745a421a485 [Google Scholar]
  10. Chen H, Tang B, Wang X, Liu X, Liu Z, Liu S, Wang W, Deng S, Chen Y, & Wang J (2015). An automatic system to identify heart disease risk factors in clinical texts over time. Journal of Biomedical Informatics, 58, S158–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dalla R (2003). When the bough breaks: Examining intergenerational parent-child relational patterns among street-level sex workers and their parents and children. Applied Developmental Science, 7(4), 216–228. [Google Scholar]
  12. Dalla R, Xia Y, & Kennedy H (2003). “You just give them what they want and pray they don’t kill you” Street level sex workers’ reports of victimization, personal resources and coping strategies. Violence Against Women, 9(11), 1367–1394. [Google Scholar]
  13. Duff P, Shoveller J, Chettiar J, Feng C, Nicoletti R, & Shannon K (2015). Sex work and motherhood: Social and structural barriers to health and social services for pregnant and parenting street and off-street sex workers. Health Care for Women International, 36(9), 1039–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Edinburgh L, Pape-Blabolil J, Harpin SB, & Saewyz E (2015). Assessing exploitation experiences of girls and boys seen at a Child Advocacy Center. Child Abuse and Neglect, 46, 47–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Garcelon N, Neuraz A, Benoit V, Salomon R, & Burgun A (2017). Improving a full-text search engine: The importance of negation detection and family history context to identify cases in biomedical data warehouse. Journal of American Medical Informatics Association, 24(3), 607–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Goldberg AP, Moore JL, Houck C, Kaplan DM, & Barron CE (2017). Domestic minor sex trafficking patients: A retrospective analysis of medical presentation. Journal of Pediatric and Adolescent Gynecology, 30(1), 109–115. [DOI] [PubMed] [Google Scholar]
  17. Gonzalez-Hernandez G, Sarker A, O’Connor K, & Savova G (2017). Capturing the patient’s perspective: A review of advances in natural language processing of health-related text. Yearbook of Medical Informatics, 26(1), 214–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Greenbaum J, Crawford-Jakubiak JE, Christian CW, Flaherty EG, Leventhal JM, Lukefahr JL, Sege RD, MacMillan H, Nolan CM, Valley LA, & Hurley TP (2015). Child sex trafficking and commercial sex exploitation: Health care needs of victims. Pediatrics, 135(3), 566–574. [DOI] [PubMed] [Google Scholar]
  19. Haleem A, Javaid M, & Khan IH (2019). Current status and applications of Artificial Intelligence (AI) in medical field: An overview. Current Medicine Research and Practice, 9(6), 231–237. [Google Scholar]
  20. Hanauer DA, Mei Q, Law J, Khanna R, & Zheng K, (2015). Supporting information retrieval from electronic health records: A report of University of Michigan’s nine-year experience in developing and using the electronic medical record search engine (EMERGSE). Journal of Biomedical Informatics, 55, 290–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hershberger AR, Sanders J, Chick C, Jessup M, Hanlin H, & Cyders MA (2018). Predicting running away in girls who are victims of sexual exploitation. Child Abuse and Neglect, 79, 269–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Jaeckl S & Laughon K (2020). Sex trafficking of minors in the United States: A perspective for nurses. The Online Journal of Issues in Nursing, 25(3). [Google Scholar]
  23. Kehl KL, Elmarakeby H, Nishino M, Van Allen EM, Lepisto EM, Hassett MJ, Johnson BE, & Schrag D (2019). Assessment of deep Natural Language Processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncology, 5(10), 1421–1429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kenny MC, Helpingstine C, Long H, Perez L, & Harrington MC (2019). Increasing child serving professionals’ awareness and understanding of the commercial sexual exploitation of children. Journal of Child Sexual Abuse, 28(4), 417–434. [DOI] [PubMed] [Google Scholar]
  25. Koleck TA, Tatonetti NP, Bakken S, Mitha S, Henderson MM, George M, Miaskowski C, Smaldone A, & Topaz M (2021). Identifying symptom information in clinical notes using natural language processing. Nursing Research, 70(3), 173–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lee J, Yang S, Holland-Hall C, Sezgin E, Gill M, Linwood S, Huang Y, & Hoffman J (2022). Prevalence of sensitive terms in clinical notes using natural language processing techniques: Observational study. JMIR Medical Informatics, 10(6), e38482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Ling AY, Kurian AW, Caswell-Jin JL, Sledge GW, Shah NH, & Tamang SR (2019). Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open, 2(4), 528–537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. McCoy TH, Castro VM, Roberson AM, Snapper LA, & Perlis RH (2016). Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry, 73(10), 1064–1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Moosavinasab S, Sezgin E, Sun H, Hoffman J, Huang Y, & Lin S (2021). DeepSuggest: Using neural networks to suggest related keywords for a comprehensive search of clinical notes. ACI Open, 5(1), e1–e12. [Google Scholar]
  30. Murnan A, Bates S, & Holowacz E (2020). Understanding the risk and protective factors among children of mothers engaged in street-level prostitution. Children and Youth Services Review, 112, 104899. [Google Scholar]
  31. Murnan A, & Holowacz E (2020). A qualitative exploration of mother-child relationships in mothers with histories of substance use and street-level prostitution. Journal of Child and Family Studies, 29, 3225–3238. [Google Scholar]
  32. Murnan A, Wu J, & Slesnick N (2018). The impact of parenting on child mental health among children of prostituting mothers. Children and Youth Services Review, 89, 212–217. [Google Scholar]
  33. Naramore R, Bright MA, Epps N, & Hardt NS (2017). Youth arrested for trading sex have the highest rates of childhood adversity: A statewide study of juvenile offenders. Sexual Abuse, 29(4), 396–410. [DOI] [PubMed] [Google Scholar]
  34. National Child Traumatic Stress Network. (n.d) The 12 core concepts: Concepts for understanding traumatic stress responses in children and families. Retrieved on Sep 20, 2022 from http://www.nctsn.org/resources/audiences/parents-caregivers/what-is-cts/12-core-concepts [Google Scholar]
  35. National Research Council. (2014). Confronting commercial sexual exploitation and sex trafficking of minors in the United States. A guide for the health care sector. In Confronting Commercial Sexual Exploitation and Sex Trafficking of Minors in the United States. National Academic Press. [PubMed] [Google Scholar]
  36. Ong CJ, Orfanoudaki A, Zhang R, Caprasse FPM, Hutch M, Ma L, Fard D, Balogun O, Miller MI, Minnig M, Saglam H, Prescott B, Greer DM, Smirnakis S, & Bertsimas D (2020). Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS ONE, 15(6), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rafferty Y (2016). Challenges to rapid identification of children who have been trafficked for commercial sexual exploitation. Child Abuse and Neglect, 52, 158–168. [DOI] [PubMed] [Google Scholar]
  38. Velupillai S, Mowery D, South BR, Kvist M, & Dalianis H (2015). Recent advances in clinical natural language processing in support of semantic analysis. Yearbook of Medical Informatics, 10(1), 183–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Viani N, Botelle R, Kerwin J, Yin L, Patel R, Stewart R, & Velupillai S (2021). A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Scientific Report 11(1), 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zheng Z, Deng Y, Li X, Naumann T, & Luo Y (2018). Natural language processing for HER-based computational phenotyping. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(1), 139–153. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES