ABSTRACT
Rape myths, including the belief that victims frequently lie, contribute to barriers in justice, such as the disproportionate use of the “unfounded” classification—where, following an investigation, it is determined no crime occurred. This study analyzes rape report narratives tied to previously untested sexual assault kits (N = 5638) from a large, urban Midwestern (US) jurisdiction, focusing on differences in narratives deemed unfounded or where officers expressed victim lying/doubt. Using natural language processing's sentiment analysis, we assessed tone (via polarity and subjectivity) and word counts. Results showed that unfounded narratives were shorter and more negatively written than others but did not differ in subjectivity. Victim lied/doubted narratives showed no significant difference in polarity, subjectivity, or length compared to others. These findings highlight how bias can manifest in written narratives, potentially influencing case outcomes. Addressing these biases through improved report writing and limiting the misuse of the unfounded classification is essential to support victims' pathways to justice.
Keywords: false report, machine learning, natural language processing, rape myth, sentiment analysis, sexual assault, unfounded
1. Introduction
The FBI's Uniform Crime Report (UCR) Handbook guides crime classification and reporting for law enforcement, stating that a crime is to be classified as unfounded after an investigation determines that a crime did not occur (Uniform Crime Report 2004, 77–78) and further disaggregates unfounded into false or baseless complaints. However, the handbook lacks clarity on what constitutes such complaints. This is concerning, given the more frequent classification of rape allegations as unfounded (De Zutter et al. 2017) and the extensive literature on rape myth acceptance among police officers (Campbell and Fehler‐Cabral 2022; Page 2010; Shaw et al. 2017), sexual assault kit (SAK) testing lab personnel (Campbell and Fehler‐Cabral 2022), jurors (Lilley et al. 2023), and lawyers (Temkin et al. 2018). Lisak et al. (2010) defined baseless complaints as those where the victim's account does not meet legal criteria. Kelly et al. (2005) defined false complaints as involving clear admissions (of a false report) by the victim or strong evidential grounds—the latter can include physical evidence and/or statements from credible sources that contradict the victim's statement (Lisak et al. 2010).
Rape is the most underreported violent crime (Thompson and Tapp 2023), and even when reported, it is often undercounted and unfounded compared to other crimes (De Zutter et al. 2017; Yung 2014). Yung (2014) found widespread undercounting of rapes in the U.S., with 22% of large law enforcement agencies showing statistical irregularities from 1995–2012 and a 61% increase in undercounting over the 18‐year observation period. Using UCR data from 2006 through 2010, De Zutter et al. (2017) reported that about 5% of rape complaints were unfounded, five times higher than most other offenses, 1.7 times higher than murder, and on par with robbery. However, the unfounded designation is often misunderstood or misused. Lisak et al. (2010) note false reports are often cited at 8% (Federal Bureau of Investigation 1997), but this is the percentage that were unfounded (which includes baseless), thereby inflating false rape report figures and fueling rape mythology. The reliance on official statistics complicates understanding the true prevalence, especially in distinguishing between false and baseless complaints. Other research suggests that 2%–10% of rapes are false in the U.S. (Lisak et al. 2010; Spohn et al. 2014), the United Kingdom (Kelly et al. 2005; Kelly 2010) and Australia (Heenan and Murray 2006), but studies are limited on the criteria for unfounded reports, including false ones (Spohn et al. 2014).
Police report narratives play a crucial role as evidentiary elements in the criminal justice system for various stakeholders, including detectives, prosecutors, defense attorneys, judges, and jurors (Yu and Monas 2020). They are viewed as the official documentation of the crime (or lack thereof), expected to be objective and formulaically written, with little stylistic differentiation by type of crime (Reynolds 2012). They are also the means by which officers present evidence that could potentially support the unfounded designations, as well as the medium by which responding officers could explicitly doubt or “signal” (insinuate) the victim's lack of credibility/victim blaming to investigating officers (Lovell et al. 2023a, 2023b)—before the investigation began. Therefore, analyzing police report narratives offers valuable context for understanding unfounded complaints. This is particularly relevant to rape, with its disproportionally higher unfounding rate (De Zutter et al. 2017) and the widespread (but routinely debunked [Kelly 2010; Lisak et al. 2010; Lonsway et al. 2009; Spohn et al. 2014]) myth that victims frequently fabricate complaints of rape (Page 2008; Shaw et al. 2017). Police report narratives also offer the ability to distinguish between false and baseless reports within unfounded cases, as noted by Spohn et al. (2014).
However, researchers often face challenges in accessing narratives (Güss et al. 2020), and when available, the process of systematically coding them by hand is labor‐intensive. The most commonly employed methodologies in studies that include narrative analysis of unfounded rapes include qualitatively coding/classifying manually based on what is detailed in the narratives (Heenan and Murray 2006; Kelly et al. 2005) and/or quantitatively coding case, suspect, and victim characteristics and outcomes manually (Lisak et al. 2010; Spohn et al. 2014). These challenges contribute to small (er) samples in studies focused on unfounded (Spohn et al. 2014, n = 81) or, more specifically, false rape narratives (Lisak et al. 2010, n = 136). Hence, the existing literature illustrates a methodological double bind in the manual collection (“human learning”) of traditional quantitative and qualitative data: larger samples enable robust analyses but often miss the nuanced insights essential to understanding the unfounded designation. These methodological limitations also result in studies with short observation periods (e.g., several years), which prevent the assessment of the evolution of unfounded policies and practices within departments over time, as these can fluctuate considerably from year to year (International Association of Chiefs of Police 2005).
This study overcomes these methodological limitations by utilizing a large sample of rape reports linked with previously untested sexual assault kits (N = 5638) that span nearly a quarter century in one U.S. urban Midwestern jurisdiction. We employ an innovative methodological approach to the criminal justice/criminology field, a type of machine learning—natural language processing's sentiment analysis—to a large corpus of text (linguistic dataset) that is expected to be formulaically written and free of opinions and subjectivity. We examine tone and word counts in the narratives of rape reports, offering a potential solution to the qualitative‐quantitative double bind. More specifically, the study examines the nature of unfounded designations by comparing the text of unfounded reports to those of other reports, as well as those where the written report stated that victims lied or were doubted by officers (“false reports”), to other reports. We pose three research questions: (1) To what extent have the narratives in the unfounded and victim lied/doubted rape reports changed over time? (2) To what extent do the tone of narratives in the unfounded rape reports compare to others? And (3) To what extent do the tone of narratives in the victim lied/doubted reports compare to others? Unfounded narratives are those that include this term in the closing reason, investigative report, or title/footer. This designation was provided by the officers, not the research team. Victim lied/doubted narratives include language in the initial closing reason or initial or investigative report that the victim lied, recanted, or was doubted by officers. Since “false report” is not a specific label applied to these reports by officers, this designation was provided by the research team (the methods section details this further). There is overlap, but not all unfounded reports were victim lied/doubted narratives, and vice versa. We apply a focal concerns theory framework that asserts that officers' decision‐making in rape cases can be influenced by myths, stereotypes, and perceived victim credibility (Steffensmeier et al. 1998; Lapsey et al. 2022). The unfounded and victim lied/doubted narratives are those where victims were explicitly disbelieved or there was a lack of sufficient evidence to substantiate a crime. By examining how these reports compare to others, we gain important insights into how they are written, how they potentially differ in content, and how victim disbelief may be introduced in the reports.
2. Background
2.1. Misclassification and Focal Concerns Theory
There are several notable examples of U.S. police departments that have deliberately misapplied the unfounded classification (Maryland Coalition Against Sexual Assault 2011; Women’s Law Project 2010). As a non‐crime, unfounded rapes are excluded from UCR crime statistics and therefore not counted. In the late 2000s, the Baltimore Police Department (PD) got rape “off the books” by misapplying the unfounding designation, resulting in rates five times the national average and about 40% of calls not forwarded for investigation (Fenton 2010). A subsequent audit found that about half of the unfounded complaints were misclassified. Philadelphia PD had a practice of “downgrading” sexual assault reports to non‐criminal codes, partly due to FBI scrutiny over high unfounding rates (43%–52%) (Women’s Law Project 2013). These examples show how unfounded has been erroneously applied to rape, often without evidence or investigation, jeopardizing victims' justice and harming communities.
These examples also highlight the importance of correctly applying unfounding guidelines. Federal guidance (Federal Bureau of Investigation 2013, 111) and the International Association of Chiefs of Police (International Association of Chiefs of Police 2005) specify that this classification should not be determined during initial interviews. The International Association of Chiefs of Police (2005) emphasizes considering all reports valid and not rushing investigations. Insufficient evidence, delayed reporting, uncooperative victims, and statement inconsistencies do not constitute false reporting (Federal Bureau of Investigation 2013; International Association of Chiefs of Police 2005; Lisak et al. 2010). Unfounded cases have been found to disproportionally associate with: (a) diminished evidentiary strength, including lack of injury or weapon involvement (O’Neal 2019; Venema et al. 2021), inconsistent statements (Kelly et al. 2005; Kelly 2010; O’Neal 2019), and recanted allegations (Spohn et al. 2014; Kelly et al. 2005); (b) diminished victim credibility, such as mental health issues (Kelly 2010; O’Neal 2019) and perceived motivations to lie (O’Neal 2019). Venema et al. (2021) found increased unfounding odds when drinking/intoxication was not mentioned, suggesting failing to note sobriety influenced outcomes; and (c) certain victim characteristics, such as being male (Venema et al. 2021) or female adolescent (O’Neal and Hayes 2020).
Focal concerns theory has frequently been applied to explain officers' decision‐making in rape cases based on: suspects' blameworthiness (e.g., victim injury); need for the protection of the community (e.g., weapon use, criminal history); practical constraints (e.g., physical evidence and agency resources available within the criminal justice system); and perceptual shorthand (e.g., victim‐offender relationship, victim credibility)—the latter has been found to be influenced by myths, stereotypes, and victim demographics, especially when the above information is limited (Campbell and Fehler‐Cabral 2018; Lapsey et al. 2022; O’Neal and Spohn 2017; Spohn et al. 2014). This theory suggests that officer discretion and reliance on perceptual shorthand can result in the selective unfounding of rape cases despite contrary evidence or without sufficient evidence collection (Spohn et al. 2014). Examining the official crime report narratives provides a way to assess this by comparing these reports to others.
2.2. Natural Language Processing: Methodological Advancement
Given prior research methodological limitations on unfounded rape—smaller samples, qualitative analyses, and brief observation periods (Heenan and Murray 2006; Lisak et al. 2010; Spohn et al. 2014)—machine learning's natural language processing (NLP) provides a robust methodological alternative. NLP enables advanced analysis of vast criminal justice textual data at previously unavailable scales, addressing the qualitative‐quantitative bind (Mourtgos and Adams 2019). Using the textual dataset from this study, we previously applied NLP sentiment analysis to explore whether polarity (positive or negative opinion) and subjectivity (personal opinion vs. factual information) in rape report narratives predicted case progression to prosecution (“successful” cases) for the entire sample of reports (Lovell et al. 2023a, 2023b).
The current study uses the same NLP technique but focuses specifically on unfounded and/or victim lied/doubted reports from this same sample of reports. Previously, we found that the polarity for all reports was near zero, with a slight negative skew, but it accurately predicted case outcomes. Cases recommended for prosecution exhibited more positive polarity and more positive subjectivity, whereas non‐recommended cases showed neutral polarity and subjectivity. Since most incident reports are written based on the complainant's (victim's) accounting of events, reports with more positive subjectivity tended to capture or convey the victim's subjectivity—what the victim experienced (namely, what happened to the victim, details of the extent of the trauma of rape) from the victim's perspective as compared to neutral or objective “statements of fact” (“victim is a known prostitute and crack cocaine user”) or an officer's personal subjectivity (reports are often not written in first person, e.g., “victim's statement is inconsistent”) (See Figure 1, Examples 1 and 3, for an illustration of reports with high and low subjectivity, respectively.). Additionally, polarity, subjectivity, and word counts remained relatively stable over 24 years, with small increases in subjectivity and decreases in polarity starting around 2002–2003; however, these differences were nonsignificant. Average incident reports increased by only 13 words from the 1990–2000s, then remained constant through the 2010s (Lovell et al. 2023a). Therefore, in the current study's first aim, we expect the polarity, subjectivity, and word counts for unfounded and victim lied/doubted reports to remain mostly unchanged over the observational period. For the remaining aims, based on extant research, we expect unfounded and victim lied/doubted reports to be more negative and less subjective in tone than others, as they are “unsuccessful.” This represents the first known NLP application to unfounded rape reports.
FIGURE 1.
Example narratives. Narratives have been edited to prevent identifiability. Reworded or edited text is in all caps.



2.3. Police Report Narratives
After a rape is reported, a responding officer typically answers the call first and serves as the victim's initial criminal justice contact. Their report represents the first step in the reporting process. Besides addressing the immediate needs of the victim, the responding officer conducts an initial investigation, gathers relevant facts and evidence, and documents them in the incident report narrative for the investigator's follow‐up. Therefore, an investigator's first interaction with a case is often through the initial report content, rather than direct contact with the victim. Research indicates police dedicate more effort to investigating cases where they believe victims, consider cases worthy of investigation (Campbell and Fehler‐Cabral 2018), and/or expect prosecutorial acceptance (“downstream orientation”) (Morabito et al. 2017; Pattavina et al. 2016). Officers are advised to strike a balance between conciseness, thoroughness, and objectivity when providing detailed, step‐by‐step accounts of events and information about individuals and locations involved (Reynolds 2012). Less is known about the implementation of these recommendations (Yu and Monas 2020) and the consequences of poorly‐ or well‐crafted reports. Longer police reports are often viewed as more credible, indicating honest victim statements (Quijano‐Sánchez et al. 2018) and suggesting greater officer effort (Yu and Monas 2020). Longer reports are associated with more positive criminal justice outcomes (Lovell et al. 2023a), potentially reflecting cases officers considered “worthy” of investigative focus and/or officers having more time to develop comprehensive narratives (Campbell and Fehler‐Cabral 2018). However, rape reports are often poorly written (Archambault et al. 2020) and frequently contain victim‐blaming language (Lovell et al. 2023b; O’Neal and Hayes 2020). Research lacks examination of whether and how words in rape incident reports vary for unfounded cases and those deemed false reports.
In another study using this dataset, we applied NLP's textual classification to explore three‐word combinations (trigrams) in rape narratives that predicted various outcomes, including unfounded designation or victim disbelief/lying mentions (Lovell et al. 2023b). The most predictive trigrams in unfounded narratives involved officers frequently mentioning the lack of evidence (variations of “on the basis of” and “due to insufficient evidence”), followed by procedural phrases tied to unfounded designation. Non‐unfounded reports showed predictive phrases associated with procedural language for other closing reasons, since approximately three‐fourths of cases did not proceed to prosecution (Lovell et al. 2023b). Predictive trigrams for victim‐lied/doubted narratives were notably non‐specific, including mentions of friends or persons, “lied” and “not raped,” and dates or zone car numbers, suggesting brief narratives that lack detail with few investigative activities (contrary to FBI guidelines). Non‐victim‐lying/doubted reports showed the opposite—varied investigative and prosecutorial activities. Victim‐related language was not negative (rather than “the victim did not, the reports stated, “the victim went” and “victim stated”). Results suggest that victim lied/doubted narratives differed from unfounded reports generally. These findings provide context for interpreting the results of the current study. If negative sentiments are present and predictive (current study), the trigrams (Lovell et al. 2023b) offer insight into words producing negative opinion and subjective scores in unfounded and victim lied/doubted narrative.
2.4. Current Study
Capitalizing on a large sample of rape narratives (N = 5638) and building on prior research using focal concerns framework and innovative NLP sentiment analysis methodology, we assess how unfounded and victim‐lied/doubted narratives differ from other narratives with the following hypotheses:
H 1
Unfounded and victim‐lied/doubted rape narratives have not significantly changed in tone (polarity and subjectivity) and word counts over the observational period.
H 2
Tone in unfounded rape narratives is statistically more negatively opinionated, less subjective, and the narratives have fewer words than other reports.
H 3
Tone in narratives explicitly mentioning victims lied or were doubted is statistically more negatively opinionated, less subjective, and the narratives have fewer words than other reports.
Focal concerns theory has been found to have strong empirical support in explaining officers' decision‐making in rape cases (Lapsey et al. 2022). In this study, we posit that officers' negative perceptual shorthand about victims as theorized by focal concerns theory could potentially be expressed in unfounded and victim lies/doubts narratives, and therefore, can be assessed. These narratives are those where responding officers might be explicitly or implicitly (“signaling”) doubt about the victim's account or lack of credibility/victim; therefore, these reports could logically differ from those of other cases. NLP offers a distinctive approach to exploring this topic, given the vast amount of textual data available. This study's contributions are both methodological and applied, enhancing the understanding of unfounded designations, the importance of officers' word choice, and best practices in narrative writing. Results increase generalizability through methods, bridging the qualitative‐quantitative divide using a 24‐year sample. Results also highlight the prevalence of unfounded reports and suggest strategies for improving officers' report writing by identifying sources of bias. This special issue on rape mythology provides an opportunity to explore criminal justice system biases and highlight how rape myths impede justice for victims. This study examines how unfounded classifications influence investigations and outcomes, and how rape myths shape decisions affecting victims' pursuit of justice.
3. Data and Methods
3.1. Sample
The data are derived from 5638 (out of 6071) police reports of rape from the Cleveland Division of Police (CDP), all with associated sexual assault kits (rape kits), that were was recently forensically tested for DNA as part of an initiative to address untested kits. Since few SAKs were regularly submitted by this PD for forensic testing before the late 2000s (Luminais et al. 2017), our sample is derived from untested SAKs representing almost all SAKs collected during the period. 433 narratives (out of 6071) were excluded because they were missing or unreadable or contained little or no text. The analyzed corpus primarily consisted of incident reports (in this jurisdiction, initial reports made before an investigation has been conducted) written by the responding officer(s). Given the police department's large size and that the narratives were written by responding officers, potentially thousands (if not tens of thousands) of officers contributed to the corpus. Since derived from incident reports from SAKs, almost all were classified by law enforcement as § Rape, defined according to state statute (Ohio Revised Code 2025)—unlawful penetration by force, threat of force, or impairment. This is consistent with the updated federal definition of rape (U.S. Department of Justice, 2013). A small handful of reports were other felony‐level sex crimes, namely § Sexual Battery (coercion or impairment rape) and § Gross Sexual Imposition (groping by force, potentially attempted rape), but were not excluded because all had associated SAKs and were felony‐level sex crimes.
3.2. Methods and Measures
Reports typically included a: (a) “front page”—fixed fields, such dates, victim's and suspect's names, addresses, weapons, etc.; (b) narrative of the crime taken from the victim when reported and summarized by a responding officer; and (c) (as applicable) narrative of investigative activities as reported by detectives. The most detailed and syntactically varied account of events came from the incident narratives. Investigative summaries were often brief and written in a prescriptive style. Thus, we primarily used narratives from the incident reports. To automate the conversion from PDF to text, we used pdfMiner, an Optical Character Recognition program (ocrmypdf.readthedocs.io). Of the 5638 reports, 320 were manually converted to text due to poor or atypical PDFs. After conversion, we conducted a thorough quality control (QC) process, as described in Lovell et al. (2022) for more information.
Unfounded refers to cases that included this designation, meaning the term was mentioned in the narrative of the investigative follow‐up or in the title or footer of the report. Victim lied/doubted refers to cases where there was investigative follow‐up on the case or the closing language mentioned the victim was lying (including recantations)/doubted by officers. While conducting QC, the team manually coded the case's closing reason after reading the report in its entirety. While doing so, we also manually coded for whether the officer's narrative cast doubt on the victim's accounting of events (victim lied/doubted), although this was not the closing reason, as officers do not have a specific designation for this. This is also why not all the victim lied/doubted were unfounded. Cases could mention a victim's inconsistent statements, for example, but not be unfounded. We created this specific code because we expected that if any reports were to contain victim‐blaming language, it would be these. We wanted to flag them for comparison to the other reports. We combined victim lied (which includes recantations) with victim doubted because it is often unclear from the terse, vague reports whether the victim recanted, if the victim said they lied or officers thought they lied about certain aspects of the rape but did not recant, or whether officers doubted some aspect of the victim's account but did not specifically state the victim lied. See Figure 1 (Examples 8 and 9) for several illustrations of this. These are the closest to “false reports” as is possible to determine in the full dataset. The team also hand‐entered data from the front sheets—dates, victim's name(s), dates of birth for the victim and suspect(s), assault locations, victim's and suspect(s)’ addresses, and victim's and suspect(s)’ race/ethnicity and gender.
Victim demographics and other case characteristics were included based on previously cited literature on the key perceptual shorthand factors within focal concerns theory that predict rape unfounding (Lapsey et al. 2022). Victim race/ethnicity and sex are based on what was indicated on the front sheet. On certain forms, Hispanic was listed as a race, while on others, it was listed as an ethnicity. Thus, Victim Hispanic refers to those who were identified as Hispanic (race or ethnicity); Victim Black refers to those identified as Black, not Hispanic; Victim White refers to those identified as White, not Hispanic; and Victim race other refers to those identified as a race other than White, Black, or Hispanic. Victim female refers to victims identified as female (male was the reference group). Victim's/suspect's age at the time of the assault was based on the date of birth (if provided) and the date of the assault. Victim's age was grouped as Victim less than 13 years of age, Victim 13‐17 years of age, or Victim 18 years of age or older. For victim characteristics, analyses pertain to the first victim listed in the report (> 98%). Manual coding revealed that when there was more than one victim, the demographics for the second victim were more prone to officer data entry errors. Suspect fully named refers to at least one suspect being fully named (first and last name) on the front sheet. Suspect/victim criminal history mentioned in the report refers to mention of suspect's or victim's prior criminal history in the narrative (e.g., prior arrests and convictions), conceptualized as suspect's blameworthiness within focal concerns. Year of the report was collected from the police report number and front sheet.
Before the text could be analyzed, it had to be preprocessed via tokenization—separating the text into pieces a “machine” can understand, treating white spaces and punctuation as explicit word boundaries. After chunking into word pieces, we then represented them using the bag‐of‐words approach, which is equivalent to creating dummy variables for each word in the corpus. This process resulted in a corpus of 3,931,481 words, 281,600 sentences, ∼9157 word processing pages, and 3744 unique words (a low lexical diversity of 0.008).
Sentiment analysis is a valuable tool for mining text data, often used to gauge customer experiences, feedback, and online content, including website reviews, social media posts, and forum discussions. This technique is frequently applied to text that contains opinions and subjective viewpoints (Ignatow and Mihalcea 2018). Sentiment analysis categorizes text according to the degree of opinion and objectivity in the selected text (Ignatow and Mihalcea 2018). Given that police reports are formulaically written and intended to be an objective accounting of events (Reynolds 2012), we expect minimal sentiment in the corpus. However, this technique has been successfully applied in assessing tone in police reports (Lovell et al. 2023a). The narratives' sentiment was processed using a prebuilt, open‐source library (textblob.readthedocs.io). Due to the absence of training data for a machine learning‐based approach and the lack of existing sentiment analysis tools specifically trained on police reports, particularly those involving rape, we cross‐validated the TextBlob results with a rule‐based sentiment scoring method. This approach utilized SentiWordNet 3.0, a lexical resource that assigns sentiment scores to individual words and incorporates rules for handling linguistic features such as negation and adverbial modifiers (Baccianella et al. 2010). We applied this lexicon‐based method to compute the overall sentiment score for each incident report and found the results to be consistent with those produced by TextBlob. Given TextBlob's established reliability and ease of use, we ultimately adopted its sentiment scores for our analysis. A polarity score quantifies the amount of positive, neutral, or negative words in the selected text. Scores range from −1.0 to +1.0, where 0.0 is neutral, positive values indicate positive text, and negative values indicate negative text. Subjectivity refers to personal feelings, views, or beliefs (Liu and Zhang 2012). A subjectivity score quantifies the amount of personal opinion versus factual information in the narrative. Scores range from 0.0 to 1.0, where 0.0 is very objective and 1.0 is very subjective—the higher the score, the more personal opinion in the narrative. In this analysis, sentiment scores for polarity and subjectivity are the sum of positive and negative scores for the entire report—higher scores indicate more positive opinions (polarity) and more personally opinionated (subjective) words in the narrative. Additionally, while not a sentiment measure, we also measure the length of the report via report word counts using the default tokenizer in the python nltk package. Word counts can indicate the amount of detail provided to responding officers by victims, and/or the level of effort expended in writing the report.
We present results for bivariate differences of means tests to determine whether polarity, subjectivity, and word counts varied for reports that were unfounded versus non‐unfounded. Based on the bivariate differences of means test, we then conducted a logistic regression, regressing the scores on the two outcomes of interest (unfounded vs. non‐unfounded, victim lied/doubted vs. non‐victim lied/doubted) to assess whether statistically significant relationships remained after including covariates. In the logistic regressions, we regressed on the mutually exclusive case outcomes, and the two sentiment measures were regressed in separate models after multicollinearity diagnostics and the goodness of fit measure, the Hosmer and Lemeshow test (Hosmer and Lemesbow 1980), indicated that separately regressing these measures produced better fitting models (results not shown). Word counts and other characteristics of the victims and suspects are covariates.
Polarity and subjectivity scores are abstract concepts, especially in rape narratives. We provide narrative examples with case characteristics, sentiment scores, word length, and outcomes to illustrate these concepts, answering: What constitutes typical positively/negatively subjective narratives and typical unfounded and victim lied/doubted narratives? For selection, we identified narratives with the top 20 highest subjectivity scores (N = 5638), which previous research found predictive of positive criminal justice outcomes (Lovell et al. 2023a)—“preferred” narratives. We selected one to illustrate positive subjectivity in preferred narratives. Next, we identified the bottom 20 lowest subjectivity scores, predictive of negative outcomes—“not preferred” narratives (Lovell et al. 2023a). We selected one to show negative subjectivity manifestation. We then chose one median‐scored narrative for sentiment and subjectivity from the full dataset to illustrate “typical” scoring. Finally, to contextualize unfounded and victim lied/doubted narrative tone, we ranked all unfounded and victim lied/doubted narratives by polarity score (subjectivity was nonsignificant) and identified three examples. Presented narratives were modified to redact identifying information, ensuring victims would not recognize their own cases.
4. Results
4.1. Descriptives
Table 1 presents the descriptive statistics for the sample. Unfounded narratives comprise 6.8% of the sample (n = 386). Victim lied/doubted narratives comprise 2.8% of the sample (n = 158). A little over a quarter of the unfounded mentioned that victims lied/doubted (27.2%; 105 out of 386). This implies that 281 of the 386 reports (72.8%) did not specifically mention the victim lied/doubted. Additionally, 105 of the 158 victim lied/doubted (66.5%) narratives were unfounded. Since these categories are not mutually exclusive, we do not directly compare the narratives of the unfounded with those of the victim lied/doubted. Table 2 also indicates that the mean polarity score is near zero, indicating the neutral and formulaic nature of the police reports.
TABLE 1.
Descriptives for total sample (N = 5638).
| Characteristics of reports, victims and suspects | All reports (N = 5236) | Unfounded reports (N = 386) | Victim lied/doubted reports (N = 158) | |||
|---|---|---|---|---|---|---|
| n | % | n | % | n | % | |
| Decade of report, 1990s | 2280 | 41 | 76 | 19.8 | 60 | 38.5 |
| Decade of report, 2000s | 2931 | 52.6 | 272 | 70.8 | 88 | 56.4 |
| Decade of report, 2010s | 356 | 6.4 | 36 | 9.4 | 8 | 5.1 |
| Female, first victim | 5236 | 94.7 | 355 | 92.4 | 150 | 96.8 |
| Race/Ethnicity, 1st victim, Black/African American | 3547 | 64.6 | 236 | 61.6 | 90 | 58.4 |
| Race/Ethnicity 1st victim, White/Caucasian | 1790 | 32.6 | 138 | 36 | 62 | 40.3 |
| Race/Ethnicity 1st victim, hispanic (of any race) | 138 | 2.5 | 6 | 1.6 | 1 | 0.6 |
| Race/Ethnicity 1st victim, other race | 16 | 0.3 | 3 | 0.8 | 1 | 0.6 |
| Age, 1st victim, less than 13 years of age | 654 | 11.9 | 75 | 19.8 | 29 | 18.7 |
| Age, 1st victim, between 13 and 17 years of age | 1238 | 22.6 | 91 | 24.1 | 60 | 38.7 |
| Age, 1st victim, 18 years of age or older | 3585 | 65.5 | 212 | 56.1 | 66 | 42.6 |
| Male, first suspect | 4640 | 99.1 | 312 | 96.9 | 129 | 98.5 |
| Race/Ethnicity, 1st suspect, Black/African American | 3472 | 77.2 | 218 | 73.6 | 89 | 71.8 |
| Race/Ethnicity, 1st suspect, White/Caucasian | 854 | 19 | 66 | 22.3 | 29 | 23.4 |
| Race/Ethnicity, 1st suspect, hispanic (of any race) | 159 | 3.5 | 10 | 3.4 | 5 | 4.0 |
| Race/Ethnicity, 1st suspect, other | 13 | 0.3 | 2 | 0.7 | 1 | 0.8 |
| Male, second suspect | 532 | 93.2 | 40 | 100 | 23 | 100 |
| Race/Ethnicity, 2nd suspect, Black/African American | 394 | 75.5 | 31 | 81.6 | 19 | 86.4 |
| Race/Ethnicity, 2nd suspect, White/Caucasian | 110 | 21.1 | 5 | 13.2 | 2 | 9.1 |
| Race/Ethnicity, 2nd suspect, hispanic (of any race) | 15 | 2.9 | 1 | 2.6 | 1 | 4.5 |
| Race/Ethnicity, 2nd suspect, other | 3 | 0.6 | 1 | 2.6 | 0 | 0.0 |
| Any suspect fully named | 3124 | 55.3 | 187 | 48.7 | 82 | 52.9 |
| Victim criminal history mentioned in report | 27 | 0.5 | 3 | 0.8 | 3 | 1.9 |
| Suspect criminal history mentioned in report | 287 | 5.1 | 9 | 2.4 | 7 | 4.4 |
| Proceeded to prosecution | 1514 | 26.9 | 2 | 0.5 | — | — |
| Did not proceed to prosecution | 4119 | 73.1 | 384 | 99.5 | 158 | 100.0 |
| Victim not believed or lied (indicated in report) | 158 | 2.8 | 105 | 27.2 | — | — |
| Unfounded report | 386 | 6.8 | — | — | 105 | 66.5 |
| Suspect arrested | 1186 | 21.4 | 29 | 7.6 | 17 | 11.0 |
| Characteristic of report from continuous variables, victim and suspect | All reports (N = 5236) | Unfounded reports (N = 386) | Victim lied/doubted reports (N = 158) | |||
|---|---|---|---|---|---|---|
| n | M (SD) | n | M (SD) | n | M (SD) | |
| Min, max | Min, max | Min, max | ||||
| Age of victim 1 (years) | 5477 | 23.85 (12.31) | 378 | 23.13 (15.84) | 155 | 22.30 (14.28) |
| 0.0, 95.00 | 0.0, 95.00 | 2.0, 70.0 | ||||
| Age of suspect 1 (years) | 2478 | 29.37 (11.81) | 137 | 32.08 (13.05) | 60 | 31.60 (13.52) |
| 6.00, 89.00 | 9.00, 81.00 | 11.0, 69.0 | ||||
| Age of suspect 2 (years) | 169 | 24.66 (10.83) | 7 | 26.00 (10.38) | 9 | 20.56 (9.21) |
| 6.00, 58.00 | 15.00, 43.00 | 14.0, 43.0 | ||||
| Subjectivity score | 5638 | 0.2517 (0.0554) | 386 | 0.2525 (0.0559) | 158 | 0.251 (0.0594) |
| 0.1069, 58.75 | 0.1108, 0.4402 | 0.1194, 0.4310 | ||||
| Polarity score | 5638 | −0.0120 (0.0373) | 386 | −0.0200 (0.340) | 158 | −0.0160 (0.0410) |
| −0.1394, 0.2147 | −0.1171, 0.1020 | −0.1071, 0.1183 | ||||
| Word counts | 5638 | 415.09 (251.53) | 386 | 373.99 (220.80) | 158 | 399.92 (245.84) |
| 0.00, 2694.00 | 0,00, 1349 | 0.00, 1348 | ||||
Note: For brevity and because of small n's, we truncated after the first victim and second suspect.
TABLE 2.
Logistic regression for sentiment scores on unfounded police reports.
| Covariates | Model 1: Polarity (N = 374) | Model 2: Subjectivity (N = 374) | ||||||
|---|---|---|---|---|---|---|---|---|
| Odds ratios | Sig. | Std. error | B | Odds ratios | Sig. | Std. error | B | |
| Scores for incident reports | ||||||||
| Polarity score | 0.001 | *** | 1.558 | −6.916 | ||||
| Subjectivity score | 0.216 | 1.048 | −1.534 | |||||
| Control variables | ||||||||
| Word count | 0.999 | ** | 0.000 | −0.001 | 0.999 | ** | 0.000 | −0.001 |
| Victim Black a | 0.785 | * | 0.115 | −0.242 | 0.807 | 0.119 | −0.215 | |
| Victim hispanic | 0.512 | 0.430 | −0.670 | 0.449 | 0.469 | −0.802 | ||
| Victim race other | 2.931 | 0.660 | 1.075 | 2.888 | 0.668 | 1.061 | ||
| Victim female | 0.803 | 0.217 | −0.219 | 0.806 | 0.221 | −0.215 | ||
| Victim under 13 years of age b | 2.320 | *** | 0.154 | 0.841 | 2.199 | *** | 0.161 | 0.788 |
| Victim 13–17 years of age | 1.348 | * | 0.132 | 0.299 | 1.271 | 0.137 | 0.240 | |
| Suspect fully named in report | 0.776 | * | 0.115 | −0.254 | 0.615 | *** | 0.118 | −0.486 |
| Suspect criminal history mentioned in narrative | 0.480 | * | 0.370 | −0.734 | 0.484 | * | 0.371 | −0.726 |
| Victim criminal history mentioned in narrative | 2.356 | 0.635 | 0.857 | 3.233 | 0.653 | 1.173 | ||
| Constant | 0.129 | *** | 0.250 | −2.044 | 0.258 | *** | 0.380 | −1.353 |
| Chi‐square | < 0.001*** | < 0.001*** | ||||||
| Cox & Snell R‐square | 0.015 | 0.015 | ||||||
| Hosmer & Lemeshow test | 0.390 | 0.063 | ||||||
White is reference category.
Victim 18+ is reference group.
*p < 0.05. **p < 0.01. ***p < 0.001.
4.2. Word Counts and Sentiment Over Time
Regarding the first hypothesis, given that the unfounded reports cover over 2 decades, we explore variations in median sentiment in the reports (plus word counts) over time. Figure 2 provides median word counts by year. To be more conservative, we present medians instead of means, given the small number of reports in some years. The results indicated little change in the median word counts over the two decades and little difference between unfounded and victim lied/doubted. The median word count for the unfounded reports for all years was 328.5, and for the victim lied/doubted was 350. In the 1990s, the median word count for unfounded reports was 344 and victim lied/doubted was 351. In the 2000s (plus 2011–2012), the median word count was 305 for the unfounded reports and 324 for the victim/lied doubted. Additionally, Figure 2 indicates that the unfounded and victim lied/doubted followed comparable trends regarding word counts over time, except for an outlier unfounded report in 2011. Figure 3 presents the medians for polarity and subjectivity scores for the unfounded and victim lied/doubted narratives. The results indicate that the polarity scores remained constant over the years, with the largest variations in the polarity scores occurring in the earliest and latest years in our data, which are also the years in which we had the fewest number of reports (particularly for the unfounded). Figure 3 also illustrates that while still relatively stable, subjectivity scores were more variable than polarity scores. Additionally, for the unfounded and victim/doubted narratives, there was a slight increase in the median subjectivity score in 2004, which was maintained for several years, followed by a steady decline in later years. Thus, these Figures indicate that unfounded and victim lied/doubted did not differ over time in terms of word counts or tone.
FIGURE 2.

Word count medians for unfounded and victim lied/doubted narratives, 1993–2012.
FIGURE 3.

Polarity and subjectivity medians for unfounded and victim lied/doubted narratives, 1993–2012.
4.3. Unfounded and Victim Lied/Doubted Narratives
To test the second hypothesis, we used NLP's sentiment to assess the tone of unfounded rape incident narratives compared to others. Additionally, while not a traditional measure of sentiment, we also explore word counts for the unfounded narratives. The bivariate difference of means results show no difference in the subjectivity scores for unfounded versus non‐unfounded narratives (p = 0.982). Polarity scores are more negative for the unfounded (M = −0.02, SD = 0.03) versus non‐unfounded narratives (M = −0.01, SD = 0.07; p = < 0.001; d = 0.23), meaning more negatively opinionated reports. Lastly, the unfounded narratives have lower word counts (M = 373.3, SD = 220.5) than other narratives (M = 414.5, SD = 249.0, p = 0.002, d = 0.167).1 Since the bivariate difference of means tests indicated a relationship between the unfounded narratives and polarity, we conducted a logistic regression (Table 2). Just as with the bivariate results, the multivariate logistic regression indicates unfounded narratives have significantly lower polarity scores, even after controlling for the covariates. Narratives with fewer words, non‐Black victims, victims over age 18, without fully named suspects, and without mentions of suspects' criminal histories are more likely to be unfounded, with strong goodness of fit metrics (starting at the 0.05 level, higher values indicate better fit) (Hosmer and Lemesbow 1980).
To address the third hypothesis, we used NLP's sentiment to assess the tone of the victim lied/doubted rape incident narratives compared to all others. The bivariate difference of means tests indicate that subjectivity (p = 0.860) and polarity scores (p = 0.222) were nonsignificant in the reports where officers specifically mentioned that they did not believe or doubted victims versus all others. Additionally, victim lied/doubted versus others did not have different word counts (p = 0.372). Logistic regression for victim lied/doubted is not presented, as the bivariate analyses were nonsignificant (see Table S1).
4.4. Qualitative Examples
Figure 3 provides examples of reports, categorized by subjectivity scores, to contextualize the findings. Example 1's narrative shows high subjectivity, written from the victim's perspective using descriptive language about trauma, threats, and rape without negatively framed phrases about what the victim failed to do or provide. Example 2 exhibits median subjectivity, presenting a neutral, objective account as factual statements—the standard guidance for report writing. However, it introduces several vulnerabilities: the victim's diminished mental capacity, prior rape reports involving the suspect, and the mother's drug use and doubt about her daughter's allegation, without explaining their relevance. The report includes negatively worded phrases like “did nothing to stop” and “not telling the truth.” Example 3 shows low subjectivity through terse, objective descriptions. It introduces vulnerabilities such as the victim's alcohol impairment and contains several negative phrases (“no clothes,” multiple instances of “didn't remember”). Notably, the victim reports the suspect admitted to sexual contact and showed potential guilt (“if you don't remember, then I don't remember”) before fleeing when confronted. This rape was unfounded without information about investigative activities or reasoning. Since bivariate and logistic regression found more negative polarity in unfounded reports, median polarity examples were selected to demonstrate neutral‐toned unfounded and victim‐lied/doubted reports. Examples 4‐6 show median polarity unfounded cases; Examples 7‐9 show median polarity victim‐lied/doubted narratives.
5. Discussion and Conclusion
Existing research highlights the importance of assessing narratives of unfounded reports to understand this designation better; however, most existing studies using this methodology face limitations like small (er) sample sizes, condensed observation periods, and reliance on traditional quantitative/qualitative methods. Prior research using the NLP method employed here, sentiment analysis, on the same dataset found the methodology particularly effective in examining all rape report narratives (not just the unfounded and victim lied/doubted), and the language used in rape report narratives influenced whether a case proceeded to prosecution, with negative opinions linked to earlier attrition (Lovell et al. 2023a). Another study using textual classification on the same dataset identified words or phrases predicting attrition for all reports, noting that negative tones often stemmed from negating the victim's account (that align with commonly accepted rape myths) or citing insufficient evidence (e.g., “on the basis of” “due to insufficient evidence this case”) (Lovell et al. 2023b). When more negative statements about the victim's credibility were mentioned, or hinted at, the phrases were written as neutral, terse statements of fact (e.g., “victim is known prostitute,” “juvenile had sex in the past,” “clothes not dirty or disheveled”) and were often unqualified, meaning it was unclear from the narratives why these “statements of facts” were relevant (Lovell et al. 2023a). These studies suggest that biases and rape mythology may guide responding officers' report writing and the subsequent investigation, enough so to affect case unfounding and impede justice for victims.
Building on this knowledge and overcoming prior research's methodological limitations, this study specifically focused on the cases that theoretically could be written differently. We compare unfounded and victim lied/doubted narratives with other narratives using NLP's sentiment analysis on a large corpus of rape narratives (N = 5638). It assessed variations over 24 years within the same police department. Given the impact of rape myths and the criminal justice implications of unfounded rapes, these findings significantly contribute to the sexual assault field. To our knowledge, there are no studies that have examined unfounded rape narratives in a sample of this size, span this amount of time, or use NLP methodology. This study is innovative in its use of NLP and the size and duration of its sample.
Our first hypothesis—that unfounded and victim lied/doubted narratives did not significantly change in tone or word counts over the observational period—was mostly supported. We found that unfounded narratives and those mentioning officers who doubted the victim's account did not change in polarity or word count over 24 years. This supports our expectation that reports tend to be written with a neutral and formulaic tone, even over nearly a quarter of a century. Our second hypothesis—that unfounded narratives were more negatively toned—was partially supported. Compared to non‐unfounded narratives, unfounded narratives were slightly more negatively toned (although mean polarity scores were near zero and had a small effect size), did not differ in subjectivity, and were shorter. These findings remained after controlling for several relevant focal concern factors, such as suspect blameworthiness and perceptual shorthand. Thus, while the unfounded reports were shorter, they did not differ much in tone compared to all others. Thus, given the near‐zero mean and small effect size, the differences in the polarity scores between the unfounded and non‐unfounded, while statistically significant, should be interpreted cautiously. Our third hypothesis—that victim lied/doubted narratives were also more negatively toned—was not supported. These narratives—the ones where officers were the most explicit about doubting the victims' accounts and or stated that the victims lied—did not differ in polarity, subjectivity, or word counts.
Taken together, these findings are interesting because if any reports had been written differently, it would have been with these reports, as officers concluded that the victim was lying or that the incident was not a crime. These reports, where victim credibility is most relevant, should include longer narratives with detailed information that supports facts or observations that led to the unfounding. Yet, unfounded reports had fewer words (41.2 on average or approximately 10% fewer words) than non‐unfounded, indicating that responding officers (not investigating officers) could potentially be expending less effort in capturing or at least documenting what reports are supposed to contain (“a step‐by‐step account of events the occurred [and] details about the people and place involved”) (Reynolds 2012, 17). In essence, the officers are signaling a lack of victim credibility due to the lack of documentation, a practice supported by research (Campbell and Fehler‐Cabral 2018; Lovell et al. 2023b).
According to focal concerns theory, especially in rape cases, limited information about a suspect's blameworthiness and dangerousness influences officers' decision‐making, shaped by myths, stereotypes, and victim credibility (Lapsey et al. 2022). Under this theory, officers likely rely on myths and perceptual shorthand rather than detailed accounts, which are more common in reports that proceed to prosecution. Findings suggest that perceptual shorthand may emerge early, in the incident report, with fewer word counts in unfounded narratives. Even after controlling for several variables within the focal concerns theory, unfounded reports were slightly more negative, which prior research suggests might be due to mentions of what the victim did not know, do, or say (Lovell et al. 2023a). For example, several of the Figure 1 examples suggest that officers' unfoundings were due to a lack of the victim's engagement (what the victim did not “do”), a practice that is counter to the guidelines provided in the UCR and by IACP.
The nonsignificant findings in the victim lied/doubted, even after controlling for several focal concerns variables, are counter to our expectations. These findings, along with the minimal changes in the reports over 24 years, suggest that most reports lack important details and context, even at the early stages of the process, and even if they suspect the victim lied or doubted the victim. This finding might also be a function of the fact that so few cases proceeded to prosecution (in these data, it was ~25%). In other words, in our analyses the victim lied/doubted were being compared to reports where approximately three‐fourths failed to proceed before being reviewed by prosecutors. As explained by focal concerns and supported by prior research, victim engagement (“cooperation”) at the investigative phrase has been found to be the strongest predictor of investigative effort (e.g., “she has to prove she wants this … then I'll take a look” [R. Campbell and Fehler‐Cabral 2018, 89]), of arrest (Lapsey et al. 2022), and prosecution (Lovell et al. 2021). Put simply, the narrative's phrasing and the victim's credibility concerns become more prominent once the issues with engagement are resolved.
More research is needed to better understand focal concerns' role in officer decision‐making, as we were not able to control for several constants in these data. All cases were associated with SAKs and thus had existing evidence (focal concerns’ practical constraints), even if that evidence was only recently tested. Additionally, these data came from one police department with constantly high caseloads (Luminais et al. 2017), so we were unable to assess the resource constraints (practice constraints).
The study's results cannot be generalized to all jurisdictions, rapes, or recent incidents. Like all police report‐based research, our data reflects only reported crimes and recorded rapes, despite rape being the most underreported violent crime in the U.S. (Thompson and Tapp 2023). We lack data on unreported cases. SAK collection rates vary across jurisdictions, about 50% here (Lovell and Dissell 2021). SAKs are more common when there is penetration (Campbell et al. 2017), a stranger or near‐stranger perpetrator (Chen and Ullman 2010; Goodman‐Williams et al. 2024), a weapon (Fisher et al. 2003), or resulted in injuries (Patterson and Campbell 2012). Lastly, all data come from a single jurisdiction's police reports, which may differ by officer and contain inaccuracies (Hawk et al. 2021).
Regarding implications and future directions, prior research has shown that unfounded narratives often lacked evidence to substantiate the designation, while other cases were filled with procedural language (Lovell et al. 2023b). It found victim lied/doubted narratives lacked detail but mentioned (a) friends or other persons, (b) specific terms like “lied” and “not raped,” and (c) dates or car number. Questions remain about who is considered credible (if not the victim) and/or who claimed the victim lied? What are the circumstances and/or evidentiary grounds presented for the unfounded report categorized as either a false report or baseless? Due to the lack of detail and findings that unfounded narratives are shorter and slightly more negative, further research is needed to investigate the evidence behind this designation. Also, the current study found narratives related to non‐Black victims were more often unfounded. Previous data analysis showed no difference in case outcomes for Black and non‐Black victims but indicated Black victims were more likely to be raped by strangers while in transit (Lovell, Crawford Fletcher, et al. 2023). What is the source of the negative polarity for non‐Black victims? Is it related to the different types of rapes of Black victims? Additionally, these findings highlight the need for a specialized crime dictionary or lexicon, preferably one focused on rape. Although we have used an open‐source dictionary in our sentiment analysis, it is a general‐purpose lexicon that is not well‐suited for criminal justice data. As a result, we created a pilot open‐source lexicon tailored to these data, which is part of the grant's deliverables. 2 Future research should replicate these methods using reports from other departments, which can be added to the lexicon, and would ideally include all reported rapes, not only those linked to SAKs.
This study makes two key contributions. First, it expands our suite of applicable methods for inquiry, enabling analysis of large textual data sets through NLP, which captures qualitative nuance at a scale previously only achievable in quantitative research. Second, it offers practical guidance for law enforcement to improve report writing and training, potentially reducing unfoundings and bias/rape myth acceptance in the justice response. This can be achieved in several ways. Best practice should include writing detailed, lengthy reports. This finding underscored that officers may, in fact, signal information about a victim's credibility not by providing strong, opinionated statements but by stating little (Lovell et al. 2023a). Additionally, contrary to current practice (Lafata 2024; Miller and Whitehead 2018), best practice should involve officers writing reports that go beyond “just the facts.” Reports that use negative or neutral language also omit the victim's perspective, often merely stating that the event occurred. Best practice would be to write more subjectively, meaning writing in a way that accurately reflects the reality of such violent crimes. Prior research demonstrated that the more subjective the report, the more likely it was to proceed in the criminal justice system (Lovell et al. 2023a). This recommendation aligns with current guidance on reducing gender bias in responding to sexual assault and domestic violence (International Association of Chiefs of Police 2018; U.S. Department of Justice, 2015, 2022) but contradicts traditional law enforcement training on report writing (Lafata 2024). According to current recommendations, reports should capture the victim's perspective, provide a fair account of corroborating evidence, and include statutory elements of the crime (Archambault et al. 2020; International Association of Chiefs of Police 2018; Long et al. 2022). Finally, research‐informed large language models or chatbots, such as those developed for family courts in the UK (Hall 2024; herEthical.ai, n.d.), show promise for saving officers' time, enhancing documentation accuracy, reducing biases, and improving interactions with victims.
Author Contributions
Rachel E. Lovell: conceptualization, data curation, methodology, supervision, writing – original draft. Lacey Caporale: investigation, writing – original draft. Jiaxin Du: formal analysis, software, methodology, validation, writing – review and editing.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Table S1: Logistic regression for sentiment scores on victim lied/doubted narratives.
Acknowledgments
These are not data points but individuals who endured intense, intimate trauma—survivors. To the survivors: We hold your stories with the reverence and respect they deserve. Thank you for inadvertently sharing them with us. We aim to leverage your stories to inform and improve our response to rape.
Lovell, Rachel E. , Caporale Lacey, and Du Jiaxin. 2026. “Decoding Disbelief: Using Natural Language Processing's Sentiment Analysis to Assess 24 Years of Unfounded Rape Reports Narratives,” Behavioral Sciences & the Law: 44. no. 1), 47–62. 10.1002/bsl.70020.
Funding: This project was supported by Grant 2018‐VA‐CX‐0002, awarded by the National Institute of Justice (NIJ). NIJ is the research, development and evaluation agency of the United States Department of Justice, Office of Justice Programs (OJP). Points of view or opinions in this document are those of the authors and do not necessarily represent the official position or policies of the US Department of Justice.
Endnotes
Differences of mean scores for unfounded versus others and victim lied/doubted versus others were previously provided, along with numerous other covariates and case outcomes, in Lovell et al. (2023b). Presented here for illustrative purposes and for context as to why additional analyses were or were not conducted (e.g., logistic regressions).
Data Availability Statement
The data that support the findings of this study are archived with the U.S. National Archive of Criminal Justice Data.
References
- Archambault, J. , Lonsway K. A., and Keenan S.. 2020. “Effective Report Writing: Using the Language of Non‐Consensual Sex.” End Violence Against Women International. https://evawintl.org/wp‐content/uploads/Module‐1_Report‐Writing‐11‐9‐2020‐1.pdf. [Google Scholar]
- Baccianella, S. , Esuli A., and Sebastiani F.. 2010. “SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.” In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). http://www.lrec‐conf.org/proceedings/lrec2010/pdf/769_Paper.pdf. [Google Scholar]
- Campbell, R. , Feeney H., Fehler‐Cabral G., Shaw J., and Horsford S.. 2017. “The National Problem of Untested Sexual Assault Kits (SAKs): Scope, Causes, and Future Directions for Research, Policy, and Practice.” Trauma, Violence, & Abuse 18, no. 4: 363–376. 10.1177/1524838015622436. [DOI] [PubMed] [Google Scholar]
- Campbell, R. , and Fehler‐Cabral G.. 2018. “Why Police ‘Couldn’t or Wouldn’t’ Submit Sexual Assault Kits for Forensic DNA Testing: A Focal Concerns Theory Analysis of Untested Rape Kits.” Law & Society Review 52, no. 1: 73–105. 10.1111/lasr.12310. [DOI] [Google Scholar]
- Campbell, R. , and Fehler‐Cabral G.. 2022. “‘Just Bring Us the Real Ones’: The Role of Forensic Crime Laboratories in Guarding the Gateway to Justice for Sexual Assault Victims.” Journal of Interpersonal Violence 37, no. 7–8: NP3675–NP3702. 10.1177/0886260520951303. [DOI] [PubMed] [Google Scholar]
- Chen, Y. , and Ullman S. E.. 2010. “Women’s Reporting of Sexual and Physical Assaults to Police in the National Violence Against Women Survey.” Violence Against Women 16, no. 3: 262–279. 10.1177/1077801209360861. [DOI] [PubMed] [Google Scholar]
- De Zutter, A. W. E. A. , Horselenberg R., and Koppen P. J. V.. 2017. “The Prevalence of False Allegations of Rape in the United States From 2006–2010.” Journal of Forensic Psychology 2, no. 2. 10.4172/2475-319X.1000119. [DOI] [Google Scholar]
- Federal Bureau of Investigation . 1997. Uniform Crime Reports.
- Federal Bureau of Investigation . 2013. Criminal Justice Information Services (CIJS) Division Uniform Crime Reporting (UCR) Program, Summary Reporting System (SRS) User Manual, 111–116.
- Fenton, J. 2010. City Rape Statistics, Investigations Draw Concern. https://www.baltimoresun.com/news/bs‐md‐ci‐rapes‐20100519‐story.html.
- Fisher, B. S. , Daigle L. E., Cullen F. T., and Turner M. G.. 2003. “Reporting Sexual Victimization to the Police and Others: Results From a National‐Level Study of College Women.” Criminal Justice and Behavior 30, no. 1: 6–38. 10.1177/0093854802239161. [DOI] [Google Scholar]
- Goodman‐Williams, R. , Volz J., and Smith S.. 2024. “Do Concerns About Police Reporting Vary by Assault Characteristics? Understanding the Nonreporting Decisions of Sexual Assault Victims Who Utilize Alternative Reporting Options.” Journal of Forensic Nursing 20, no. 3: 151–159. 10.1097/JFN.0000000000000469. [DOI] [PubMed] [Google Scholar]
- Güss, C. D. , Tuason T., and Devine A.. 2020. “Problems With Police Reports as Data Sources: A Researchers’ Perspective.” Frontiers in Psychology 11: 582428. 10.3389/fpsyg.2020.582428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall, R. 2024. “Family Court Judges Use Victim‐Blaming Language in Domestic Abuse Cases, Finds AI Project.” Guardian. https://www.theguardian.com/law/2024/oct/08/family‐court‐judges‐victim‐blaming‐language‐domestic‐abuse‐cases‐ai‐project. [Google Scholar]
- Hawk, S. R. , Dabney D. A., and Teasdale B.. 2021. “Reconsidering Homicide Clearance Research: The Utility of Multifaceted Data Collection Approaches.” Homicide Studies 25, no. 3: 195–219. 10.1177/1088767920939617. [DOI] [Google Scholar]
- Heenan, M. , and Murray S.. 2006. Study of Reported Rapes in Victoria 2000–2003: Summary Research Report. Office of Women’s Policy, Department for Victorian Communities. [Google Scholar]
- herEthical.ai . n.d. Victim Blaming in Courts. https://www.herethical.ai/projects/project‐vb.
- Hosmer, D. W. , and Lemesbow S.. 1980. “Goodness of Fit Tests for the Multiple Logistic Regression Model.” Communications in Statistics—Theory and Methods 9, no. 10: 1043–1069. 10.1080/03610928008827941. [DOI] [Google Scholar]
- Ignatow, G. , and Mihalcea R. F.. 2018. An Introduction to Text Mining: Research Design, Data Collection, and Analysis. SAGE. [Google Scholar]
- International Association of Chiefs of Police . 2005. Sexual Assault Incident Reports: Investigative Strategies, 1–8.
- International Association of Chiefs of Police . 2018. Police Response to Violence Against Women. http://www.theiacp.org/Police‐Response‐to‐Violence‐Against‐Women.
- Kelly, L. 2010. “The (In)Credible Words of Women: False Allegations in European Rape Research.” Violence Against Women 16, no. 12: 1345–1355. 10.1177/1077801210387748. [DOI] [PubMed] [Google Scholar]
- Kelly, L. , Lovett J., and Regan L.. 2005. “A Gap or a Chasm?: Attrition in Reported Rape Cases.” PsycEXTRA Dataset. 10.1037/e669452007-001. [DOI] [Google Scholar]
- Lafata, C. 2024. “Report Writing for Criminal Justice Professionals.” Cognella. https://titles.cognella.com/report‐writing‐for‐criminal‐justice‐professionals‐9798823315050. [Google Scholar]
- Lapsey, D. S. , Campbell B. A., and Plumlee B. T.. 2022. “Focal Concerns and Police Decision Making in Sexual Assault Cases: A Systematic Review and meta‐analysis.” Trauma, Violence, & Abuse 23, no. 4: 1220–1234. 10.1177/1524838021991285. [DOI] [PubMed] [Google Scholar]
- Lilley, C. , Willmott D., and Mojtahedi D.. 2023. “Juror Characteristics on Trial: Investigating How Psychopathic Traits, Rape Attitudes, Victimization Experiences, and Juror Demographics Influence Decision‐Making in an Intimate Partner Rape Trial.” Frontiers in Psychiatry 13: 1086026. 10.3389/fpsyt.2022.1086026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lisak, D. , Gardinier L., Nicksa S. C., and Cote A. M.. 2010. “False Allegations of Sexual Assault: An Analysis of Ten Years of Reported Cases.” Violence Against Women 16, no. 12: 1318–1334. 10.1177/1077801210387747. [DOI] [PubMed] [Google Scholar]
- Liu, B. , and Zhang L.. 2012. “A Survey of Opinion Mining and Sentiment Analysis.” In Mining Text Data, edited by Aggarwal C. C. and Zhai C., 415–463. Springer. 10.1007/978-1-4614-3223-4_13. [DOI] [Google Scholar]
- Long, J. , Powers P., Fuhrman H., and Newman J.. 2022. “Prosecutorial Response to Sexual Violence.” In Sexual Assault Kits and Reforming the Response to Rape, edited by Lovell R. and Langhinrichsen‐Rohling J., 1st ed., 185–201. Routledge, Taylor & Francis Group. [Google Scholar]
- Lonsway, K. A. , Archambault J., and Lisak D.. 2009. “False Reports: Moving Beyond the Issue to Successfully Investigate and Prosecute Non‐Stranger Sexual Assault.” [On‐line Training]. http://www.evawintl.org/evaw_courseware.
- Lovell, R. E. , and Dissell R.. 2021. “Dissemination and Impact Amplified: How a Researcher–Reporter Collaboration Helped Improve the Criminal Justice Response to Victims With Untested Sexual Assault Kits.” Journal of Contemporary Criminal Justice 37, no. 2: 257–275. 10.1177/1043986221999880. [DOI] [Google Scholar]
- Lovell, R. E. , Overman L., Huang D., and Flannery D. J.. 2021. “The Bureaucratic Burden of Identifying Your Rapist and Remaining ‘Cooperative’: What the Sexual Assault Kit Initiative Tells Us About Sexual Assault Case Attrition and Outcomes.” American Journal of Criminal Justice 46, no. 3: 528–553. 10.1007/s12103-020-09573-x. [DOI] [Google Scholar]
- Lovell, R. E. , Crawford Fletcher A. M., Sabo D., Overman L., and Flannery D. J.. 2023. “What an Examination of Previously Untested Sexual Assault Kits Tells Us About the Patterns of Victimization and Case Outcomes for Black Women and Girls.” In Handbook on Inequalities in Sentencing and Corrections Among Marginalized Populations, edited by Ahlin E. M., Mitchell O., and Atkin‐Plunk C. A., 1st ed. Routledge. [Google Scholar]
- Lovell, R. E. , Klingenstein J., Du J., et al. 2023a. “Using Natural Language Processing to Assess Rape Reports: Sentiment Analysis Detection of Officers’ ‘Signaling’ About Victims’ Credibility.” Journal of Criminal Justice 88: 102106. 10.1016/j.jcrimjus.2023.102106. [DOI] [Google Scholar]
- Lovell, R. E. , Klingenstein J., Du J., et al. 2023b. “Using Natural Language Processing to Assess Rape Reports: ‘Signaling’ Words About Victims’ Credibility That Predict Investigative and Prosecutorial Outcomes.” Journal of Criminal Justice 88: 102107. 10.1016/j.jcrimjus.2023.102107. [DOI] [Google Scholar]
- Lovell, R. E. , Klingenstein J., Du J., et al. 2022. Using Sentiment Analysis and Topic Modeling in Assessing the Impact of Police ‘Signaling’ on Investigative and Prosecutorial Outcomes in Sexual Assault Reports. Final Research Report. National Institute of Justice: 2018‐VA‐CX‐0002. [Google Scholar]
- Luminais, M. , Lovell R., and Flannery D.. 2017. Perceptions of Why the Sexual Assault Kit Backlog Exists in Cuyahoga County, Ohio and Recommendations for Improving Practice. Case Western Reserve University. https://digital.case.edu/islandora/object/ksl:2006061457. [Google Scholar]
- Maryland Coalition Against Sexual Assault . 2011. Baltimore City Sexual Assault Response Team Annual Report. http://www.ncdsv.org/images/MCASA_BaltimoreCitySARTAnnualReport_October2011.pdf.
- Miller, L. S. , and Whitehead J. T.. 2018. Report Writing for Criminal Justice Professionals. 6th ed. Routledge. https://www.routledge.com/Report‐Writing‐for‐Criminal‐Justice‐Professionals/Miller‐Whitehead/p/book/9781138288935. [Google Scholar]
- Morabito, M. S. , Pattavina A., and Williams L. M.. 2017. “Active Representation and Police Response to Sexual Assault Complaints.” Journal of Crime and Justice 40, no. 1: 20–33. 10.1080/0735648X.2016.1216730. [DOI] [Google Scholar]
- Mourtgos, S. M. , and Adams I. T.. 2019. “The Rhetoric of de‐Policing: Evaluating Open‐Ended Survey Responses From Police Officers With Machine Learning‐Based Structural Topic Modeling.” Journal of Criminal Justice 64: 101627. 10.1016/j.jcrimjus.2019.101627. [DOI] [Google Scholar]
- Ohio Revised Code, 2907.02 Rape. 2025. https://codes.ohio.gov/ohio‐revised‐code/section‐2907.02.
- O’Neal, E. N. 2019. “‘Victim Is Not Credible’: The Influence of Rape Culture on Police Perceptions of Sexual Assault Complainants.” Justice Quarterly 36, no. 1: 127–160. 10.1080/07418825.2017.1406977. [DOI] [Google Scholar]
- O’Neal, E. N. , and Hayes B. E.. 2020. “‘Most [False Reports] Involve Teens”: Officer Attitudes Toward Teenage Sexual Assault Complainants—A Qualitative Analysis.” Violence Against Women 26, no. 1: 24–45. 10.1177/1077801219828537. [DOI] [PubMed] [Google Scholar]
- O’Neal, E. N. , and Spohn C.. 2017. “When the Perpetrator Is a Partner: Arrest and Charging Decisions in Intimate Partner Sexual Assault Cases—A Focal Concerns Analysis.” Violence Against Women 23, no. 6: 707–729. 10.1177/1077801216650289. [DOI] [PubMed] [Google Scholar]
- Page, A. D. 2010. “True Colors: Police Officers and Rape Myth Acceptance.” Feminist Criminology 5, no. 4: 315–334. 10.1177/1557085110384108. [DOI] [Google Scholar]
- Page, A. D. 2008. “Judging Women and Defining Crime: Police Officers’ Attitudes Toward Women and Rape.” Sociological Spectrum 28, no. 4: 389–411. 10.1080/02732170802053621. [DOI] [Google Scholar]
- Pattavina, A. , Morabito M., and Williams L. M.. 2016. “Examining Connections Between the Police and Prosecution in Sexual Assault Case Processing: Does the Use of Exceptional Clearance Facilitate a Downstream Orientation?” Victims and Offenders 11, no. 2: 315–334. 10.1080/15564886.2015.1046622. [DOI] [Google Scholar]
- Patterson, D. , and Campbell R.. 2012. “The Problem of Untested Sexual Assault Kits: Why Are Some Kits Never Submitted to a Crime Laboratory?” Journal of Interpersonal Violence 27, no. 11: 2259–2275. 10.1177/0886260511432155. [DOI] [PubMed] [Google Scholar]
- Quijano‐Sánchez, L. , Liberatore F., Camacho‐Collados J., and Camacho‐Collados M.. 2018. “Applying Automatic Text‐Based Detection of Deceptive Language to Police Reports: Extracting Behavioral Patterns From a Multi‐Step Classification Model to Understand How We Lie to the Police.” Knowledge‐Based Systems 149: 155–168. 10.1016/j.knosys.2018.03.010. [DOI] [Google Scholar]
- Reynolds, J. 2012. Criminal Justice Report Writing. Createspace Independent Publishing Platform. [Google Scholar]
- Shaw, J. , Campbell R., Cain D., and Feeney H.. 2017. “Beyond Surveys and Scales: How Rape Myths Manifest in Sexual Assault Police Records.” Psychology of Violence 7, no. 4: 602–614. 10.1037/vio0000072. [DOI] [Google Scholar]
- Spohn, C. , White C., and Tellis K.. 2014. “Unfounding Sexual Assault: Examining the Decision to Unfound and Identifying False Reports.” Law & Society Review 48, no. 1: 161–192. 10.1111/lasr.12060. [DOI] [Google Scholar]
- Steffensmeier, D. , Ulmer J. T., and Kramer J. H.. 1998. “The Interaction of Race, Gender, and Age in Criminal Sentencing: The Punishment Cost of Being Young, Black, and Male.” Criminology 36, no. 4: 763–798. 10.1111/j.1745-9125.1998.tb01265.x. [DOI] [Google Scholar]
- Temkin, J. , Gray J. M., and Barrett J.. 2018. “Different Functions of Rape Myth Use in Court: Findings From a Trial Observation Study.” Feminist Criminology 13, no. 2: 205–226. 10.1177/1557085116661627. [DOI] [Google Scholar]
- Thompson, A. , and Tapp S. N.. 2023. Criminal Victimization, 2022. Bureau of Justice Statistics, U.S. Department of Justice. [Google Scholar]
- Uniform Crime Report . 2004. “Uniform Crime Reporting Handbook.” FBI. [Google Scholar]
- United States Department of Justice . 2013. Crime in the United States, 2013. 1–2. https://ucr.fbi.gov/crime‐in‐the‐u.s/2013/crime‐in‐the‐u.s.‐2013/violent‐crime/rape.
- United States Department of Justice . 2015. Identifying and Preventing Gender Bias in Law Enforcement Response to Sexual Assault and Domestic Violence. https://www.justice.gov/opa/file/799366/download.
- Venema, R. M. , Lorenz K., and Sweda N.. 2021. “Unfounded, Cleared, or Cleared by Exceptional Means: Sexual Assault Case Outcomes From 1999 to 2014.” Journal of Interpersonal Violence 36, no. 19–20: NP10688–NP10719. 10.1177/0886260519876718. [DOI] [PubMed] [Google Scholar]
- Women’s Law Project . 2010. Rape in the United States: The Chronic Failure to Report and Investigate Rape Cases. Senate Committee on the Judiciary Subcommittee on Crime and Drugs. [Google Scholar]
- Women’s Law Project . 2013. Advocacy to Improve Police Response to Sex Crimes, 1–11: [Policy Brief].
- Yu, H. , and Monas N.. 2020. “Recreating the Scene: An Investigation of Police Report Writing.” Journal of Technical Writing and Communication 50, no. 1: 35–55. 10.1177/0047281618812441. [DOI] [Google Scholar]
- Yung, C. R. 2014. “How to Lie With Rape Statistics: America’s Hidden Rape Crisis.” Iowa Law Review 99, no. 1197: 1197–1256. https://ilr.law.uiowa.edu/print/volume‐99‐issue‐3/how‐to‐lie‐with‐rape‐statistics‐americas‐hidden‐rape‐crisis. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1: Logistic regression for sentiment scores on victim lied/doubted narratives.
Data Availability Statement
The data that support the findings of this study are archived with the U.S. National Archive of Criminal Justice Data.
