Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2025 Feb 25;122(9):e2418821122. doi: 10.1073/pnas.2418821122

Measuring criticism of the police in the local news media using large language models

Logan Crowl a,b, Sujan Dutta c, Ashiqur R KhudaBukhsh c, Edson Severnini d, Daniel S Nagin a,1
PMCID: PMC11892639  PMID: 39999180

Significance

A large body of evidence documents political polarization in national reporting of divisive topics. This study examines this issue empirically for local media reporting using large language models. Our focus is coverage of the police. Contrary to public perception, we find no evidence of an increasing trend in critical reporting of the police over the past decade. While spikes in critical coverage coincide with high-profile national incidents of police killings, those spikes quickly return to the trendless baseline level. We also find little difference between the reporting in more conservative and more liberal cities. Taken as a whole, these results suggest local media coverage of the police has not succumbed to the partisanship that objective reporting aims to avoid.

Keywords: police, media, journalism, transfer learning, natural language inference

Abstract

High-profile incidents of police violence against Black citizens over the past decade have spawned contentious debates in the United States on the role of police. This debate has played out prominently in the news media, leading to a perception that media outlets have become more critical of the police. There is currently, however, little empirical evidence supporting this perceived shift. We construct a large dataset of local news reporting on the police from 2013 to 2023 in 10 politically diverse U.S. cities. Leveraging advanced language models, we measure criticism by analyzing whether reporting supports or is critical of two contentions: 1) that the police protect citizens and 2) that the police are racist. To validate this approach, we collect labels from members of different political parties. We find that contrary to public perceptions, local media criticism of the police has remained relatively stable along these two dimensions over the past decade. While criticism spiked in the aftermath of high-profile police killings, such as George Floyd’s murder, these events did not produce sustained increases in negative police news. In fact, reporting supportive of police effectiveness has increased slightly since Floyd’s death. We find only small differences in coverage trends in more conservative and more liberal cities, undermining the idea that local outlets cater to the politics of their audiences. Last, although Republicans are more likely to view a piece of news as supportive of the police than Democrats, readers across parties see reporting as no more critical than it was a decade ago.


The murder of George Floyd by Minneapolis Police Officer Derek Chauvin in the summer of 2020 sparked massive nationwide protests and brought policing under intense scrutiny. Although distinct in its intensity, Floyd’s murder punctuated a decade of policing defined by tragic images of police violence against Black Americans, including but not limited to the deaths of Eric Garner, Michael Brown, Freddie Gray, Alton Sterling, Philando Castile, and Breonna Taylor. While these types of violent incidents are not new (1), they have increasingly received national attention over the past ten years, with social media accelerating the spread of news stories capturing the events, their aftermath, and the larger debates around racial bias in policing and police misconduct. While personal experiences with the police—whether direct or through the accounts of friends and family—are particularly influential on individual attitudes (2, 3), most Americans do not have regular contact with law enforcement (4). As a result, the news media plays an important intermediary role in shaping public opinion of the police, affecting, for example, perceptions of legitimacy, police effectiveness, and public appetite for accountability (511). Prior studies of media reporting on the police mostly focus on the episodic reporting of national-level outlets after major use-of-force incidents. We instead examine local media reporting in ten large U.S. cities, which allows us to observe regional differences in reporting from outlets that frequently cover these local institutions. We focus on two important, unexamined questions: Has local news become more critical of the police and has that reporting become more politically polarized over time?

The attention given to high-profile events of lethal police violence, coupled with declining public confidence in the police (12, 13), suggests that media reporting may have become more critical of police departments and their officers. Americans certainly seem to perceive a shift in the news. For this study, we surveyed 500 representative U.S. residents on their perception of media treatment of the police (for more details, see SI Appendix). 54% of respondents believe that local media reporting on the police has become more critical in the last decade, while only 19% think it has become less critical. For the national media, 70% say coverage has become more critical, with just 11% believing it has become less critical. For Republican, Independent, and Democratic respondents alike, the most common perception is that media criticism of the police has increased. There is, however, scant empirical evidence on how coverage of the police has changed over time. Thus, it is unclear if reporting on the police has indeed become more critical overall.

A further complication comes from the politically polarized media landscape, with partisan outlets proliferating in the past decade as more consumers look to the internet to find news compatible with their political views. Reporting on politically divisive topics—including climate change, COVID-19, and abortion—has become more polarized based on audience politics (1416). Meanwhile, few topics have become more politically divisive than policing, with a large, widening partisan divide in attitudes toward the police (17). From 2016 to 2020, the percentage of Democrats who said the police are doing a good/excellent job of “protecting people from crime” declined from 53% to 43%. Over that same time, Republican agreement with that statement rose from 74% to 78% (18). These political divisions are reflected in the larger discussion around policing and its role in society, with support, reform, and abolition of the police all becoming political rallying cries for candidates of different political affiliations.

The partisan division in the debate over policing suggests that media coverage of the police may depend heavily on the political composition of an outlet’s audience. For example, Dutta et al. established that television coverage of the police by cable news networks (CNN), Fox News, and MSNBC diverged along political lines after George Floyd’s murder, the Capitol riot on January 6, 2021, and the sentencing of Derek Chauvin for the murder of Floyd in March 2021 (19). Local media, however, may have a different relationship with its audience. In our survey, 41% of Republicans felt the police were treated fairly by local media compared to just 17% for national media (SI Appendix, Fig. S2). Furthermore, national reporting on the police typically does not cover a news consumer’s local police department, whereas police activity has long been a staple of regional news publishers through reporting on crime (2022). Therefore, the coverage that police departments receive from local news media organizations is important to their image and may not follow the same political trends that define the national discourse.

The goal of this study is to ground these questions in empirical evidence by measuring how media criticism of the police has changed over the past decade, with specific attention paid to differences based on regional politics and reader beliefs. Is the perception that news on the police has become more critical supported when considering the full spectrum of coverage? The immense quantity of news involving the police and the difficulty of measuring meaningful characteristics of written text means that existing research on how the police are portrayed in the news is often limited to case studies of high-profile police shootings (11, 23, 24). A thorough accounting of media support and criticism, however, necessitates an expansion of scope beyond these highly salient examples. Thus, we investigate how the police are covered in the news on a typical day, not just when they make headlines.

1. Results

1.1. Data.

To measure media criticism of the police, we have constructed a large dataset of news articles published from 2013 to 2023 by local outlets for a set of 10 U.S. cities. Because of the potential importance of audience politics, our sample contains both Republican-leaning and Democratic-leaning areas. Since large cities in the United States tend to be more Democratic, we began by selecting from cities with the largest number of recent Republican mayors (since 2000), choosing Dallas-Fort Worth, San Diego, Jacksonville, Oklahoma City, and Omaha. We then selected Democratic-controlled cities—Houston, Denver, Tampa, Nashville, and Pittsburgh—as counterparts based on population, geographic region, and racial composition. These metropolitan areas represent an estimated 33,928,790 Americans as of 2023 (25).

We then use Google News and Newsbank to identify relevant news articles published in each city at different times in our study period. We remove any articles written by the Associated Press, newswire services, or other nonlocal outlets. The remaining articles are primarily from the largest newspapers and television stations in each city and cover topics ranging from local crime to police misconduct locally and nationally.

From these articles, we study all sentences and headlines that explicitly reference the police. We chose this unit of analysis and subset of sentences for several reasons. There is a trade-off between providing sufficient context and making judgments easier and more reliable. Working with sentences mentioning the police, instead of entire articles or excerpts, allows us to collect more judgments from readers, ensures those judgments are manageable and based on actual references to the police, and gives us a more granular view of supportive and critical reporting. Second, previous work on cable news coverage of the police used this unit of analysis and found significant time trends in reporting (19), suggesting individual sentences can provide enough information to detect changes in criticism of the police. We recognize, however, that a focus on sentences may miss important context. For this reason, we report additional results for different units of analysis showing that our main findings are robust to this choice.

In total, our dataset includes 1,317,855 sentences and headlines mentioning the police from 250,169 different articles published by 209 unique outlets (for more information, see Materials and Methods and SI Appendix).

1.2. Measuring Criticism.

To detect shifts in news coverage, we need to infer if a piece of reporting in our dataset is supportive or critical of the police. Traditional sentiment analysis, which studies the emotional tone of text, is poorly suited for our context because it struggles to separate coverage that is critical of the police from negative news, such as reporting on a murder, where the police are involved—a particularly relevant distinction given the police’s role in regular reporting on crime (26). We instead cast the problem of measuring criticism as a natural language inference task (19, 27, 28). We ask the directed question: Would a person reading this piece of reporting view it as supportive, neutral, or contradicting of a specific opinion (“hypothesis”) about the police?

We focus on two relevant hypotheses about the police: that “the police protect us” (Hprotect) and that “the police are racist” (Hracist). Our first hypothesis, “police protect us,” is grounded in prior computational social science literature (19); it provides a measure of support for the idea that the police are effective at providing safety—a key goal of American police departments where “protect” features prominently in many departmental mission statements. We also select “police are racist” because systemic racism in policing has become a focal point in U.S. policing discourse, particularly after George Floyd’s murder (2931). Although not the totality of relevant opinions on the police, these hypotheses cover two key aspects of the discourse around the police. Therefore, by taking this approach, we can move beyond simply asking whether a piece of writing on the police is generally positive or negative, and instead, highlight specific dimensions of criticism.

Measuring criticism for our entire dataset requires predicting whether a reader would view a sentence as “supporting,” “contradicting,” or “neutral” to a given hypothesis. To do this, we rely on two well-known large language models (LLMs): Mistral 7B (32) and LLaMa 2 13B (33). Modern LLMs are remarkably flexible, setting state-of-the-art benchmarks across a variety of language tasks. Their general language skills are particularly valuable in transfer learning settings, where pretrained base models are fine-tuned using context-specific examples (34, 35).

1.3. Reader Perceptions of Criticism.

To ensure our results align with how actual readers perceive police criticism, we collect judgments for a subset of the police news in our dataset from members of different political parties. Specifically, we select 3,600 sentences from our dataset using different active learning strategies and ask three readers—one Republican, one Democrat, and one Independent—whether they perceive the reporting as supporting, neutral to, or contradicting our two hypotheses. All participants were recruited through Prolific, an online platform for conducting research studies involving human subjects. Furthermore, our choice to balance our survey across political affiliation is based on the well-established finding that online survey platforms tend to have more liberal participants than population-based samples (3638) and that readers from different political parties differ significantly in how they perceive content (39, 40).

Indeed, readers of different political affiliations frequently disagree in their judgments of the same piece of news. Specifically, Fig. 1 shows that Republican readers are more likely to see a sentence as supporting “police protect us” and contradicting “police are racist” than Democratic readers. Independents tend to fall between their political peers, making them the most frequent tiebreaker.

Fig. 1.

Fig. 1.

The percentage of police sentences (N = 3,600) judged by readers of different political parties as supporting, neutral to, or contradicting two hypotheses: “police protect us” and “police are racist.” Each sentence is read by three readers, one from each political party, so the consensus label is the majority response.

1.4. Consensus Model.

For our primary model, we fine-tune Mistral and LLaMa 2 models to predict the majority label between the three readers, giving us a sense of the consensus perception across political beliefs. We select models based on out-of-sample F1 score (macro) and evaluate performance on a separate test set. Our chosen model uses Mistral and has F1 = 73.1 (Table 1), giving it performance similar to prior work modeling media coverage of the police (19).

Table 1.

F1 scores (macro) between model predictions and test set labels from readers of different political parties

Label
Model Consensus Democratic Independent Republican
Consensus 73.11 ± 1.71 64.57 ± 1.75 63.73 ± 1.71 63.29 ± 1.68
Democratic 72.36 ± 1.62 65.67 ± 1.64 65.12 ± 1.66 62.11 ± 1.63
Independent 72.98 ± 1.62 64.70 ± 1.69 66.65 ± 1.64 62.35 ± 1.69
Republican 71.58 ± 1.57 64.09 ± 1.64 64.18 ± 1.70 63.81 ± 1.56

Performance is evaluated across 1,000 bootstrap samples of the test set, with averages and SDs of scores reported. In each column, the highest average score is bolded to indicate the model with the best performance metric for that label type.

With our consensus model, we can approximate how a politically diverse set of readers would judge the full set of reporting in our dataset. Fig. 2 summarizes trends in predicted support and contradiction for our two hypotheses. The Top figures show the percentage of monthly sentences mentioning the police predicted as supporting or contradicting a given hypothesis (the percentage predicted as “neutral” has been omitted). To test for time trends, we estimate four linear probability models, regressing indicators for model predictions of support and contradiction with each hypothesis on city and quarter fixed effects. The Bottom plots report 95% CIs for each quarter. These coefficients indicate percentage point differences in support or contradiction relative to Q2 2018, the midpoint of our sample. We use the midpoint as our reference level because of the smaller sample sizes for earlier quarters in our dataset.

Fig. 2.

Fig. 2.

Temporal trends in the proportion of police sentences predicted as supporting and contradicting the hypotheses, “police protect us” (Left) and “police are racist” (Right). Predictions come from a fine-tuned Mistral model trained to predict the majority vote label between Democratic, Republican, and Independent readers. The Top figures show the percentage of monthly police sentences from all local publishers predicted as supportive and critical. The four figures Below show 95% CIs for the difference in proportions for each quarter relative to the midpoint of our sample (Q2 2018). Coefficient estimates come from linear probability models regressing model prediction indicators for each hypothesis on city and quarter fixed effects, and SEs are clustered by city.

Overall, we see remarkable stability in the levels of supportive and critical coverage over the past decade, temporary spikes in criticism after prominent incidents of police violence, and evidence of a modest increase in supportive news in recent years. At the beginning of our study period from 2013 to 2019, 35 to 40% of monthly sentences mentioning the police supported the idea that police protect us and contradicted it just 10% of the time. In addition, reporting that insinuated or refuted police racial bias is virtually nonexistent at the beginning of our sample and eventually rises to levels of 1 to 2%.

The first notable deviation from this pattern happened in 2014 with several notable spikes in media criticism coinciding with prominent incidents of police violence. Specifically, we see support for Hracist rise 3 to 5 percentage points and contradiction with Hprotect jump roughly 7 percentage points. These increases in criticism, however, do not translate to sustained changes in coverage. In general, the period from 2013 to 2019 is defined by consistent levels of criticism and support of the police interrupted by upticks in critical reporting surrounding high-profile police killings.

The most dramatic change in coverage occurred after the murder of George Floyd in May 2020. Unlike previous stories about the deaths of unarmed Black citizens at the hands of police officers, George Floyd’s murder and the subsequent nationwide protests against police violence drastically changed the tenor of police coverage. In June 2020, the rate at which sentences contradict the idea that police protect us reached 36.2%, a rise of nearly 30 percentage points from two months prior. At the same time, support for Hprotect dropped 15 points. Unsurprisingly, the rate at which sentences imply the police are racist rose dramatically during this period. In the months prior to George Floyd’s murder, coverage supportive of Hracist dropped to less than 1%, likely a result of changes in coverage and police activity caused by the onset of the COVID-19 pandemic in the United States. In June 2020, however, 13.6% of sentences mentioning the police suggested they are racist, far and away the highest monthly rate of support for that hypothesis during our sample period.

The explosion of media criticism at a time when phrases like “defund the police” entered the published consciousness is striking but unsurprising. What is somewhat unexpected is what has happened since. From 2021 through the end of 2023, we observe support for Hracist gradually return to pre-George Floyd levels, but support for the idea that police protect us becomes more common. Beyond just rebounding, support for Hprotect reaches its historical norm of roughly 40% midway through 2021 and then continues to rise to 46.2% at the end of our sample period in October of 2023.

These results indicate that coverage implying racial bias in policing is generally no more common today than it was before George Floyd’s murder. Conversely, reporting in recent years has become more supportive of the idea that police protect citizens. Therefore, we find evidence that the national reckoning on the harms of policing has not produced a sustained increase in media criticism for local outlets and may have instead ushered in slightly more supportive coverage of the police overall.

1.5. Political Differences in Criticism of the Police.

There are two important types of political variation in our sample. First, there may be differences in coverage due to the overall political affiliation of a given publisher’s audience, where certain cities in our sample are identified as conservative or liberal leaning based on local election results. Partisan polarization in media, particularly at the national level, suggests that content targeting a more Republican audience may differ from that targeting a more Democratic audience. Using our primary model, we examine whether coverage trends are different between liberal and conservative cities. Second, there are differences in how the same piece of reporting is perceived by readers of different political affiliations. Since all of our labeled sentences are read by one person from each political party, we can explore whether Republicans, Democrats, and Independents perceive different trends in reporting on the police.

1.5.1. City politics.

Fig. 3 shows support and contradiction rates for our two hypotheses separately for the conservative and liberal cities. Notably, there do not appear to be major differences in trends for these different sets of cities. For both the conservative and liberal cities, our overall finding that coverage has maintained a consistent level of criticism over the decade holds.

Fig. 3.

Fig. 3.

Temporal trends in predictions from our consensus model by the political leanings of each city. Republican-leaning cities (Left) include Dallas-Fort Worth, San Diego, Jacksonville, Oklahoma City, and Omaha. Democratic-leaning cities (Right) include Houston, Denver, Tampa, Nashville, and Pittsburgh.

To further support this claim, we regress prediction indicators on city fixed effects and an interaction between quarter and city political leaning. We do not find evidence of significant differences in time trends in support and contradiction for liberal compared to conservative cities (SI Appendix, Fig. S4). However, conservative cities do appear to have slightly lower levels of criticism overall. In models specified with quarter and city politics fixed effects, we find that support and contradiction rates for Hprotect both have a difference of roughly 2 percentage points on average between conservative and liberal cities, with conservative cities having more supportive and less critical coverage. Similarly, conservative cities are slightly less likely to see reporting suggesting racial bias in policing than liberal cities, with support rates 0.4 percentage points lower and contradiction rates 0.1 percentage points higher on average. While these level differences are significant at the 5% level, their absolute magnitudes are small.

1.5.2. Reader politics.

Since readers of different political affiliations regularly disagree in their judgments of the same piece of news, we train three additional Mistral models to mirror the labels from each party separately. These models allow us to see whether Republican, Democratic, and Independent readers perceive different trends in criticism of the police.

Fig. 4 shows the trends in predicted support and contradiction for each political model. We see that Republican readers are more likely to interpret sentences involving the police as supporting Hprotect and contradicting Hracist than Democrats and Independents. Compared to Republicans, Democratic readers perceive a larger spike in criticism after George Floyd’s murder followed by a more pronounced increase in support for Hprotect from 2020 to 2023. Notably, these level differences are much larger than we observed for the differences between liberal and conservative cities, indicating that perceived differences in coverage for different political groups are likely driven more by differences in reader judgments than the reporting itself. However, we generally do not see significant differences in the perceived time trends in reporting on the police for readers of different political backgrounds (SI Appendix, Fig. S5). Our models are based on current-day judgments, so we cannot determine how perceptions have varied over time, but contemporary readers of all parties perceive local news as roughly equally critical of the police today as a decade ago.

Fig. 4.

Fig. 4.

Temporal trends in support and contradiction predictions for models trained separately on judgments from Democratic, Independent, and Republican respondents. Blue and red lines indicate the percentage of monthly sentences mentioning the police that each model predicts as supporting or contradicting a given hypothesis.

1.5.3. Adding context.

We earlier acknowledged that our use of a single sentence as the unit of analysis may miss important context. For this reason, we conducted two supplementary analyses to address this concern. First, we expand our scope to consider the sentences preceding and succeeding each mention of the police. For each three-sentence window, we use our consensus model to see whether this added context changes the trends in criticism. SI Appendix, Fig. S6 shows that overall levels of support for Hprotect and Hracist are higher (roughly 6 and 1 percentage points, respectively) when including the additional context of nearby sentences, but there is still no trend in reporting critical of the police over time. Our main findings for differences between Democratic-leaning and Republican-leaning cities—no trend in either subset of cities and only a small difference in their levels of criticism—also hold (SI Appendix, Figs. S7 and S8). In our second robustness check, we consider if criticism looks the same when aggregating over articles instead of sentences. To check this, we take the percentage of an article’s references to the police that are predicted as supportive, neutral, and contradicting of each hypothesis and average these percentages each month, so each article is weighted equally. SI Appendix, Figs. S9–S11 show that again, we find no evidence of a trend in articles critical of the police overall or in cities with different political leanings, but we do see a modest increase in articles supportive of police effectiveness in recent years.

2. Discussion

In a decade where U.S. policing has been defined by images of deadly force against Black citizens, we find that criticism of the police in local media has remained remarkably steady. Specifically, we see that news implying police ineffectiveness or racial bias is generally no more common today than it was before George Floyd’s murder in 2020. While these types of high-profile incidents of police violence do lead to sharp spikes in criticism, these changes are temporary, with critical news returning to historical norms within a short amount of time. In fact, in the post-George Floyd period, reporting supportive of police effectiveness increased roughly 4 to 6 percentage points. These findings do not support the common belief that local news has become more critical of the police. Instead, they suggest local outlets have maintained relatively consistent treatment of the police departments they cover.

Furthermore, it does not appear that local outlets have tailored their reporting on the police to the politics of their audience, a common finding in research on coverage of polarizing topics at the national level. For example, Dutta et al. used a similar approach to measuring criticism and found evidence that cable news reporting on the police reflected the political stances of their predominant viewership. In contrast, we find that police news in the more conservative and more liberal cities in our sample have very similar trends and levels of criticism. Publishers in Republican-leaning cities are slightly more likely to imply the police protect us (2 percentage points) and less likely to suggest they are racist (0.2 percentage points) than outlets in Democratic-leaning cities, but these differences are small and do not change significantly over time. In essence, the content written about the police in more conservative cities is not particularly different in terms of the opinions it implies than the news from more liberal cities.

Importantly, however, this consistency of content does not mean that readers of different political backgrounds see reporting on the police moving in the same direction. Even when reading the same piece of reporting, Republican annotators are more likely to perceive coverage as supportive of the police and Democrats are more likely to view it as critical. Our political models, however, indicate that while readers of different political parties may disagree on the overall level of support and criticism, they seem to agree that criticism of the police is not on the rise in local news. These findings have two important consequences: First, they imply that even if all local publishers covered the police identically, readers in more conservative areas would view that news as more supportive of policing than readers in more liberal areas, potentially leading to larger differences in opinion. Second, they suggest that political differences in perception are not driving the common belief that the media has become more critical of the police.

The scale, scope, and methods of this study situate it in a gap in the current literature. Much of the existing research on how the police are covered in the news focuses on national outlets in the aftermath of high-profile incidents of police violence and misconduct, where there is evidence for an increase in news critical of the police (19, 23, 24, 28, 41). We contribute to this literature by focusing on local outlets and considering the full spectrum of coverage the police receive. Our findings suggest that understanding the relationship between news media and the police requires considering these additional factors. Beyond these moments of intense scrutiny, local media coverage of the police is largely favorable.

With policing becoming more politically divisive, this study also contributes to efforts to understand polarization in journalism. In particular, our focus on local news adds to research on the differences in partisan bias between national and regional outlets. The rise of partisan media brings with it a rejection of objectivity in favor of news content representing specific viewpoints (42). This approach directly contrasts with the dominant model used by many local publishers, like large daily newspapers, where objectivity and neutrality are often founding doctrines (43). With local newsrooms in decline (44), many outlets have changed ownership, which has led to increased polarization and focus on national news for issues related to politics (45). Nonetheless, there is evidence that when local publishers focus on local issues instead of national ones, polarization and its impacts on reader beliefs decline (46). Our findings here support that local journalists covering local issues do not fall prey to the type of partisanship seen at the national level.

Media coverage of important public institutions like the police matters. For many Americans, the stories they read about their local police department define their understanding of law enforcement. This study contributes empirical evidence on how that relationship may be changing, finding that despite popular belief, reporting on the police has not become more critical. Furthermore, the usage of advanced language modeling to measure nuanced characteristics of writing at scale presents exciting opportunities for future research on media coverage and other social science questions where relevant information is locked in difficult-to-analyze formats, like text.

3. Materials and Methods

3.1. News Data.

Capturing media coverage of the police in different cities at different times presents unique challenges. Many prominent publishers do not provide searchable archives, making it especially difficult to locate relevant, older content. To address this issue, we rely on two key resources: Google News and Newsbank.

Each service provides a different view of the historical news landscape. Google News is the largest news aggregation service both in the United States and globally (47). By indexing news content from local and national media outlets, Google News provides a single location for consumers to access most news articles available online. In a 2020 survey, Pew Research Center found that 11% of U.S. adults used Google News often, with an additional 24% indicating they use it sometimes (47). By searching Google News using different keywords, we capture popular news stories published at different points in time by a wide variety of publishers.

Newsbank, on the other hand, is a database containing current and archived news material from over 12,000 publications, including print and online-only newspapers, blogs, newswires, journals, broadcast transcripts, and videos. Newsbank, therefore, provides access to the full set of articles published by a select number of prominent outlets from each city. As a collection of both, our sample benefits from breadth and depth, containing both highly relevant articles from a wide range of outlets and thorough coverage of articles for our most important publishers.

3.2. Data capture.

For each city and week during our time frame (January 1, 2013, through November 1, 2023), we queried Google News and Newsbank using common identifiers for the major police department(s) for relevant articles. For example, we used both “Dallas Police Department” and “Dallas DPD” as keywords for finding articles mentioning the police in Dallas. We chose these keywords because Google News does not allow us to restrict results to outlets from a particular area, so we use our keyword terms to target articles that mention both a city and the police. While this approach prioritizes articles that mention a city’s local police department specifically, we also capture articles from publishers in that city written about police elsewhere because they are often considered close matches for these terms. For each week and keyword term, we identified up to 100 articles. Since we cannot prescreen for local outlets in Google News, we took the 25 most common sources for each city, manually checked which are local to the area, and restricted our sample to those outlets. We remove duplicated articles and exclude any national publishers or newswires, even those based in one of our cities.

Using these search results, we then scraped each article and split it into its component headline and body sentences using spaCy’s English segmentation model (48). We restrict our analysis to sentences and headlines ten tokens or longer that reference the police. To do this, we require that a sentence includes at least one of the following words: police, detective, law enforcement, authorities, cop, officer, investigator, or a police department abbreviation ending in “PD.” Finally, there are a number of sentences where the police are only referenced as a source of information, so we removed any sentences where the only matched police word is preceded by “according to.”

3.2. Reader Labels.

To fine-tune and validate our models, we collected human judgments for 3,600 sentences from our dataset. For each sentence, we asked three readers—one Republican, one Democrat, and one Independent—to decide whether the sentence supports, contradicts, or is neutral to “police protect us” and “police are racist.” We recruited participants through the Prolific platform because of recent findings that it provides higher quality responses than other human-subjects research platforms (38). Party affiliations were self-determined by participants, and each respondent read 30 sentences and had to pass reasonable attention checks to have their labels included. The instructions provided to participants were the following:

“For each of the following sentences, please decide if they support, contradict, or are neutral to the following ideas: 1. Police protect us 2. Police are racist.”

3.2.1. Active learning to select sentences.

Since this process is labor intensive, we selected sentences using several well-known active learning sampling strategies (49). Specifically, we used a combination of four approaches: random sampling conditional on base model predictions, logical inconsistency sampling, model inconsistency sampling, and uncertainty sampling. We provide a brief description of these methods below.

3.2.2. Random sampling from base model predictions.

We expected some degree of imbalance in support, neutral, and contradiction labels, particularly for Hracist. As a result, a random sample of sentences may not include sufficient instances of support and contradiction, which are of particular interest in this study. We, therefore, sought to balance our initial sample based on the expected labels across our two hypotheses. To accomplish this, we made predictions for a random sample of 10,000 police sentences using three base models: Mistral-7B-Instruct (32), Llama-2-13b-chat (33), and PaLM 2 text-bison-001 (50). We then sampled 100 unique sentences predicted as support, neutral, and contradiction for each hypothesis and model. This process yields 100(sentences)×3(labels)×2(hypotheses)×3(models)= 1,800 sentences. After collecting judgments from readers on these sentences, we fine-tuned Mistral and LLaMa 2 models on the majority label.

3.2.3. Logical inconsistency sampling.

A sentence cannot support or contradict both a hypothesis and its negation, but current LLMs often struggle with negated prompts (51). We sought to discourage this behavior by labeling sentences where initial models produce logically inconsistent predictions. Using the best fine-tuned Mistral and LLaMa 2 models, we made predictions on a new random sample of 10,000 police sentences for each hypothesis and its negation. We then selected 225 sentences for each model where the predictions for a hypothesis and its negation are either both support or contradiction.

3.2.4. Model inconsistency sampling.

Because different models have different strengths, we collected labels for sentences where different models strongly disagree. Specifically, we sampled 450 sentences where our two fine-tuned models predict either support-contradiction or contradiction-support for the same hypothesis. This approach is similar to committee-based sampling methods (52).

3.2.5. Uncertainty sampling.

Last, we used uncertainty sampling to select an additional 900 sentences for labeling (53, 54). For each of our fine-tuned models, we chose the 450 sentences with the smallest difference in normalized scores between the top two predicted classes.

3.3. Additional Model information.

We used the following prompt structure for all models:

“Perform natural language inference by predicting entailment, contradiction, or neutral.

Premise: {sentence}

Hypothesis: {hypothesis}”

We experimented slightly with different prompts by changing the task description (e.g., “Can you predict if the premise logically entails, contradicts, or is neutral to the hypothesis?”), including the political party of the intended reader, and providing an example of a sentence for each label. These changes did not have a significant impact on final model performance. For our base models, however, providing examples and a system prompt to respond with only “entailment,” “contradiction,” or “neutral” notably improved our generated labels. Still, the base models occasionally output slight variations on these labels or different words entirely, so we matched outputs to their intended labels when possible and discarded ambiguous predictions. For fine-tuned models, we randomly initialized a classification layer, so there were no ambiguous predictions.

3.3.1. Performance and training.

All models were fine-tuned using QLoRa (55), a memory-efficient approach where a large, pretrained model is quantized and then modified by training Low-Rank Adapters (56) through mini-batch gradient descent. For training, we tuned the learning rate, its schedule, and the adapter rank.

Because our inconsistency and uncertainty sampling rounds selected sentences based on model weaknesses, we reserved 30% of the sentences from our random sampling stage (555 unique sentences and 3,330 labels) as a test set to evaluate our final models. Using the remaining examples, we fine-tuned Mistral and LLaMa 2 to predict the majority label from the three readers with ties broken at random. We selected our best model based on macro F1 score in an out-of-sample validation set. We chose to prioritize macro F1 because, despite the imbalance in our labels, we prefer models with a reasonable balance of precision and recall in all three classes. Our best-performing consensus model was Mistral, so we fine-tuned additional Mistral models to predict labels from the Republican, Independent, and Democratic readers separately. Table 1 shows the F1 performance for these models, and additional performance information can be found in SI Appendix.

Supplementary Material

Appendix 01 (PDF)

Acknowledgments

We are grateful to PNAS editors and reviewers for their thoughtful feedback, which significantly improved this work. We are appreciative of Sharon Lynch, Cynthia Lum, and Rayid Ghani for their valuable feedback. We thank Md Towhidul Absar Chowdhury for his guidance in setting up our crowdsourcing experiments. Additionally, we acknowledge the helpful comments and suggestions from seminar participants at Princeton University and the Max Planck Institute for Research on Collective Goods. All opinions expressed in this paper, as well as any errors, are solely our own.

Author contributions

L.C., S.D., A.R.K., E.S., and D.S.N. designed research; L.C. performed research; L.C. analyzed data; and L.C., S.D., A.R.K., E.S., and D.S.N. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

Anonymized data and replication code have been deposited in a repository at the Inter-university Consortium for Political and Social Research (https://doi.org/10.3886/E218063V1) (57).

Supporting Information

References

  • 1.Goldkamp J. S., Minorities as victims of police shootings: Interpretations of racial disproportionality and police use of deadly force. Justice Syst. J. 2, 169–183 (1976). [Google Scholar]
  • 2.Schuck A. M., Rosenbaum D. P., Global and neighborhood attitudes toward the police: Differentiation by race, ethnicity and type of contact. J. Quant. Criminol. 21, 391–418 (2005). [Google Scholar]
  • 3.Callanan V. J., Rosenberger J. S., Media and public perceptions of the police: Examining the impact of race and personal experience. Polic. Soc. 21, 167–189 (2011). [Google Scholar]
  • 4.E. Harrell, E. Davis, “Contacts between police and the public, 2018-statistical tables” (Tech. Rep. NCJ 255730, Bureau of Justice Statistics, Washington, DC, 2020).
  • 5.R Surette, Media, crime and criminal justice: Images, realities, and policies (Cengage Learning, Boston, MA, ed. 5, 2014).
  • 6.Graziano L. M., News media and perceptions of police: A state-of-the-art-review. Polic. Int. J. 42, 209–225 (2019). [Google Scholar]
  • 7.Dowler K., Media influence on citizen attitudes toward police effectiveness. Polic. Soc. 12, 227–238 (2002). [Google Scholar]
  • 8.Dowler K., Zawilski V., Public perceptions of police misconduct and discrimination: Examining the impact of media consumption. J. Crim. Justice 35, 193–203 (2007). [Google Scholar]
  • 9.D. Premkumar, Public scrutiny, police behavior, and crime consequences: evidence from high-profile police killings. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715223. Accessed 10 December 2024.
  • 10.Intravia J., Thompson A. J., Pickett J. T., Net legitimacy: Internet and social media exposure and attitudes toward the police. Sociol. Spectr. 40, 58–80 (2020). [Google Scholar]
  • 11.J. Moreno-Medina, A. Ouss, P. Bayer, B. A. Ba, Officer-involved: The media language of police killings. Q. J. Econ., 10.1093/qje/qjaf004 (2025). [DOI]
  • 12.Gallup, Confidence in U.S. institutions down; average at new low. https://news.gallup.com/poll/394283/confidence-institutions-down-average-new-low.aspx. Accessed 10 December 2024.
  • 13.Washington Post-ABC News, Confidence in police practices drops to a new low: Poll. https://abcnews.go.com/Politics/confidence-police-practices-drops-new-low-poll/story?id=96858308. Accessed 10 December 2024.
  • 14.Chinn S., Hart P. S., Soroka S., Politicization and polarization in climate change news content, 1985–2017. Sci. Commun. 42, 112–129 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hart P. S., Chinn S., Soroka S., Politicization and polarization in covid-19 news coverage. Sci. Commun. 42, 679–697 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Carmines E. G., Gerrity J. C., Wagner M. W., How abortion became a partisan issue: Media coverage of the interest group-political party connection. Polit. Policy 38, 1135–1158 (2010). [Google Scholar]
  • 17.Pew Research Center, Republicans more likely than democrats to have confidence in police. https://www.pewresearch.org/short-reads/2017/01/13/republicans-more-likely-than-democrats-to-have-confidence-in-police/. Accessed 10 December 2024.
  • 18.Pew Research Center, Majority of public favors giving civilians the power to sue police officers for misconduct. https://www.pewresearch.org/politics/2020/07/09/majority-of-public-favors-giving-civilians-the-power-to-sue-police-officers-for-misconduct/. Accessed 10 December 2024.
  • 19.S. Dutta, B. Li, D. S. Nagin, A. R. KhudaBukhsh, “A murder and protests, the capitol riot, and the chauvin trial: Estimating disparate news media stance” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 (2022), pp. 5059–5065.
  • 20.Chermak S. M., Body count news: How crime is presented in the news media. Justice Q. 11, 561–582 (1994). [Google Scholar]
  • 21.Dixon T. L., Azocar C. L., Casas M., The portrayal of race and crime on television network news. J. Broadcast. Electron. Media 47, 498–523 (2003). [Google Scholar]
  • 22.J. Schildkraut, “Crime news in newspapers” in Oxford Research Encyclopedia of Criminology and Criminal Justice (Oxford University Press, 2017).
  • 23.Lee A. S., Weitzer R., Martinez D. E., Recent police killings in the united states: A three-city comparison. Police Q. 21, 196–222 (2018). [Google Scholar]
  • 24.C. Naoroz, H. M. Cleary, News media framing of police body-worn cameras: A content analysis. Polic. J. Policy Pract. 15, 540–555 (2021).
  • 25.U.S. Census Bureau, Metropolitan and micropolitan statistical areas population totals: 2020–2023. https://www.census.gov/data/tables/time-series/demo/popest/2020s-total-metro-and-micro-statistical-areas.html#v2023. Accessed 10 December 2024.
  • 26.P. Venkit et al., “The sentiment problem: A critical survey towards deconstructing sentiment analysis” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 13743–13763.
  • 27.I. Dagan, O. Glickman, B. Magnini, “The pascal recognising textual entailment challenge” in Machine Learning Challenges Workshop, J. Quiñonero-Candela, I. Dagan, B. Magnini, F. d’Alché-Buc, Eds. (Springer, 2005), pp. 177–190.
  • 28.A. Halterman, K. A. Keith, S. M. Sarwar, B. O’Connor, Corpus-level evaluation for event QA: The indiapoliceevents corpus covering the 2002 gujarat violence. arXiv [Preprint] (2021). http://arxiv.org/abs/2105.12936 (Accessed 10 December 2024).
  • 29.Weitzer R., Tuch S. A., Race and Policing in America: Conflict and Reform (Cambridge University Press, 2006). [Google Scholar]
  • 30.Lum C., Nagin D. S., Reinventing American policing. Crime Justice 46, 339–393 (2017). [Google Scholar]
  • 31.Schwartz S. A., Police brutality and racism in America. Explore 16, 280 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.A. Q. Jiang et al., Mistral 7B. arXiv [Preprint] (2023). http://arxiv.org/abs/2310.06825 (Accessed 10 December 2024).
  • 33.H. Touvron et al., Llama 2: Open foundation and fine-tuned chat models. arXiv [Preprint] (2023). http://arxiv.org/abs/2307.09288 (Accessed 10 December 2024).
  • 34.Ziems C., et al. , Can large language models transform computational social science? Comput. Linguist. 50, 237–291 (2024).
  • 35.S. Wang, H. Fang, M. Khabsa, H. Mao, H. Ma, Entailment as few-shot learner. arXiv [Preprint] (2021). http://arxiv.org/abs/2104.14690 (Accessed 10 December 2024).
  • 36.Berinsky A. J., Huber G. A., Lenz G. S., Evaluating online labor markets for experimental research: Amazon. com’s mechanical turk. Polit. Anal. 20, 351–368 (2012). [Google Scholar]
  • 37.Levay K. E., Freese J., Druckman J. N., The demographic and political composition of mechanical turk samples. Sage Open 6, 2158244016636433 (2016). [Google Scholar]
  • 38.Douglas B. D., Ewell P. J., Brauer M., Data quality in online human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona. PLoS One 18, e0279720 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.T. Weerasooriya et al., “Vicarious offense and noise audit of offensive speech classifiers: Unifying human and machine disagreement on what is offensive” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 11648–11668.
  • 40.M. Sap et al., “Annotators with attitudes: How annotator beliefs and identities bias toxic language detection” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M. C. de Marneffe, I. V. Meza Ruiz, Eds. (Association for Computational Linguistics, Seattle, United States, 2022), pp. 5884–5906.
  • 41.T. Gest, “Police and the news media” in Oxford Research Encyclopedia of Criminology and Criminal Justice (Oxford University Press, 2019).
  • 42.Jamieson K. H., Cappella J. N., Echo Chamber: Rush Limbaugh and the Conservative Media Establishment (Oxford University Press, 2008). [Google Scholar]
  • 43.Mindich D. T., Just the Facts: How “Objectivity’’ Came to Define American Journalism (NYU Press, 1998). [Google Scholar]
  • 44.Pew Research Center, State of the news media 2016. https://assets.pewresearch.org/wp-content/uploads/sites/13/2016/06/30143308/state-of-the-news-media-report-2016-final.pdf. Accessed 10 December 2024.
  • 45.Martin G. J., McCrain J., Local news and national politics. Am. Polit. Sci. Rev. 113, 372–384 (2019). [Google Scholar]
  • 46.Darr J. P., Hitt M. P., Dunaway J. L., Home Style Opinion: How Local Newspapers Can Slow Polarization (Cambridge University Press, 2021). [Google Scholar]
  • 47.M. Barthel, A. Mitchell, D. Asare-Marfo, C. Kennedy, K. Worden, Measuring news consumption in a digital era. Washington, DC: Pew Research Center (2020). https://www.pewresearch.org/wp-content/uploads/sites/20/2020/12/PJ_2020.12.08_News-Consumption_FINAL.pdf. Accessed 10 December 2024.
  • 48.M. Honnibal, I. Montani, S. Van Landeghem, A. Boyd, Spacy: Industrial-strength natural language processing in python. Zenodo. 10.5281/zenodo.1212303. Accessed 10 December 2024. [DOI]
  • 49.Palakodety S., KhudaBukhsh A. R., Carbonell J. G., “Voice for the voiceless: Active sampling to detect comments supporting the rohingyas” in Proceedings of the AAAI Conference on Artificial Intelligence (2020), vol. 34, pp. 454–462.
  • 50.R. Anil et al., Palm 2 technical report. arXiv [Preprint] (2023). http://arxiv.org/abs/2305.10403 (Accessed 10 December 2024).
  • 51.J. Jang, S. Ye, M. Seo, “Can large language models truly understand prompts? A case study with negated prompts” in Proceedings of The 1st Transfer Learning for Natural Language Processing Workshop, Proceedings of Machine Learning Research, A. Albalak et al., Eds. (PMLR, 2023), vol. 203, pp. 52–62.
  • 52.H. S. Seung, M. Opper, H. Sompolinsky, “Query by committee” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92 (Association for Computing Machinery, New York, NY, USA, 1992), pp. 287–294.
  • 53.Lewis D. D., A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum 29, 13–19 (1995). [Google Scholar]
  • 54.T. Scheffer, C. Decomain, S. Wrobel, “Active hidden markov models for information extraction” in Advances in Intelligent Data Analysis, F. Hoffmann, D. J. Hand, N. Adams, D. Fisher, G. Guimaraes, Eds. (Springer, Berlin, Heidelberg, 2001), pp. 309–318.
  • 55.T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, QLoRA: Efficient finetuning of quantized LLMs. Adv. Neural Inf. Process. Syst. arXiv [Preprint] (2023). https://arxiv.org/abs/2305.14314 (Accessed 10 December 2024).
  • 56.E. J. Hu et al., Lora: Low-rank adaptation of large language models. arXiv [Preprint] (2021). http://arxiv.org/abs/2106.09685 (Accessed 10 December 2024).
  • 57.L. Crowl, Reproduction Files for: Measuring criticism of the police in the local news media using large language models. Inter-university Consortium for Political and Social Research, Ann Arbor, MI [distributor]. 10.3886/E218063V1. Deposited 5 February 2025. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Data Availability Statement

Anonymized data and replication code have been deposited in a repository at the Inter-university Consortium for Political and Social Research (https://doi.org/10.3886/E218063V1) (57).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES