Skip to main content
Science Advances logoLink to Science Advances
. 2025 Mar 12;11(11):eadt1512. doi: 10.1126/sciadv.adt1512

Privacy violations in election results

Shiro Kuriwaki 1,, Jeffrey B Lewis 2, Michael Morse 3,*,
PMCID: PMC11900849  PMID: 40073116

Abstract

After an election, should election officials release a copy of each anonymous ballot? Some policy-makers have championed public disclosure to counter distrust, but others worry that it might undermine ballot secrecy. We introduce the term vote revelation to refer to the linkage of a vote on an anonymous ballot to the voter’s name in the public voter file and detail how such revelation could theoretically occur. Using the 2020 election in Maricopa County, Arizona, as a case study, we show that the release of individual ballot records would lead to no revelation of any vote choice for 99.83% of voters as compared to 99.95% under Maricopa’s current practice of reporting aggregate results by precinct and method of voting. Further, revelation is overwhelmingly concentrated among the few voters who cast provisional ballots or federal-only ballots. We discuss the potential benefits of transparency, compare remedies to reduce or eliminate privacy violations, and highlight the privacy-transparency trade-off inherent in all election reporting.


Voters’ vote choices are rarely deducible even if cast vote records, indicating how every ballot cast was marked, are made public.

INTRODUCTION

Democracies face a difficult trade-off in reporting election results between promoting public transparency and protecting individual privacy. After votes are cast, election officials must report results at a level of granularity sufficient to bolster the legitimacy of the count while avoiding revealing individual choices. Where to strike that balance between transparency and privacy is an increasingly pressing question. That is because most modern voting systems can now preserve anonymous, individual ballot records in addition to tallying aggregate results. Further, at least since the 2020 election, a growing swath of voters have been seeking “citizen audits” of elections by making public records requests for these ballot records (13).

In theory, election results should both promote transparency and protect privacy. Transparent, or granular, election results can promote confidence by helping to detect or deter electoral manipulation (48). Further, privacy, in the form of a secret ballot, can also reduce the market for vote buying and voter intimidation (9, 10). In practice, however, more transparent election results might unravel the secret ballot.

While privacy concerns with administrative data are not new (1113), neither formal privacy scholars nor political scientists have systematically explored the particular privacy risks in election results. Without the benefit of theoretical or empirical guidance, election officials across the United States have charted starkly different approaches to election reporting. A number of election officials now affirmatively release individual ballot records—for example, the states of Alaska, Georgia, and Maryland, as well as the counties of San Francisco, California; El Paso County, Colorado; Leon County, Florida; Ada County, Idaho; Tarrant County, Texas; and Dane County, Wisconsin (1422)—sometimes explicitly with the aim of shoring up trust (20). However, many other election officials have prohibited the release of ballot records. For example, election officials in Indiana, Missouri, and South Carolina are among those who have denied requests for ballot records (2325), and the North Carolina legislature recently turned back a legislative effort to publish them (26).

Election officials and legislators alike have come to different conclusions in part because of an empirical disagreement about the consequences of public disclosure (27). For example, jurisdictions that disclose ballot records explain that “ballots are anonymous and not connected to voters” (28) or “there [is] nothing linking the data to individual voters” (29). In contrast, jurisdictions that forbid disclosure emphasize that “the release of [ballot records] would likely lead to [the] identification of voters” (30) or would “threaten[] anonymity” (31).

To inform the public debate, we provide an empirical assessment of the privacy costs of transparent election results, from the aggregate to the individual level. We focus on what we term vote revelation. In short, as explained more below, we define vote revelation as linking vote choices to voters. Although a limited set of academic studies and reports have already pointed to the potential for vote revelation, those studies do not go on to measure revelation in an actual election (3235), or, if they do, only measure a special case of it (36).

We use Maricopa County’s 2020 general election as our primary case study. Our main finding is that election results could reveal at least one vote choice for approximately 0.00009% of voters if results were aggregated by precinct, 0.05% if results were aggregated by precinct and method [the county’s current practice (37)], and 0.17% if results were reported by releasing individual ballots.

Framework for vote revelation

To understand the debate about ballot records, it is helpful to start with two background facts about election administration. First, ballots are anonymous, i.e., no names are attached to the actual ballot. Second, the so-called “voter file” is not. Instead, it lists all persons by name who are registered to vote in a jurisdiction. It typically includes both where registrants live and if they have voted. In almost every state, voter records in the voter file are public (38). In other words, who citizens vote for is private, but whether they register and vote is public.

One specific concern is that a vote choice on an anonymous ballot will be uniquely linked to the voter’s name in the public voter file—a phenomenon we term “vote revelation.” The Venn diagram in Fig. 1 illustrates how vote revelation is possible. In particular, both ballot records (left) and voter records in the voter file (right) share overlapping information, shown at the intersection of the diagram. The overlapping information includes a voter’s precinct, ballot style, and vote method.

Fig. 1. Schematic of potential privacy violation by vote revelation.

Fig. 1.

A Venn diagram of how quasi-identifiers could link vote choice to personally identifying information.

Following Machanavajjhala et al. (39), we refer to these overlapping variables as quasi-identifiers. They are quasi-identifiers, rather than personal identifiers, in the sense that they are not necessarily unique identifiers. Instead, they narrow down which voter might have cast which vote. This setup thus mirrors common studies in the privacy literature, where scholars have shown how an analyst might triangulate multiple datasets with overlapping information to gain information not intended to be shared (11, 12). While the privacy literature would generally describe this as reidentification (13), we prefer the term revelation to emphasize the fact that ballots are anonymous.

The quasi-identifiers of precinct, ballot style, and vote method play an important role in both election reporting and election administration. The quasi-identifiers are available in ballot records because they define the reporting units for aggregating election results. For example, to report election results by precinct, election officials need to know the precinct in which each ballot was cast. More generally, any reporting unit for election results is defined by some combination of the quasi-identifiers on ballots, either precinct, ballot style, or vote method. The same quasi-identifiers are also available in voter files because they define where voters are assigned to vote (a precinct), which set of contests they are eligible to vote in (a ballot style), and how they participate (a vote method). Voters in the same precinct may be eligible to vote for different contests and thus have different ballot styles (fig. S4).

Types of revelation

Building on the privacy literature discussed in Materials and Methods, we distinguish between three overlapping types of vote revelation: (i) public revelation, (ii) local revelation, and (iii) probabilistic revelation. We use Fig. 2 to illustrate each type of revelation using a hypothetical election.

Fig. 2. How unanimous election results reveal votes.

Fig. 2.

In this example, 30 voters vote in two contests (president and a tax referendum) and the reporting units for results are defined by precinct × vote method × ballot style.

Public revelation

A vote can be publicly revealed when all voters with the same quasi-identifiers are unanimous in their vote choice. In the privacy literature, the mechanism for public revelation is thus known as a homogeneity attack (39).

In Fig. 2, the presidential choice of in-person voters is vulnerable to public revelation because all in-person voters were unanimous. An analyst could use the public voter file to find the names of all in-person voters who are registered in precinct A, assigned to ballot style X, and recorded as voting in-person, and know with certainty how they voted for president.

Because revelation is driven by unanimity, revelation is more likely, all else equal, when election results are reported in reporting units with fewer voters. Formally, Eq. 1 shows that the expected number of publicly revealed voters in a reporting unit, which we denote as R, is a function of the size of the reporting unit, N ≥ 1, and the lopsidedness of the contest

E[R]=Nh = 1HwhN (1)

where wh is the a priori probability that a randomly selected voter in the reporting unit supports candidate h{1,,H}. Thus each w is bounded between 0 and 1 (inclusive) and sums to 1. In the special case of singleton reporting units (i.e., N = 1), we see that a voter’s choice is always revealed (E[R]=Nh=1Hwh=1). Equation 1 quickly becomes monotonically decreasing in N after a tipping point. The expected revelations also decrease as races become less lopsided and more evenly contested (e.g., w1=0.5,w2=0.5). In the typical two-candidate uncompetitive contest in which the probability that a randomly selected voter supports the leading candidate is w1=0.7 and there are no abstentions, expected revelations drop below 0.01 after N = 22 and below 0.001 after N = 29. Section A of the Supplementary Materials generalizes the statistical model to allow for abstentions.

Local revelation

Votes that cannot be revealed publicly can still be revealed locally. Local revelation can occur if there is private information available about some individual vote choices in the reporting unit, or what privacy scholars refer to as a “background knowledge attack” (39). For example, in Fig. 2, the single Biden voter who cast a mail ballot could deduce with certainty that 19 of her neighbors must have voted for Trump. However, each Trump vote would only be revealed to the single Biden voter, not the public as a whole.

The extent of local revelation depends on the number of people sharing their private vote choice. More formally, we denote the number of such collaborators required for local revelation by α. The local revelation is only to those groups of α voters whose candidate preferences differ from those of the remaining voters and is only possible in practice if those groups of α voters are able to credibly share their vote choices with each other.

Equation 1 can be modified to consider the probability of a α-level revelation. Instead of computing the probability of N out of N voters making the same choice, section A of the Supplementary Materials computes the probability of at least N − α out of N voters making the same vote choice using the binomial coefficient.

Probabilistic revelation

It is an open question as to whether near-unanimous revelation should also be considered a form of vote revelation. To continue the example in Fig. 2, an outside analyst who observes the reporting unit with 19 Trump votes and 1 Biden vote, and no other information, would predict that the probability any voter in the unit could be a Trump voter is 95%.

Some definitions of statistical privacy do consider near-perfect probabilities of revelation as a successful privacy attack (4042). These approaches consider an update in an analyst’s beliefs about the values of the sensitive items (here, vote choices) from their prior belief (e.g., based on the voter’s quasi-identifiers or partisan affiliation) as privacy leakage. In this study, we consider a vote choice probabilistically revealed if it is assigned a (posterior) probability of more than 0.95 under the assumption that every voter in each reporting unit is equally likely to have cast each ballot. We report this operationalization of probabilistic revelation for context and return to its relevance in Discussion.

Not vote revelation

In Materials and Methods, we also discuss several related privacy concerns that we do not consider to be vote revelation. The most important caveat is that we do not consider a voter voluntarily revealing their vote choice to be vote revelation. We also do not consider the possibility of vote revelation from other potential quasi-identifiers on a ballot, such as voter language or time stamps. In general, we set these concerns aside because they are not the focus of election officials who are grappling with how to report election results. Instead, these concerns can be addressed outside of election reporting.

Implications of revelation

On the basis of our framework, the mechanism for vote revelation has two implications for those concerned with privacy violations in election reporting.

First, individual ballot records do not necessarily reveal more information about how any particular person voted than aggregate election results. That is because the quasi-identifiers that generate vote revelation are available in both individual ballot records and aggregate election results. As in Fig. 1, individual ballot records typically have quasi-identifiers for precinct, ballot style, and method. As a result, from the perspective of vote revelation, releasing individual ballot records is equivalent to releasing aggregate election results by precinct, style, and method. Ballot records do not reveal more information simply because they show how a single voter voted on all contests. To return to Fig. 2, even if we had the individual ballots for the 10 in-person voters who voted for the same candidate for president, we could still not identify which of those 10 voted Yes on the tax referendum.

Second, the extent of vote revelation will vary as a function of both administrative choices about the size of reporting units as well as political behavior. For one, we expect primary elections to generate more potential revelation than general elections. In general, fewer voters participate in primaries but the number of reporting units—such as a precinct—typically stay the same. As a result, primaries typically have fewer voters in each reporting unit, mechanically increasing the likelihood of revelation. Primaries may also be more likely to feature lopsided contests, which further contribute to unanimous reporting units. For another, we expect that elections that include many different contests will create more revelations (all else equal) than elections (such as a special election) involving just one contest because the former requires more distinct ballot styles. In addition, it is not clear whether rural or urban jurisdictions will feature more revelation. Rural jurisdictions will typically have fewer voters per reporting unit to accommodate the geographic dispersion of voters. However, urban jurisdictions are more likely to have unanimous election results because of political homophily (43). Last, there are a set of idiosyncratic factors that influence revelation but do not neatly map onto types of elections or jurisdictions. For example, how jurisdictions report provisional ballots, which are relatively rare, will affect the number of voters per reporting unit. In the next section, we present a case study of vote revelation along the continuum of election reporting and then assess the generalizability of our findings.

RESULTS

We use Maricopa County, Arizona, as a case study to quantify the extent of vote revelation in the reporting of election results. Following the 2020 presidential election, Maricopa County became the most prominent site in the battle over public access to ballots. Maricopa County is also closely divided by partisanship, home to millions of people, and features a range of geographies from urban, suburban, and rural areas. As we describe in more detail in Materials and Methods, we use a particular type of individual ballot record, known as a cast vote record. These cast vote records include the necessary information on both quasi-identifiers and vote choice to allow us to reconstruct election results at any common level of aggregation.

We first calculate the magnitude of vote revelation in Maricopa County across the continuum of election reporting. We then decompose vote revelation to consider both how to reduce revelation and the potential harm of revelation. Last, we assess how similar Maricopa County’s 2020 general election is to other elections and other jurisdictions.

Prevalence of vote revelation

Table 1 presents our main finding. The outcome of interest is whether a voter would have at least one of their votes revealed in a contested election. Our choice of outcome serves as the worst-case scenario for the magnitude of vote revelation. The top panel presents public vote revelation, the middle panel considers local revelation, and the bottom panel reports probabilistic revelation, while the different columns correspond to different reporting units.

Table 1. Revelation by aggregation of election results.

The number of voters with a revelation in at least one out of the roughly 60 contests on the ballot, according to various definitions of revelation, by level of reporting unit. Parentheses translate the number of voters with at least one revealed vote into a percentage of total voters.

Level of aggregation
Precinct Style Precinct × method Ballot (precinct × style × method) Total voters
Public revelation (revealed to everyone with certainty)
19 73 1088 3492
(0.0009%) (0.003%) (0.05%) (0.17%) 2,089,563
Local revelation (revealed to α collaborators)
α = 1 +56 +65 +1225 +2798
(0.003%) (0.003%) (0.06%) (0.13%) 2,089,563
α = 2 +81 +128 +2436 +5408
(0.004%) (0.006%) (0.12%) (0.26%) 2,089,563
Probabilistic revelation (linked with near-certain probability)
≥ 0.95 51 93 1132 3829
(0.002%) (0.004%) (0.05%) (0.18%) 2,089,563

The first cell in the top panel shows that 19 voters could have at least one of their votes publicly revealed if the 2020 general election results were reported at the precinct level. This is 0.0009% of the more than 2 million voters who participated in Maricopa County’s election. If the same quantity were reported at the precinct-method level, as in Maricopa County’s official canvass (37), 1088 voters (0.05%) could have at least one of their vote choices revealed. Last, in the most granular reporting unit we consider, where election officials release individual ballots, 3492 voters, or 0.17% of Maricopa voters, could have at least one vote choice revealed.

The middle panel of Table 1 separately reports the increase in local vote revelation relative to public vote revelation. The table uses α to define the degree of local revelation. When α = 1, every person knows how they themselves voted, but does not share how they voted with anyone else. In that case, we find that 56 additional voters would have at least one of their vote choices locally revealed with precinct-level election results. The case of α = 2 might correspond to a household within a precinct, where a couple shares their vote choice with each other. We find that 81 more voters (which includes the 56 voters under α = 1) will have their vote locally revealed in this case.

Even if we relax the unanimity requirement, the bottom panel shows that the prevalence of revelation increases only modestly. The bottom panel considers probabilistic revelation. Overall, 51 voters cast a vote in precincts where the voteshare for at least one contest was at least 95%. Relaxing the threshold from unanimity to 95% is equivalent to considering local revelation with α = 1 in a reporting unit with N = 20 voters. The increase in probabilistic revelation is more muted at more granular levels of reporting because those reporting units have fewer voters to begin with.

Drivers of vote revelation

To consider how to reduce revelation, Table 2 decomposes the public vote revelation in Table 1 by vote method and ballot style. It reveals that the vast bulk of revelation comes from how Maricopa County reports votes on provisional ballots and ballots limited to federal offices.

Table 2. Revelation by vote method and ballot style type.

Number of voters with at least one contest subject to public revelation in the ballot-level reporting regime, for each vote method or ballot style type.

All voters Revealed
N % N % Pr(Revealed)
By vote method
 Early (mail-in) 1,915,487 91.7 1058 30 0.06%
 In-person 167,878 8.0 777 22 0.46%
 Provisional 6198 0.3 1657 47 26.7%
Total 2,089,563 3492
By ballot style type
 Federal only 8114 0.4 1842 53 22.7%
 State and federal 2,081,449 99.6 1650 47 0.08%
Total 2,089,563 3492
Subset of ballots
 Federal only or provisional 13,274 0.6 2996 86 22.6%
 Neither 2,076,289 99.4 496 14 0.02%
Total 2,089,563 3492

Federal law generally requires election officials to offer some voters either a provisional ballot or a ballot limited to federal offices (44, 45). Provisional ballots may be offered, for example, when an individual may not appear in a precinct’s pollbook. Further, federal law also requires that some voters be able to vote for federal offices, even if they are ineligible to vote for state offices. In Arizona, voters must show proof of citizenship to vote in state elections (46). However, the state law does not apply to federal offices (47). Instead, federal law permits any voter who attests to being a citizen to be able to vote for federal offices. For this reason, Arizona has a particularly large number of federal-only ballots. Maricopa County treats provisional ballots as separate vote methods and federal-only ballots in different precincts as separate ballot styles.

Table 2 compares the number of total voters and publicly revealed voters by vote method or ballot style type. The first panel shows that provisional ballots are only 0.3% of all ballots cast, but they are nearly half of the ballots with at least one revealed vote. Further, the probability of revelation given that a voter casts a provisional ballot is 26%. The risk of revelation is therefore more than 400 times more likely among provisional voters than voters who vote by mail, which is the most common vote method. Similarly, the second panel shows that federal-only ballots are only 0.4% of ballots cast, but also account for more than half of all revealed ballots. The risk of revelation is again highly disproportionate. Last, the third panel shows that votes on ballots which are either provisional or federal-only are particularly likely to be revealed. Of the 3492 ballots with at least one vote revealed, 2996, or 86%, come from provisional or federal-only ballots.

Provisional ballots and federal-only ballots are the main drivers of vote revelation in part because they are rare. Less than 1% of voters end up using either a provisional or a federal-only ballot. As a result, both the provisional ballot voting method and the federal-only ballot style drastically shrink the size of the relevant reporting unit, making unanimity (and thus revelation) more likely.

Patterns of vote revelation

We also consider whether revelation could disproportionately affect certain types of voters. These patterns may be relevant to public officials who are considering other dimensions of election reporting, beyond the extent of revelation.

By contest

Figure 3 shows that top-of-the-ticket offices have a higher prevalence of vote revelation than down-ballot offices. For example, the median judicial retention contest has a potential revelation rate of 0.000012%, which is three orders of magnitude smaller than the comparable revelation rate for President. Revelations track salience because voters tend to abstain in low-salience, down-ballot contests like ballot measures and retention elections. Figure S6 shows the undervoting rate for a judicial retention election is close to 40%, while the undervoting rate for President is only 0.4%. As a result, even though down-ballot offices are less likely to be competitive, the high abstention rates effectively prevent unanimity.

Fig. 3. Contests with more revelation.

Fig. 3.

A boxplot showing the fraction of public revelations in the ballot-level reporting regime, excluding federal-only ballots. Each point represents a contest. The solid bars indicate the median, the box indicates the first and third quartile, and the whiskers extend to 1.5 multiplied by the interquartile range.

By voter partisanship

A related concern might be that revelation could fall disproportionately on people with certain partisan preferences. Table 3 reports the conditional probability that a voter has any vote revealed given their presidential vote choice. Biden voters are most vulnerable to any-contest revelation (0.19%), while third-party Libertarian (Jorgensen) voters are least likely (0.10%). There is differential revelation because votes for the third-party candidate for President are not as clustered by geography as votes for the major-party candidates. There are no pockets of neighborhoods that heavily favor Libertarians, whereas some urban precincts may heavily vote for Democratic candidates.

Table 3. Revelation by presidential vote choice.

Number of voters with at least one contest subject to public revelation in the ballot-level reporting regime, for each presidential candidate’s supporters.

All voters Revealed
Vote for President N % N % Pr(Revealed | President)
Biden (D) 1,040,774 50 1985 57 0.19%
Trump (R) 995,665 48 1430 41 0.14%
Jorgensen (L) 31,705 2 31 1 0.10%
All others 21,419 1 46 1 0.21%
Total 2,089,563 3492

By popularity

Yet another concern is that revelation could fall disproportionately on people who hold unpopular views. As detailed in section B.3 of the Supplementary Materials, we compare the presidential vote choice of each revealed voter to the choice of other voters living in the surrounding geographies. For each fixed radius from the revealed voter, we compute the degree of agreement between revealed voters and all other voters.

Figure 4 summarizes our agreement measure when results are reported at either the precinct level (left) or ballot level (right). An important difference between the two reporting units is that a precinct is a geographic unit while a ballot contains a mix of geographic and nongeographic units, such as vote method.

Fig. 4. Revelation of unpopular vote choices.

Fig. 4.

For each revealed vote choice at the precinct level or ballot level, we display the percentage of the revealed voter’s neighbors who share that vote choice, where neighbors are defined by geographical distance. Solid lines show average agreement by revealed candidate.

The left facet has a single line because Maricopa County had only a single precinct whose votes for president were unanimous. By definition, the agreement value is 100% at x = 0 miles because revelations arise from unanimous results. At a x = 10 mile radius, the agreement value drops to about 75%, meaning that three out of four voters in a 10-mile radius from the revealed voters agree with the revealed voters’ choice.

The agreement value on the right panel does not begin at 100% because one vote method within a precinct may be unanimous while another method in the same precinct is often not. The solid lines show the weighted mean agreement with revealed votes for each candidate. The revealed votes for Biden and Trump tend to reflect the majority, not the minority, view. However, the Libertarian voters who are revealed are distinctly unpopular at any radius because Libertarian preferences are not clustered by geography or method.

Case study in context

We expect the prevalence of vote revelation to vary both across elections and across jurisdictions, in part because of differences in the number of voters per reporting unit. Further, while the patterns of revelation by contest are likely consistent across jurisdictions, the other patterns of revelation by preference or popularity could vary based on the extent of partisan homophily. For these reasons and others, we compare Maricopa County’s 2020 general election to other elections and to other jurisdictions.

Different elections

Table S1 compares Maricopa’s 2020 general election and its later 2022 primary election, using similar cast vote records. The 2022 primary election only had a third of voters compared to the 2020 general election, but it had the same number of precinct reporting units. Consistent with our statistical model, voters in the 2022 primary election were two to five times more likely to have their vote choices revealed than in the 2020 general election.

Different jurisdictions

Analyzing revelation in the 2020 general election across all jurisdictions nationwide is beyond the scope of this study, particularly for individual ballot records. Instead, we consider the relative difference in the size of reporting units across jurisdictions. Given both our theoretical framework and empirical results, we focus specifically on the number of voters in reporting units with few voters, because these voters are particularly vulnerable to revelation.

Table S2 makes use of a recently published dataset of cast vote records for the 2020 general election from 160 counties (48), as further detailed in section B.4 of the Supplementary Materials. These counties are scattered around 20 states and are not a representative sample of the U.S. population, in part because many jurisdictions do not release cast vote records. They are collectively slightly more non-white and 3 percentage points more Democratic leaning than the country as a whole (table S3). Nonetheless, the available comparison is informative for contextualizing the prevalence of revelation in Maricopa County.

Overall, Maricopa County had relatively more voters per precinct in the 2020 general election but relatively fewer voters with particular ballot styles, such as the federal-only ballots discussed above. Focusing on the precinct-level first, Maricopa County had only 1 voter per 100,000 in precincts with less than 10 voters and only 3 per 100,000 in precincts with less than 30 voters. In comparison, the Conevska et al. (48) counties had 12 voters per 100,000 in precincts with less than 10 voters and 49 in precincts with less 30 voters. The comparison set of counties also allows us to examine number of voters in ballot-equivalent reporting units, defined by precinct, ballot style, and vote method. Maricopa had 381 voters per 100,000 in ballot-equivalent units of less than 10 voters, while the Conevska et al. (48) counties had just 158 voters per 100,000.

Given these relative differences in the size of reporting units, we expect that the precinct-level revelation we find in Maricopa County may be closer to a lower bound of revelation across the country, while the ballot-level revelation may be closer to an upper bound. However, analyzing revelation in the Conevska et al. (48) data is beyond the scope of this study, as they did not validate their data against the certified election results, as we have carefully done for Maricopa County. While validation was not necessary for their study of ticket splitting, validation is particularly important for vote revelation because public revelation is driven by the unanimity of results. Section B.5 of the Supplementary Materials instead compares revelation across 47 states in a validated dataset of 2020 presidential election results at the precinct level (49).

DISCUSSION

Given our results, we lastly address the future of election reporting. We first review the evidence on the value of transparent election results, drawing from the political science literature. We then evaluate current ex-post approaches to reducing revelation and propose additional ex-ante approaches. We lastly consider the limitations of any empirical case study.

Benefits of transparency

A growing literature in political science finds support for theories that more transparency in election reporting can reduce voter fraud or increase voter trust. For one, granular election results might reduce fraud because they facilitate the detection of voting irregularities. The growing body of so-called election forensics uses granular election results precisely for that reason. For example, in a classic study, Wand et al. (4) used multiple sources of granular election results, including cast vote records, to show that the effect of poor ballot design was large enough to swing the winner of the 2000 presidential election [see also (50)]. More recent studies similarly rely on results reported by vote method and precinct to dispel claims of election fraud (6, 7, 51, 52) or detect actual election fraud (5).

Consistent with election forensics, international organizations monitoring elections recommend that countries publish more granular election results to reduce fraud. The findings of various political science studies support that recommendation. For example, a field experiment in the 2010 Afghanistan parliamentary elections found that notifying polling station managers that their initial vote tallies will be photographed and made public substantially reduced the manipulation of vote counts (53). Further, an over-time study of 125 low- and middle-income countries finds that more granular election results are associated with fewer perceived irregularities (54). The 2024 presidential election in Venezuela offers a particularly stark example of the need for transparency. In that election, the politically controlled election board declared that the incumbent had won with 52% of the total vote, while withholding polling station-level statistics that showed he had in fact lost the election (8).

More granular election results may also improve public trust by changing perceptions. Rueda et al. (54) find that more granular results improve expert perceptions of the overall quality of international elections. Further, in the contemporary United States, Jaffe et al. (55) show that publishing postelection audits increases the mass public’s confidence in both the accuracy and result of the election. The effect of publicly releasing audit results is large—Jaffe et al. (55) find that its effect on the public’s confidence in the audit is about as large as the effect of the declared winner in the audit being a copartisan. It is possible that the effect might be even larger if the ballots being audited were public, too. Then, everyone could participate in the process, skeptic or not.

Granular election results have also become central to a range of other aspects of election research. For example, granular election results are relied upon to demonstrate racial polarization relevant to the enforcement of current voting rights protections (56). Cast vote records in particular also allow better measurement of ticket splitting and voter behavior in ranked choice elections (5759).

Ex-post approaches to reducing revelation

Transparency, of course, has costs as well as benefits. Election officials are broadly pursing two types of policies to reduce or eliminate vote revelation. The first approach uses redaction to remove certain information from election results, while the second injects noise into those results. These approaches can either target the quasi-identifiers associated with individual ballots or the vote choices themselves. In general, modifying vote choices has more privacy guarantees than modifying quasi-identifiers (60) but runs counter to a fundamental goal of elections to identify a winner.

Redaction

Several states redact information from all reporting units that have fewer than a certain number of voters. For example, by statute, election officials in Nevada, New Mexico, and Florida are required to redact vote method from precinct-method aggregate totals if the number of votes cast in any precinct-method reporting unit is below a defined threshold of either 5, 10, or 30 total votes (6163). In that case, election officials would report results for the particular precinct as a whole rather than separating it by method. A similar approach is taken in other democracies too. For example, in Germany, election officials combine any precinct with less than 50 total votes with another precinct. The approach effectively redacts a precinct identifier by reporting two precincts as if they were one (64). Such redaction and aggregation can be an intuitive approach for election officials to reduce revelation from election reporting.

Nonetheless, redaction almost always reduces transparency too. Figure 5 visualizes the trade-off by returning to Maricopa County’s 2020 general election. The horizontal, dashed line reflects that there are 3492 votes vulnerable to revelation, using the ballot-level, any-contest specification as in the previous figures. The x axis varies the threshold for redaction, which we denote by k. The y axis counts the number of voters whose ballots would be subject to some form of redaction if they were part of reporting units with less than k voters. For any particular threshold, the figure distinguishes between voters whose vote is (yellow) or is not (gray) vulnerable to public revelation before redaction.

Fig. 5. Trade-offs from redaction.

Fig. 5.

Number of affected ballots from a hypothetical policy of redacting ballots from reporting units with k or fewer voters. k = 0 indicates the status quo of no redaction.

By definition, a policy of no redaction (k = 0) would expose all votes vulnerable to revelation, while a policy of redacting and aggregating ballots in reporting units with just a single voter (k = 1) would reduce revelation without affecting ballots not vulnerable to revelation. To more substantially reduce revelation, an election official could increase k. For example, using a threshold of k = 5 or fewer voters could reduce revelation by 65%, but not eliminate it. However, the number of voters whose ballots are redacted despite not being vulnerable to revelation would increase as well. In our example, Maricopa County would need to set k to 31 to eliminate revelation. As a consequence, though, more than 11,000 voters whose ballots are not vulnerable to revelation would also be subject to some form of redaction.

More generally, the harm of redaction depends on both the information redacted and the other information about election results available in the jurisdiction. For example, many Colorado counties redact the vote choice from ballot records of voters in reporting units of less than 10 voters before they publicly release a cast vote record database (65, 66). In practice, New Mexico and Nevada appear to redact the number of votes that each candidate receives in small reporting units, rather than the vote method. Redacting the vote choice means the public cannot verify which candidate got the most votes, at least in that specific set of election results. However, all of these states also report unredacted vote counts at higher levels of geography (e.g., counties and statewide).

Extending redaction

While the policies above deploy redaction based on the size of reporting units, a similar approach could instead target the particular types of ballots that are disproportionately likely to generate revelation. Recall that 86% of the potential revelations from releasing individual ballots in Maricopa County’s 2020 general election came from provisional and federal-only ballots (Table 2).

The two types of ballots call for somewhat different treatments but simple reporting adjustments could prevent most revelation. For one, election officials do not need to identify provisional ballots in election results. While there is a clear benefit to identifying voters whose provisional votes need to be cured, once those ballots are counted, the benefit to identifying which ballots were provisional ballots is less clear. Election officials could instead modify vote method for provisional ballots to simply identify that it was cast in person, as provisional ballots typically are. For another, election officials could report provisional ballots (or federal-only) ballots in a separate nongeographic precinct. The upshot would be larger reporting units and thus less revelation.

Noising

Beyond redaction, North Carolina has charted a unique approach of adding noise to election results. In short, the state adds or subtracts an undisclosed random number to some reported election results. The policy—what we term noising—is consistent with the calls of many contemporary privacy scholars in the context of administrative data generally. However, the policy is a poor fit for election administration.

In theory, adding random noise can be a powerful privacy protection mechanism because it makes sensitive data more resistant to background knowledge attacks, while largely preserving the ability to draw correct inferences about the structure of the underlying data. Such privacy protection satisfies what scientists refer to as the differential privacy criterion (6770). However, valid inferences are only possible when the analyst has full knowledge of the noise injection mechanism and North Carolina has yet to publicly document its random noise algorithm.

More generally, differential privacy faces conceptual challenges when it comes to election reporting. Classic differential privacy requires that the reported result of every query to a confidential database be effectively equally likely regardless of whether any individual is removed from the database. The most important query for election results is simple: Which candidate won the most votes? The answer to this query must be reported with perfect fidelity in close elections, but as we explain in Materials and Methods, differential privacy requires that the answer not substantially depend on any one vote, or, put another way, that we pick election winners in a way that no single vote could ever matter. In North Carolina, the state board warns that a candidate who appears to have won the most votes in the noised results may not be the one who was the actual certified winner (71).

Many of these challenges might be solved by advances in privacy algorithms. Some recent variants of differential privacy do incorporate constraints so that, in our case, the certified election totals at the constituency level would not be altered (72, 73). However, these methods are complex, require expertise and documentation, and are yet to be implemented in election administration.

The importance of ex-ante electoral design

Ex-post approaches to eliminate vote revelations after an election are inherently second best. Instead, election officials should also consider ex-ante approaches to reduce revelation before conducting the election.

Ex-post approaches such as redaction, coarsening, and noising only protect voters from having their vote choices revealed to the public. They do not protect voters from having their choices revealed to election officials themselves. The secret ballot, though, was developed in part to protect the privacy of voters’ political preferences from the government (9, 10). For example, if vote choices in a given reporting unit turn out to be unanimous, that information could be revealed to election officials themselves before they perform any redaction or noising. In other words, if the reporting of election results violates the secret ballot, it could do so regardless of whether the results are made public or not. Further, the ex-post redaction and coarsening of quasi-identifiers not only is nontrivial but also can easily generate the sort of distrust that officials are seeking to reduce.

A more robust secret ballot requires designing election reporting systems that are less likely to produce vote revelations. One way for jurisdictions to reduce the likelihood of revelation is to redraw political districts so that there are more voters per reporting unit and fewer or no rare ballot styles. However, some district boundaries may intersect with many other district boundaries and yet not be readily manipulable. Beyond redistricting, revelations can also be avoided by “reprecincting” to reduce the number of precincts (and thus expand the number of voters per precinct). For example, jurisdictions that use vote centers or vote-by-mail rather than traditional in-person polling stations could define their precincts by ballot style and thus avoid revelations created when precincts are split by ballot style.

A jurisdiction might also modify the information available in the public voter file. Voters’ names and addresses are easily linked to unanimous election results in the United States because names and addresses are publicly available in voter files (Fig. 1), along with information on whether registrants turned out to vote. Limiting that information on the voter file would make vote revelation much more difficult. But we expect state legislators are unlikely to adopt the approach, since parties depend on voters’ public contact information and vote history to mobilize and target support (74). Further, it would not address the prospect of revelation to election officials. A similar but less drastic approach would limit the snapshots of the voter file made available, in effect using the churn of the electorate for privacy protection. In conjunction with these approaches, states might also consider centralizing, standardizing, and regulating the election reporting process. Some states, such as Washington and Georgia, have already moved in that direction (75).

Remaining concerns

Election officials faced with navigating the privacy-transparency trade-off in election results may have other concerns beyond those addressed in our case study. For example, election officials might worry about the possibility of revelation even if there is not certain revelation. In general, though, probabilistic revelation seems to us to overstate the privacy risks of election reporting. A probabilistic approach fits uneasily in the U.S. election setting, where campaigns are already flush with individually identifiable information about voters (74, 76). The public voter file typically lists voters’ party of registration, and campaign contributions are typically public too. Further, there is no clear threshold to distinguish probabilistic revelation. Regardless, our results show that the extent of probabilistic revelation at 95% is similar to the amount of certain revelation (Table 1).

Election officials may also be concerned about the public’s perception of privacy violations. Regardless of the extent of actual revelation, perceptions of revelation may discourage participation (77). A substantial number of Americans think their vote choice is not actually secret (78, 79). However, the vast majority of Americans also report freely sharing their vote choice with others (78). Nonetheless, it would be concerning if publication of ballot records increases perceptions of privacy risk. Ultimately, future work should consider how best to inform voters about the actual prevalence of revelation.

Separately, election officials might worry that our case study will not generalize to their jurisdiction or to various types of elections. Ultimately, we expect vote revelation will vary by election and jurisdiction. We showed that revelation is two to five times greater in Maricopa County’s 2022 primary election than in its 2020 general election. However, we also found that Maricopa County likely had more revelation from releasing ballot records than we would expect in other jurisdictions, primarily because of its particular policies for reporting provisional and federal-only ballots, both of which can be easily remedied.

Last, while election reporting involves balancing privacy and transparency, our study does not quantify the value of transparency. If transparency is of little value, the case for more granular election reporting is necessarily weaker. On the other hand, if transparency is of some value, then taking transparency seriously requires reducing potential revelation from election results.

The future of election reporting

In this study, we extensively considered the ways in which a voter’s vote choice can be revealed through the reporting of election results and measured the actual prevalence of what we call public, local, and probabilistic revelation. Conceptually, we have shown that vote revelations are possible in any election reporting system that promotes transparency. Empirically, we found that the release of individual ballot records can publicly reveal a vote choice of 0.17% of the voters in a large, diverse county. In comparison, the release of aggregate results by precinct and vote method would reveal a vote choice for 0.05% of voters. Further, while Republican and Democratic voters were equally likely to have their vote choices revealed, voters who cast provisional ballots and ballots limited to federal offices were disproportionately more likely to be affected.

In the coming years, we expect election officials, legislators, and the public alike to continue to debate how to best balance transparency and privacy in election reporting. Choosing one particular reporting regime over another depends on both the empirical realities we detail here as well as normative preferences about the value democracies should place on privacy versus transparency. In that sense, our case study is not designed to dictate any particular policy as much as it offers a framework to arrive at one.

MATERIALS AND METHODS

Materials

We use Maricopa County, Arizona, as a case study to quantify the extent of vote revelation in the reporting of election results.

Case selection

We used cast vote records from Maricopa County’s 2020 general election and 2022 primary election. For each contest on each ballot, Maricopa’s cast vote records provide vote choice, including undervotes or overvotes, vote method (in-person, early, and provisional), precinct, and ballot style. Maricopa’s cast vote records thus include the necessary information on both quasi-identifiers and vote choice to allow us to reconstruct election results at any common level of aggregation. We inspected the dataset extensively to verify that it reproduced the result from the official canvass of results.

In the Results, we analyze contested single-member contests with nontransferable votes. We define a contested contest as a contest having at least two candidates listed on the ballot, not counting write-ins, or having at least two options on a referendum. On the basis of this definition, more than 99.5% of the ballots cast in the 2020 general election included at least 64 contested contests. Figure S6 shows a summary of the contests we examine.

Reporting units

Table 4 summarizes our collection of cast vote records for the November 2020 general election. It reports both the number of unique reporting units and the proportion of voters in reporting units with few voters, which corresponds to the empirical cumulative density function weighted by the number of voters in each unit. For example, the top row shows that Maricopa County used 743 unique precincts for the election. All in all, only 0.003% of voters (3 out of 100,000) voted in precincts with 30 or fewer voters. The most granular reporting unit is precinct × ballot style × vote method, which we have established is equivalent for purposes of vote revelation to releasing individual ballots. There are 4397 such reporting units, with 0.7% of voters in units with 30 or fewer voters and 0.028% in singleton units. The three rows with “…” further subset style-method reporting units by each of the three vote methods. Reporting units for provisional ballots have by far the fewest number of voters. Nearly 98% of voters who cast a provisional ballot are in ballot-level reporting units of 30 or fewer voters (last row; 97,661 out of 100,000 voters).

Table 4. Summary statistics.

Counts and coverage of the Maricopa County data we focus on in this paper. Each row indicates a reporting unit. The column “Unique units” indicates the number of unique units that exist. Cells show the proportion of voters in small reporting units (displayed as voters per 100,000), which correspond to the empirical cumulative density function weighted by the number of voters in each unit.

Voters per 100,000 in reporting units of size
Reporting unit Unique units N = 1 N ≤ 2 N ≤ 5 N ≤ 10 N ≤ 30
Precinct 743 0 0 0 1 3
Ballot style 381 0 0 2 5 13
Precinct × ballot style 1741 1 4 34 124 358
Ballot equivalent 4397 28 65 182 381 705
… Only mail method 1722 2 8 47 143 319
… Only in-person method 1369 126 256 469 657 1530
… Only provisional method 1306 5357 12,649 34,124 66,731 97,661

Cast vote records versus ballot images

Cast vote records are one of two different types of individual ballot records, each with subtly different implications for vote revelation. However, the two records have often been confused in the public debate.

A cast vote record is a machine-readable enumeration of a voter’s choice across some or all contests on their ballot (80, 81). Cast vote records are generally stored in familiar data formats such as XML, JSON, or spreadsheets. A ballot image is a digitally recorded scanned image of each ballot or ballot page. Vote scanners can produce both types of individual records.

From a privacy perspective, the relevant difference is that cast vote records only contain a voter’s contest choices, while a ballot image would capture all marks or notations made on the ballot, including those that do not indicate a choice for a particular candidate. For this reason, voters are often prohibited from adding identifying marks to their ballot to identify themselves [see, e.g., (82)], and election officials attempt to redact ballot images with identifying marks [(33), see also fig. S3 for a redacted mark].

Nonetheless, ballot images are more vulnerable to vote revelation than cast records because deterrence from criminal sanctions and redaction by election officials may be ineffective or incomplete. Further, our interest is not in self-revelation, but rather the revelation of vote choices that were not intended to be revealed to anyone. For this reason, we study what the choices themselves can reveal. Those choices are contained in the cast vote records.

Voter file data

The possibility of vote revelation, from either individual cast vote records or aggregate election results, depends on the completeness of the voter file. In practice, the public voter file is incomplete. For one, there is well-known churn in publicly available voter files, which would incidentally prevent revelation (83, 84). Further, Maricopa’s Address Confidentiality program removes the records of about 4000 voters from their public voter file. These voters are typically victims of domestic abuse and stalking who are allowed to shield their addresses from public records (85). Last, Maricopa’s voter file lumps together in-person votes and provisional votes under the same vote method.

Nonetheless, we assume that such a complete dataset is available and measure the amount of potential revelation that would be possible with a complete list of voters and their quasi-identifiers. The main reason is that election officials, such as those in Maricopa County, retain a list they refer to as the “voted file” that is not affected by voter churn and does not coarsen vote method. Anyone can obtain a voted file through an open records request, and even if voted files were not available to the public, the state has access to such a file. Section B.2 of the Supplementary Materials provides more detail on our comparison between this voted file and a commercial voter file.

Methods

Our methodology builds on work in the formal privacy literature and extends it to the unique context of election reporting.

Vote revelation in a privacy framework

The building blocks of an election database include a table of individual ballot records and a table of individual voter registration records. As Fig. 1 illustrates, the ballot records indicate each choice made in each contest by each voter in any election, while the voter registration records enumerate which registrants cast a ballot in the same election. The ballot records are anonymous, i.e., no names are attached to the actual ballot, but contain quasi-identifiers (e.g., precinct, ballot style, and vote method).

A formal statement of what we call public revelation is based on a measure called -diversity (39). A dataset is considered -diverse if there are at least distinct “well-represented” values of a sensitive attribute (vote choice) in a block of data with a specific quasi-identifier (precinct, style, or method). The concept of -diversity contrasts with the older notion of k-anonymity (86), which holds that a dataset is k-anonymous if there are at least k − 1 individuals with the same quasi-identifier. Even if a dataset is k-anonymous, a sensitive attribute can be revealed if all k − 1 voters with a specific quasi-identifier vote unanimously (in other words,  = 1 with only one value). However, datasets that are -diverse are still not necessarily immune from vote revelation (60). Instead, a person who voted themselves can use their own private knowledge to back out how others voted. For this reason, the mechanism for what we call local vote revelation is known as a background knowledge attack (39).

Not vote revelation

We set aside various related privacy concerns in our definition of vote revelation.

Other quasi-identifiers. First, we do not consider the possibility of vote revelation from quasi-identifiers other than precinct, ballot style, and vote method. For example, election officials have voiced concern that including time stamps on ballot records indicating exactly when each ballot was cast (or in what order the ballots were cast) could be used to reveal votes. At least one voting system is known to have inadvertently disclosed the order in which in-person ballots were cast (87). Relatedly, some states allow for so-called “retrievable” ballots. For example, North Carolina marks ballots cast during early voting with an identifier such that it is retrievable (88).

We set aside such concerns arising from other quasi-identifiers because the solution to these concerns is about election administration, not the reporting of election results. To continue the examples above, election officials need not provide a time stamp of a voter’s interaction with a ballot in a publicly available cast vote record. Further, the ballot identifier is not a public record and can only be linked to a voter if a vote is later found to be fraudulent. Otherwise, it is a criminal offense for anyone “who has access to an official voted ballot or record and knowingly discloses … how an individual has voted that ballot” (89).

In contrast, if election officials are required to report precinct-level results, they must necessarily associate the voter’s precinct with each ballot to administer the election. Similarly, information about the ballot style is inherent to each ballot and cannot typically be redacted. {However, the cast vote records produced by some voting systems do not connect vote choices made by voters across a multiple-page ballot [see, e.g., (90)]. In this case, ballot style may not be entirely revealed by the cast vote records and, thus, ballot style information is not inherent to cast vote records in those cases.} Election officials have also voiced concern about ballots in different languages, but we are not presently aware of ballot language being recorded in a voter file.

Authentication. Our definition of vote revelation also does not consider the possibility of voters using cast vote records to voluntarily reveal their vote. Prior work has warned that releasing ballot images could allow voters to authenticate their votes by adding identifiable stray marks, which could be captured in ballot images (32). That concern is also reflected in state law. For example, Minnesota requires that a ballot not be intentionally marked with “distinguishing characteristics” that render it identifiable (82). More recent work focuses on how releasing cast vote records in ranked-choice elections could similarly allow each voter to use their complete ranking ordering of the candidates as a verifiable public signal that would nearly always be unique to them (91, 92).

Nonetheless, we see vote revelation as a more pressing concern than vote authentication in considering the public disclosure of individual ballots. For one, vote-buying and vote-selling already violates federal law (93). Further, vote authentication may be permissible as protected First Amendment speech, at least in some jurisdictions [see, e.g., (94)]. For example, many states permit voters to share a picture of their marked ballot, known as a “ballot-selfie” (95).

Negative revelation. Our definition of vote revelation does not include so-called negative revelation: the ruling out of choices that a voter may have made. Negative revelation is often found in granular election results because zero votes might be cast for particular minor-party candidates in a reporting unit. Cast vote records can allow more negative revelation than aggregate election results. To use the setup of two, suppose a vote buyer asks one voter to vote for Trump and vote Yes on a referendum, and another to vote for Biden and vote No on the same referendum. However, if the first voter voted {Trump; No}, and the second voted {Biden; Yes}, the cast vote records would reveal that both voters had reneged, while aggregate results would not.

Vote verification. Last, our definition of vote revelation pertains to revelation to others, not self-verification. Cryptographers have designed encryption systems that, when properly implemented, allow voters to verify to themselves that their vote was properly tallied as cast (34).

Prevalence and patterns of vote revelation

Our main results calculate whether a voter had at least one of their votes revealed in a given election. In other words, if we define rij=1 when voter i’s vote for contest j{1,,Ki} is revealed, and rij=0 otherwise, we count the number of voters for whom rij=1 in at least one contest j. To formalize our quantities of interest, index voters by i{1,,N}. The number of voters revealed in contest j is given as the sum across voters i=1Nrij. In Table 1, we compute Ni=1Nj=1Ki(1rij), i.e., everyone except voters for whom none of their Ki contests are revealed.

We also analyze contest-specific revelation. Figure 3 studies the mean of rij for each contest j. For generalizability, we omit federal-only ballots, because those few ballots are both substantially more likely to lead to vote revelation, and by definition, such revelation would reveal a vote for federal office. We also drop three school district contests with few eligible voters for comparability across types of contests. In Fig. 4, we limit our attention to the presidential contest.

Differential privacy

We lastly formalize our claim in Discussion that it is impossible to implement a differential privacy algorithm that will preserve the fidelity of the final vote count in an election determined by one vote.

Consider the statistic s (e.g., the winner of the election) produced from a private dataset of voters D. The (usually random noise-inducing) processing is denoted, and the value of the statistic is m. The privacy literature considers M to be differentially private if the ratio of Pr[M(s,D)=m] to Pr[M(s,D)=m] is below a user-specified constant exp(ϵ), where D is a dataset that differs from D by the inclusion of one additional row. Now, consider an election is decided by one vote, D is the dataset that includes every voter, and D is the data where one voter is excluded. Then, if M has no noise, M(s,D) will report the same winner as M(s,D) with probability 0. Therefore, the ratio of Pr[M(s,D)=m] to Pr[M(s,D)=m] will be infinity, and no value of ϵ would satisfy the differential privacy criterion. For ϵ to approach 0, the result of the election would have to be noised enough that the outcomes of the closest elections would be determined by lot and the presence or absence of any one ballot could never matter. Therefore, differential privacy seems incompatible with the integrity of a single vote.

Another conceptual challenge is that classic differential privacy is defined on multiple hypothetical versions of the same dataset, comparing whether one person is in the dataset or not. This idea may work for a survey sample, where there may be multiple samples from the same population, or for a policy of continuously releasing statistics from a changing population (13). However, an election only happens once. Elections are defined by their finality. It is therefore difficult to rely on a notion of multiple versions of the same election.

Acknowledgments

For valuable comments and suggestions, we thank participants at the Midwest Political Science Association conference (2023), the Election Science, Reform, and Administration conference (2023), the Conference on Empirical Legal Studies (2024), the University of Pennsylvania Law School’s Ad Hoc workshop (2024), the Washington University School of Law Election Law Conference (2024), the Harvard Law School Law and Politics Workshop (2024), and the American Political Science Association conference (2024), as well as members of the State Audit Working Group, members of the North Carolina Board of Elections, and S. Baltz, H. Branscomb, P. Burke, G. Evans, A. Friedman, D. Hoffman, S. Jarrett, C. Jerdonek, C. T. Kenny, R. Lutz, Y. Li, C. Warshaw, B. White, A. Wood, and Q. Yang. This research was deemed not human subjects research by the IRBs at Yale University and the University of California Los Angeles. A previous draft of the paper was titled “The Still Secret Ballot: The Limited Privacy Cost of Transparent Election Results.”

Funding: We received no funding for this research.

Author contributions: S.K., J.B.L., and M.M. all contributed to the design of the study, data analysis, visualization, writing, and editing.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper, the Supplementary Materials, or supplementary data repository at https://doi.org/10.7910/DVN/V83DN5 (Harvard Dataverse).

Supplementary Materials

This PDF file includes:

Supplementary Text

Figs. S1 to S7

Tables S1 to S4

sciadv.adt1512_sm.pdf (1.4MB, pdf)

REFERENCES AND NOTES

  • 1.Stewart C. III, Trust in elections. Daedalus 151, 234–253 (2022). [Google Scholar]
  • 2.Pildes R. H., Election law in an age of distrust. Stanford Law Rev. 74, 100–114 (2021). [Google Scholar]
  • 3.Brennan Center, Voting Laws Roundup (2022). https://perma.cc/4PW2-XFLX.
  • 4.Wand J. N., Shotts K. W., Sekhon J. S., Mebane W. R., Herron M. C., Brady H. E., The butterfly did it: The aberrant vote for Buchanan in Palm Beach county, Florida. Am. Polit. Sci. Rev. 95, 793–810 (2001). [Google Scholar]
  • 5.Herron M. C., Mail-in absentee ballot anomalies in North Carolina’s 9th congressional district. Elect. Law. J. 18, 191–213 (2019). [Google Scholar]
  • 6.Goel S., Meredith M., Morse M., Rothschild D., Shirani-Mehr H., One person, one vote: Estimating the prevalence of double voting in US presidential elections. Am. Polit. Sci. Rev. 114, 456–469 (2020). [Google Scholar]
  • 7.Cantú F., The fingerprints of fraud: Evidence from Mexico’s 1988 presidential election. Am. Polit. Sci. Rev. 113, 710–726 (2019). [Google Scholar]
  • 8.J. Corrales, D. Kronick. “How Maduro Stole Venezuela’s Vote.” Journal of Democracy 36, 36-49 (2025).
  • 9.A. Keyssar. The Right to Vote: The Contested History of Democracy in the United States (Basic Books, 2009). [Google Scholar]
  • 10.I. Mares. From Open Secrets to Secret Voting: Democratic Electoral Reforms and Voter Autonomy (Cambridge Univ. Press, 2015). [Google Scholar]
  • 11.Ohm P., Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Rev. 57, 1701–1777 (2010). [Google Scholar]
  • 12.Sweeney L., Yoo J. S., Perovich L., Boronow K. E., Brown P., Brody J. G., Re-identification risks in HIPAA safe harbor data: A study of data from one environmental health study. Technol. Sci. 2017, 2017082801 (2017). [PMC free article] [PubMed] [Google Scholar]
  • 13.Wood A., Altman M., Bembenek A., Bun M., Gaboardi M., Honaker J., Nissim K., O’Brien D. R., Steinke T., Vadhan S., Differential privacy: A primer for a non-technical audience. Vanderbilt J. Entertain Technol. Law 21, 209–276 (2018). [Google Scholar]
  • 14.Division of Elections, State of Alaska. 2024 General Election Cast Vote Records. https://perma.cc/BSR4-AL6N.
  • 15.Georgia Secretary of State. Ballot Image Library Search. https://perma.cc/9VZ2-HNW2.
  • 16.Maryland State Board of Elections. Cast Vote Records for 2022 General Election. https://perma.cc/HXF4-QDDR.
  • 17.SF Elections. Ranked Choice Voting and Detailed Reports. https://perma.cc/LE87-KH4S.
  • 18.El Paso County Clerk and Recorder. 2023 through 2021 Records and Results. https://perma.cc/2SLK-YX5V.
  • 19.L. Atkeson, L. Bryant, W. Hansen, Y. Li, A. Cutler, K. Mackie, E. McKown-Dawson, M. S. Earley, 100% Independent Retabulation Election Audit, Leon County, Florida (Tech. Rep., 2023). https://perma.cc/2TEG-PQWM.
  • 20.M. Baker, “An Idaho county will publish everyone’s ballots to combat mistrust,” New York Times, 2024.
  • 21.Tarrant County Elections. Past Election Information. https://perma.cc/9Y7H-ETLE.
  • 22.County of Dane, Wisconsin. Election Audit Central. https://perma.cc/L7BJ-4YFB.
  • 23.R. Rivas, Election deniers ramp up public records requests in Missouri, across the country. Missouri Independent (2022). https://perma.cc/22LW-A9LA.
  • 24.York County. Ballots, Ballot Images and CVRs Are Not Subject to FOIA (2022). https://perma.cc/Y8MV-S7B3.
  • 25.M. Houck. Letter to South Carolina Election Commission, Office of the Attorney General, State of South Carolina. 2020 WL 5985610. 2020.
  • 26.R. Jacobs, Republican Lawmakers in North Carolina Tap Brakes on Confidential Voting Records Bill (WUNC, 2023). https://perma.cc/G2GL-M8NY.
  • 27.J. Fifield, A Bill to Enable Do-It-Yourself Election Audits Sparks Rare Bipartisan Interest in Arizona (Votebeat, 2023). https://perma.cc/2LXF-TMZQ.
  • 28.C. Beecher. Legal status of cast vote records. Response to Michael Morse on October 30, 2023.
  • 29.S. McDonnell. Legal status of cast vote records. Response to Michael Morse on October 10, 2023.
  • 30.M. Houck. Opinion Letter to South Carolina Representatives. 2022 WL 4229451. 2022.
  • 31.K. Hobbs, Veto of HB2560 (2023). https://perma.cc/974J-DMCL.
  • 32.Adler E. S., Hall T. E., Ballots, transparency, and democracy. Elect. Law. J. 12, 146–161 (2013). [Google Scholar]
  • 33.J. McCarthy, N. McBurnett, H. Branscomb, R. Rivest, P. Stark. Preserving Anonymity of Cast Vote Record (Tech. Rep., 2018). https://perma.cc/KBV5-AASX.
  • 34.M. Bernhard, J. Benaloh, J. Alex Halderman, R. L. Rivest, P. Y. Ryan, P. B. Stark, V. Teague, P. L. Vora, D. S. Wallach, Public evidence from secret ballots. Electronic Voting. E-Vote-ID 2017. Lecture Notes in Computer Science (2017).
  • 35.Congressional Research Service. Election Policy Fundamentals: The Secret Ballot (Tech. Rep. IN12389, 2024).
  • 36.D. Rubinstein, “Glitch reveals ballot choices of N.Y.C. voters, including mayor’s son,” New York Times, 2021.
  • 37.Maricopa County Elections Department. November 3, 2020 General Election Canvass (Tech. Rep., 2020). https://perma.cc/9BLU-KC6T.
  • 38.United States Election Assistance Commission. Availability of State Voter File and Confidential Information (Tech. Rep., 2020). https://perma.cc/6AE8-ACSM.
  • 39.A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, ℓ-diversity: Privacy beyond k-anonymity, in 22nd International Conference on Data Engineering (ICDE’06) (IEEE, 2006). [Google Scholar]
  • 40.Hotz V. J., Bollinger C. R., Komarova T., Manski C. F., Moffitt R. A., Nekipelov D., Sojourner A., Spencer B. D., Balancing data privacy and usability in the federal statistical system. Proc. Natl. Acad. Sci. U.S.A. 119, e2104906119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Duncan G., Lambert D., The risk of disclosure for microdata. J. Bus. Econ. Stat. 7, 207–217 (1989). [Google Scholar]
  • 42.C. T. Kenny, S. Kuriwaki, C. McCartan, E. Rosenman, T. Simko, K. Imai, Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System (Harvard Data Science Review Special Issue 2, 2023), pp. 1–16.
  • 43.Brown J. R., Enos R. D., The measurement of partisan sorting for 180 million voters. Nat. Hum. Behav. 5, 998–1008 (2021). [DOI] [PubMed] [Google Scholar]
  • 44.52 U.S.C. §§20302, 20310.
  • 45.52 U.S.C. §21082.
  • 46.Ariz. Rev. Stat. Ann. §16-121.01(C).
  • 47.Arizona v. Inter Tribal Council of Arizona Inc., 570 U.S. 1 (2013).
  • 48.A. Conevska, S. Hirano, S. Kuriwaki, J. B. Lewis, C. Mutlu, J. M. Snyder Jr., How partisan are US local elections? Evidence from 2020 cast vote records. [Preprint] (2024). 10.31219/osf.io/db3mj. [DOI]
  • 49.Baltz S., Agadjanian A., Chin D., Curiel J., DeLuca K., Dunham J., Miranda J., Phillips C. H., Uhlman A., Wimpy C., Zárate M., Stewart C. III, American election results at the precinct level. Sci. Data 9, 651 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Morse M., Herron M. C., Meredith M., Smith D. A., Martinez M. D., Election administration harms and ballot design: A study of Florida’s 2018 United States senate race. Am. J. Polit. Sci. 10.1111/ajps.12919, (2024). [DOI] [Google Scholar]
  • 51.Eggers A. C., Garro H., Grimmer J., No evidence for systematic voter fraud: A guide to statistical claims about the 2020 election. Proc. Natl. Acad. Sci. U.S.A. 118, e2103619118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bafumi J., Herron M. C., Hill S. J., Lewis J. B., Alvin Greene? Who? How did he win the United States senate nomination in South Carolina? Elect. Law. J. 11, 358–379 (2012). [Google Scholar]
  • 53.Callen M., Long J. D., Institutional corruption and election fraud: Evidence from a field experiment in Afghanistan. Am. Econ. Rev. 105, 354–381 (2015). [Google Scholar]
  • 54.M. Rueda, G. Grossman, S. Ge, Do more disaggregated electoral results deter aggregation fraud? [Preprint] (2024). 10.31219/osf.io/dqt6c. [DOI]
  • 55.Jaffe J., Loffredo J. R., Baltz S., Flores A., Stewart C. III, Trust in the count: Improving voter confidence with post-election audits. Public Opin. Q. 88, 585–607 (2024). [Google Scholar]
  • 56.Greiner D. J., Ecological inference in voting right act disputes: Where are we now, and where do we want to be? Jurimetrics 47, 115–167 (2007). [Google Scholar]
  • 57.Kuriwaki S., Ticket splitting in a nationalized era. J. Polit. [Preprint] (2025). 10.1086/734263. [DOI] [Google Scholar]
  • 58.Alvarez R. M., Hall T. E., Levin I., Low-information voting: Evidence from instant-runoff elections. Am. Politics Res. 46, 1012–1038 (2018). [Google Scholar]
  • 59.S. Pettigrew, D. Radley, “Ballot marking errors in ranked-choice voting,” Working paper (2023). 10.2139/ssrn.4670677. [DOI]
  • 60.A. Cohen, Attacks on Deidentification’s defenses. 31st USENIX Security Symposium (USENIX Security 22, 2022), pp. 1469–1486.
  • 61.N. M. Admin. Code 1.10.33.8.
  • 62.Nev. Admin. Code 293.357.
  • 63.Fla. Stat. §98.0981(3)(a)1.
  • 64.§68 ¶2, Bundeswahlordnung (Federal Election Regulation).
  • 65.Kuriwaki S., Reece M., Baltz S., Conevska A., Loffredo J. R., Mutlu C., Samarth T., Acevecdo Jetter K. E., Djanogly Garai Z., Murray K., Hirano S., Lewis J. B., Snyder J. M. Jr., Stewart C. III, Cast vote records: A database of ballots from the 2020 US Election. Sci. Data 11, 1304 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Colorado Secretary of State. SOS Guidance: Reviewing and Redacting Cast Vote Records and Ballot Images (Tech. Rep., 2017). https://perma.cc/MK39-EBZK.
  • 67.C. Dwork, Differential privacy, in International Colloquium on Automata, Languages, and Programming, ICALP 2006 (Springer, 2006). [Google Scholar]
  • 68.Evans G., King G., Schwenzfeier M., Thakurta A., Statistically valid inferences from privacy protected data. Am. Polit. Sci. Rev. 117, 1275–1290 (2023). [Google Scholar]
  • 69.Abowd J. M., Hawes M. B., Confidentiality protection in the 2023 US Census of population and housing. Annu. Rev. Stat. Appl. 10, 119–144 (2023). [Google Scholar]
  • 70.Kenny C. T., McCartan C., Kuriwaki S., Simko T., Imai K., Evaluating bias and noise induced by the US Census Bureau’s privacy protection methods. Sci. Adv. 10, eadl2524 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Documentation for North Carolina Noised Election Results. https://perma.cc/F646-EG4Y.
  • 72.J. Gao, R. Gong, F.-Y. Yu, Subspace differential privacy. Proc. AAAI ACM Conf. AI. Int. (2022). vol. 36, pp. 3986–3995. [Google Scholar]
  • 73.P. Dharangutte, J. Gao, R. Gong, F.-Y. Yu, Integer subspace differential privacy. Proc. AAAI ACM Conf. AI. Int. (2023). vol. 36, pp. 7349–7357. [Google Scholar]
  • 74.E. D. Hersh. Hacking the Electorate: How Campaigns Perceive Voters (Cambridge Univ. Press, 2015). [Google Scholar]
  • 75.Green R., FOIA-flooded elections. Ohio State Law J. 85, 255–306 (2023). [Google Scholar]
  • 76.Rentsch A., Schaffner B. F., Gross J. H., The elusive likely voter: Improving electoral predictions with more informed vote-propensity models. Public Opin. Q. 83, 782–804 (2019). [Google Scholar]
  • 77.Gerber A. S., Huber G. A., Doherty D., Dowling C. M., Hill S. J., Do perceptions of ballot secrecy influence turnout? Results from a field experiment. Am. J. Polit. Sci. 57, 537–551 (2013). [Google Scholar]
  • 78.Gerber A. S., Huber G. A., Doherty D., Dowling C. M., Is there a secret ballot? Ballot secrecy perceptions and their implications for voting behaviour. Br. J. Polit. Sci. 43, 77–102 (2013). [Google Scholar]
  • 79.Atkeson L. R., McKown-Dawson E., Hood M. III, Stein R., Voter perceptions of secrecy in the 2020 election. Elect. Law J. 22, 268–277 (2023). [Google Scholar]
  • 80.J. P. Wack, Cast Vote Records Common Data Format Specification Version 1.0 (Tech. Rep. National Institute of Standards and Technology, 2019). [Google Scholar]
  • 81.R. Leingang, Election Activists Are Seeking the “Cast Vote Record” from 2020. Here’s What It Is and Why They Want It (Votebeat, 2022). https://perma.cc/ZBA3-DRYG.
  • 82.Minn. Stat. §204C.22(13).
  • 83.Nyhan B., Skovron C., Titiunik R., Differential registration bias in voter file data: A sensitivity analysis approach. Am. J. Polit. Sci. 61, 744–760 (2017). [Google Scholar]
  • 84.S. Kim, B. Fraga, When do voter files accurately measure turnout? How transitory voter file snapshots impact research and representation Cambridge Open Engage. [Preprint] (2022). 10.33774/apsa-2022-qr0gd. [DOI]
  • 85.Maricopa County Elections Department, Correcting the Record: Maricopa County’s In-Depth Analysis of the Senate Inquiry (Tech. Rep., 2022). https://perma.cc/P5RQ-ERFZ.
  • 86.Sweeney L., k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowlege-Based Syst. 10, 557–570 (2002). [Google Scholar]
  • 87.DVSorder. 20022. https://perma.cc/N9Z3-XCWN.
  • 88.N. C. Gen. Stat. §163-166.45.
  • 89.N. C. Gen. Stat. §163-165.1(e).
  • 90.Morse M., The future of felon disenfranchisement reform: Evidence from the campaign to restore voting rights in Florida. Calif. Law Rev. 109, 1143–1197 (2021). [Google Scholar]
  • 91.Williams J. R., Baltz S., Stewart C. III, Votes can be confidently bought in some ranked ballot elections, and what to do about it. Polit. Anal. 32, 463–475 (2024). [Google Scholar]
  • 92.P. Y. Ryan, P. B. Roenne, D. Ostrev, F.-E. El Orche, N. Soroush, P. B. Stark, “Who was that masked voter? The tally won’t tell!”, in Electronic Voting: 6th International Joint Conference, E-Vote-ID 2021, Virtual Event, October 5–8, 2021, Proceedings 6 (Springer-Verlag, 2021); 10.1007/978-3-030-86942-7_8. [DOI] [Google Scholar]
  • 93.Hasen R. L., Vote buying. Calif. Law Rev. 88, 1323–1371 (2000). [Google Scholar]
  • 94.Rideout v. Gardner, 838 F.3d 65 (1st Cir. 2016).
  • 95.National Conference of State Legislatures, Secrecy of the Ballot and Ballot Selfies (2022). https://perma.cc/W2RR-LVC5. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Text

Figs. S1 to S7

Tables S1 to S4

sciadv.adt1512_sm.pdf (1.4MB, pdf)

Articles from Science Advances are provided here courtesy of American Association for the Advancement of Science

RESOURCES