Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Aug 12;47(13-15):2711–2736. doi: 10.1080/02664763.2020.1804842

Assessing uncertainty of voter transitions estimated from aggregated data. Application to the 2017 French presidential election

Rafael Romero a, Jose M Pavía b,CONTACT, Jorge Martín a, Gerardo Romero c
PMCID: PMC9041978  PMID: 35707414

Abstract

Inferring electoral individual behaviour from aggregated data is a very active research area, with ramifications in sociology and political science. A new approach based on linear programming is proposed to estimate voter transitions among parties (or candidates) between two elections. Compared to other linear and quadratic programming models previously published, our approach presents two important innovations. Firstly, it explicitly deals with new entries and exits in the election census without assuming unrealistic hypotheses, enabling a reasonable estimation of vote behaviour of young electors voting for the first time. Secondly, by exploiting the information contained in the model residuals, we develop a procedure to assess the uncertainty in the estimates. This significantly distinguishes our model from other published mathematical programming methods. The method is illustrated estimating the vote transfer matrix between the first and second rounds of the 2017 French presidential election and measuring its level of uncertainty. Likewise, compared to the most current alternatives based on ecological regression, our approach is considerably simpler and faster, and has provided reasonable results in all the actual elections to which it has been applied. Interested scholars can easily use our procedure with the aid of the R-function provided in the Supplemental Material.

KEYWORDS: Ecological Inference, Linear Programming, Voter transitions, R × C contingency tables, French elections

1. Introduction

The analysis of voter transitions that occur between two elections from a set of parties (or candidates) to another is a relevant study topic of political sociology. The availability of appropriate estimates is relevant for many agents, including the media, political scientists and party teams [1]. Hence, for decades the issue has attracted the interest of many authors who have tried to exploit survey data and/or aggregate election results to produce accurate estimates (see, for instance: [26,36,35,61,5,49,14,15,18,63,39,62,55,10,53,34,37,42,51,33,2] or [40]).

In poll-based approaches, electoral mobility is estimated using vote recall of exit-polls or post-election surveys, or via panel surveys, as an aggregation of individual vote displacements. This strategy, however, raises serious concerns that question both their efficiency and effectiveness. Firstly, issues emerge of complexity and of sample size. Large, complex samples are required to reach reasonably accurate rate estimates given that, from a statistical point of view, a single population is not being sampled but as many populations as election options are contemplated in the first election. Secondly, even more disturbing is the challenge posed by nonresponse bias and measurement error. On the one hand, nonresponse is not randomly distributed among political options. The individual probability of nonresponse depends on the context, on the voter’s own vote, and even on the propensity to change it [41]. On the other hand, retrospective answers are not very reliable. When asked about their past electoral behaviour, electors frequently cannot recall their vote or are concerned with social desirability issues [11]. Hence, combined, both issues raise doubts about the variance and bias of poll-based vote transfer estimates.

Indeed, many authors have reported actual cases in which the bias induced by measurement error and nonresponse can lead to results that are far from the reality. As an example, we can compare actual data and raw answers collected in a survey conducted in November 2014 by the most prestigious Spanish survey organization (Centro de Investigaciones Sociológicas). In that survey, just 28% of the respondents claimed to have voted for the Conservative Party (PP) in the 2011 Spanish General election [9], when actually 45% of voters supported PP in that election. Thus, it is not surprising Miller [36, p. 122] claims that: ‘Surveys dealing with voting change are especially unreliable’. As a consequence of the above limitations, authors who have studied this issue in depth conclude that, in these types of surveys, imprecision and bias can be large and represent obstacles difficult to overcome [34].

The existence of these flaws has motivated several authors to try to estimate voter transitions using either statistical or mathematical algorithms that exploit recorded aggregate official results, which are certainly more reliable. All these methods, examples of the so-called ecological inference procedures, can be grouped into two main sets: ecological regression methods and mathematical programming procedures. The ecological regression literature has been more prolific, producing a larger number of proposals undoubtedly fuelled by the US legal ramifications related to the electoral redistricting processes [31] and by their use in epidemiology [13]. Electoral studies and epidemiology are not, however, the only disciplines where these approaches can be used. They are useful in many situations where the goal is to infer individual-level behaviour from aggregate data (see, for instance, [7]). In this paper, we propose a new ecological inference approach to estimate voter transitions, but based on linear programming. Compared to other procedures, our method presents two important innovations. On the one hand, it explicitly considers new entries and exits in the election census. On the other hand, by exploiting the information contained in the model residuals, it proposes a procedure to assess the level of uncertainty in the estimates. This second innovation is the main contribution of our paper. No other previously published method based on linear or quadratic programming [35,60,63,55,10] measures this issue. Uncertainty is routinely estimated in ecological regression approaches.

The rest of the paper is organized as follows. Section 2 briefly reviews the ecological inference literature. In Section 3, we present the model, introduce the mathematical conditions that must be fulfilled and state conditions under which the hypothesis of electoral homogeneity on which the model rests can be applied. In this section, we also address the problem of census changes, suggesting a personal solution that allows the estimation of new electors’ votes. This is illustrated with an actual example in Section 4 (Aragonese regional elections). Section 5 is devoted to analyzing the uncertainty associated with the model estimates, which is case-dependent on both data and model structure. We propose an original procedure to estimate it. The procedure is based on quantifying, by simulation, the relationship between true error rates and the degree of non-compliance of the homogeneity hypothesis. In actual applications, we can estimate the latter from the model’s residuals. In Section 6, we illustrate our procedure by estimating uncertainty in voter transitions between the first and second rounds of the 2017 French presidential election. Section 7 summarizes the conclusions obtained and suggests directions for further research. An R function to apply the methodology is described in an Appendix and its code provided in the Online Supplemental Materials.

2. Ecological inference methods. A brief revision of the literature

Polls are not always available (e.g. in local elections) and, when available, they are, as stated in the previous section, exposed to significant sources of bias. They also give rise to voter transition estimates with large variances. Hence, as an alternative, many methods just rely on recorded official outcomes. In this case, the basic strategy to reach estimates consists in applying a statistical or mathematical procedure to the results tailed in a set of territorial units. The drawback of this approach, which is exposed to the presence of the so-called ecological fallacy [54], lies in the fact that the underlying mathematical problem is indeterminate, since it depicts a system with more unknowns than equations. This forces the inclusion of additional hypotheses to obtain a solution [24]: usually the assumption that the voter transition matrices in the different units are, in some sense, ‘similar’. Two basic strategies have been followed in the literature: one based on ecological regression and the other grounded on mathematical programming.

The ecological regression literature has been very fertile since the seminal papers of Duncan and Davis [12] and Goodman [21,22] and has experienced a resurgence since King [29], despite the criticisms of Freedman et al. [17] and Cho [8]. Indeed, King [29] and notably King et al. [31] represent a tipping point in this literature, with some of the key references including King, Rosen, and Tanner [30], Rosen et al. [58], Wakefield [64], Greiner and Quinn [25], Glynn and Wakefield [20], Puig and Ginebra [53], Plescia and De Sio [51], Klima et al. [33] and Forcina and Pellegrino [16]. Klima et al. [34] discuss some of the main methods developed under the ecological regression framework and show some of the difficulties that these kinds of procedures entail. The principal problem with the most current approaches, apart from their high computational demand, lies in their complexity; mainly for those methods relying on the Bayesian framework. They require the intervention of highly skilled experts to properly specify the setting parameters and hypothesis for the distributions of the quantities to be estimated. Indeed, given specific election outcomes, the different methods can lead to different results and even a single method leads to quite different results as different values for certain operational parameters are set. The situation is worsened by the fact that the relevance of some hypotheses is sometimes difficult to establish and to gauge from the point of view of political science.

A different way followed by other authors is to approach the subject as a mathematical programming problem, looking for the values pjk of the vote transition matrix that, fulfilling certain restrictions, minimize, to a certain extent, the discrepancy with the outcomes recorded in the different territorial units. McCarthy and Ryan [35] propose a quadratic programme model to minimize the sum of squares of these discrepancies, Tziafetas [60] suggests minimizing the sum of their absolute values, which transforms the model into a linear programme, and Corominas et al. [10] explore four possible optimality criteria to estimate the pjk. Although mathematical programming approaches share some similarities with the ecological regression methods when the sum of squares of the discrepancies is used as loss function, these have the advantage of not needing to assume any particular probability distribution to guarantee that the pjk estimates are logically consistent. In mathematical programming, constraints are introduced in a natural way, making it possible to reach distribution-free estimates.

The abovementioned proposals, however, present two main drawbacks. Firstly, the way they all handle census changes is questionable. Secondly, none of the mathematical programming methods incorporates a procedure to measure the levels of uncertainty of the estimates provided by the model. Despite it being well known that the electoral behaviour of young electors newly entitled to vote is different to experienced voters (see, e.g. [27,59]), none of these methods takes this into account when estimating the electoral behaviour of young electors. Vázquez and Romero [63] propose an initial solution to this issue, with Romero [55,56,57] expanding and exemplifying its use in three actual election processes. In this paper, we deal with these two drawbacks. On the one hand, we deepen on the solution proposed by Romero [55]. On the other hand, as our main contribution, we develop within the mathematical programming approach a procedure to estimate in actual studies the margins of uncertainty associated with the estimated transition rates.

3. The LPHOM model

3.1. The basic model

The application of the proposed methodology requires, as in all the models referred to in the previous section, the results of the two elections in a set of I territorial units (which we shall refer to hereinafter as units) in which the overall area of the study is partitioned.

Let xij be the votes gained in unit i by the election option j(j=1,,J) of election 1, and let yik be the votes obtained in the same unit by the election option k(k=1,,K) of election 2. In both elections non-voters (abstentions), including perhaps null and blank votes, are considered as an additional election option. As is usual, we group the minor parties (or candidates) in a rest option. We discuss the issues related to census changes between the two elections later, in subsection 3.2, and in subsection 3.3 we introduce our whole model, referred to as LPHOM.

The objective is to estimate the J×K unknown values pjk, defined as the proportion of voters in the overall analysed territory who having chosen option j in election 1, choose option k in election 2. According to this definition, the pjk proportions must inevitably fulfil constraints (1), (2) and (3).

pjk0forj=1,,Jk=1,,K (1)
k=1Kpjk=1forj=1,,J (2)
j=1J(i=1Ixij)pjk=(i=1Iyik)fork=1,,K (3)

The mathematical programming models proposed in McCarthy and Ryan [35] and Tziafetas [60] include constraints (1) and (3). Constraints (2) are similar to those proposed in Johnston and Hay [28], Romero [55] and Corominas et al. [10]. The problem that arises is that the above system, having more unknowns than equations, is indeterminate, with infinite possible solutions.

At this point, denoting pjki as the proportion of voters in unit i that having chosen option j in election 1 choose option k in election 2, we have that the pjki proportions must exactly fulfil (4).

j=1Jxijpjki=yikfori=1,,Ik=1,,K (4)

Including these additional unknowns, pjki, and constraints (4), however, does not solve indeterminacy: the system remains indeterminate. To solve this indeterminacy it is necessary to include some hypothesis. As in all procedures referred to in the previous section, our hypothesis is that the voter transition matrices in the different territorial units are in some sense ‘similar’ to each other, and, therefore, similar to the global matrix.

It should be noted that the homogeneity hypothesis does not imply that the different units have voted in a similar way rather that the matrix of voter transitions between parties has been in all of them ‘similar’ to the global average matrix. For example, in the 2017 French elections it is obvious that there are regions that voted more for Macron and others that did so for Le Pen. What the hypothesis of homogeneity implies is that, for example, if at the national level the majority of those who voted for Macron in the first round also went on to do so in the second, this phenomenon of fidelity will have occurred in a similar way in all regions.

For the hypothesis of homogeneity to be reasonable, we need the study area to be electorally homogeneous in the considered elections. This means that: (i) the main options presented, on the one hand, in election 1 and, on the other hand, in election 2 have been basically the same in all the units; and, (ii) the motivations that may have influenced voters’ behaviour between elections 1 and 2 have been similar throughout the whole territory analysed, i.e. that voters’ motivations have not varied too much in the different units, with global trends weighting more than local trends [48]. In addition, for the homogeneity hypothesis to be adequate, it is advisable that the size of the units and also the size of the election options considered not be too small.

Thus, according to the hypothesis of homogeneity, equations (4) will be fulfilled approximately if the pjki proportions are replaced by the pjk; an issue which is expressed through equation (5), where the error terms eik should be ‘small’.

j=1Jxijpjk=yik+eikfori=1,,Ik=1,,K (5)

The basic model, therefore, consists of obtaining the values of pjk that, fulfilling constraints (1), (2), (3) and (5), verify (6).

minimize i,k|eik| (6)

An advantage of this basic model for sociologists and political scientists is that it easily allows the inclusion of constraints to force the result to fulfil certain conditions that the expert considers appropriate. The problem of acting like this, however, is that the results can lose to a certain extent their objective character, depending on the validity of the subjective hypotheses imposed. As an example, additional restrictions are imposed to the pjk ’s in the model suggested in Romero [55]. In particular, after establishing a correspondence between some of the J political options of election 1 and some of the K political options of election 2, Romero [55] imposes two additional conditions. On the one hand, he imposes that those parties that improve their election results retain most, at least a minimum percentage ws, of their previous voters. On the other hand, he assumes that the greater part of the votes for those parties which had a worse result in the second election comes from voters, at least in a minimum percentage wl, who already voted for them in the first election. These hypotheses, in principle reasonable, can be included in the model by adding the corresponding constraints. We have not included them in our model because we have found in most actual studies that they are usually automatically fulfilled by the estimates.

3.2. The problem of changes in the election census

It is unrealistic to maintain the hypothesis of stationary electorates for any pair of elections separated by a period of time. There will almost certainly be changes in the composition of the election censuses of the different units as a consequence of entries and exits. On the one hand, there will be new electors included in the unit censuses of election 2 who did not appear in the lists of election 1. They correspond to young people, ni , who reached the voting age between the two elections and new residents, mi, with the right to vote coming from other places. On the other hand, some voters included in the election censuses of election 1 will have exited from the lists of the election 2. Exit voters can be divided in to voters who, between both elections, moved outside the given territorial unit i, ei, and those voters who died, di . Unfortunately, this detailed information is almost never available for the average analyst. Even having access to the deanonymized, detailed census lists of both elections and linking them, it is impossible to separate exit voters into emigrants and deceased. To do this, deanonymized lists of deceased would also be required.

Demographic figures broken down into (single or five-year) age groups, nevertheless, are regularly published by official statistical agencies; therefore, accurate estimates of the number of new young voters (if they are not made available by the election authorities) can be easily obtained in each unit [46,47]. In a similar fashion, and depending on the size of the units, rough estimates of deceased voters could be computed applying age death probabilities to population figures. Finally, the balance of immigrants and emigrants, who cancel each other out, could be computed in each territorial unit as a residue.

With regard to new young voters, which generally represent a significant part of the new voters, their size depends on the time elapsed between the two elections and the age structure of the population pyramid. For instance, currently in Spain for each year elapsed between two elections, these new voters represent, on average, slightly less than 1.1% of the population over 18 years. On the other hand, regarding deceased voters, we see that again their size depends on the time elapsed between the two elections and the population age pyramid as well as on mortality rates. Currently, in Spain, this rate, expressed as a percentage of the population over the age of 18, is on average more than 1.5% for each year elapsed between the elections.

It is noteworthy that, although it could be assumed that mortality and migration flows would have a similar effect on the different options competing in election 1, i.e. proportionally to their relative weight, it seems questionable to assume that young voters will be behave in election 2 similarly to the older electorate.

In the references consulted, however, it is always assumed, more or less implicitly, that new voters behave similarly to those who leave the census would have done or in a similar fashion to those remaining in the census. For example, McCarthy and Ryan [35] compute for each unit the difference between entries and exits and define, for each party k in the election 2, a new parameter, γk, to capture the behaviour of these differences. This makes it impossible to estimate the vote of the new electors separately. They define γk as the proportion of vote to option k of the net balances between entries and exits. Defined this way, it is easy to verify that the γk proportions are a complicated combination of the share of votes for party k of new and previous voters with weights that depend on the ratio between entries and exits in each unit, an issue that makes the hypothesis of territorial homogeneity for the γk’s strongly questionable. A similar criticism can be made of the approach of Corominas et al. [10], who assume that the number of voters in each unit is the same in both elections. This implies, as the authors themselves indicate, ‘that the behavior of the electors not belonging to the intersection of both censuses is not different from those that belong to it’, not permitting an estimation of the young electors’ vote and making the hypothesis of homogeneity implicit in their model more questionable.

3.3. Extending the basic model: the LPHOM model

With detailed election census lists available, it is theoretically possible to know in each unit, or at least roughly estimate and benchmark, the number of young voters (ni), immigrant entries (mi) and exits (ei+di) between the two elections. Therefore, in this case, a more correct specification of the model would entail considering both kind of entries as additional election options (J1 and J) of the first election and exits as a possible destination (K) of the votes in the second election and to include as additional constraints pJ1,K=0 and pJK=0.

The above scenario, however, is quite data-demanding. A more typical scenario is one in which only accurate estimates of young voters are available in each unit. In this case, assuming that the counted votes in each election correspond to the J1 and K1 first categories of the respective elections, net exits (bi=di+eimi) can be easily computed from the available data bi=j=1J1xij+nik=1K1yik and both ni and bi figures introduced in the problem, respectively, as option J of election 1 and option K of election 2. It should be noted that in this situation, with units of sufficient size, net exits will always be positive and will coincide in great part with the number of exits due to mortality because of the compensating effect of immigration and emigration.

Another scenario occurs with electoral processes very close in time, as is the case in, for example, the first and second rounds of the French presidential election, where the changes in the electorate are really very small. In these cases, we can compute for each unit the changes exits between the two elections as bi=j=1J1xijk=1K1yik (considering again the same notation as in the previous scenario) and define in each unit the quantities given by equation (7). These new J and K categories will be irrelevant and, therefore, they could be omitted for presentation purposes.

yiK=bixiJ=0ifbi0yik=0xiJ=biifbi<0 (7)

When no information about new young voters is available and the time elapsed between both elections is significant, the values of equation (7) will not be irrelevant. Even so, they can still be computed and our method applied after introducing them into the system, although at the cost of a loss of interpretability in some of the pjk coefficients related to both categories J and K.

In a typical scenario, the rate transfers pjK corresponding to net exits are less relevant than the other rates. What is more, as some simulation studies have shown us, their estimates are, as expected, quite volatile for the smallest election options. Hence, taking into account that they are mainly a consequence of mortality, we will force them, for the first J1 election options considered in election 1, to be equal. This constraint might seem reasonable, as initially there is no reason to consider that mortality affects older voters differently in the different political options. In addition, we will also impose the obvious condition pJK=0. These constraints are included in the model using equations (8) and (9).

pjK=(i=1IyiK)/(j=1J1i=1IyiK)j=1,,J1 (8)
pJK=0 (9)

The default model we propose is to obtain the J×K values of the pjk that, fulfilling constraints (1), (2), (3), (5), (8) and (9), minimize the sum of absolute values of the eik . Given that this model is ultimately a linear programme, we propose naming it: LPHOM (acronym for Lineal Programme based on HOMogeneity hypothesis). In the Appendix we describe an R function (whose code is provided in the Online Supplemental Materials) to apply LPHOM procedure to actual data in all the possible scenarios discussed.

3.4. Additional considerations

LPHOM offers a tool to estimate the transfer of votes between two elections separated by a certain period of time. It is also possible, however, to use LPHOM in formally analogous problems, but with no changes in the electoral census. This would be the case, for example, of a single election where voters are partitioned into J ‘groups’ (based on criteria such as sex, race and/or social class), the xij are the numbers of electors in unit i belonging to group j and the yik are the votes gained by electoral option k in unit i, the objective being to estimate the proportions pjk of voters of the different groups voting for the different options. This is a typical ecological inference problem. In these situations, assuming that the hypothesis of homogeneity in electoral behaviour by group is reasonable (i.e. that the values of pjk are ‘similar’ in different units), the LPHOM model can be applied directly, but without including restrictions (8) and (9).

Another situation in which there are no changes in electoral censuses arises in simultaneous elections. This could be the case, for example, in Spain when general elections and regional elections are held simultaneously in a given autonomous region. In this scenario, the LPHOM model can be applied directly, obviously not including constraints (8) and (9). The challenge here arises in deciding which election should be considered as ‘origin’ and which ‘destination’.

In situations where there are changes in the electoral census and the J option in election 1 corresponds to the new young electors and the K option in election 2 as net exits, it is necessary to include in the model the restriction (9), being reasonable but dispensable the consideration of constraint (8).

4. Assessing LPHOM with new voters

In this section, we exemplify the use of the method in a scenario where new voters are explicitly considered by applying LPHOM to the 2015 Aragonese regional election. The regional elections held in 2015 in Spain were of particular interest as it was the first time in which the then two new big emerging parties of Spanish politics, Podemos (POD) and Ciudadanos (C’s), presented candidatures. POD is a left populist party, which has its roots in the so-called ‘15M movement’ [19]. C’s is a centre-right party born in Catalonia to oppose the independent nationalism that in 2015 decided to expand throughout Spain. Both parties presented themselves as new, regenerative options opposed to the two traditional big parties, PP and PSOE, under fire due to their problems with corruption and the economic crisis [4]. In this scenario, almost all the analysts agreed that new voters were going to turn their backs on traditional main-stream parties. Despite new voters being a relatively small group (4% of the census in the 2015 Aragonese regional election), we want to see whether LPHOM is able to properly capture their behaviour.

Aragon is chosen as our case study because it is considered a swing territory that perfectly reflects the particular mood that Spanish politics is sensing at any given moment, like a Spanish electoral thermometer [50]. Aragon is one of seventeen autonomous regions in Spain. It is divided into three constituencies: Huesca, Teruel and Zaragoza; the latter holding the capital of the region where half of the total regional inhabitants live.

In the 2011 regional election, only six of the seventeen parties competing surpassed 1% of the total votes: PP (the conservative party), PSOE (the socialist party), IU (a left party with a heavy weight of communists), CHA (a left nationalist Aragonese party), PAR (a moderate nationalist conservative party) and UPyD (a party created just a few years earlier and that was the largest party with no representation in the regional parliament). Table 1 provides a summary of the results of the regional elections held in Aragon in 2011 and 2015. In the table, blank and null votes have been added to non-voters (ABST), with ‘REST’ grouping the remaining minority parties.

Table 1. Summary of election outcomes for the 2011 and 2015 Aragonese regional election.

  ABST PP PSOE PAR CHA IU UPyD POD(1) C’s(1) REST
2011 340,020 269,729 197,189 62,193 55,932 41,874 15,667 - - 20,214
2015 332,911 181,757 141,528 45,577 30,334 27,936 5,637 135,554 62,188 17,357

(1) Podemos and Ciudadanos did not compete in the 2011 election.

To estimate the matrix of transfer votes between the 2011 and 2015 regional elections, we split Aragon into 15 territorial units: the provinces of Huesca and Teruel, the twelve election districts of the capital of the region and the rest of the province of Zaragoza. Although it is not a requirement of the approach to split the electoral space into a relatively small number of spatial units, this practice shows three real-world benefits. Firstly, it makes it easier for the homogeneous hypothesis to be verified [44]. Secondly, it avoids the problem of establishing the correspondence between small-area election units of different periods [45,43]. Thirdly, it significantly reduces the computational burden. The outcomes recorded in both elections in each of the 15 units considered are available in the Online Supplemental Materials (Tables S.1 and S.2).

The censuses of both the 2011 and 2015 elections and the numbers of new young voters incorporated into the 2015 election census of each province as a result of having reached the legal age to vote (18 years old) since 2011 election, made public by the Spanish Official Statistical Agency (INE), were combined to estimate new entries and net exits between both elections. New electors were 6,836 in Huesca, 4,457 in Teruel and 29,224 in Zaragoza. New voters in the province of Zaragoza were divided among the thirteen territorial units in which we split this constituency in a fashion proportional to their total election populations. Given that the total population of the region decreased between 2011 and 2015, net exits were positive. We assume that net exits (mainly due to mortality) affected in a similar way the different options competing in the 2011 election (constraint (8)). Net exits accounted for 6.2% of 2011 census.

Table 2, where new young voters are referred to as ENTR and net exits as EXIT, shows the estimated transition probabilities between the options considered in the 2011 and 2015 elections obtained after applying LPHOM procedure. From the data in Tables 1 and 2 it is easy to obtain Table S.3 in the Online Supplemental Materials that shows the origin of the votes obtained by the different options competing in the Aragonese regional election in 2015.

Table 2. Estimated vote transfer matrix (in percentages) between the 2011 and 2015 Aragonese regional elections.

  ABST PP PSOE POD C’s PAR CHA IU UPyD REST EXIT
ABST 70.7 * * 19.5 0.9 * * 0.1 0.2 2.2 6.2
PP 13.6 65.2 * * 13.6 0.1 * * 0.2 0.7 6.2
PSOE 20.0 * 65.5 3.3 * * * * * * 6.2
PAR * 9.1 19.7 * * 64.8 * * * * 6.2
CHA * * * 56.7 * * 37.0 * * * 6.2
IU * * * 36.9 * * * 54.6 * 2.1 6.2
UPyD * * * * 75.2 * * * 18.5 * 6.2
REST 44.5 * * * * 23.1 * 2.9 * 23.1 6.2
ENTR 17.5 * * 37.9 26.0 * 8.0 2.5 3.0 4.8 0.0

Note: Since the solution of a linear programme is always a basic solution, LPHOM tends to make exactly 1 or 0 the results very close to these values. Therefore, we prefer to substitute ones, if they exist, for 0.999 and zeros for the asterisk symbol indicating a very low value.

Despite the purely mathematical nature of LPHOM procedure, which does not include any consideration regarding the ideological proximity between the different options competing in both elections, outcomes in Tables 2 and S.3 are extremely clear and simple to interpret from the point of view of political sociology. For example, we can see that: (i) the most important source of the votes gained for any party in 2015 are the voters who voted for that same party in 2011, if the party competed at that election; (ii) the votes lost by the two main traditional parties were mostly to abstention and to the two new parties following an ideological alignment, C’s in the case of PP and POD in the case of PSOE; (iii) the new party POD received most of its votes from former abstentions, from previous left-wing party voters (CHA, IU and PSOE) and from new young voters; and, (iv) the new centre-right party C’s gained its votes mainly from previous PP (conservative) and UPyD (a party very close ideologically to C’s) voters, from new young voters and from former abstentions. Interested readers on the subject can consult a more detailed analysis in this regard in the Online Supplemental Materials.

For the purpose of this paper, we focus on the transition probabilities estimated for new young voters, whose behaviour is clearly different from those of previous election voters. The new parties POD (37.9%) and C’s (26.0%), followed by abstentions (17.5%), were the preferred choices of new young voters; whereas, the two traditional parties, PP and PSOE, had almost no success among this electorate. It is important to emphasize that this differential estimate of the new voters’ voting pattern is not possible through the procedures proposed by other authors. Contrary to what is assumed in those procedures, the voting patterns of young electors are distinct from that found at a global level in the region, where PP and PSOE were the most voted parties.

5. Estimating the uncertainty of model results

5.1. Introduction

The fundamental problem in scientifically establishing the validity of the methodology comes from the fact that it is (almost) impossible to compare LPHOM outcomes with actual transition probabilities. Aside from extraordinary circumstances, such as in simultaneous elections where the same ballot paper is used to vote for the different political contests and individual votes are available, actual voter transition probabilities are impossible to know. Likewise, due to the lack of reliability of retrospective answers and poll data for these kinds of studies, comparing LPHOM outcomes to survey approximations does not seem to be a valid alternative.

Faced with this impossibility, we can conceive, in principle, two possible approaches for judging the validity of the proposed method. One alternative is to assess the logic and rationality of the process followed to estimate the vote transfer matrices. The other alternative is to analyse whether the results provided by the method are ‘reasonable’ when applied to actual elections. With respect to the first point, that of the rationality of the process, we have already discussed in Section 3 the soundness and logic of the hypothesis of homogeneity of electoral mobility in the different units, provided that the conditions indicated therein were fulfilled in the definition of the territorial units. Regarding the second alternative, a former and simpler version of LPHOM procedure has been used in a number of recent electoral processes held in several Spanish regions (see Table 3). There is no particular reason to choose these elections beyond opportunity and easy access to the data for the authors. In our opinion, which is also shared by many Spanish experts in political sociology, in all cases the results obtained (which can be consulted, in Spanish, in the Online Supplemental Materials and in the references indicated in Table 3) have been ‘reasonable’, in the sense of being logical and clearly interpretable in sociological terms. As an example, we have always obtained that the most important source of votes of any party competing in election 2 was the voters of the same party in the previous election (if the party contested at that election). This remark can look surprising because we should remember that the results obtained are based on a purely mathematical manipulation of the data that does not take into account the possible ideological proximity among the different options, nor even between a party in election 1 and the same party when competing in election 2. The fact that the model, despite its purely mathematical nature, has always provided reasonable results in actual studies seems to be a certain guarantee of its validity, that is, of the validity of the homogeneity hypothesis on which it rests.

Table 3. Some studies performed using a former version of LPHOM procedure.

Region Number of units (I) Election 1 Options in Election 1* (J) Election 2 Options in Election 2* (K) Source**
Aragon 15 2011 regional election 8 2015 regional election 10 Online Suppl. Materials
Madrid 13 2011 regional election 5 2015 regional election 7 Online Suppl. Materials
Valencian Region 10 2011 regional election 7 2015 regional election 9 Online Suppl. Materials
Andalusia 8 2012 regional election 7 2015 regional election 9 Online Suppl. Materials
Catalonia 10 2012 regional election 9 2015 regional election 8 Romero [56]
Andalusia 8 2015 regional election 7 2015 General elections 7 Online Suppl. Materials
Valencian Region 14 2015 regional election 8 2015 General elections 7 Online Suppl. Materials
Andalusia 8 2015 General elections 7 2016 General elections 6 Romero [57]
Madrid 23 2015 General elections 7 2016 General elections 6 Romero [57]
Valencian Region 14 2015 General elections 7 2016 General elections 6 Romero [57]
Basque Country 8 2016 General elections 6 2016 regional election 6 Online Suppl. Materials

* Entries and exits in census were not included in these studies.

** Associated documents in Spanish.

At this point, therefore, the question is: what is the margin of uncertainty of the results obtained when applying LPHOM procedure to a specific study? As we discuss in the following subsections, the model outcomes provide information to calculate a heterogeneity index that allows the adequacy of the homogeneity hypothesis in each actual study to be quantified as well as the margin of uncertainty of the results achieved. This is the main contribution of our paper: a procedure to assess the level of uncertainty of the estimates. No other previously published method based on linear or quadratic programming method provides such a measure.

5.2. Model residuals: estimating the heterogeneity

If the hypothesis of electoral homogeneity was fulfilled exactly in a given study, that is, if all pjki were exactly equal to their average value pjk in the whole territory, LPHOM would yield as output the unknown true values of pjk with all the eik errors being zero. The departure of homogeneity hypothesis in each unit is therefore captured somehow in the residuals. In this and the following subsections we show how these can be used to estimate the uncertainty of LPHOM outputs.

To address the problem of quantifying the uncertainty associated with LPHOM outcomes, it is therefore important to estimate in each real instance the extent to which the homogeneity hypothesis is verified. If all the true vote transfer matrices in each unit were known, the degree of non-compliance of the homogeneity hypothesis can be easily quantified using, for instance, the heterogeneity index HET defined in equation (10), where vjki denotes the number of voters that, in unit i, choose option j in election 1 and option k in election 2.

HET=1000.5ijk|vjkixijpjk|ijxij (10)

As can be clearly seen, HET accounts for the percentage of voters which should be shifted to match perfectly the homogeneity hypothesis. The problem with HET lies in the impossibility of computing it in real studies, given that actual values for vjki, and also for pjk, are unknown. Nevertheless, since if HET were zero all the eik residuals would also be null, it makes sense to define an estimated heterogeneity index HETe through equation (11).

HETe=100ik|eik|ijxij (11)

Unlike HET, the HETe value can be calculated in any actual study from the results provided by LPHOM.

As we show in the next subsection, we have carried out a set of simulation studies in five different scenarios to analyse, among other points, the relationship between HET and HETe. Considering together the results of the 6900 simulations performed, 1380 in each one of the five scenarios, we obtain an almost perfect linear relationship between log(HETe) and log(HET), with a Pearson correlation of 0.989 and HET=1.626(HETe)0.921 as fitted equation. These results reveal HETe as being a good predictor of the real heterogeneity index HET. In the following subsections, we exploit this relationship to quantify the uncertainty associated with the results provided by LPHOM in actual studies.

5.3. Relationship between error index and heterogeneity index

In order to assess the relationship between the estimated heterogeneity index (HETe) and the uncertainty of the results provided by LPHOM, we have carried out a set of simulation studies. These studies have been implemented in five different scenarios characterized by two matrices X and Q.

  • -

    The matrix X=[xij] collects the results achieved in election 1 by the different options in the different territorial units. This matrix accounts for the impact of numbers I and J (number of units and options considered in the first election) and for the degree of electoral diversity in the different units in election 1.

  • -

    The basic matrix Q=[qjk] of global voting transitions between the options presented in both elections. This matrix accounts for the impact of number K (the number of options considered in the election 2) as well as for the basic structure of the voter transitions.

To generate randomly a certain degree of heterogeneity in the transition matrices of the different units, the pjki values have been obtained by adding to the qjk values a uniform random variable between --d and +d. These initial pjki values are subsequently readjusted to be non-negative and verifying kpjki=1 for all i and j. The level of heterogeneity is regulated by d. In all the simulated scenarios, we attained a correlation coefficient between d and HET higher than 0.99. More details of the simulations performed are available in the Online Supplemental Materials that accompanies this paper.

From pjki and xij we build the hypermatrix W=[wjki], whose generic element is the number of voters that swing from option j in election 1 to option k in election 2 in unit i. From W it follows the matrix V=[vjk=iwjki] of transition votes in the overall territory and the matrix P=[pjk] of voter transition probabilities at the global level. In general P is close to Q.

From W we also obtain the matrix Y=[yik=jwjki] whose generic element represents the number of votes gained for each option j of election 2 in unit i. The matrices X and Y are given as inputs to LPHOM, from which we obtain the estimated matrices of vote transitions V=[vjk] and of transition probabilities P=[pjk]. These matrices are compared to V and P in order to assess the degree of proximity between estimated and ‘actual’ results. LPHOM also provides the eik values of the residuals and the value of the estimated heterogeneity index HETe.

An overall measure of the discrepancy between V and V is the error index, EI, defined by equation (12).

EI=1000.5jk|vjkvjk|jkvjk (12)

It is easy to verify that EI is the percentage of votes whose destination has been erroneously estimated by the model.

For each scenario, we have considered 46 possible values of d, chosen between 0.001 and 0.1, and performed 30 simulations for each value. The characteristics of the five scenarios and the results obtained for HETe and EI in each one of the 1380 simulations performed in each scenario can be consulted in the Online Supplemental Materials. In all scenarios analysed, we find a close relationship between the error indexes, EI, and the estimated heterogeneity indexes, HETe. In the five cases this relationship is satisfactorily modelled by a regression equation using log(EI) as dependent variable and log(HETe) and its square as predictor variables. The high values attained for the multiple correlation coefficient (0.931 in Scenario 1, 0.976 in Scenario 2, 0.965 in Scenario 3, 0.972 in Scenario 4 and 0.976 in Scenario 5) show that HETe is a good predictor of EI.

Figure 1 shows the mean values predicted for EI as a function of HETe in the five scenarios analysed. As can be seen in Figure 1, although the general shape of the relationships are very similar in all the scenarios, the particular values of the corresponding equations noticeably differ among scenarios.

Figure 1.

Figure 1.

Relationships between averages of EI and HETe in five simulated scenarios.

Preliminary simulation studies that we have undertaken seem to indicate that some of the factors that influence the relationship between EI and HETe are the ratio between the number of equations in the model and the number of unknowns pjk and the degree of diversity of the results in election 1 in the I territorial units.

5.4. A procedure to estimate outcomes’ uncertainty in actual studies

Outcomes from the previous subsection show that to evaluate the degree of validity of LPHOM results it is necessary to estimate in each specific study the particular function that relates EI and HETe. This can be performed by means of a simulation study similar to those carried out in this research, but using the particular scenario defined by the data corresponding to the study. That scenario will be characterized by the X matrix with the results obtained in election 1 in the I different territorial units and by the P matrix of voter transition estimated by LPHOM.

From the relationship estimated in this way and from the particular value for HETe computed by LPHOM, it is possible to estimate the predicted value for the error index EI in the instance under study, and also to establish confidence limits for this index. In the next section, the operative of this procedure is illustrated applying it to the study of the voter transfers between the first and second rounds of the 2017 French presidential election.

6. Voter transitions between first and second rounds of the 2017 French presidential election

To illustrate the simplicity of the LPHOM procedure, we estimate and analyse voter transitions between the first and second rounds of the 2017 French presidential election, measuring its level of uncertainty. This an interesting case of study due to their political relevance and because, as can be deduced from Table 4, at least 20 million French people changed their vote between 23 April and 7 May 2017, the dates of the first and second rounds. Knowing how votes of first and second rounds relate is undoubtedly relevant to understanding the main drivers that operated during that election.

Table 4. National results of first and second rounds of the 2017 French presidential election.

Round Census NonVoters BlaNull Macron Le Pen Fillon Mélechon Hamon Dupont Others
First 47,582,183 10,578,455 949,334 8,656,346 7,678,491 7,212,995 7,059,951 2,291,288 1,695,000 1,460,323
Second 47,568,693 12,101,366 4,085,724 20,743,128 10,638,475 - - - - -

Source: Official results from https://www.conseil-constitutionnel.fr/ Retrieved 3 March 2020.

Table 4 provides the votes gained at a national level for the main candidates (Emmanuel Macron, Marine Le Pen, François Fillon, Jean-Luc Mélenchon, Benoît Hamon and Nicolas Dupont-Aignan) in both rounds, with ‘Others’ grouping the remaining candidates; those who received less than half a million votes. Abstaining (NonVoters) and voting either blank or null (BlaNull) complete the electors’ alternatives.

To run LPHOM, we need a partition of the territory under study and the election outcomes recorded in such a set of territorial units. In this research, we consider the official results recorded in the 107 departments in which France was divided plus an artificial department that grouped the French electors living abroad. The election results of both rounds at department level can be consulted in the Online Supplemental Materials (Tables S.4 and S.5). As expected, given the temporal proximity of the two elections, the changes in the censuses between them have been minimal. However, as LPHOM still requires that jxij matches exactly kyiki, a column with (net) census entries and another one with (net) census exits need to be added to the respective matrices X and Y to account for the differences. Before applying LPHOM, these columns can be calculated following any of the approaches pointed out in subsection 3.2. In our solution, we use the R-function described in the Appendix and provided in the Online Supplemental Materials with the option new_and_exit_voters="raw", which implicitly implements equation (7) in this circumstance. Hence, our LPHOM implemented model to estimate the transitions of votes between first and second rounds of the 2017 French presidential election is a linear programme with 1130 variables and 565 restrictions. Its solution, by means of our R function, takes less than 0.6 seconds on a standard PC.

Table 5 shows the results attained. According to our estimates, 86.6% of first-round non-voters did the same and did not vote in the second round either, while 9.5% of them voted for Macron and about 3.8% for Le Pen. As expected, virtually all the electors who voted for either Macron or Le Pen in the first round again chose the same candidate in the second round. We want to emphasize that although this output seems very logical, at no time has LPHOM been informed that ‘Macron in the second round’ is the same candidate as ‘Macron in the first round’, nor the equivalent information concerning Le Pen.

Table 5. Estimated swings between first and second rounds of the 2017 French presidential election.

  Non Voters Blank and Null Emmanuel Macron Marine Le Pen
Non Voters 86.6 * 9.5 3.8
Blank and Null * 61.3 * 38.6
Macron * * 99.9 *
Le Pen * * * 99.9
Fillon 16.4 5.8 74.5 3.3
Mélechon 24.6 16.2 48.5 10.7
Hamon * * 99.9 *
Dupont-Aignan * 37.7 * 62.2
Others * 89.4 * 10.5

See note under Table 2.

Regarding the behaviour of the voters of the remaining candidates, it seems that LPHOM was able to capture the logical transfers according to the ideology of the candidates. Thus, Fillon’s centre-right supporters mostly voted in the second round for the centrist, independent candidate Emmanuel Macron, with the remaining of his voters split between abstentions and blank or null votes. Just a few of these voters chose the far-right Front National Leader Marine Le Pen. Likewise, practically all voters of the socialist Hamon decided to vote Macron in the second round. On the other hand, the first-round voters of the populist Mélenchon shared out much of their vote in the second round: about (48.5%) of them deciding to vote Macron and about (10.7%) to vote Le Pen, with the rest of Mélenchon’s voters split between abstention and blank or null votes. Equally logical is that the majority of the first-round voters of the ultranationalist DuPont-Aignan decided to vote Le Pen in the second round. Indeed, there was an agreement between Le Pen and Dupont-Aignan after the first round and Dupont-Aignan’s voters were encouraged to vote for Le Pen. Finally, we find that 90% of the nearly one and a half million voters who voted for other minority candidates in the first round voted blank or null in the second round, with the remaining 10% of them voting for Le Pen.

Our results can be assessed by comparing them with the outcomes obtained using ecological regression and with the estimates derived from several polls conducted between the first and second rounds of the election (see Tables S.7 to S.10 in the Online Supplemental Materials).

Regarding the first issue, we have compared our vote transfer estimates (see Table 5) to (i) the estimates published in Pons [52], who applies King’s method, and (ii) the transfers that can be reached after applying the function ei.MD.bayes to our data. The function ei.MD.bayes of the R-library eiPack [38] implements a version of the Bayesian hierarchical model suggested in Rosen et al. [58] and, according to Klima et al. [34], exhibits the best overall estimation performance among the most commonly used approaches.

On the one hand, we have found that our results are quite similar to the ones attained by Pons [52]. For example, [52] estimates 9.0% of first-round non-voters voting for Macron in the second round or 21% first-round Fillon’s voters deciding to abstain in the second round. It should be noted, nevertheless, that we only used 108 units and spent less than a second of computation, whereas Pons [52] used 69,241 units (bureaux de votes) and spent several hours of computation. On the other hand, after applying (with default options and with the help of the tuneMD function) the function ei.MD.bayes to our data we have obtained nonsense estimates. For example, with default options, ei.MD.bayes estimates about 43% of first-round Le Pen voters choosing Macron in the second-round. It seems that ei.MD.bayes needs many units, a really proper tune of its key parameters and a lot of computational time to reach reasonable estimates. Even so, according to Plescia and De Sio [51] and Klein [32], the coverages of its resulting credible intervals are well below the target credible levels.

Regarding the second issue, comparing LPHOM outcomes (Table 5) and polls estimates (Tables S.7 to S.11), we see that both sets of estimates exhibit the same patterns, but each of them with their own nuances. For example, comparing LPHOM Mélenchon’s swings with the raw Mélenchon’s swings derived from the post-electoral survey of the 2017 French Election Study [23], we found a quite similar distribution (see Table S.11), being almost equal after permuting the numbers of non-voters and blanks and nulls. In our view, our results are superior to survey estimates because they are fully consistent with actual outcomes (fulfilling all the constraints) and they offer vote transfer estimates between all the relevant election options. Furthermore, they are not exposed to sources of error such as nonresponse bias, social desirability, measurement error or changes of opinion. One drawback to our solution is that it probably underestimates slightly the electoral mobility. This drawback could be significantly reduced using more detailed data, for example, using outcomes at municipality or, even better, at precinct level.

In addition to an estimate of the vote transfer matrix, LPHOM also provides the estimated heterogeneity index, HETe, which reaching 4.21% for this study indicates the degree of compliance of the hypothesis of homogeneity. Indeed, this index can be observed as the average of the discrepancies between the global transition matrix and the corresponding transition matrices in each territorial unit. In particular, computing heterogeneity indexes for each unit, we find that these range between a minimum of 0.46% for the department of Tarn and a maximum of 23.8% for French Polynesia.

As we stated in Section 5, once an estimate of the heterogeneity index is made available, it is possible to approximate the uncertainty linked to the estimated vote transfer matrix. To compute this, we carry out a simulation study similar to those described in subsection 5.4, but using the data corresponding to the current scenario, which is defined by the matrix X available in Table S.5 in the Online Supplemental Materials and the Q matrix of voter transitions of Table 5. From this, we simulate 46 possible values of d between 0.001 and 0.1 and run 30 simulations for each value.

Figure 2 shows the values attained for HETe and EI in the 1380 simulations completed. In the Online Supplemental Materials (simulations.csv), interested readers can consult the series of HETe and EI obtained. As can be seen in Figure 2, a close relationship links log(EI) and log(HETe), also for this dataset. The black line in Figure 2 depicts the equation relating both statistics and Table S.6 in the Online Supplemental Materials offers the details of the model fitted. At this point, and plugging the value 4.21 reached for HETe in the estimated equation, we obtain the amount 8.7% as estimate for EI, with an upper confidence limit (1α=0.90) of 11.2%. Remember that EI can be interpreted as the percentage of votes whose destination has been erroneously estimated by the model.

Figure 2.

Figure 2.

Relationship between log(EI) and log(HETe) for the 2017 French presidential election.

At first glance, the estimated EI looks high. In order to contextualize it, we compare this to the numbers reached by other methods in similar problems. In this sense, Klima et al. [34] assess the performance of five different ecological inference procedures estimating the voter transfer matrix in five different scenarios by simulation. In that research, Klima and colleagues evaluate performance using as a measure of dissimilarity an index, AD, defined as two times our EI index. The average values they obtain for these AD indices depend on the scenario and the procedure considered, ranging from a minimum of 10% to a maximum of 60%, most of them being between 20% and 30%. Although any comparisons made could be debatable, given that results strongly depend on the scenario considered, it seems that the performance obtained in our study is better than that observed when using other much more complicated procedures.

7. Concluding remarks and further research

The deficient reliability of responses to retrospective questions, the challenge posed by nonresponse bias together with the financial costs and large sample sizes required to reach suitable estimates of vote transition probabilities have encouraged many authors to look for an alternative to polls to solve the problem of estimating voter transfer matrices. In this vein, several authors have taken the route of estimating vote transitions between two elections using exclusively the undisputable records available in the election aggregate outcomes. The determination of the transfer vote matrix based exclusively on these aggregate data is, however, a basic indeterminate problem, whose resolution requires the imposition of additional hypotheses. Hence, the validity of the estimates, that is, their closeness to the unknown true values, depends on the extent to which these hypotheses are satisfied in the electoral processes under scrutiny.

Both in ecological regression procedures and in mathematical programming approaches, the basic idea behind such hypotheses is that vote transfer matrices in the different territorial units in which the whole area is partitioned are, in some sense, ‘similar’ to each other, and therefore similar to the global matrix. Mathematical programme procedures, such as LPHOM, have the advantage of being much simpler to apply than equivalent ecological regression methods. This is mostly true when we consider Bayesian ecological inference approaches, which require specific training as well as previous knowledge and expertise that many analysts lack. Furthermore, this higher simplicity of mathematical programme procedures is reached without impairment to the quality of estimates. LPHOM has provided reasonable results in all the actual studies where it has been tested.

In addition, from a computational point of view, (Bayesian) ecological inference approaches are computationally very intense, demanding really huge amounts of processing times in models involving many variables. The hierarchical distributional structures that characterize Bayesian approaches mean that, on the one hand, the Markov Chain Monte Carlo (MCMC) procedures routinely employed require very long computation times and that, on the other hand, the analysts must face convergence problems even when parameter transformations are performed [34]. And, although the advent of Stan language [6] is speeding up many Bayesian problems, its use still remains quite complex for the average analyst.

Compared to other procedures based on mathematical programming, we find some important advantages in our approach. Firstly, LPHOM considers explicitly new voters, making it possible to estimate their behaviour differentially. Secondly, LPHOM offers a way to estimate, from the model results, the degree of non-compliance of the basic hypothesis of homogeneity. And lastly and more importantly, as shown in subsections 5.2–5.4, we provide a procedure to assess the level of uncertainty of the estimates.

Regarding further research, we have started a new line in order to investigate, by simulation, the factors that influence the accuracy of the outcomes provided by LPHOM and to compare them with those obtained by ecological regression procedures. Our provisional results, which we expect to explain in detail in a future paper, point towards other factors, in addition to the value of the heterogeneity index, as features affecting the precision of the estimates. It seems that their accuracy also depends on the ratio between the number of equations in the model and the number of unknowns pjk and on the degree of diversity of the results of election 1 in the I territorial units under consideration. These results seem logical and some of them have been pointed out by other authors [29]. Since electoral results are generally available disaggregated for a large number of elementary units, a good knowledge of the factors influencing the quality of the estimates could indubitably help in the process of establishing a proper strategy for grouping elementary units into territorial units with the aim of reaching estimates which are as accurate as possible.

The main goal of LPHOM is the estimation of the overall voter transition matrix P=[pjk], and not the transfer matrices Pi=[pjki] in the different territorial units in which the whole territory has been partitioned. This does not mean that we have no local information. The residuals eik carry useful information about what has happened in each of these units. For example, a high positive eik would indicate that voter transitions to party k in unit i have been less intense than that found in the average unit. A possible approach to estimate the Pi transition matrices, proposed by Corominas et al. [10] and consistent with the electoral homogeneity hypothesis, could be to obtain pjki values that, matching perfectly electoral results in unit i, are more similar to the pjk obtained for the whole territory. The adequacy of this approach and the properties of the outcomes it provides could be the object of further research.

Lastly, the LPHOM approach could be generalized to a tri-electoral model to estimate the hypermatrix [pjkm] whose generic element would be the proportion of voters who having chosen option j in election 1 and option k in election 2 vote for option m in a later election. For example, in Spain there is much interest in knowing the proportion of voters who having swung from PP to C’s or Abstention in a second election, returned back to PP in the following election. This problem seems difficult to deal with from ecological regression approaches, but looks simpler from a mathematical linear framework. This could be addressed by means of a generalization of the LPHOM procedure.

Supplementary Material

simulations.csv
ONLINE_SUPPLEMENTAL_MATERIALS.pdf

Acknowledgements

The authors wish to thank the special issue editor and a reviewer for their valuable comments and suggestions. Thanks are also due to M. Hodkinson for revising the English of the paper. This piece of research has been supported by the Spanish Ministry of Science, Innovation and Universities and the Spanish Agency of Research, co-funded with FEDER funds, grant ECO2017-87245-R, and by Consellería d’Innovació, Universitats, Ciència i Societat Digital, Generalitat Valenciana, grant AICO/2019/053.

Appendix: An R function to apply LPHOM procedure.

This appendix describes the details of an R-function, called lphom, created by the authors to implement the LPHOM procedure described in the paper. The code of the function is available in the Online Supplemental Materials. The function estimates, given the results gained in a set of I spatial units by the J political options (parties or candidates) competing in election 1 and the K political options competing in election 2, the J×K matrix of vote transition probabilities between the two elections.

This function, which depends on lpSolve package [3], has five arguments (votes_election1, votes_election2, new_and_exit_voters, structural_zeros and verbose) and returns a list with four objects (VTM, OTM, EHet and HTEe).

The arguments of the function are:

  • -

    votes_election1: data.frame (or matrix) of order I×J with the votes gained by the J political options competing on election 1 (or origin) in the I territorial units considered.

  • -

    votes_election2: data.frame (or matrix) of order I×K with the votes gained by the K political options competing on election 2 (or destination) in the I territorial units considered.

  • -
    new_and_exit_voters: A character argument indicating the level of information available regarding new entries and exits of the election censuses between the two elections. This argument captures the different options discussed on Section 3. This argument admits five values: ‘regular’, ‘raw’, ‘simultaneous’, ‘full’ and ‘gold’.
    • regular: The default value. This argument accounts for the most plausible scenario. A scenario with two elections elapsed at least some months. In this scenario, (i) the column J of votes_election1 corresponds to new young electors who have the right to vote for the first time, (ii) net exits (basically a consequence of mortality), and eventually net entries, are internally computed according to equation (7), and (iii) we assume net exits affect equally all the first J1 options of election 1, hence (8) and (9) constraints are imposed.
    • raw: This value accounts for a scenario with two elections where only the raw election data recorded in the I territorial units, in which the area under study is divided, are available. In this scenario, net exits (basically deaths) and net entries (basically new young voters) are internally estimated according to equation (7). Constraints defined by equations (8) and (9) are imposed. In this scenario, when net exits and/or net entries are negligible (such as between the first and second rounds of French presidential election), they are omitted in the outputs.
    • simultaneous: This value accounts for either a scenario with two simultaneous elections or a classical ecological inference problem. In this scenario, the sum by rows of votes_election1 and votes_election2 must coincide. Constraints defined by equations (8) and (9) are not included in the model.
    • full: This value accounts for a scenario with two elections elapsed at least some months, where: (i) the column J1 of votes_election1 totals new young electors that have the right to vote for the first time; (ii) the column J of votes_election1 measures new immigrants that have the right to vote; and (iii) the column K of votes_election2 corresponds to total exits of the census lists (due to death or emigration). In this scenario, the sum by rows of votes_election1 and votes_election2 must agree and constraints (8) and (9) are imposed.
    • gold: This value accounts for a scenario similar to full, where total exits are separated out between exits due to emigration (column K1 of votes_election2) and death (column K of votes_election2). In this scenario, the sum by rows of votes_election1 and votes_election2 must agree. The same restrictions as in the above scenario apply but for both columns K1 and K of the vote transition probability matrix.
  • -

    structural_zeros: Default NULL. A list of vectors of length two, indicating the election options for which no transfer of votes are allowed between election 1 and election 2. For instance, when new_and_exit_voters is set to ‘regular’, lphom implicitly states structural_zeros = list(c(J, K)).

  • -

    verbose: A TRUE/FALSE argument that indicates if the main outputs of the function should be printed on the screen. Default TRUE.

The outputs of the function are:

  • -

    VTM: A matrix of order J×K with the estimated percentages of vote transitions from election 1 to election 2. Tables 2 and 6 are examples of VTM matrices.

  • -

    OTM: A matrix of order K×J with the estimated percentages of the origin of the votes obtained for the different options of election 2. Table S.3 is an example of a OTM matrix.

  • -

    EHet: A matrix of order I×K measuring in each spatial unit the distance to the homogeneity hypothesis, that is, the differences under the homogeneity hypothesis between the actual recorded results and the expected results in each territorial unit for each option of election 2. The matrix [eik].

  • -

    HTEe: The estimated heterogeneity index defined in equation (11).

Funding Statement

This work was supported by Ministerio de Ciencia, Innovación y Universidades: [grant number ECO2017-87245-R]; Consellería d’Innovació, Universitats, Ciència i Societat Digital, Generalitat Valenciana: [grant number AICO/2019/053].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Abou-Chadi T., and Stoetzer L., How parties react to voter transitions. Am. Polit. Sci. Rev. 114(3) (2020), pp. 940–945. [Google Scholar]
  • 2.Baydoğan, U. Vote transitions analysis and comparison of Turkish local elections in 2014 and 2019, Ph.D. diss., Mef University, 2019.
  • 3.Berkelaar M. and others , lpSolve: Interface to Lp_solve v. 5.5 to Solve Linear/Integer Programs. R package version 5.6.10, 2014. Available at https://CRAN.R-project.org/package=lpSolve
  • 4.Bosch A., and Durán I.M., How does economic crisis impel emerging parties on the road to elections? The case of the Spanish Podemos and Ciudadanos. Party Polit. 25(2) (2019), pp. 257–267. [Google Scholar]
  • 5.Brown P.J., and Payne C.D., Aggregate data, ecological regression and voting transitions. J. Am. Stat. Assoc. 81 (1986), pp. 453–460. [Google Scholar]
  • 6.Carpenter B., Gelman A., Hoffman M.D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., and Riddell A., Stan: A probabilistic programming language. J. Stat. Softw. 76(1) (2017), pp. 1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Caughey D., and Wang M., Dynamic ecological inference for time-varying population distributions based on sparse, irregular, and noisy marginal data. Polit. Anal. 27(3) (2019), pp. 388–396. [Google Scholar]
  • 8.Cho W.K.T., Iff the assumption fits … : A comment on the king ecological inference solution. Polit. Anal. 7 (1998), pp. 143–163. [Google Scholar]
  • 9.CIS , Estudio 3041. Barómetro octubre 2014, Centro de Investigaciones Sociológicas, Madrid, 2014. [Google Scholar]
  • 10.Corominas A., Lusa A., and Valvet M.D., Computing voter transitions: The elections for the Catalan parliament, from 2010 to 2012. J. Ind. Eng. Manage. 8(1) (2015), pp. 122–136. [Google Scholar]
  • 11.Dassonneville R., and Hooghe M., The noise of the vote recall question: The validity of the vote recall question in panel studies in Belgium, Germany, and the Netherlands. Int. J. Public. Opin. Res. 29(2) (2017), pp. 316–338. [Google Scholar]
  • 12.Duncan O., and Davis B., An alternative to ecological correlation. Am. Sociol. Rev. 18 (1953), pp. 665–666. [Google Scholar]
  • 13.Fisher L.H., and Wakefield J., Ecological inference for infectious disease data, with application to vaccination strategies. Stat. Med. 39(3) (2020), pp. 220–238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Forcina A., and Marchetti G.M., Modelling transition probabilities in the analysis of aggregate data, in Statistical Modelling, Decarli A., Francis B.J., Gilchrist R., Seber G.U.H., eds., Springer-Verlag, 1989. [Google Scholar]
  • 15.Forcina A., and Marchetti G.M., The Brown and Payne model of voter transition revisited, in New Perspectives in Statistical Modeling and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization, Ingrassia S., Rocci R., Vichi M., eds., Springer, Berlin, 2011. [Google Scholar]
  • 16.Forcina A., and Pellegrino D., Estimation of voter transitions and the ecological fallacy. Qual. Quant. 53 (2019), pp. 1859–1874. [Google Scholar]
  • 17.Freedman D.A., S.P. Klein, Ostland M., and Roberts M.R., Review of ‘A solution to the ecological inference problem’. J. Am. Stat. Assoc. 93 (1998), pp. 1518–1522. [Google Scholar]
  • 18.Füle E., Estimating voter transitions by ecological regression. Elect. Stud. 13 (1994), pp. 313–330. [Google Scholar]
  • 19.Galais C., Don’t vote for them: The effects of the Spanish indignant movement on attitudes about voting. J. Elect Public Opin. Part. 24 (2014), pp. 334–350. [Google Scholar]
  • 20.Glynn A.N., and Wakefield J., Ecological inference in the social sciences. Stat. Methodol. 7(3) (2010), pp. 307–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Goodman L.A., Ecological regressions and the behaviour of individuals. Am. Sociol. Rev. 18 (1953), pp. 663–666. [Google Scholar]
  • 22.Goodman L.A., Some alternatives to ecological correlation. AJS 64(6) (1959), pp. 610–625. [Google Scholar]
  • 23.Gougou F., and Sauger N., The 2017 French election study (FES 2017): A post-electoral cross-sectional survey. French Polit. 15 (2017), pp. 360–370. [Google Scholar]
  • 24.Greiner D.J., and Quinn K.M., R×C ecological inference: Bounds, correlations, flexibility, and transparency of assumptions. J. Roy. Stat. Soc. Ser. A 172 (2009), pp. 67–81. [Google Scholar]
  • 25.Greiner D., and Quinn K.M., Exit polling and racial bloc voting: Combining individual-level and RxC ecological data. Ann. Appl. Stat. 4 (2010), pp. 1774–1796. [Google Scholar]
  • 26.Hawkes A.G., An approach to the analysis of electoral swing. J. Roy. Stat. Soc. Ser. A 132 (1969), pp. 68–79. [Google Scholar]
  • 27.Henn M., and Foard N., Young People and Politics in Britain: How do Young People Participate in Politics and What Can Be Done to Strengthen their Political Connection?, Nottingham Trent/ESRC, London, 2012. [Google Scholar]
  • 28.Johnston R.J., and Hay A.M., Voter transition probability estimates: An entropy maximizing approach. Eur. J. Polit. Res. 11 (1983), pp. 93–98. [Google Scholar]
  • 29.King G., A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data, Princeton University Press, Princeton, NJ, 1997. [Google Scholar]
  • 30.King G., Rosen O., and Tanner M.A., Binomial-beta hierarchical models for ecological inference. Sociol. Methods. Res. 28 (1999), pp. 61–90. [Google Scholar]
  • 31.King G., Rosen O., and Tanner M.A. (eds.), Ecological Inference. New Methodological Strategies, Cambridge University Press, New York, 2004. [Google Scholar]
  • 32.Klein J.M., Estimation of Voter Transitions in Multi-Party Systems. Quality of Credible Intervals in (hybrid) Multinomial-Dirichlet Models, Master Thesis diss., Ludwig-Maximilians-Universität München, 2019.
  • 33.Klima A., Schlesinger T., Thurner P.W., and Küchenhoff H., Combining aggregate data and exit polls for the estimation of voter transitions. Sociol. Methods. Res. 48 (2019), pp. 296–325. [Google Scholar]
  • 34.Klima A., Thurner P.W., Molnar C., Schlesinger T., and Küchenhoff H., Estimation of voter transitions based on ecological inference: An empirical assessment of different approaches. AStA – Adv. Stat. Anal. 100 (2016), pp. 133–159. [Google Scholar]
  • 35.McCarthy C., and Terence M.R., Estimates of voter transition probabilities from the British general elections of 1974. J. Roy. Stat. Soc. Ser. A 140 (1977), pp. 78–85. [Google Scholar]
  • 36.Miller W.L., Measures of electoral change using aggregate data. J. Roy. Stat. Soc. Ser. A 135 (1972), pp. 122–142. [Google Scholar]
  • 37.Núñez L., Expressive and strategic behavior in legislative elections in Argentina. Polit. Behav. 38(4) (2016), pp. 899–920. [Google Scholar]
  • 38.Olivia L., Moore O.R.T., and Kellermann M., eiPack: Ecological Inference and Higher-Dimension Data Management. R package version 0.1-8, 2018. Available at https://CRAN.R-project.org/package=eiPack.
  • 39.W.-H. Park, Ecological inference and aggregate analysis of elections, Ph.D. diss., The University of Michigan, 2008.
  • 40.Pavía J.M., and Aybar C., La Movilidad Electoral en las Elecciones 2019 en la Comunitat Valenciana. Debats 134(1) (2020), pp. 27–51. [Google Scholar]
  • 41.Pavía J.M., Badal E., and García-Cárceles B., Spanish exit polls: Sampling error or nonresponse bias? Rev. Int. de Sociol. 74(3) (2016), pp. e043. [Google Scholar]
  • 42.Pavía J.M., Bodoque A., and Martín J., The birth of a new party: Podemos, a hurricane in the Spanish crisis of trust. Open J. Soc. Sci. 4 (2016), pp. 67–86. [Google Scholar]
  • 43.Pavía J.M., and Cantarino I., Dasymetric distribution of votes in a dense city. Appl. Geogr. 86 (2017), pp. 22–31. [Google Scholar]
  • 44.Pavía J.M., Larraz B., and Montero J.M., Election forecasts using spatiotemporal models. J. Am. Stat. Assoc. 103 (2008), pp. 1050–1059. [Google Scholar]
  • 45.Pavía J.M., and López-Quilez A., Spatial vote redistribution in redrawn polling units. J. Roy. Stat. Soc. Ser. A 176 (2013), pp. 655–678. [Google Scholar]
  • 46.Pavía J.M., and Veres-Ferrer E., Un nuevo estimador para disgregar totales poblacionales. El caso de los nuevos electores. Anales de Economía Aplicada XXX (2016a), pp. 817–826. [Google Scholar]
  • 47.Pavía J.M., and Veres-Ferrer E., Desagregando Estadísticas de Población, in Investigaciones en Métodos Cuantitativos para la Economía y la Empresa, Herrerías J.M., Callejón J., eds., Editorial Universidad de Granada, Granada, 2016b. pp. 543–555. [Google Scholar]
  • 48.Pavía-Miralles J.M., Forecasts from non-random samples: The election night case. J. Am. Stat. Assoc. 100 (2005), pp. 1113–1122. [Google Scholar]
  • 49.Payne C., Brown P., and Hanna V., By-election exit polls. Elect. Stud. 5 (1986), pp. 277–287. [Google Scholar]
  • 50.Piedras de Papel , Aragón es Nuestro Ohio. Así Votan los Españoles, El Hombre del Tr3s, Madrid, 2015. [Google Scholar]
  • 51.Plescia C., and De Sio L., An evaluation of the performance and suitability of RxC methods for ecological inference with known true values. Qual. Quant. 52 (2018), pp. 669–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Pons V., Comment expliquer les transferts de voix du premier au second tour? Le Figaro, mercredi 17 mai 2017, 13, 2017.
  • 53.Puig X., and Ginebra J., Ecological inference and spatial variation of individual behavior: National divide and elections in Catalonia. Geogr. Anal. 47(3) (2015), pp. 262–283. [Google Scholar]
  • 54.Robinson W.S., Ecological correlations and the behavior of individuals. Am. Sociol. Rev. 15(3) (1950), pp. 351–357. [Google Scholar]
  • 55.Romero R., Un modelo matemático para estimar el trasvase de votos entre partidos. Revista Digital de la Real Academia de Cultura Valenciana (2014), pp. 3–23. [Google Scholar]
  • 56.Romero R., Trasvase de votos entre partidos en las elecciones autonómicas catalanas del 27 de septiembre de 2015. Revista Digital de la Real Academia de Cultura Valenciana (2015), pp. 3–15. [Google Scholar]
  • 57.Romero R., Movilidad electoral entre las elecciones del 20D y del 26J en las comunidades autónomas valenciana, madrileña y andaluza. Revista Digital de la Real Academia de Cultura Valenciana. Segunda Época 1 (2016), pp. 1–25. [Google Scholar]
  • 58.Rosen O., Jiang W., King G., and Tanner M.A., Bayesian and frequentist inference for ecological inference: The RxC case. Stat. Neerl. 55 (2001), pp. 134–156. [Google Scholar]
  • 59.Snelling C.J., Young people and electoral registration in the UK: Examining local activities to maximise youth registration. Parliam. Aff. 69(3) (2016), pp. 663–685. [Google Scholar]
  • 60.Tziafetas G., Estimation of the voter transition matrix. Optimization 17 (1986), pp. 275–279. [Google Scholar]
  • 61.Upton G.J.G., A note on the estimation of voter transition probabilities. J. Roy. Stat. Soc. Ser. A 141 (1978), pp. 507–512. [Google Scholar]
  • 62.van der Ploeg C., A Comparison of Different Estimation Methods of Voting Transitions with an Application in the Dutch National Elections, Centraal Bureau voor de Statistiek, 2008. [Google Scholar]
  • 63.Vázquez E., and Romero R., Modelos para el estudio del cambio electoral, in Actas del XXVI Congreso Nacional de Estadística e Investigación Operativa, Jaen, Spain, 2001. [Google Scholar]
  • 64.Wakefield J., Ecological inference for 2 ( 2 tables (with discussion). J. Roy. Stat. Soc. Ser. A 167 (2004), pp. 385–445. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

simulations.csv
ONLINE_SUPPLEMENTAL_MATERIALS.pdf

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES