Skip to main content
PLOS One logoLink to PLOS One
. 2020 Sep 1;15(9):e0238481. doi: 10.1371/journal.pone.0238481

A new metric for understanding hidden political influences from voting records

Corrado Possieri 1,*, Chiara Ravazzi 2, Fabrizio Dabbene 2, Giuseppe C Calafiore 2,3
Editor: Gabriele Oliva4
PMCID: PMC7462714  PMID: 32871583

Abstract

Inspired by the increasing attention of the scientific community towards the understanding of human relationships and actions in social sciences, in this paper we address the problem of inferring from voting data the hidden influence on individuals from competing ideology groups. As a case study, we present an analysis of the closeness of members of the Italian Senate to political parties during the XVII Legislature. The proposed approach is aimed at automatic extraction of the relevant information by disentangling the actual influences from noise, via a two step procedure. First, a sparse principal component projection is performed on the standardized voting data. Then, the projected data is combined with a generative mixture model, and an information theoretic measure, which we refer to as Political Data-aNalytic Affinity (Political DNA), is finally derived. We show that the definition of this new affinity measure, together with suitable visualization tools for displaying the results of analysis, allows a better understanding and interpretability of the relationships among political groups.

Introduction

In the past decades, many efforts have been spent by the scientific community on the study of mathematical models for opinion formation in social and belief systems [1, 2]. Among these models, the Friedkin and Johnsen’s (F&J) opinion dynamics model [3] has been experimentally validated for deliberative groups of small and medium size [4]. According to this model, the agents’ opinions evolve as a convex combination of others’ beliefs and an initial condition. In this sense, agents are not completely open-minded, being persistently driven by an individual attachment due, for example, to the influence from a specific ideology [5, 6]. The key ingredient for estimating this stubbornness and, consequently, for offering insights in efficient control strategies for steering social behaviors towards desired patterns, is the development of new technically sound tools, able to extract low-dimensional information from social data [710].

In this paper we address the problem of understanding, via machine learning techniques, the attachment of individuals to their own group, and the underlying influence from competing ideology groups, by using observations of public voting data in politics. Although other types of information could be used in principle for exploring relationships in the political scenarios, such as speeches and interviews, the voting patterns are particularly meaningful for their impact in the society and can reveal important behaviors and trends [11, 12]. Further, several voting data sets are already publicly available, and the trend in developed countries is to make these data more and more widely accessible, see, for instance, roll call data in the US congress, https://www.govtrack.us, and in the European parliament, http://www.europarl.europa.eu/plenary/en/minutes.html. However, although the data is publicly available, the ability to extract relevant information from it may be a challenging task. Oftentimes, data is provided by governmental institutions under the form of minutes, that is textual documents from which the actual quantitative voting data is to be extracted via costly human intervention, or via ah-hoc text analytics software. Alternatively, data can be obtained, sometimes at a cost, from research, nonprofit, or commercial parties, see, e.g., https://voteview.com and https://www.votewatch.eu.

The present research focuses on the activity in the Italian Senate during the XVII Legislature. The Italian political scenario represents an interesting case of study due to its complexity compared to other foreign deliberative institutions, since it is composed of several parties. Our data source is the nonprofit organization Openpolis, which tracks information about representatives and senators in Italy, including votes, monitoring government daily events, and providing statistics on politicians’ actions. In our analysis, we focus on data that is classified by OpenPolis as key votes, i.e., those votes that are publicly available and considered as the most important, both for the relevance of the subject matter and for the political value. We also acquired the nominal membership of each senator to her/his political group, which will be used as side information. Note that these votes constitute just a portion of the total ensemble of votes made by a senator, which also include, e.g., the ones made in secret ballots.

The main contribution of this paper is the introduction of a new quantitative metric for measuring the affinity between each representative and the existing political parties. This affinity index, which we shall refer to as the Political Data-aNalytics Affinity (Political DNA), summarizes the degree of fidelity to a party, as well as the influence from other parties. It can be equivalently interpreted as a quantitative indicator of the degree of “rebellion” to the discipline of the party of nominal membership. Intuitively, it can be expected that representatives belonging to the same political party will express approximately homogeneous votes on a given bill. However, when bills are analyzed in their totality, say over a few years, a structure emerges which links each representative not only to his/her nominal party but also to other parties, indicating that the political orientation of a representative is possibly influenced by many diverse political visions. In extreme cases, we may observe representatives who nominally belong to some party, but whose votes indicate stronger political affinities with other parties, see, e.g., some examples in Fig 3.

Fig 3. Political affinity maps of the parliamentary leaders of each group obtained using the technique given in section with q = 2 and c = 20.

Fig 3

The proposed metric is based on an information-theoretic ground, by modeling the votes as outcomes of a mixture of random variables and reformulating the computation of Political DNA as an estimation of class-posterior probabilities. The combination of this new metric with sparse learning techniques allows us to also select the most relevant bills determining variances in the political positions. Moreover, we develop ad-hoc visualization tools allowing an easier and immediate understanding of the affinity relationships.

Preliminary results concerning the Political DNA of the Italian senate have been presented in Longo et al. [13]. The main differences between the results given in this paper and those in Longo et al. [13] is that we here perform a deeper analysis considering all the political groups that were active during the XVII Legislature, we propose novel visualization tools that were not provided in Longo et al. [13], and we illustrate how the Political DNA can be employed for determining the reciprocal influence among political groups.

Background and related literature

Italian senate

In Italy the Parliament holds legislative power, i.e., the faculty of making new laws. According to the principle of full bicameralism, it is composed by two houses with identical powers and functions: the Chamber of Deputies and the Senate of the Republic. In particular, the Senate consists of 315 elected members, the former Presidents of the Republic and life senators, appointed by the President of the Republic “for outstanding merits in social, scientific, artistic and literary fields”. The Senate is chaired by the President of the Senate, elected by the senators in a secret ballot. Internally, the senators are joined in political groups but senators are free to migrate from the original political group to another during the Legislature.

Fig 1 shows the list of the political groups active at the end of the XVII Legislature (from the 15th of March 2013 to the 22nd of March 2018). If a political group contains multiple parties, only the most significant will be indicated. After elections, political groups form coalitions in order to create a multi-party pro-government majority on one side, and the opposition group on the other side. In the same table the political orientation (Left, Right, Center, Center-Right and Center-Left) and the role in the Legislature (Government-Opposition) of political groups are shown.

Fig 1. Political groups active at the end of the XVII Legislature.

Fig 1

The acronyms stand for: ALA: Alleanza Liberalpopolare-Autonomie; AUT: Per le Autonomie; FdL: Federazione delle Libertà; GAL: Grandi Autonomie e Libertà; LeU: Liberi e Uguali; Lega: Lega Nord e Autonomie; M5S: Movimento 5 Stelle; NCD: Nuovo Centrodestra; NcL: Noi con l’Italia; PD: Partito Democratico; PdL: Il Popolo delle Libertà.

Analysis of voting records

The analysis of voting records has a long history in social and political science. Several approaches have been proposed in the literature for scoring the political ideology from voting data [14, 15]. The most popular techniques belong to the family of Multidimensional scaling, such as Nominate, W-Nominate, and DW-Nominate [15, 16]. These methods are mainly used to produce graphical interpretations of political positions, representing high-dimensional data in a space with a lower dimension. Jenkins [17] used these methods to extract ideal points used as features in estimating party influence and to determine the ideological rank order in the US Congress [8].

Although these techniques are based on empirically-grounded models of opinion formation and can be interpreted both in probabilistic and in geometric framework, the algorithms are highly computationally intensive. Alternative to previous methods, Principal Component Analysis allows extraction of the relevant information from highly correlated data by reducing the number of dimensions while retaining most of the variance of the data, and it provides a compact and meaningful representation of the original data. PCA is commonly used for the Ideology Analysis of legislators in the US Congress. In particular, several data set are used for this analysis, including political orientation (Left-Right), co-sponsorship, committees seats, campaign contributions, to mention just a few. The Italian Senate, however, presents some intrinsic differences with respect to the US Congress, making the analysis more challenging and calling for new tools. In particular, we emphasize the higher number of legislators (although smaller than the one of the Chamber of Deputies), their high fragmentation among several political groups, several ideologies/coalitions/political orientations, and high migration between parties [18].

Methods and data

Notation

Natural and real numbers are denoted by N and R, respectively. We denote column vectors with lower-case letters, and matrices with capital letters. Given a matrix XRm×n, X denotes its transpose, Xij is the entry corresponding to i-th row and j-th column. 1 denotes a (column) vector of all ones. We use the notation x(i)Rn for the column vector corresponding to the i-th row, i.e.

X=[x(1)x(m)].

Given zRn, we denote the Euclidean norm and 0-pseudonorm (number of non-zero elements) with ‖z2 and ‖z0, respectively. We denote the Frobenius norm of a matrix XRm×n by

XFi=1mj=1nXij2.

Dataset

The dataset under consideration contains the final votes of members of the Italian Senate during the XVII Legislature. A field in the dataset specifies the membership of each senator to his/her most recent political group. The data consist in the vote of each senator for each proposed bill, expressed as “Favorevole” (for), “Contrario” (against), “Astenuto” (abstention) and “Assente” (not in chamber). These data have been cleaned by removing the senators who never voted and the bills voted in a secret ballot. After this cleaning, the dataset contains the votes of m = 334 senators on n = 160 bills. The dataset is available at the following link: https://www.kaggle.com/cpossieri/italian-senate-xvii-legislature.

Encoding and preprocessing

The votes of senators have been encoded in a vote matrix Z ∈ {−1, 0, 1}m×n, where each row represents a senator and each column a bill. The (i, j)th entry Zij of such a matrix equals + 1 if the ith senator was favorable to the jth bill, − 1 if he/she was either against the jth bill or abstained from voting, and 0 in all the other cases (namely, if he/she was not in chamber). This encoding is coherent with the voting rule of Italian Senate during the XVII Legislature, according to which abstention is essentially equivalent to a rejection. This vote matrix has been standardized over the columns, as in [19]:

Xij=Zij1mi=1mZiji=1m(Zij1mi=1mZij)2, (1)

i = 1, …, m, j = 1, …, n, thus obtaining the centered and standardized matrix XRm×n. Each vector x(i)Rn of this matrix represents the centered and standardized votes of senator i, which is associated to a scalar index ω(i) ∈ {1, …, r} representing his/her most recent political group. Note that if we define z¯ as the average of the rows of Z, that is z¯=1mi=1mz(i), then z¯ has the interpretation of the row n-vector of votes from the average senator on the n bills. Also, if we define σ¯ as the row n-vector containing the sample standard deviation along the columns of Z, that is σj2=1mi=1m(Zij-z¯j)2, for j = 1, …, n, then σ¯j gives a measure of the variability of opinions of the senators on the jth bill. The standardized data matrix X in (1) thus contains, in each column (bill) j = 1, …, n, the votes of the senators, centered around the average vote z¯j for that bill, and scaled according to the standard deviation of the votes for that same bill. In compact matrix notation, we can express X as

X=(Z-1z¯)S-1,whereS=diag(σ¯1,,σ¯n).

Learning political DNA from raw data

The main objective of the proposed methodology is to infer, for each senator i ∈ {1, 2, …, m}, his/her Political DNA, that is, a vector π(i) ∈ [0, 1]r whose entries {πy(i)}y{1,,r} represent the influence exerted from group y on senator i, with the property that y=1rπy(i)=1. In this section, we briefly describe the technique that has been used to achieve this objective. In particular, we assume a Gaussian mixture generative model for representing the voting pattern of senators, and then use Bayes’ rule to extract the political DNA. The raw data are firstly encoded in a vote matrix, which is centered and standardized as in (1). We then apply a dimensionality reduction technique (namely, Sparse Principal Component Analysis, SPCA), thus obtaining lower dimensional data. This first prefiltering step allows for reduction of noise and extraction of the relevant features from the data matrix. These filtered data are then used to learn a Gaussian Mixture Model (GMM), from which the Political DNA is finally extracted via posterior probability computation. Fig 2 summarizes this procedure.

Fig 2. Procedure for the extraction of Political DNA.

Fig 2

Dimensionality reduction

Vote data matrices are typically noisy, containing possibly spurious information. A classical computational approach used to to better expose the information contained in the data matrix is the Principal Component Analysis (PCA), which is related to the singular value decomposition of X, in the dyadic form

X=k=1sσkukvk, (2)

where s is the rank of X, σ1σ2 ⋯ ≥ σs ≥ 0 are the singular values, and ukRm, vkRn, k = 1, …, s, are the left and right singular vectors of X, respectively, which also form orthonormal bases for Rm and Rn, respectively. The well-known Eckart-Young-Mirsky theorem states that the truncation of (2) at the qth term (with qs), namely Xq=k=1qσkukvk, provides the best rank-q approximation to X, in both the Frobenius and the spectral norm metrics. The q pairs of vectors (uk, vk), k = 1, …, q, constitute the first q so-called principal components of X. The relative approximation error, in the Frobenius norm, for q < s, is given by

eq2=X-XqF2XF2=k=q+1sσk2σtot2, (3)

where σtot2XF2=k=1sσk2 is the so-called total variance of the data matrix. A small eq value signifies that the variability of the data matrix is well explained by means of its first q principal components, who capture the most relevant spatial directions along which data variation occurs. Equivalently, along the residual directions (uk, vk), k = q + 1, …, s, there is little data variability, which is attributed to noise, and thus these directions can be discarded. This standard PCA approach realizes effective dimensionality reduction whenever eq is small for q values reasonably smaller than the original rank q. However, one problem with standard PCA is that when we look at, say, the first principal directions (u1, v1), these vectors are typically dense (as opposed to sparse) vectors. If we were to approximate the vote matrix by its first principal components only, we would have

XX1=σ1u1v1,

where the entries in u1Rm represents senators’ influence coefficients on the votes, and the entries in v1 represent bills’ influence on the vote outcomes. A sparse u1 would highlight which senators are the leading actors in influencing the votes, and a sparse v1 which bills are the most relevant for capturing the main voting trends. For pursuing both dimensionality reduction and interpretability, we thus employed a sparse version of the PCA, namely the Sparse-PCA (SPCA) method discussed in [20]. Computationally, given the centered and standardized matrix X, and letting Σ = X X, the SPCA aims at determining a vector v1Rn that maximizes v1Σv1, subject to the constraints that ‖v12 = 1 and that the cardinality (i.e., the number of nonzero elements) of v1 is no larger than a given positive integer cN, that is

|maxv1Τv1,v12=1,v10c. (4)

Due to the cardinality constraint, the problem given in (4) is NP-hard in the strong sense [21]. For this reason, several approaches have been proposed in the literature to approximately solve (4), including a regression framework [20], an approach based on semidefinite programming relaxation [22], and inverse and truncated power methods [23, 24]. The latter was used in this paper for performing SPCA on the standardized vote data matrix. Note that, once the vector v1 has been computed, the SPCA algorithm can be applied again to a suitably “deflated” version of the original matrix, Σ1=Σ-(v1Σv1)v1v1, so to extract the second principal component v2, and so on [22, 25, 26]. By stacking the vectors v1, …, vq so to obtain the matrix Vq=[v1vq]Rn×q, the vote matrix is projected onto the q-dimensional space Rq, thus obtaining the projected data matrix P=XVqRm×q. Each element pij of the P matrix represents a linear combination of all the votes of the ith senator, with coefficients given by the jth principal direction vj. Since vj is c-sparse, only the c “most relevant” bills actually contribute to such linear combination.

Gaussian mixture model

We use a Gaussian Mixture Model (GMM) to represent the projected data P. Such a generative model assumes that there are r classes (each class, in our context, actually corresponding to a political party), and that the projected vote vector, conditional on the class being y, is a Gaussian random variable with mean μy and covariance matrix Σy, y = 1, …, r, where r denotes the number of classes. In our analysis we considered r = 12 classes (political groups). By letting αy represent the latent class probabilities, αy ≥ 0, y=1rαy=1, the distribution f(p) of the projected vote vector p is a Gaussian mixture

f(p)=y=1rαyN(p|μy,Σy), (5)

where N(p|μy,Σy) is a Gaussian density

N(p|μy,Σy)=exp(-12(p-μy)Σy-1(p-μy))(2π)ndet(Σy).

For each class, the mean and the covariance matrix are estimated by using the maximum likelihood principle [27]. In particular, defining the set Gy={i{1,,m}:ω(i)=y}, y = 1, …, r, we obtain

αy=|Gy|m,μy=1|Gy|iGyp(i),Σy=1|Gy|-1iGy(p(i)-μy)(p(i)-μy),

where |Gy| denotes the cardinality of the yth class.

Extraction of the DNA from data

Once the parameters of the Gaussian mixture model (5) have been estimated as described above, we consider the latent class variable zy, which is such that zy = 1 if the datum p comes from class y, and 0 otherwise, and define the conditional probability of the latent variable, given the observed datum p (the projected votes vector). In formulas, using Bayes’ rule, we have that

πyP(zy=1|p)=P(zy=1)f(p|zy=1)f(p)=αyN(p|μy,Σy)f(p)=αyN(p|μy,Σy)ν=1rανN(p|μν,Σν).

We shall view αy as the prior probability of zy = 1, and πy as the corresponding posterior probability, once we have observed p. Also, πy can be interpreted as the influence that the group y has in explaining the observation p. The Political DNA of each Senator i = 1, …, m, is then defined as the vector

π(i)=[π1(i)πr(i)],

where

πy(i)=αyN(p(i)|μy,Σy)ν=1rανN((i)|μν,Σν),y=1,,r,

containing the influences from all the r classes, upon evidence of the ith senator’s projected votes vector p(i).

Visualization of political data

In this section, we introduce three tools that we employed for producing easily interpretable visualizations of the political influence data obtained via the proposed extraction technique.

Political affinity map

The Political Affinity Map is a bi-dimensional representation of the Political DNA of each senator. More precisely, we draw a regular polytope whose vertexes {γy}y = 1, …, r represent the groups. The ith senator is then represented as a polytope with vertexes {βy(i)}y=1,,r, where βy(i)=πy(i)γy. The resulting plot is a spider chart (usually referred to also as radar plot or Kiviat diagram) in which the length of each “spike” is proportional to the influence of the yth group to the ith senator and adjacent “spikes” are connected via a segment. Depending on the ordering of the political groups, this type of plots allows one to undercover the shift of each senator toward political parties different from the nominal one. For instance, larger areas of the polytope indicate that the ideology of the senator is influenced by several political parties, whereas unitary spikes in the direction of his/her nominal affiliation indicate that his/her ideology is consistent with the one of the party. Further, the Political Affinity Map allows one to determine senators with similar ideologies, political clusters, and the presence of outliers; see, e.g., [28].

Fig 3 depicts the Political Affinity Maps of the leaders of political groups in the Italian Senate, obtained using the proposed extraction technique, with q = 2 and c = 20.

Segmentation plot

The segmentation plot represents the Political DNA of each senator as a segmented bar. In particular, the Political DNA of the ith senator is represented as portions of the bar of different colors (each color corresponds to a political group) of length proportional to πy(i). Fig 4 depicts the Segmentation Plots of the parliamentary leaders of each group obtained using the proposed DNA extraction technique with q = 2 and c = 20.

Fig 4. Segmentation plots of the parliamentary leaders of each group obtained using the proposed DNA extraction technique with q = 2 and c = 20.

Fig 4

It is worth noting that the Political Affinity Map and the Segmentation Plot are just two different graphical representations of the Political DNA. These two representations allow to gather different information about the ideology of senators. For instance, the Political Affinity Map is useful for identifying clusters and detecting the presence of outliers. On the other hand, the Segmentation Plot is useful for undercovering the reciprocal influence among different parties and for identifying the leading groups. Therefore, the combined use of these two representation allows one to have a complete picture of the relationships among different parties.

Reciprocal influence map

The Political DNA can also be used for identifying the mutual influence among parties. Indeed, the average influence of the yth group to the υth group is given by

Πyυ=1|Gυ|iGυπy(i).

In particular, the matrix Π, whose (y, υ)th entry is Πyυ, represents the average probability that a senator belonging to the yth group voted in a manner similar to a senator belonging to the υth group. We denote by Reciprocal Influence Map the graphical representation of such a matrix.

Results and discussion

Case study: Votes in the Italian senate during the XVII legislature

In this section, we present the results obtained by using the proposed technique on the vote data of the Italian Senate during the XVII Legislature. Selected Political Affinity Maps and Reciprocal Influence Maps obtained using the proposed DNA extraction technique with different values of the parameters q and c are shown in Figs 5 and 6. The nominal membership of each senator has been represented using the color code in Fig 1.

Fig 5. Political affinity maps of all the senators for different values of q and c (c = 160 corresponds to full, non-sparse PCA).

Fig 5

The nominal membership of each senator is represented via the color code given in Table 1.

Fig 6. Reciprocal influence maps for different values of q and c.

Fig 6

As shown by such plots, large values of q and c hide the reciprocal influence of each group on the others. Note that c = 160 corresponds to non-sparse PCA.

Table 1. E-Var for different values of q and c.

q
2 3 4 5 6 7 8 9
c 20 14.247 18.889 20.485 23.433 26.663 27.735 28.57 29.315
40 29.501 33.75 37.801 39.896 41.83 42.986 43.856 45.025
60 40.111 44.78 47.963 50.166 52.214 53.461 54.716 55.628
80 47.429 52.197 55.401 57.683 59.73 61.044 62.323 63.291
100 52.405 56.945 60.238 62.534 64.61 65.967 67.265 68.287
120 56.81 61.18 64.515 66.818 68.92 70.306 71.604 72.777
140 58.999 63.367 66.72 69.024 71.137 72.532 73.842 75.010
160 59.378 63.757 67.115 69.419 71.533 72.93 74.247 75.413

Although there is no an obvious ground truth to compare against, some considerations are in order. For each experiment, we computed the expressed variance (E-Var), defined as the ratio between the variance of the projected data and the one of the original data.

By looking at these numerical results, we may formulate the following observations:

  • as expected, the E-Var of the data increases as a function of the number q of principal components considered and of their cardinality level c;

  • large values of q and c lead the senators’ political positions to shift towards their nominal affiliation, which is represented in the Political Affinity Map as a “spike” of unitary length in the direction of their nominal affiliation. Therefore, noise filtering through decreasing the values of q and c reveals subtle structures of the political affinities among voters. It is worth noticing that this filtering property is the result of the use of SPCA for dimensionality reduction. In fact, subtle structures are usually confined in minor components [2931], and filtering them out usually only maintains the global trends. On the other hand, SPCA allows one to focus just on the most relevant votes, hence undercovering hidden influences among individuals with different ideologies;

  • similarly, large values of q and c tends to uniform the reciprocal influence of each group on the others;

  • SPCA is useful also to identify the most significant bills influencing the projected vote vectors in the q-dimensional space. Table 2 reports the short description of the 10 most importantbills identified by SPCA, performed using q = 10 and c = 1.

Table 2. Most significant bills identified by SPCA.

Description
European delegation 2013 - DDL n. 587 - Final vote
Resignation of senator Mangili
Jurisdiction on ethical issues - DDL n. 1429
Liability of magistrates - DDL n. 1070 - Final vote
Rosatellum bis - DDL n. 2941 - Final vote
Daily allowance for lifetime senators
Azzollini house Arrest
Anti femicide - DDL n. 1079 - Final vote
Health decree - DDL n. 298
Public Debits - DDL n. 662 - Final vote

Regarding the political structure of the Italian Senate, Fig 7 depicts the Segmentation Plots for selected values of the parameters q and c, using the color coding given in Fig 1. By inspecting Fig 7 we observe the following facts:

Fig 7. Segmentation plots for different values of q and c.

Fig 7

  • the groups “Misto,” “GAL” and “AUT” are the first to spread out when projecting the data in a lower dimensional space. This is to be expected, since these groups collect senators of very different ideologies, and many senators “migrated” to this group after leaving their respective original groups. The Political DNA allows recovering their original membership or political orientation; see Fig 8.

  • the group “M5S” is one of the most compact groups in the Senate, with senators remaining strongly cohesive even for low values of q and high sparsity levels. This can be explained considering that a code of conduct—available at Codice Etico Movimento 5 Stelle—binds the elected candidates in this party to pay a penalty fine if the official party voting guidelines are not followed;

  • for small values of q the senators belonging to “LeU” start to shift towards the group “PD”: this is coherent since both groups share a common political orientation (they are both left parties) and the “LeU” group foundation is linked to an internal split of the “PD” group; see Fig 9.

Fig 8. Segmentation plot of the mean DNA of senators belonging to the groups “Misto”, “GAL” and “AUT” for different values of q and c = 160 (corresponding to non-sparse PCA).

Fig 8

Fig 9. Political affinity maps of the mean DNA of senators belonging to “LeU” and “PD’ groups for different values of q and c.

Fig 9

Concluding remarks

In this paper, we presented a numerical technique that, based on publicly available voting data, generates explanatory maps of hidden interconnections among voters nominally belonging to a given number of political or ideological groups. The proposed method is based on a Gaussian mixture generative model that we use as a prior to compute a voter’s posterior influences (the Political DNA), given evidence of his/her votes; see Fig 2. We applied this approach to a data set pertaining to the votes of 335 members of the Italian Senate on 160 bills during the XVII Legislature. Although the analysis has been carried out by clustering senators according to their nominal membership, other approaches are possible. For instance, Fig 10 depicts the DNA obtained by clustering senators either based on their general political orientation (Center-Left-Right) or on their role in the parliament (Government-Opposition), thus highlighting the flexibility of the proposed numerical technique. Further, while the DNA approach is here presented in a political context, we believe that the kind of interpretability it offers makes it suitable to broader application endeavors, such as in the analysis of behaviors, influence and preferences in markets, advertisement, or other social interaction contexts that are based on votes, preferences, purchases, etc.

Fig 10. Segmentation plot obtained by clustering senators on the basis of their orientation and their role (q = 6, c = 30).

Fig 10

Acknowledgments

The authors are indebted to Vincenzo Smaldore and to OpenPolis Foundation for providing data and insights on the interpretation of the results. The authors would also like to thank Antonio Longo for a first analysis of the data, and Antonio Santangelo, Francesco Ruggiero and the Staff of Nexa Center for interesting conversations on topics related to this paper.

Data Availability

All the data used in this paper are available from the Kaggle database at the following link https://www.kaggle.com/cpossieri/italian-senate-xvii-legislature.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1. Friedkin NE. The Problem of Social Control and Coordination of Complex Systems in Sociology: A Look at the Community Cleavage Problem. IEEE Control Systems. 2015;35:40–51. 10.1109/MCS.2015.2406655 [DOI] [Google Scholar]
  • 2. Moussaïd M, Kämmer JE, Analytis PP, Neth H. Social Influence and the Collective Dynamics of Opinion Formation. PLOS ONE. 2013;8(11):1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Friedkin NE, Johnsen EC. Social influence networks and opinion change. Advances in Group Processes. 1999;16:1–29. [Google Scholar]
  • 4. Friedkin NE, Johnsen EC. Social Influence Network Theory: A Sociological Examination of Small Group Dynamics. Cambridge, U.K: Cambridge Univ. Press,; 2011. [Google Scholar]
  • 5.Frasca P, Ravazzi C, Tempo R, Ishii H. Gossips and prejudices: ergodic randomized dynamics in social networks. In: Estimation and Control of Networked Systems—Proceedings of the 4th IFAC Workshop on Distributed Estimation and Control in Networked Systems. IFAC; 2013. p. 212–219.
  • 6. Xie J, Emenheiser J, Kirby M, Sreenivasan S, Szymanski BK, Korniss G. Evolution of Opinions on Social Networks in the Presence of Competing Committed Groups. PLOS ONE. 2012;7(3):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wu S, Wai H, Scaglione A. Data mining the underlying trust in the US Congress. In: 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016—Proceedings. United States: IEEE; 2017. p. 1202–1206.
  • 8. Wai HT, Scaglione A, Leshem A. Active Sensing of Social Networks. IEEE Transactions on Signal and Information Processing over Networks. 2016;2:406–419. 10.1109/TSIPN.2016.2555785 [DOI] [Google Scholar]
  • 9. Ravazzi C, Tempo R, Dabbene F. Learning influence structure in sparse social networks. IEEE Transactions on Control of Network Systems. 2018;. 10.1109/TCNS.2017.2781367 [DOI] [Google Scholar]
  • 10. Burghardt K, Rand W, Girvan M. Inferring models of opinion dynamics from aggregated jury data. PLOS ONE. 2019;14(7):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Braha D, de Aguiar MAM. Voting contagion: Modeling and analysis of a century of U.S. presidential elections. PLOS ONE. 2017;12(5):1–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Sobkowicz P. Quantitative Agent Based Model of Opinion Dynamics: Polish Elections of 2015. PLOS ONE. 2016;11(5):1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Longo A, Ravazzi C, Dabbene F, Calafiore GC. Learning Political DNA in the Italian Senate. In: 18th European Control Conference; 2019. p. 3526–3531.
  • 14. Borg I, Groenen P. Modern Multidimensional Scaling: Theory and Applications. Journal of Educational Measurement. 2006;40:277–280. [Google Scholar]
  • 15. Poole KT, Rosenthal H. A spatial model for legislative roll call analysis. American Journal of Political Science. 1985; p. 357–384. [Google Scholar]
  • 16. Poole KT. Nonparametric unfolding of binary choice data. Political Analysis. 2000;8(3):211–237. 10.1093/oxfordjournals.pan.a029814 [DOI] [Google Scholar]
  • 17. Jenkins S. The Impact of Party and Ideology on Roll-Call Voting in State Legislatures. Legislative Studies Quarterly. 2006;31(2):235–257. 10.3162/036298006X201797 [DOI] [Google Scholar]
  • 18.Martirano D. Deputati e senatori “voltagabbana” “Cambio partito”: il record di 526;. 10.3162/036298006X201797 Corriere.it.
  • 19. Krzanowski W. Principles of multivariate analysis. OUP Oxford; 2000. [Google Scholar]
  • 20. Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. Journal of computational and graphical statistics. 2006;15(2):265–286. 10.1198/106186006X113430 [DOI] [Google Scholar]
  • 21. Tillmann AM, Pfetsch ME. The Computational Complexity of the Restricted Isometry Property, the Nullspace Property, and Related Concepts in Compressed Sensing. IEEE Transactions on Information Theory. 2014;60(2):1248–1259. 10.1109/TIT.2013.2290112 [DOI] [Google Scholar]
  • 22. d’Aspremont A, Ghaoui LE, Jordan MI, Lanckriet GR. A direct formulation for sparse PCA using semidefinite programming In: Advances in neural information processing systems; 2005. p. 41–48. [Google Scholar]
  • 23. Hein M, Bühler T. An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA In: Advances in Neural Information Processing Systems; 2010. p. 847–855. [Google Scholar]
  • 24. Yuan XT, Zhang T. Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research. 2013;14(Apr):899–925. [Google Scholar]
  • 25. Mackey LW. Deflation methods for sparse PCA In: Advances in neural information processing systems; 2009. p. 1017–1024. [Google Scholar]
  • 26. Shen H, Huang JZ. Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis. 2008;99(6):1015–1034. 10.1016/j.jmva.2007.06.007 [DOI] [Google Scholar]
  • 27. Marin JM, Mengersen K, Robert CP. Bayesian modelling and inference on mixtures of distributions In: Dey D, Rao C, editors. Handbook of statistics. Elsevier; 2005. p. 459–507. [Google Scholar]
  • 28. Croarkin C, Tobias P, Filliben JJ, Hembree B, Guthrie W, et al. NIST/SEMATECH e-handbook of statistical methods; 2006. [Google Scholar]
  • 29. Roden JC, King BW, Trout D, Mortazavi A, Wold BJ, Hart CE. Mining gene expression data by interpreting principal components. BMC Bioinformatics. 2006;7(1):194 10.1186/1471-2105-7-194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Censi F, Calcagnini G, Bartolini P, Giuliani A. A systems biology strategy on differential gene expression data discloses some biological features of atrial fibrillation. PLoS One. 2010;5(10):e13668 10.1371/journal.pone.0013668 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Giuliani A, Colosimo A, Benigni R, Zbilut JP. On the constructive role of noise in spatial systems. Physics Letters A. 1998;247(1-2):47–52. 10.1016/S0375-9601(98)00570-2 [DOI] [Google Scholar]

Decision Letter 0

Gabriele Oliva

27 Jul 2020

PONE-D-20-21006

A new metric for understanding hidden political influences from voting records

PLOS ONE

Dear Dr. Possieri,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 10 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Gabriele Oliva, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Additional Editor Comments (if provided):

Two reviews were collected both very positive, although both reviewers suggest minor improvements or request minor clarifications.

For this reason, I invite the authors to prepare a minor revision and to address such comments.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors must be complimented for a very interesting and novel application of a classical (but probably still the more straightforward and meaningful) technique as PCA. The strategy adopted by the authors allows for a consistent analysis of both q and c space giving a clear picture of both senator profile and bill relevance together with an appreciation of the the 'internal homogeneity' of the different political parties.

I have only one (very minor) request of clarification that I think could be useful to the readers to fully appreciate the proposed methodology. The authors affirm: ...Therefore, noise filtering through decreasing the values of q and c reveals subtle structures of the political affinities among voters...;

This statement is strictly dependent from the use of SPCA instead of 'plain' PCA in which happens exactly the opposite ('subtle structures' are confined in minor components and filtering them out only maintains the 'global trends' see for example: Roden, J. C., King, B. W., Trout, D., Mortazavi, A., Wold, B. J., & Hart, C. E. (2006). Mining gene expression data by interpreting principal components. BMC bioinformatics, 7(1), 194., Censi, F., Calcagnini, G., Bartolini, P., & Giuliani, A. (2010). A systems biology strategy on differential gene expression data discloses some biological features of atrial fibrillation. PLoS One, 5(10), e13668., Giuliani, A., Colosimo, A., Benigni, R., & Zbilut, J. P. (1998). On the constructive role of noise in spatial systems. Physics Letters A, 247(1-2), 47-52.).

In any case this is only a very minor remark and the manuscript can be even accepted with no modifications at all.

Reviewer #2: The manuscript is very interesting and sound, I have few minor comments for the authors.

Reading the introduction of the paper I wasn't 100% sure that the bills used to conduct the analysis were those in which the vote is not secret. Even if it would be impossible to perform the analysis using secret votes I think that the Introduction should clearly state that only "public" bills were used and that those bills are just a part of the total ensemble.

The explanation of the visualization proposed in Figure 3 (that according to the authors is part of the novelty of the paper) is very brief and somewhat unclear. I understand the coordinates of the party and I also understand the presence of spikes proportional to \\pi in the direction of the party. I don't understand why some shapes are resulting from the Figure, are those the result of lines going between adjacent and non-zero \\beta values? If this is the case, what is the interpretation of such shapes? Plus, how this plot is different with respect to a radar plot?

This drawbacks also affect Figure 5.

Finally, Figure 3 and Figure 4 are basically two visualizations for the same thing. Similarities, differences and potential use of both (separately and combined) could be discussed in more detail.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Alessandro Giuliani

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 1;15(9):e0238481. doi: 10.1371/journal.pone.0238481.r002

Author response to Decision Letter 0


30 Jul 2020

Response to the Editor

We would like to thank the Editor for the constructive reviewing process that helped us improve the quality of the manuscript and for the supportive and motivating comments about our work. We have carefully considered all the comments from the Reviewers, and their suggestions have been incorporated into the revised version of the paper, as detailed below.

Response to Reviewer#1

We thank the Reviewer for his appreciation of our the paper, for the motivating and supportive comments, and for the useful suggestions that helped us preparing this revised version of our paper.

The point raised by the Reviewer is very interesting and therefore we added a detailed explanation following his suggestion. Namely, we added the suggested references and following paragraph to the discussion at page 13 of the revised version of the manuscript:

“It is worth noticing that this filtering property is the result of the usage of SPCA for dimensionality reduction. In fact, subtle structures are usually confined in minor components [29]-[31], and filtering them out usually only maintains the global trends. On the other hand, SPCA allows one to focus just on the most relevant votes, hence undercovering hidden influences among individuals with different ideologies.”

Response to Reviewer#2

We thank the Reviewer for his/her appreciation of the paper, for the supportive comment, and for the useful suggestions that helped us preparing this revised version.

We revised the introduction so to clarify the class of votes that have been taken into account in performing the analysis of the Italian Senate during the XVII Legislature. Namely, we stressed the fact that our focus are voting data that are classified by OpenPolis as key votes, i.e., those votes that are publicly available (non secret) and considered as the most important, both for the relevance of the subject matter and for the political value. We also acquired the nominal membership of each senator to her/his political group, which will be used as side information. We further stressed that these votes constitute just a portion of the total ensemble of votes made by a senator, which also include, e.g., the ones made in secret ballots.

In the revised version of the paper, we better acknowledged that the Political Affinity Map is essentially a spider chart (usually referred to also as radar plot or Kiviat diagram) in which the length of each “spike” is proportional to the influence of the yth group to the ith senator and adjacent “spikes” are connected via a segment. Depending on the ordering of the political groups, this type of plots allows one to undercover the shift of each senator toward political parties different from the nominal one. For instance, larger areas of the polytope indicate that the ideology of the senator is influenced by several political parties, whereas unitary spikes in the direction of his/her nominal affiliation indicate that his/her ideology is consistent with the one of the party. Further, the Political Affinity Map allows one to determine senators with similar ideologies, political clusters, and the presence of outliers; see, e.g., [28].

In order to better discuss the relation between Political Affinity Maps and Segmentation Plots, we added the following paragraph at page 13 of the revised version of our paper:

“It is worth noting that the Political Affinity Map and the Segmentation Plot are just two different graphical representations of the Political DNA. These two representations allows to gather different information about the ideology of senators. For instance, as already pointed out above, the Political Affinity Map is very useful to identify clusters and to detect the presence of outliers. On the other hand, the Segmentation Plot is very useful to undercover the reciprocal influence among different parties and to identify the leading groups. Therefore, the combined use of these two representation allows one to have a complete picture of the relationships among different parties.”

Attachment

Submitted filename: reply_Rev1.pdf

Decision Letter 1

Gabriele Oliva

18 Aug 2020

A new metric for understanding hidden political influences from voting records

PONE-D-20-21006R1

Dear Dr. Possieri,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Gabriele Oliva, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Both reviewers are satisfied by the review, and I concur with their evaluation.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors commented on the issue I suggested them. In any case this is an excellent paper and the authors must be complimented for their sensible use of data analysis tools.

Reviewer #2: The authors reviewed the manuscript fixing all the minor issues. I recommend the paper for publication for publication in PLOS ONE

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Alessandro Giuliani, Istituto Superiore di Sanità, Roma, Italy

Reviewer #2: No

Acceptance letter

Gabriele Oliva

20 Aug 2020

PONE-D-20-21006R1

A new metric for understanding hidden political influences from voting records

Dear Dr. Possieri:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Gabriele Oliva

Academic Editor

PLOS ONE


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES