Skip to main content
Proceedings of the Royal Society B: Biological Sciences logoLink to Proceedings of the Royal Society B: Biological Sciences
. 2013 May 22;280(1759):20130247. doi: 10.1098/rspb.2013.0247

Individual heterogeneity and costly punishment: a volunteer's dilemma

Wojtek Przepiorka 1,2,, Andreas Diekmann 3
PMCID: PMC3619514  PMID: 23536599

Abstract

Social control and the enforcement of social norms glue a society together. It has been shown theoretically and empirically that informal punishment of wrongdoers fosters cooperation in human groups. Most of this research has focused on voluntary and uncoordinated punishment carried out by individual group members. However, as punishment is costly, it is an open question as to why humans engage in the punishment of wrongdoers even in one-time-only encounters. While evolved punitive preferences have been advocated as proximate explanations for such behaviour, the strategic nature of the punishment situation has remained underexplored. It has been suggested to conceive of the punishment situation as a volunteer's dilemma (VOD), where only one individual's action is necessary and sufficient to punish the wrongdoer. Here, we show experimentally that implementing the punishment situation as a VOD sustains cooperation in an environment where punishers and non-punishers coexist. Moreover, we show that punishment-cost heterogeneity allows individuals to tacitly agree on only the strongest group member carrying out the punishment, thereby increasing the effectiveness and efficiency of social norm enforcement. Our results corroborate that costly peer punishment can be explained without assuming punitive preferences and show that centralized sanctioning institutions can emerge from arbitrary individual differences.

Keywords: evolution of cooperation, peer punishment, second-order free-rider problem, volunteer's dilemma, coordination, social norms

1. Introduction

Actions negatively affecting others can give rise to social norms that proscribe these actions and entitle the affected parties to enforce compliance by punishing wrongdoers [1]. Someone playing annoying music in the courtyard late at night, smoking in a waiting room at the station or free riding on other's contributions in a group project are scenarios where the conflict between individual self-interest and collective wellbeing, a so-called social dilemma, creates the conditions for the emergence of proscriptive social norms and the punishment of defectors.

In laboratory experiments with social dilemma games, where subjects gain from mutual cooperation but gain even more from one-sided defection, defection prevails to the detriment of all. However, if subjects are given the opportunity to punish defectors by decreasing their earnings at a cost to themselves, punishment is carried out and cooperation increases to significantly higher levels [25]. It has been pointed out that punishment of defectors in itself is a form of cooperation that is subject to a (second-order) free-rider problem [68]. Because punishing defectors is costly, everyone prefers everyone else to do it while benefiting from the cooperative environment punishment threats create. However, if everybody thinks alike, there will be no punishment and (first-order) defection will prevail.

The fact that in laboratory experiments subjects do punish defectors has been attributed to an evolved preference for punishment [9,10]. Whereas theoretical studies have made the evolution and stability of punitive preferences in humans plausible under certain conditions [1117], empirical findings from laboratory experiments have cast doubt on these ultimate explanations for peer punishment. Only under relatively favourable conditions does peer punishment increase cooperation and group benefits [3,1823]. Moreover, punishment can trigger retaliation [2426] or be used to maintain a cooperative environment that can be exploited [27,28]. Historical [2932] as well as experimental evidence [5,3335] suggests that sanctioning institutions have evolved to overcome the disadvantages and detrimental effects of peer punishment [36,37].

Despite these recent advances, the answer to why humans voluntarily punish wrongdoers—especially in one-time-only encounters—remains inconclusive. It has been suggested, though, to conceive of the second-order free-rider problem as a volunteer's dilemma (VOD) [38] to better capture the strategic nature of the punishment situation [39,40]. The VOD (figure 1) represents a step-level public good game, where the public good is a nonlinear function of the number of cooperators. It can be shown theoretically that conceptualizing the free-rider problem as a public good that is a nonlinear function of the number of cooperators allows for the coexistence of cooperators and non-cooperators in large groups of unrelated individuals [41]. Based on this insight, a recent analysis shows that abandoning the key assumption of a linear production function for the second-order public good may have profound implications for the evolution of punishment and first-order cooperation [40].

Figure 1.

Figure 1.

The volunteer's dilemma. In a VOD, a public good of value ∑Ui for a group of size n ≥ 2 is produced by a single person i choosing C at a cost Ki where Ui > Ki > 0 ∀ i. The public good is not produced if all persons choose D and there is a loss in efficiency if more than one person chooses C. Here, C denotes punishment (i.e. second-order cooperation), D stands for second-order defection and the public good is the prevented negative effect of first-order defection. We distinguish between a symmetric VOD, where Ui = Uj and Ki = Kjij, and an asymmetric VOD, where UiUj and/or Inline graphic.

We investigate the effectiveness and efficiency of peer punishment implemented as a VOD, where only one individual's action is necessary and sufficient to punish the wrongdoer; we advance the argument one step further by also investigating how the interplay of punishment-cost heterogeneity and penalty size affects first- and second-order cooperation. For the latter, we distinguish between a symmetric VOD, where all individuals have the same punishment costs, and an asymmetric VOD, where one group member can punish defectors at a slightly lower cost. In our laboratory experiment, we reduce the first-order cooperation problem to its essential properties by letting subjects decide whether to cooperate, by complying with the status quo of equal benefits for all, or to defect, by making a monetary gain while imposing costs on others. This allows us to focus on how the properties of the second-order problem affect cooperation on both levels.

2. Material and methods

(a). Experimental procedure

A total of 120 subjects participated in our computerized laboratory experiment comprising four sessions. Subjects were students from the University of Zurich and ETH Zurich, 52.5 per cent were female, and they were 23.1 (s.d. = 3.13) years old on average. Upon arrival in the laboratory, subjects were randomly assigned to two of the three experimental conditions and received condition-specific instructions on paper. The instructions that were given to subjects in one of the experimental conditions are reproduced in figures S1 through S4 in the electronic supplementary material, section S1 (translated from German by the authors). Instructions explained the decision situations step by step and contained shots of the actual decision screens. Moreover, subjects learned that their decisions were anonymous, that their payments would correspond to the sum they earned in each round and that they would be administered by a person not involved in the implementation of the experiment. After reading the instructions, subjects took a quiz with questions mainly about the decision situations. Questions for which at least one wrong answer was given were read out loud and the correct answer was explained to all subjects at the same time. Then, the experiment started. A session lasted for about 1 h and subjects earned CHF 40 on average (≈USD 41.2). After the experiment, subjects filled in a questionnaire and could leave the laboratory to get their payment in private. The experiment was programmed and conducted with the software z-Tree [42].

(b). Experimental design

Each experimental session lasted for 30 rounds. In every round, subjects were randomly matched in groups of four, were randomly assigned to be a person 1, person 2, person 3 or person X and were endowed with 140 monetary units (MU) each. In every round, a person X could decide whether or not to ‘steal’ part of the other three persons' endowments (we used the word ‘deduct’ in the instructions). Person X could refrain from stealing and gain 0 MU or steal 70 MU from each of the other persons (i.e. 0 MU or 3 × 70 MU = 210 MU, respectively), and this amount was added to person X's account. Only if person X decided to steal, the other three persons could decide independently whether to reclaim their money from person X. They could do this by clicking on the corresponding areas on their decision field labelled ‘up’ and ‘down’. If at least one of the three chose ‘up’, person X's account was reduced back to 140 MU, and each of the other three persons' accounts amounted to 140 MU again. However, each person that chose ‘up’ was charged a certain amount for this decision. There was no charge for choosing ‘down’, but if all three persons chose ‘down’, the amount that person X had stolen from their accounts was not reclaimed—the three persons thus faced a VOD.

In the first and the second experimental condition, the amount a person was charged for choosing ‘up’ was 30 MU (symmetric). In the third experimental condition, only person 2 was charged 30 MU for choosing ‘up’ and person 1 and person 3 were charged 40 MU (asymmetric). Moreover, the experiment had two parts. In the first part (rounds 1–15), there was no penalty for person X if one of the other three persons chose ‘up’ (no penalty). However, in the second part (rounds 16–30), the account of person X was reduced by an additional amount if at least one of the other three persons chose ‘up’. In the first condition, the penalty was 120 MU (high penalty). In the second and the third condition, the penalty was 40 MU (low penalty). Person X was penalized by these amounts, irrespective of whether one, two or all three of the other persons chose ‘up’. If all three persons chose ‘down’, the stolen amount was not reclaimed and person X was not penalized. Hence, only in the second part of the experiment does ‘reclaiming’ correspond to the standard notion of punishment, where both the punisher and the punished incur a cost from punishment. The first part of the experiment, with punishment being costly only for the punisher, mainly serves as a control condition that allows us to disentangle the effects the two types of punishment costs have on first- and second-order cooperation. Table 1 summarizes the experimental design.

Table 1.

Experimental design. 100 MU correspond CHF 1. The benefit from reclaiming the money stolen by person X is UiKi and corresponds to a net loss relative to the status quo, where person X does not steal. The benefit from not making a reclaim is either Ui or zero, depending on whether at least one other person makes a reclaim or not, respectively (see the electronic supplementary material for detailed predictions).

symmetric: Ui = 70 MU,
Ki = 30 MU
symmetric: Ui = 70 MU,
Ki = 30 MU
asymmetric: Ui = 70 MU,
K1,3 = 40 MU, K2 = 30 MU
part 1 (rounds 1–15) no penalty no penalty no penalty
part 2 (rounds 16–30) high penalty (120 MU) low penalty (40 MU) low penalty (40 MU)

(c). Hypotheses

Based on the payoffs specified in table 1 and the point predictions of our model devised in the electronic supplementary material, sections S2 and S3, we expect a high rate of group-level punishment (i.e. punishment carried out by at least one person per group) in the symmetric condition (hypothesis H1a). We expect an even higher punishment rate in the asymmetric condition (H1b) because we expect subjects there to tacitly agree on the strongest person (i.e. the person with the lowest punishment costs) carrying out the punishment (H2). H2 also implies a higher rate of efficient punishment (carried out by only one person per group) in the asymmetric condition than in the symmetric condition (H3). Finally, we expect the stealing rate to decrease if a penalty is introduced (H4a) and, as a consequence of H2 and the variation in penalty size, we expect this decrease to be smaller in the symmetric condition with a low penalty than in the asymmetric condition or the symmetric condition with a high penalty (H4b).

3. Results

In line with previous findings, our results show first that penalties are effective in deterring stealing by person X (figure 2). Second, and more importantly, our results show that a small increase in punishment-cost heterogeneity can be as effective in deterring stealing as a threefold increase in penalty size. In other words, making clear that there is a person determined to use the stick is as effective as making a ‘diffuse stick’ hurt three times as much. In what follows, we will uncover the underlying mechanism of this strong effect of punishment-cost heterogeneity on the deterrence of stealing by person X. We will show that heterogeneity facilitates coordinated punishment and, therefore, increases the probability that punishment will be carried out, as compared with symmetric situations where coordination is difficult and punishment thus more uncertain (see the electronic supplementary material, section S4 for statistical details and regression analyses).

Figure 2.

Figure 2.

Stealing rate across experimental conditions. Without a penalty (light grey bars), the overall rate is 87% and does not differ across treatments (Inline graphic, p = 0.315). With a penalty (medium and dark grey bars), the overall rate drops by 50 percentage points to 37% (H4a: Inline graphic, p < 0.001). The rate drops by 58 and 59 percentage points in the symmetric condition with a high penalty and the asymmetric condition with a low penalty, respectively—significantly more than in the symmetric condition with a low penalty, where it ‘only’ drops by 36 percentage points. The differences in differences are statistically significant (H4b: Inline graphic p = 0.018 and Inline graphic, p = 0.004, respectively).

In line with our first hypothesis (H1a), the money stolen by person X is being reclaimed to a large extent (figure 3a). We observe 71 per cent of reclaims in the symmetric condition without penalty (predicted 72% by our model). More importantly, in line with our second hypothesis, we observe a significantly higher reclaim rate of 88 per cent in the asymmetric condition without penalty (H1b: Inline graphic p < 0.001). Although the 88 per cent falls short of the 100 per cent predicted by our model, based on previous evidence [43], we would expect it to approach 100 per cent after a longer period of time. In any case, subjects experiencing an average reclaim rate of 88 per cent in the first part, after the introduction of a low penalty in the second part, would expect a loss of 10 MU from stealing [210 MU × (1–0.88) − 40 MU × 0.88 = −10 MU]. This plausibly explains the larger decrease in the stealing rate as compared with the symmetric condition with a low penalty (figure 2), where the expected benefit from stealing would be 32.5 MU [210 MU × (1–0.71) − 40 MU × 0.71 = 32.5 MU]. With a penalty (i.e. in the second part of the experiment), the overall reclaim rate is 81 per cent and does not differ significantly across experimental conditions (Inline graphic p = 0.761).

Figure 3.

Figure 3.

Reclaim rates across experimental conditions. The money stolen by person X is being reclaimed at a high rate across all experimental conditions (a). The rate of reclaims made by only one person is higher in the asymmetric condition than in the symmetric condition (b). This can be attributed to the fact that in the asymmetric condition it is mostly the strong person that makes the reclaims (c). The proportions in (a) and (b) are calculated at the group level, based on whether at least one or exactly one person made the reclaim, respectively. The proportions in (c) are calculated at the individual level, that is, across all groups within a treatment condition (see the electronic supplementary material, table S2 for details).

Figure 3b shows the rate of reclaims made efficiently, i.e. by one person only. In the symmetric condition, the rates are 46 per cent without a penalty and 50 per cent with a penalty (predicted 44% by our model). As hypothesized, these rates are significantly higher in the asymmetric condition, where they reach 67 per cent without a penalty (H3: Inline graphic p < 0.001) and 73 per cent with a penalty (H3: Inline graphic p = 0.008) (predicted 100% by our model). The introduction and the size of the penalty do not affect the rate of efficient reclaims in the symmetric condition (Inline graphic p = 0.679), nor does the introduction of the low penalty in the asymmetric condition (Inline graphic p = 0.529).

Finally, in line with our hypothesis, the higher rate of efficient reclaims in the asymmetric condition can be attributed to the fact that it is mostly the strong person that carries them out (figure 3c). The overall rate of individual level reclaims in the symmetric condition is 35 per cent (predicted 34% by our model), and this is unaffected by the introduction and the size of the penalty (Inline graphic p = 0.207). In the asymmetric condition, the strong person reclaims the money stolen by person X in 83 per cent of cases (predicted 100% by our model), whereas a ‘weak’ person does it only in 12 per cent (H2: Inline graphic p < 0.001), irrespective of whether there is no or a low penalty for person X (Inline graphic p = 0.462).

4. Discussion

In many situations in which the behaviour of one person negatively affects a group of other people only one volunteer is necessary to stop it. Someone playing annoying music in the courtyard, smoking in a waiting room, or free riding on a group project are examples most of us are familiar with. In such situations, if everyone affected expects somebody else to stop the wrongdoing, the harm may persist. However, there will be one person who either suffers most from the wrongdoing—thus benefitting most if it is stopped—or has the least cost in stopping it by punishing the transgressor. Individual heterogeneity in punishment costs can be based on observable and unobservable individual characteristics (e.g. body size and wit, respectively), but can also be situation specific (e.g. distance to the source of noise). In any case, such individual differences are important determinants of coordinated actions and facilitate effective and efficient punishment—apparently, not only in humans [44]. Clearly, unobservable individual characteristics need to be credibly signalled in order to become coordination devices [16], but this necessity may have been a driver of the evolution of human communication in the first place.

Our experiment corroborates that conceiving the second-order free-rider problem as a VOD can explain costly peer punishment in humans without having to resort to punitive preferences. From an evolutionary perspective, assuming nonlinear returns on punishment allows for cooperation to be sustained in an environment, where punishers and non-punishers coexist without explicit assortment of any kind [40,41]. Moreover, our results establish the plausibility that punishment-cost heterogeneity is an important determinant of first- and second-order cooperation (see also [43]). Individual heterogeneity facilitates coordinated punishment because subjects can more easily find a tacit agreement that the strongest group member is to carry it out. This suggests that even an arbitrary assignment of an individual to a focal position in the social hierarchy allows for the endogenous emergence of a centralized and more effective sanctioning system [34,45,46].

The VOD offers a plausible alternative conception of the second-order free-rider problem and many interesting insights still need to be gained from it. Future research should further explore the role individual heterogeneity plays in the evolution of social order. Moreover, the validity and generalizability of these first findings need to be established by means of additional laboratory and field experiments.

Acknowledgements

We thank Marco Archetti, Delia Baldassarri, Werner Güth, Manfred Milinski, Anders Poulsen and two anonymous reviewers for their perceptive comments. We are grateful to Stefan Wehrli and Silvana Jud from DeSciL, the experimental laboratory at ETH Zurich, for their support with the experiment. This research was supported by the Swiss National Science Foundation (grant no. 100017_124877).

References


Articles from Proceedings of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES