Abstract
The standard model for direct reciprocity is the repeated Prisoner’s Dilemma, where in each round players choose between cooperation and defection. Here we extend the standard framework to include costly punishment. Now players have a choice between cooperation, defection and costly punishment. We study the set of all reactive strategies, where the behavior depends on what the other player has done in the previous round. We find all cooperative strategies that are Nash equilibria. If the cost of cooperation is greater than the cost of punishment, then the only cooperative Nash equilibrium is generous-tit-for-tat (GTFT), which does not use costly punishment. If the cost of cooperation is less than the cost of punishment, then there are infinitely many cooperative Nash equilibria and the response to defection can include costly punishment. We also perform computer simulations of evolutionary dynamics in populations of finite size. These simulations show that in the context of direct reciprocity, (i) natural selection prefers generous tit-for-tat over strategies that use costly punishment, and (ii) that costly punishment does not promote the evolution of cooperation. We find quantitative agreement between our simulation results and data from experimental observations.
1 Introduction
Two key mechanisms for the evolution of any cooperative (or ‘pro-social’ or ‘other-regarding’) behavior in humans are direct and indirect reciprocity. Direct reciprocity means there are repeated encounters between the same two individuals, and my behavior towards you depends on what you have done to me. Indirect reciprocity means there are repeated encounters in a group of individuals, and my behavior towards you also depends on what you have done to others. Our social instincts are shaped by situations of direct and indirect reciprocity. All of our interactions have possible consequences for the future. Either I might meet the same person again or others might find out what I have done and adjust their behavior towards me. Direct reciprocity has been studied by many authors (Trivers 1971, Axelrod & Hamilton 1981, Axelrod 1984, Selten & Hammerstein 1984, Nowak & Sigmund 1989, 1992, 1993, Kraines & Kraines 1989, Fudenberg & Maskin 1990, Imhof et al. 2005, 2007). For indirect reciprocity, see (Sugden 1986, Alexander 1987, Kandori 1992, Nowak & Sigmund 1998, 2005, Lotem et al. 1999, Ohtsuki & Iwasa 2004, Ohtsuki & Iwasa 2005, Panchanathan & Boyd 2005, Brandt & Sigmund 2006, Pacheco et al. 2006a).
Much of human history was spent in small groups, where people knew each other. In such a setting direct and indirect reciprocity must occur. Therefore, even if we think of group selection as a mechanism for the evolution of cooperation among humans (Wynne-Edwards 1962, Wilson 1975, Boyd & Richerson 1990, Wilson & Sober 1994, Bowles 2001, Nowak 2006, Traulsen & Nowak 2006) this could only occur in combination with direct and indirect reciprocity. Reciprocity is an unavoidable consequence of small group size, given the cognitive abilities of humans.
Yamagishi (1986, 1988) proposed that costly punishment can promote cooperation among humans. Axelrod (1986) suggested that costly punishment can stabilize social norms. Fehr & Gächter (2000, 2002) provided further support for these ideas and also suggested that costly punishment is is an alternative mechanism for the evolution of human cooperation that can work independently of direct or indirect reciprocity. While it is interesting to study the effect of costly punishment on human (or animal) behavior (Ostrom et al. 1994, Clutton-Brock & Parker 1995, Burnham & Johnson 2005, Gürerk et al. 2006, Rockenbach & Milinski 2006, Herrmann et al. 2008, Sigmund 2008), it is not possible to consider costly punishment as an independent mechanism. If I punish you because you have defected with me, then I use direct reciprocity. If I punish you because you have defected with others, then indirect reciprocity is at work. Therefore most models of costly punishment that have been studied so far (Boyd & Richerson 1992, Sigmund et al. 2001, Boyd et al. 2003, Brandt et al. 2003, Fowler 2005, Nakamaru & Iwasa 2006, Hauert et al. 2007) tacitly use direct or indirect reciprocity.
Costly punishment is sometimes called ‘altruistic punishment’, because some people use it in the second and last round of a game where they cannot directly benefit from this action in the context of the experiment (Fehr & Gächter 2002, Boyd et al. 2003). We find the term ‘altruistic punishment’ misleading, because typically the motives of the punishers are not ‘altruistic’ and the strategic instincts of people are mostly formed by situations of repeated games, where they could benefit from their action. It is more likely that punishers are motivated by internal anger (Kahneman et al. 1998, Carlsmith et al. 2002, Sanfey et al. 2003) rather than by the noble incentive to do what is best for the community. Thus, ‘costly punishment’ is a more precise term than ‘altruistic punishment’. Costly punishment makes no assumptions about the motive behind the action.
Since costly punishment is a form of direct or indirect reciprocity, the suggestion that costly punishment might promote human cooperation must be studied in the framework of direct or indirect reciprocity. Here we attempt to do this for direct reciprocity.
One form of direct reciprocity is described by the repeated Prisoner’s Dilemma. In each round of the game, two players can choose between cooperation, C, and defection, D. The payoff matrix is given by
(1) |
The game is a Prisoner’s Dilemma if a1 > a2 > a3 > a4.
We can also say that cooperation means paying a cost, c, for the other person to receive a benefit b. Defection means either ‘doing nothing’ or gaining payoff d at the cost e for the other person. In this formulation, the payoff matrix is given by
(2) |
We have b > c > 0 and d, e ≥ 0. This payoff matrix is a subset of all possible Prisoner’s Dilemmas. Not every Prisoner’s Dilemma can be written in this form, only those that have the property of ‘equal gains from switching’, a1 − a2 = a3 − a4 (Nowak & Sigmund 1990).
Including costly punishment means that we have to consider a third strategy, P, which has a cost α for the actor and a cost β for the recipient. The 3 × 3 payoff matrix is of the form
(3) |
Note that the idea of ‘punishment’ is not new in the world of the Prisoner’s Dilemma. The classical ‘punishment’ for defection is defection. Tit-for-tat punishes defection with defection. The new proposal, however, is that there is another form of punishment which is costly for the punisher. Two questions then present themselves. Is it advantageous to use costly punishment, P, instead of defection, D, in response to a co-player’s defection? And furthermore, does costly punishment allow cooperation to succeed in situations where tit-for-tat does not? We will explore these questions in the present paper.
Section 2 contains an analysis of Nash equilibria among reactive strategies. Section 3 presents the results of computer simulations. Section 4 compares our theoretical findings with experimental data. Section 5 concludes.
2 Nash-equilibrium analysis
We are interested in Nash equilibria of the repeated game given by the payoff matrix (3). We assume b, c, α, β > 0 and d, e ≥ 0 throughout the paper. We refer to one game interaction as a ”round”. With probability w (0 < w < 1), the game continues for another round. With probability 1 − w, the game terminates. The number of rounds follows a geometrical distribution with mean 1/(1 − w). The parameter w can also be interpreted as discounting future payoffs.
A ”strategy” of a player is a behavioral rule that prescribes an action in each round. We assume that each player has a probabilistic strategy as follows. In the first round, a player chooses an action (either C, D, or P) with probability p0, q0 and r0, respectively. From the second round on, a player chooses an action depending on the opponent’s action in the previous round. The probability that a player chooses C, D, or P, is given by pi, qi, and ri, for each possible previous action (i = 1, 2, 3 for C, D, P) of the opponent. Thus a strategy is described by twelve values as
(4) |
Since pi, qi, ri are probabilities, our strategy space is
(5) |
This is the product of 4 simplexes, S3. Note that we are considering the set of reactive strategies (Nowak 1990): a player’s move only depends on the co-player’s move in the previous round. This strategy space includes not only reciprocal strategies, but also non-reciprocal unconditional strategies (pi = p, qi = q, ri = r), and paradoxical ones that cooperate less with cooperators than with defectors or punishers (Herrmann et al. 2008). For example, the strategy ‘always defect’ (ALLD) is given by pi = 0, qi = 1, ri = 0. Its action does not depend on the opponent’s behavior.
We introduce errors in execution. It is well-known that errors play an important role in the analysis of repeated games (Molander 1985, May 1987, Nowak & Sigmund 1989, 1992, 1993, Fudenberg & Maskin 1990, Fudenberg & Tirole 1991, Lindgren 1991, Lindgren & Nordahl 1994, Boerlijst et al. 1997, Wahl & Nowak 1999a,b). In our model, a player fails to execute his intended action with probability 2ε. When this occurs, he does one of the other two unintended actions randomly, each with probability ε. We assume 0 < ε < 1/3.
We will now calculate all Nash equilibria of the repeated game. Let u(s1, s2) represent the expected total payoff of an s1-strategist against an s2-strategist. Strategy s is a Nash equilibrium of the repeated game if the following inequality holds for any :
(6) |
This condition implies that no strategy s′ can do better than strategy s against s. In Appendix A, we show a complete list of Nash equilibria.
Since we are interested in the evolution of cooperation, we will restrict our attention to ”cooperative” Nash equilibria. We define a Nash-equilibrium strategy, s, as cooperative if and only if two s-strategists always cooperate in the absence of errors (i.e. ε → 0).
It is easy to show that the criterion above means that we search for Nash equilibria of the form
(7) |
According to the results in Appendix A, there are three types of cooperative Nash equilibria as follows.
2.1 Cooperative Nash equilibria
2.1.1 Cooperative Nash equilibria without defection
Let w′ = w(1 − 3ε). If
(8) |
then there exist cooperative Nash equilibria where defection is never used. These strategies are of the form
(9) |
The probabilities r2 and r3 must satisfy
(10) |
and
(11) |
2.1.2 Cooperative Nash equilibria without punishment
If
(12) |
then there exist cooperative Nash equilibria where punishment is never used. These strategies are of the form
(13) |
The probabilities q2 and q3 must satisfy
(14) |
and
(15) |
Note that while q2 must be a specific value, q3 must only be greater than a certain threshold.
2.1.3 Mixed cooperative Nash equilibria
If
(16) |
then there exist cooperative Nash equilibria where a mixture of defection and punishment can be used. These strategies are of the form
(17) |
The probabilities pi, qi, and ri (i = 2, 3) must satisfy
(18) |
and
(19) |
2.1.4 Does punishment promote cooperation?
We can now ask if costly punishment allows cooperation to succeed when classical direct reciprocity alone does not. From eqs.(8,12,16), we see that if the conditions
(20) |
and
(21) |
hold, there exist cooperative Nash equilibria that use punishment, but none that use only cooperation and defection. Therefore costly punishment can create cooperative Nash equilibria in parameter regions where there would have been none with only tit-for-tat style strategies.
In a classical setting where d = e = 0 and ε → 0, there exist Nash equilibria that use only cooperation and defection if w ≥ c/b. However, the condition is loosened to w ≥ c/(b + β) by introducing costly punishment if α ≤ c. In this case, costly punishment can allow cooperative Nash equilibria even when cooperation is not beneficial (b < c). When α > c, on the other hand, there are no such cooperative Nash equilibria and punishment does not promote cooperation.
2.2 Equilibrium selection and the best cooperative Nash equilibrium
We will now characterize the strategy that has the highest payoff against itself and is a cooperative Nash equilibrium.
2.2.1 Punishment is cheaper than cooperation: α ≤ c
For α ≤ c, there is at least one cooperative Nash equilibrium if
(22) |
We now ask at which cooperative Nash equilibrium is the payoff, u(s, s), maximized, given that eq. (22) is satisfied. Since each cooperative Nash equilibrium achieves mutual cooperation in the absence of errors, the payoff u(s, s) is always of the form
(23) |
Here
(ε) represents a term of order of ε or higher. We now compare the magnitude of this error term among our cooperative Nash equilibria.
Our calculation shows that all strategies given by eqs. (17, 18, 19) are co-maximizers of the payoff. The maximum payoff is given by
(24) |
Figure 1A and 1B are graphical representations of these ‘best’ cooperative Nash equilibria. Therefore, if α ≤ c holds, then both defection and punishment can be used as a response to non-cooperative behavior.
Figure 1.
The best (highest payoff) cooperative Nash equilibria can use costly punishment only if punishment is not more expensive than cooperation, α ≤ c. The strategy space of our model, is shown. From left to right, each simplex represents the initial action, reaction to cooperation, reaction to defection and reaction to punishment, respectively. Each corner (labeled C, D, P) represents a pure reaction. A point in the simplex represents a probabilistic reaction. (A) The best cooperative Nash equilibria when punishment is cheaper than cooperation can use punishment in response to defection and/or punishment. Any pair of points from the line in the D-simplex and the line in the P -simplex is a payoff maximizing cooperative Nash equilibrium. (B) The best cooperative Nash equilibria when punishment is equal in cost to cooperation can use punishment in response to defection, but always cooperates in response to punishment. (C) The best cooperative Nash equilibrium when punishment is more expensive than cooperation is generous tit-for-tat. Only defection and cooperation are used in reaction to defection. Punishment is never used.
2.2.2 Punishment is more expensive than cooperation: α > c
If α > c, then punishment-free strategies are the only candidates for cooperative Nash equilibria. The condition under which there is at least one cooperative Nash equilibrium is
(25) |
When eq. (25) is satisfied, we obtain cooperative Nash equilibria in the form of eqs. (12, 13). Calculation shows that the payoff, u(s, s), is maximized for
(26) |
where
(27) |
It is noteworthy that the strategy (26) corresponds to ‘generous tit-for-tat’ (Nowak 1990, Nowak & Sigmund 1992). Figure 1C shows this ‘best’ cooperative Nash equilibrium. The maximum payoff is given by
(28) |
Therefore, if α > c holds, then the ‘best’ strategy is to defect against a defector with probability (27) and to always cooperate otherwise. Using punishment either reduces the payoff or destabilizes this strategy.
2.2.3 Summary of equilibrium selection
Tables 1 and 2 summarize our search for the best cooperative Nash equilibria.
Table 1.
The ‘best’ cooperative Nash equilibria derived in Sec.2.2. Such strategies are optimal in that each receives the highest payoff against itself. If α ≤ c, optimal cooperative Nash equilibria exist which use costly punishment in response to defection and/or punishment. If α > c, the unique optimal cooperative Nash equilibrium is generous tit-for-tat, which never uses costly punishment. It is interesting that as punishment becomes more expensive, the optimal response to an opponent’s punishment becomes more generous. If α ≥ c, punishment use is always responded to with full cooperation.
Punishment is less costly than cooperation, α ≤ c | Punishment is more costly than cooperation, α > c | |
---|---|---|
Initial move | C | C |
Response to C | C | C |
Response to D |
C or D or P
Any (p2, q2, r2) that satisfies |
C or D
|
Response to P |
C or D or P Any (p3, q3, r3) that satisfies
|
C |
Table 2.
The best (highest payoff) cooperative Nash equilibria derived in Sec.2.2 when ε → 0 and d = e = 0. Again, costly punishment is only used if α ≤ c, and otherwise the optimal strategy is generous tit-for-tat. You can also see that if α ≤ c, strategies which respond to D using only C and P (q2 = 0) are more forgiving than strategies which respond to D using only C and D (r2 = 0).
Punishment is less costly than cooperation, α ≤ c | Punishment is more costly than cooperation, α > c | |
---|---|---|
Initial move | C | C |
Response to C | C | C |
Response to D |
C or D or P
Any (p2, q2, r2) that satisfies |
C or D
|
Response to P |
C or D or P
Any (p3, q3, r3) that satisfies |
C |
3 Individual based simulations
Our analysis in the previous section indicates that there are infinitely many Nash equilibria in this system, some of which are cooperative, and some of which may or may not use costly punishment. We would like to know which strategies are selected by evolutionary dynamics in finite populations. In order to do this, we turn to computer simulations.
3.1 Simulation methods
We consider a well-mixed population of fixed size, N. Again we consider the set of all ‘reactive strategies’. For mathematical simplicity, we begin by assuming that games are infinitely repeated, w = 1. We later investigate the case w < 1. We only examine stochastic strategies, where 0 < pi, qi, ri < 1 for all i. Therefore, it is not necessary to specify the probabilities p0, q0, and r0 for the initial move if w = 1. For w < 1, we will assume that a strategy’s initial move is the same as its response to cooperation, p0 = p1, q0 = q1, r0 = r1.
A game between two players s1 and s2 can be described by a Markov process. For w = 1, the average payoff per round, u(s1, s2), is calculated from the stationary distribution of actions (Nowak & Sigmund 1990). For w < 1, the total payoff is approximated by truncating the series after the first 50 terms.
In our simulations, each player si plays a repeated Prisoner’s Dilemma with punishment against all other players. The average payoff of player si is given by
(29) |
We randomly sample two distinct players s(T) (Teacher) and s(L) (Learner) from the population, and then calculate the average payoffs for each. The learner then switches to the teacher’s strategy with probability
(30) |
This is a monotonically increasing function of the payoff-difference, π(T) − π(L), taking the values from 0 to 1. This update rule is called the ‘pairwise comparison’ (Pacheco et al. 2006b, Traulsen et al. 2006, 2007). The parameter τ > 0 in (30) is called the ‘temperature of selection’. It is a measure of the intensity of selection. For very large τ we have weak selection (Nowak et al. 2004).
In learning, we introduce a chance of ‘mutation’ (or ‘exploration’). When the learner switches his strategy, then with probability μ he adopts a completely new strategy. In this case, the probabilities pi, qi, and ri for each i are randomly generated using a U-shaped probability density distribution as in (Nowak & Sigmund 1992, 1993). We use this distribution to increase the chance of generating mutant strategies that are close to the boundary of the strategy space. This makes it easier to overcome ALLD populations. See Appendix B for details.
The population is initialized with N players using a strategy close to ALLD, pi = 0.0001, qi = 0.9998 and ri = 0.0001. Each simulation result discussed below is the average of four simulations each lasting 5 × 106 generations, for a total of 2 × 107 generations. Unless otherwise indicated, all simulations use the following parameter values: N = 50, μ = 0.1, and τ = 0.8.
Figure 2 shows representative simulation dynamics. At any given time, most players in the population use the same strategy. New mutants arise frequently. Often mutants gain a foothold in the population, occasionally become more numerous, and then die out. Sometimes, a new mutant takes over the population. The process then repeats itself with novel mutations arising in a population where most players use this new resident strategy. Due to the stochastic nature of finite populations, less fit mutants sometimes go to fixation, and fitter mutants sometimes become extinct.
Figure 2.
The dynamics of finite populations. Representative simulation dynamics using pay-off values b = 3, c = 1, d = e = 1, α= 1, and β = 4 are shown, with timeseries of p1 (A), q2 (B, blue), r2 (B, red), and average payoff (C). The most time is spent in strategies near tit-for-tat (p1 ≈ 1, q2 ≈ 1, r2 ≈ 0). Sometimes a cooperative strategy using punishment arises (p1 ≈ 1, q2 < r2), or cooperation breaks down and a strategy near ALLD (p1 ≈ 0, q2 ≈ 1, r2 ≈ 0) becomes most common. Average payoff for tit-for-tat strategies is approximately equal to that of cooperative strategies that use punishment. This is a result of there being very little defection to respond to in both cases. Thus, here the response to defection does not greatly effect average payoffs of cooperative strategies.
3.2 Comparison with Nash equilibrium analysis
3.2.1 Punishment use and relative cost of cooperation versus punishment
Our analysis in Section 2 found that for certain parameter values cooperative Nash equilibria exist, and that the best cooperative Nash equilibria may use punishment only if α ≤ c. If α > c, punishment is never used and the response to (an occasional) P is C. We now examine whether similar results are found in our finite population simulations. We use the payoff values b = 3, c = 1, d = e = 1, and β = 4, and compare the dynamics of α = 0.1 with α = 10.
As shown in Figure 3, the time average frequency of C, D, and P use are similar for both values of α. Most moves are cooperation (α = 0.1 : C = 78.7%; α = 10 : C = 87.6%). Defection occurs less frequently (α = 0.1 : D = 17.0%; α = 10 : D = 10.5%). Punishment is rare (α = 0.1 : P = 4.3%; α = 10 : P = 1.9%).
Figure 3.
The relative size of α and c has little effect on move frequencies. The time average frequency of cooperation, defection, and punishment are shown for α= 0.1 (red) and α= 10 (blue), with b = 3, c = 1, d = e = 1, and β = 4. Simulation parameters μ = 0.1, and τ = 0.8 are used. Move use is time averaged over N = 50 players, playing for a total of 2 × 107 generations. Consistent with the Nash equilibrium analysis, there is a high level of cooperation in both cases, and the α> c simulation contains slightly more C, less D, and less P than the α< c simulation.
These simulation results are consistent with the Nash equilibrium analysis. The high level of cooperation observed in the simulations agrees with the presence of cooperative Nash equilibria for the chosen payoff values. The α > c simulation contained more C, less D, and less P than the α < c simulation. This also agrees with the Nash equilibrium analysis. If α > c, the Nash equilibrium response to punishment is full cooperation, and punishment is never used in response to defection. Therefore, you would expect to find more C, less D, and less P if α > c than if α < c. The magnitude of these differences in move frequencies is small, which is also in agreement with the Nash equilibrium analysis: most of the time is spent in mutual cooperation, which is unaffected by the cost of punishment.
The time average strategies for both simulations are shown in Figure 4. In Figure 4A, we see little difference in the response to cooperation between α values. Cooperation is the overwhelming response to cooperation (α = 0.1 : p1 = 0.83; α = 10 : p1 = 0.91). This is consistent with the Nash equilibrium analysis for both values of α. The background level of P in response to C that exists when α < c decreases when α > c (α = 0.1 : r1 = 0.06; α = 10 : r1 = 0.02). It is intuitive that such illogical punishment is less favorable when punishment is very costly.
Figure 4.
As predicted by Nash equilibrium analysis, we see less punishment and more cooperation if α> c than if α< c. Shown are strategy time averages for α= 0.1 (red) and α= 10 (blue), with b = 3, c = 1, d = e = 1, and β = 4. Simulation parameters μ = 0.1, and τ = 0.8 are used. Strategies are time averaged over N = 50 players, playing for a total of 2 × 107 generations. There is agreement between the Nash equilibria analysis and the computer simulations on the high level of mutual cooperation regardless of the value of α, and the low level of punishment when α> c. However, the computer simulations find that even when α< c, the response to non-cooperation is much more likely to be defection than punishment. This suggests that of the infinitely many possible cooperative Nash equilibria, those that use tit-for-tat style defection in response to defection are favored by evolution over those that use costly punishment in response to defection.
In Figure 4B, we see some differences between the two simulations in their response to defection. When α < c, D is most often responded to with D (α = 0.1 : q2 = 0.63), but D is also sometimes responded to with P (α = 0.1 : r2 = 0.24). When α > c, D is responded to with P much less often (α = 10 : r2 = 0.04), and D in response to D is much more common (α = 10 : q2 = 0.85). The lack of P in response to D when α > c is consistent with the Nash equilibrium analysis. The significant preference for D over P when α < c, however, is somewhat surprising. The Nash equilibrium results suggest that D and P can both be used when α < c, and so we would not expect that q2 is necessarily much greater than r2. Additionally, we see much less forgiveness (C in response to D) in both simulations than predicted by the Nash equilibrium analysis (α = 0.1 : p2 = 0.13; α = 10 : p2 = 0.11). Presumably this is a consequence of increased randomness in small (finite) populations. A very small increase in generosity, p2, destabilizes GTFT by allowing the invasion of ALLD. Thus in finite populations, stable cooperative strategies must stay far from the GTFT generosity threshold, and therefore forgive defection less often than GTFT.
In Figure 4C, we see major differences between the two simulations in their response to punishment. In both simulations, the most frequent response to P is D (α = 0.1 : q3 = 0.52; α = 10 : q3 = 0.61). However, the use of C and P are starkly different depending on the value of α. When α < c, C is much less common than P (α = 0.1 : p3 = 0.18, r3 = 0.30). When α > c, the opposite is true: C is common (α = 10 : p3 = 0.32) whilst P is rare (α = 10 : r3 = 0.06). This preference for C in response to P instead of P in response to P when α > c is consistent with the Nash equilibrium analysis. The mix of D and P in response to P when α < c is also consistent with the Nash equilibrium analysis. Again, however, we see less forgiveness in both simulations than predicted by the Nash equilibrium analysis.
In summary, we find agreement between the Nash equilibrium analysis and the computer simulations on the high level of mutual cooperation regardless of the value of α, and the low level of punishment when α > c. However, the computer simulations find that even when α < c, the response to defection is much more likely to be defection than punishment. This suggests that of the infinitely many possible cooperative Nash equilibria, those that use tit-for-tat style defection in response to defection are favored by evolutionary dynamics in finite populations over those that use costly punishment in response to defection.
3.2.2 Does punishment promote the evolution of cooperation?
Our analysis in Section 2 found that if eqs. (20,21) are satisfied, then costly punishment allows for cooperative Nash equilibria even if none would exist using only cooperation and defection. We now ask what strategies are selected in our finite population simulations as cooperation becomes less beneficial. We use the payoff values c = 1, d = e = 1, α = 1, and β = 4, and examine the level of cooperation as b varies. Our computer simulations use w = 1 and ε → 0. Therefore, eqs. (20,21) are satisfied when b < 1.
As shown in Figure 5, cooperation is the prevailing outcome when b > 1, whereas cooperation is rare (< 5%) when b ≤ 1. In the case that b > 1, cooperative Nash equilibria exist regardless of whether costly punishment is used. Thus it is not surprising to find a high level of cooperation in this parameter region. However, our Nash equilibrium analysis suggests that costly punishment could stabilize cooperation if b < 1. This does not seem to be the case in the finite population simulations. Contrary to the predictions of the Nash equilibrium analysis, costly punishment does not allow cooperation to evolve unless direct reciprocity alone would also be sufficient. Thus, in the context of direct reciprocity, punishment does not promote the evolution of cooperation.
Figure 5.
Costly punishment does not promote the evolution of cooperation. Frequency of cooperation is shown as b is varied, with c = 1, α= 1, d = e = 1, and β = 4. Simulation parameters μ = 0.1, and τ = 0.8 are used. Cooperation frequency is time averaged over N = 50 players, playing for a total of 2 ×107 generations. Cooperation is high when b > 1, and very low (< 5%) when b ≤ 1. This is the same pattern as would be seen in finite population simulations of classical direct reciprocity. Cooperation only succeeds where it would have in classical direct reciprocity. So we see that contrary the Nash equilibrium analysis, costly punishment does not promote the evolution of cooperation in the framework of direct reciprocity.
3.3 Robustness to parameter variation
Our simulations suggest that for both α < c and α > c, costly punishment is used less often than defection as a response to defection. Now we test how robust this finding is to variation in the payoff and simulation parameters. We choose the following baseline parameter values: b = 3, c = 1, d = e = 1, α = 1, β = 4, τ = 0.8, and μ = 0.1. Each parameter is then varied, and for each value four simulations are run, each lasting 5 × 106 generations (for a total of 2 × 107 generations). The value of all strategy parameters pi, qr, ri for each player are time averaged over the entire simulation time. As we are interested in the response to defection of essentially cooperative strategies (i.e. strategies that usually cooperate when playing against themselves), we only examine players with time average p1 > 0.75. Among these cooperative players, we then examine the time average probabilities to defect in response to defection, q2, and to punish in response to defection, r2. Figure 6 shows the results of varying payoffs β, α, d, and b, as well as simulation parameters τ, μ, and w on the time average values of q2 and r2 among cooperative players.
Figure 6.
Evolution disfavors the use of costly punishment across a wide range of payoff and simulation parameter values. Probabilities of D in response to D (q2, blue) and P in response to D (r2, red) among cooperative strategies (p1 > 0.75) are shown, time averaged over a total of 2 × 107 generations. Parameters β (A), d (B), α(C), b (D), τ (E), and μ (F) are varied. Values other than that which is varied are set to b = 3, c = 1, d = e = 1, α= 1, β = 4, τ = 0.8, and μ = 0.1. For all parameter sets explored, q2 > r2 holds: strategies that respond to D with D more often than P are selected by evolution. (A) As punishment becomes more costly for the player who is punished, it becomes more effective to use P. Yet even with a 25 : 1 punishment technology, we see q2 > r2. (B) As defection becomes more effective, it makes even less sense to punish. (C) As punishment gets more expensive for the punisher, it becomes less effective to punish. Yet even if α= 0, defection is still used more than punishment. (D) Increasing b decreases the total number of D moves, and therefore decreases the selection pressure acting on q2 and r2. This has the effect of moving both values towards 1/3, thus decreasing q2 and increasing r2. (E) Increasing the temperature of selection τ also reduces selection pressure, and all move probabilities approach 1/3. Yet for all values of τ, we find q2 > r2. (F) As mutation rate μ increases, mutation dominates selection and all move probabilities approach 1/3. Yet for all values μ< 1, we find q2 > r2. (G) Even in finitely repeated games, w < 1, defection is favored over punishment, q2 > r2.
In Figure 6A, we see that increasing β decreases defection use relative to punishment use. As punishment becomes more costly for the player who is punished, it becomes more effective to use P. Yet even with a 25 : 1 punishment technology, we find that q2 > r2.
In Figure 6B, we see that increasing d (and e, as we assume that d = e) increases defection use relative to punishment use. As defection becomes more effective, it makes even less sense to punish. For d > 4, defection is more damaging to the recipient than punishment, while at the same time it is not costly to use to defect. Therefore, the probability to use defection in response to defection q2 approaches 1. On the other extreme, even when defection is passive, d = e = 0, it is still true that q2 > r2.
In Figure 6C, we see that increasing α increases defection use relative to punishment use. As punishment gets more expensive for the punisher, it becomes less effective to punish. Yet even if α = 0, defection is still used more than punishment.
In Figure 6D, we see that increasing b increases punishment use relative to defection. However, the probability to punish in response to defection r2 never rises above 1/3. As b increases, cooperation becomes increasingly prevalent and so there is less defection to respond to. When b = 2, we find that 28% of moves are D, as opposed to 4%D when b = 25. This reduces the selection pressure on players’ response to defection, and both q2 and r2 approach chance, 1/3. This has the effect of decreasing q2 and increasing r2, but still it is always true that q2 > r2.
In Figure 6E, we see that increasing the temperature of selection τ reduces selection intensity, and all move probabilities approach 1/3. However, for all values of τ, it is never true that q2 < r2. Punishment is never favored over defection as a response to an opponent’s defection.
In Figure 6F, we see a similar pattern for increasing the mutation rate μ. As μ approaches 1, mutation dominates selection and all move probabilities approach 1/3. But for all values μ < 1, it is true that q2 > r2. Again, an opponent’s D is always more often responded to with defection than with punishment.
In Figure 6G, we relax the assumption that games are infinitely repeated. To make direct comparisons between w = 1 and w < 1 simulations possible, τ is increased by a factor of 1/(1 − w) when w < 1. This compensates for w = 1 payoffs reflecting average payoff per round, whereas w < 1 payoffs reflect total payoff. We see that for values w < 1, it is still true that q2 > r2. Thus the observation that defection is favored over punishment as a response to the opponent’s defection applies to finite as well as infinitely repeated games.
The most striking aspect of Figure 6 is that in all cases, q2 > r2 holds. The evolutionary preference for ‘D in response to D′ over ‘P in response to D′ is stable against variation in all parameter values. Hence we conclude that for reasonable parameter values, defection is always used more often than costly punishment as a response to defection. Evolution in finite populations disfavors strategies that use costly punishment in repeated games.
4 Comparison with experimental results
Many behavioral experiments have investigated the effect of costly punishment on human cooperation (Yamagishi 1986, Ostrom et al. 1992, Fehr & Gächter 2000, 2002, Page et al. 2005, Bochet et al. 2006, Gürerk et al. 2006, Rockenbach & Milinski 2006, Denant-Boemont et al. 2007, Dreber et al. 2008, Herrmann et al. 2008). We would like to compare our model predictions with the observed behavior in such experiments. However, the experimental setup used in most previous punishment studies differs from the situation described in this paper. The option to punish is offered as a separate stage following the decision to cooperate or defect. Only the design used by Dreber et al. (2008) is directly comparable with our model. Subjects played a repeated 3-option Prisoner’s Dilemma, with payoff structure as in Eq. (3). The payoff values b = 3, c = 1, d = e = 1, α = 1, and β = 4 were used, and interactions had a continuation probability of w = 0.75. Given the similarity between this experimental setup and the analysis presented here, we focus on the data from this experiment and make comparisons with predictions from (i) the Nash-equilibrium analysis and (ii) individual based simulations.
4.1 Nash equilibrium analysis
Following Dreber et al. (2008), we use the payoff values b = 3, c = 1, d = e = 1, α = 1, and β = 4. It seems likely that human strategic instincts evolved in small societies where the probability of future interactions was always high. Thus, we study the limit w → 1. According to our Nash equilibrium analysis in section 2, we obtain
(31) |
as the best cooperative strategy in the limit of w → 1 and ε → 0 (see Figure 1B). Against a defector, this strategy uses defection with probability 0.5 and no punishment, or uses punishment with probability 0.286 and no defection, or a linear combination of these values. Therefore, the possibility of using costly punishment is consistent with a cooperative Nash equilibrium. However, the experimental data suggest that the use of costly punishment is disadvantageous. More specifically, Dreber et al. (2008) found a strong negative correlation between total payoff and the probability to use P in response to an opponent’s D. In this experimental situation, winners use a tit-for-tat like strategy while losers use costly punishment. Hence, Nash equilibrium analysis does not provide a good explanation of the observed experimental data.
4.2 Computer simulations
Consistent with the idea that humans evolved in settings where future encounters were likely, we find quantitative agreement between the experimental data and finite population simulations using w = 1 (Figure 7). The optimal value of τ = 0.8 was determined by minimizing the sum of squared differences between model predictions and observed data. As shown in Figure 7, this fit is robust to variation in the mutation rate μ.
Figure 7.
We see quantitative agreement between the finite population simulations and human behavior in an experimental setting. Moreover, this agreement is robust to variation in the mutation rate μ. Time average strategies from Dreber et al. (2008) experimental data (red) and finite population simulation using μ = 0.01 (green), μ = 0.05 (yellow), and μ = 0.1 (blue) are shown. The simulation uses the Dreber et al. (2008) payoff values b = 3, c = 1, d = e = 1, α= 1, and β = 4, and temperature τ = 0.8. Such quantitative agreement demonstrates the power of finite population size analysis for describing human behavior.
In both the computer simulations and the Nash equilibrium analysis, defection is used much more often than punishment after the opponent defects or punishes. We also see a similar use of cooperation: cooperation is reciprocated, but unlike in the Nash equilibrium analysis, the computer simulations find that it is uncommon to cooperate after the opponent has defected or punished. The agreement between computer simulations and experimental data demonstrates the power of finite population size analysis (Nowak et al. 2004, Imhof & Nowak 2006) for characterizing human behavior, as does the robustness of the fit to variation in the mutation rate μ. With this method, ad hoc assumptions about other-regarding preferences are not needed to recover human behavior in the repeated Prisoner’s Dilemma with costly punishment.
5 Discussion
It is important to study the effect of punishment on human behavior. It has been suggested that costly punishment can lead to cooperation independent of other mechanisms such as direct or indirect reciprocity (Fehr & Gächter 2002). We do not think this is the case, because costly punishment is always a form of direct or indirect reciprocity. If I punish you because you have defected with me, then it is direct reciprocity. If I punish you because you have defected with others, then it is indirect reciprocity. Therefore, the precise approach for the study of costly punishment is to extend cooperation games from two possible moves, C and D, to three possible moves, C, D, and P and then study the consequences. In order to understand whether costly punishment can really promote cooperation, we must examine the interaction between costly punishment and direct or indirect reciprocity.
There are two essential questions to ask about such extended cooperation games. Should costly punishment be the response to a co-player’s defection, instead of defection for defection as in classical direct reciprocity? And does the addition of costly punishment allow cooperation to succeed in situations where direct or indirect reciprocity alone do not? Here we have explored these questions for the set of reactive strategies in the framework of direct reciprocity.
We have calculated all Nash equilibria among these reactive strategies. A subset of those Nash equilibria are cooperative, which means that these strategies would always cooperate with each other in the absence of errors. We find that if the cost of cooperation, c, is less than the cost of punishment, α, then the only cooperative Nash equilibrium is generous-tit-for-tat, which does not use costly punishment. However if the cost of cooperation, c, is greater than the cost of punishment, α, then there are infinitely many cooperative Nash equilibria and the response to a co-player’s defection can be a mixture of C, D and P. We also find that the option for costly punishment allows such cooperative Nash equilibria to exist in parameter regions where there would have been no cooperation in classical direct reciprocity (including the case where the cost of cooperation, c, is greater than the benefit of cooperation, b, making cooperation unprofitable). Therefore if α < c, it is possible for a cooperative Nash equilibrium to use costly punishment.
We have also performed computer simulations to study how evolutionary dynamics in finite sized population choose among all strategies in our strategy space. We find that for all parameter choices that we have investigated, costly punishment, P, is used less often than defection, D, in response to a co-player’s defection. We also find that costly punishment fails to stabilize cooperation when cost of cooperation, c, is greater than the benefit of cooperation, b. Therefore, in the context of repeated interactions (1) natural selection opposes the use of costly punishment, and (2) costly punishment does not promote the evolution of cooperation. Winning strategies tend to stick with generous-tit-for-tat and ignore costly punishment, even if the cost of punishment, α, is less than the cost of cooperation, c.
Moreover, the results of our computer simulations are in quantitative agreement with data from an experimental study (Dreber et al. 2008). In this game, people behave as predicted from the calculations of evolutionary dynamics, as opposed to the Nash equilibrium analysis. This agreement supports the validity of the idea that analysis of evolutionary dynamics in populations of finite size helps to understand human behavior.
Perhaps the existence of cooperative Nash equilibria that use costly punishment lies behind some people’s intuition about costly punishment promoting cooperation. But our evolutionary simulations indicate that these strategies are in fact not selected in repeated games.
In summary, we conclude that in the framework of direct reciprocity, selection does not favor strategies that use costly punishment, and costly punishment does not promote the evolution of cooperation.
Acknowledgments
Support from the John Templeton Foundation, the NSF/NIH joint program in mathematical biology (NIH grant R01GM078986), and J. Epstein is gratefully acknowledged.
Appendix A
In this appendix we will analytically derive all Nash equilibria of the repeated game.
We notice that saying strategy s is a Nash equilibrium is equivalent to saying that s is a best response to s. In other words, s is a Nash equilibrium if it is a strategy that maximizes the payoff function
(A.1) |
For each strategy s, we ask what strategy s* is the best response to s. If s* happens to be the same as s, then s is a Nash equilibrium.
For that purpose, we shall use a technique in dynamic optimization (Bellman 1957). Let us define the ‘value’ of cooperation, defection, and punishment when the opponent uses strategy s. The value of each action is defined as the sum of its immediate effect and its future effect. For example, if you cooperate with an s-opponent, you immediately lose the cost of cooperation payoff, c. In the next round (which exists with probability w), however, the s-opponent reacts to your cooperation with co-operation, defection, or punishment, with probabilities p1, q1 and r1, respectively. Because we consider reactive strategies, your cooperation in round t has no effects on your payoff in round t + 2 or later. Thus the value of cooperation, υC, is given by
(A.2) |
in the absence of errors. When the effect of errors is incorporated, the value of each action is given by
(A.3) |
Given eq. (A.3), the best response to strategy s is surprisingly simple: it is ‘always take the action whose value is the largest’. If there is more than one best action, then any combination of such best actions is a best responses.
Depending on the relative magnitudes of υC, υD and υP, we obtain seven different cases.
A.1 When υC > υD, υP
The best response to s is ‘always cooperate’;
(A.4) |
Let us assume that s* = s holds. Substituting s* for s in eq. (A.3) gives us
(A.5) |
(note that we will neglect the common term wε(b − e − β) in υ’s in the following). We see that υC can never be the largest of the three. Contradiction. Therefore, there are no Nash equilibria in this case.
A.2 When υD > υP, υC
The best response to s is ‘always defect’;
(A.6) |
Let us assume that s* = s holds. Substituting s* for s in eq. (A.3) gives us
(A.7) |
Hence υD is always the largest, which is consistent with our previous assumption. Therefore, eq. (A.6) is always a Nash-equilibrium strategy.
A.3 When υP > υC, υD
The best response to s is ‘always punish’;
(A.8) |
Let us assume that s* = s holds. Substituting s* for s in eq. (A.3) gives us
(A.9) |
Obviously, υD is the largest of the three. Contradiction. Therefore, there are no Nash equilibria in this case.
A.4 When υD = υP > υC
The best response to s is ‘never cooperate’;
(A.10) |
Let us assume that s* = s holds. Substituting s* for s in eq. (A.3) gives us
(A.11) |
The strategies described by eq. (A.10) are Nash equilibria if the condition υD = υP > υC holds, which is equivalent to
(A.12) |
A.5 When υP = υC > υD
The best response to s is ‘never defect’;
(A.13) |
Let us assume that s* = s holds. Substituting s* for s in eq. (A.3) gives us
(A.14) |
The strategies described by eq. (A.13) are Nash equilibria if the condition υP = υC > υD holds, which is equivalent to
(A.15) |
A.6 When υC = υD > υP
The best response to s is ‘never punish’;
(A.16) |
Let us assume that s* = s holds. Substituting s* for s in eq. (A.3) gives us
(A.17) |
The strategies described by eq. (A.16) are Nash equilibria if the condition υC = υD > υP holds, which is equivalent to
(A.18) |
A.7 When υC = υD = υP
Any strategy is a best response to s;
(A.19) |
Let us assume that s* = s holds. For the three values of action, eq. (A.3), to be the same, we need
(A.20) |
The strategies described by eq. (A.19) are Nash equilibria if the condition eq. (A.20) is satisfied.
Appendix B
B.1 Mutation kernel
When a learner switches strategies, with probability μ he experiments with a new strategy as opposed to adopting the strategy of the teacher. In this case, the probabilities pi, qi, and ri for each i (i = 1, 2, 3 for C, D, P) are randomly assigned as follows. Two random numbers X1 and X2 are drawn from a U-shaped beta distribution described by the probability density function
(B.1) |
The simulations presented here use γ = 0.1, with the exception of the experimental data fit using w = 0.75 (Figure 7B). There, γ = 0.01 is used. Varying γ affects the total level of cooperation (p1), but it is still the case that q2 > r2. Without loss of generality, let X1<X2. Three random numbers between 0 and 1, , and are then generated using X1 and X2:
(B.2) |
(B.3) |
(B.4) |
Finally pi, qi, ri, and are randomly matched with , and . This order randomization is necessary to preserve pi + qi + ri = 1 while still maintaining the same probability distribution for each variable. The resulting distribution is U-shaped, with the probability density near 0 twice as large as that near 1.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Alexander RD. The biology of moral systems. Aldine de Gruyter; New York: 1987. [Google Scholar]
- Axelrod R, Hamilton WD. The evolution of cooperation. Science. 1981;211:1390–1396. doi: 10.1126/science.7466396. [DOI] [PubMed] [Google Scholar]
- Axelrod R. The evolution of cooperation. New York: Basic Books, USA; 1984. [Google Scholar]
- Axelrod R. An Evolutionary Approach to Norms. Amer Polit Sci Rev. 1986;80:1095–1111. [Google Scholar]
- Bellman R. Dynamic Programming. Princeton University Press; Princeton: 1957. [Google Scholar]
- Bochet O, Page T, Putterman L. Communication and punishment in voluntary contribution experiments. J Econ Behav Org. 2006;60:1126. [Google Scholar]
- Boerlijst MC, Nowak MA, Sigmund K. The logic of contrition. J Theor Biol. 1997;185:281–293. doi: 10.1006/jtbi.1996.0326. [DOI] [PubMed] [Google Scholar]
- Bowles S. Individual interactions, group conflicts, and the evolution of preferences. In: Durlauf SN, Young HP, editors. Social Dynamics. Cambridge, MA: MIT Press; 2001. pp. 155–190. [Google Scholar]
- Boyd R, Richerson P. Group selection among alternative evolutionarily stable strategies. J Thoer Biol. 1990;145:331–342. doi: 10.1016/s0022-5193(05)80113-4. [DOI] [PubMed] [Google Scholar]
- Boyd R, Richerson PJ. Punishment allows the evolution of cooperation (or anything else) in sizable groups. Ethol Sociobiol. 1992;13:171–195. [Google Scholar]
- Boyd R, Gintis H, Bowles S, Richerson PJ. Evolution of altruistic punishment. Proc Natl Acad Sci USA. 2003;100:3531–3535. doi: 10.1073/pnas.0630443100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brandt H, Hauert C, Sigmund K. Punishment and reputation in spatial public goods games. Proc R Soc Lond B. 2003;270:1099–1104. doi: 10.1098/rspb.2003.2336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brandt H, Sigmund K. The good, the bad and the discriminator - Errors in direct and indirect reciprocity. J Thoer Biol. 2006;239:183–194. doi: 10.1016/j.jtbi.2005.08.045. [DOI] [PubMed] [Google Scholar]
- Burnham TC, Johnson DP. The biological and evolutionary logic of human cooperation. Analyse and Kritik. 2005;27:11335. [Google Scholar]
- Carlsmith K, Darley JM, Robinson PH. Why Do We Punish? Deterrence and Just Deserts as Motives for Punishment. J Pers Soc Psychol. 2002;83:284–299. doi: 10.1037/0022-3514.83.2.284. [DOI] [PubMed] [Google Scholar]
- Clutton-Brock TH, Parker GA. Punishment in animal societies. Nature. 1995;373:209–216. doi: 10.1038/373209a0. [DOI] [PubMed] [Google Scholar]
- Denant-Boemont L, Masclet D, Noussair CN. Punishment, counterpunishment and sanction enforcement in a social dilemma experiment. Econ Theory. 2007;33:145167. [Google Scholar]
- Dreber A, Rand DG, Fudenberg D, Nowak MA. Winners don’t punish. Forthcoming in Nature. 2008 doi: 10.1038/nature06723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fehr E, Gächter S. Cooperation and punishment in public goods experiments. Am Econ Rev. 2000;90:980–994. [Google Scholar]
- Fehr E, Gächter S. Altruistic punishment in humans. Nature. 2002;415:137–140. doi: 10.1038/415137a. [DOI] [PubMed] [Google Scholar]
- Fehr E, Fischbacher U, Gächter S. Strong reciprocity, human cooperation and the enforcement of social norms. Hum Nature. 2002;13:1–25. doi: 10.1007/s12110-002-1012-7. [DOI] [PubMed] [Google Scholar]
- Fowler JH. Altruistic punishment and the origin of cooperation. Proc Natl Acad Sci. 2005;102:70477049. doi: 10.1073/pnas.0500938102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fudenberg D, Maskin E. Evolution and cooperation in noisy repeated games. Am Econ Rev. 1990;80:274279. [Google Scholar]
- Fudenberg D, Tirole J. Game theory. Cambridge, MA: MIT Press; 1991. [Google Scholar]
- Gürerk O, Irlenbusch B, Rockenbach B. The competitive advantage of sanctioning institutions. Science. 2006;312:108111. doi: 10.1126/science.1123633. [DOI] [PubMed] [Google Scholar]
- Gintis H. Strong reciprocity and human sociality. J Thoer Biol. 2000;206:169–179. doi: 10.1006/jtbi.2000.2111. [DOI] [PubMed] [Google Scholar]
- Hauert C, Traulsen A, Brandt H, Nowak MA, Sigmund K. Via freedom to coercion: the emergence of costly punishment. Science. 2007;316:1905–1907. doi: 10.1126/science.1141588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herrmann B, Thöni C, Gächter S. Antisocial punishment across societies. Science. 2008;319:1362–1367. doi: 10.1126/science.1153808. [DOI] [PubMed] [Google Scholar]
- Imhof LA, Fudenberg D, Nowak MA. Evolutionary cycles of cooperation and defection. P Natl Acad Sci USA. 2005;102:10797–10800. doi: 10.1073/pnas.0502589102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imhof LA, Nowak MA. Evolutionary game dynamics in a Wright-Fisher process. J Math Biol. 2006;52:667–681. doi: 10.1007/s00285-005-0369-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imhof LA, Fudenberg D, Nowak MA. Tit-for-tat or win-stay, lose-shift? J Theor Biol. 2007;247:574–580. doi: 10.1016/j.jtbi.2007.03.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kahneman D, Schkade D, Sunstein CR. Shared Outrage and Erratic Awards: The Psychology of Punitive Damages. J Risk & Uncertainty. 1998;16:49–86. [Google Scholar]
- Kandori M. Social norms and community enforcement. Rev Econ Stud. 1992;59:63–80. [Google Scholar]
- Kraines D, Kraines V. Pavlov and the prisoner’s dilemma. Theory and Decision. 1989;26:47–79. [Google Scholar]
- Lindgren K. Evolutionary phenomena in simple dynamics. In: Langton C, et al., editors. Artificial Life II. Redwood City, CA: Addison-Wesley; 1991. pp. 295–312. [Google Scholar]
- Lindgren K, Nordahl MG. Evolutionary dynamics of spatial games. it Physica D. 1994;75:292–309. [Google Scholar]
- Lotem A, Fishman MA, Stone L. Evolution of cooperation between individuals. Nature. 1999;400:226–227. doi: 10.1038/22247. [DOI] [PubMed] [Google Scholar]
- May RM. More evolution of cooperation. Nature. 1987;327:15–17. [Google Scholar]
- Molander P. The Optimal Level of Generosity in a Selfish, Uncertain Environment. J Conflict Resolution. 1985;29:611–618. [Google Scholar]
- Nakamaru M, Iwasa Y. The coevolution of altruism and punishment: Role of the selfish punisher. J Theor Biol. 2006;240:475–488. doi: 10.1016/j.jtbi.2005.10.011. [DOI] [PubMed] [Google Scholar]
- Nowak MA. Stochastic strategies in the prisoners dilemma. Theor Popul Biol. 1990;38:93–112. [Google Scholar]
- Nowak MA. Five rules for the evolution of cooperation. Science. 2006;314:1560–1563. doi: 10.1126/science.1133755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nowak MA, Sasaki A, Taylor C, Fudenberg D. Emergence of cooperation and evolutionary stability in finite populations. Nature. 2004;428:646–650. doi: 10.1038/nature02414. [DOI] [PubMed] [Google Scholar]
- Nowak MA, Sigmund K. Oscillations in the evolution of reciprocity. J theor Biol. 1989;137:21–26. doi: 10.1016/s0022-5193(89)80146-8. [DOI] [PubMed] [Google Scholar]
- Nowak MA, Sigmund K. The evolution of stochastic strategies in the Prisoner’s Dilemma. Acta Appl Math. 1990;20:247–265. [Google Scholar]
- Nowak MA, Sigmund K. Tit for tat in heterogeneous populations. Nature. 1992;355:250253. [Google Scholar]
- Nowak MA, Sigmund K. A strategy of win-stay, lose-shift that out-performs tit for tat in Prisoner’s Dilemma. Nature. 1993;364:56–58. doi: 10.1038/364056a0. [DOI] [PubMed] [Google Scholar]
- Nowak MA, Sigmund K. Evolution of indirect reciprocity by image scoring. Nature. 1998;393:573–577. doi: 10.1038/31225. [DOI] [PubMed] [Google Scholar]
- Nowak MA, Sigmund K. Evolution of indirect reciprocity. Nature. 2005;437:1291–1298. doi: 10.1038/nature04131. [DOI] [PubMed] [Google Scholar]
- Ohtsuki H, Iwasa Y. How should we define goodness? Reputation dynamics in indirect reciprocity. J Theor Biol. 2004;231:107–120. doi: 10.1016/j.jtbi.2004.06.005. [DOI] [PubMed] [Google Scholar]
- Ohtsuki H, Iwasa Y. The leading eight: social norms that can maintain cooperation by indirect reciprocity. J Theor Biol. 2005;239:435–444. doi: 10.1016/j.jtbi.2005.08.008. [DOI] [PubMed] [Google Scholar]
- Ostrom E, Walker J, Gardner R. Covenants with and without a sword: selfgovernance is possible. Am Pol Sci Rev. 1992;86:404417. [Google Scholar]
- Ostrom E, Gardner J, Walker R. Rules, Games, and Common-Pool Resources. Univ. of Michigan Press; Ann Arbor: 1994. [Google Scholar]
- Pacheco JM, Santors FC, Chalub FACC. Stern-judging: A simple, successful norm which promotes cooperation under indirect reciprocity. PLoS Comput Biol. 2006;2:16341638. doi: 10.1371/journal.pcbi.0020178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pacheco JM, Traulsen A, Nowak MA. Active linking in evolutionary games. J Theor Biol. 2006;243:437–443. doi: 10.1016/j.jtbi.2006.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Page T, Putterman L, Unel B. Voluntary association in public goods experiments: reciprocity, mimicry and efficiency. Econ J. 2005;115:10321053. [Google Scholar]
- Panchanathan K, Boyd R. A tale of two defectors: The importance of standing for evolution of indirect reciprocity. J Theor Biol. 2005;224:115126. doi: 10.1016/s0022-5193(03)00154-1. [DOI] [PubMed] [Google Scholar]
- Rockenbach B, Milinski M. The efficient interaction of indirect reciprocity and costly punishment. Nature. 2006;444:718–723. doi: 10.1038/nature05229. [DOI] [PubMed] [Google Scholar]
- Sanfey AG, Rilling JK, Aronson JA, Nystrom LE, Cohen JD. The neural basis of economic decision-making in the Ultimatum Game. Science. 2003;300:1755–1758. doi: 10.1126/science.1082976. [DOI] [PubMed] [Google Scholar]
- Selten R, Hammerstein P. Gaps in Harleys argument on evolutionarily stable learning rules and in the logic of tit for tat. Behav Brain Sci. 1984;7:115116. [Google Scholar]
- Sigmund K, Hauert C, Nowak MA. Reward and punishment. Proc Natl Acad Sci USA. 2001;98:10757–10762. doi: 10.1073/pnas.161155698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sigmund K. Punish or perish: enforcement and reciprocation in human collaboration. Trends Ecol Evol. 2008;22:593–600. doi: 10.1016/j.tree.2007.06.012. [DOI] [PubMed] [Google Scholar]
- Sugden R. The economics of rights, cooperation and welfare. Blackwell; Oxford: 1986. [Google Scholar]
- Traulsen A, Nowak MA. Evolution of cooperation by multilevel selection. Proc Natl Acad Sci USA. 2006;103:10952–10955. doi: 10.1073/pnas.0602530103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Traulsen A, Nowak MA, Pacheco JM. Stochastic dynamics of invasion and fixation. Phys Rev E. 2006;74:011909. doi: 10.1103/PhysRevE.74.011909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Traulsen A, Pacheco JM, Nowak MA. Pairwise comparison and selection temperature in evolutionary game dynamics. J Theor Biol. 2007;246:522–529. doi: 10.1016/j.jtbi.2007.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trivers RL. The evolution of reciprocal altruism. Q Rev Biol. 1971;46:35–57. [Google Scholar]
- Wahl LM, Nowak MA. The continuous prisoner’s dilemma: I. Linear reactive strategies. J Theor Biol. 1999;200:307–321. doi: 10.1006/jtbi.1999.0996. [DOI] [PubMed] [Google Scholar]
- Wahl LM, Nowak MA. The continuous prisoner’s dilemma: I. Linear reactive strategies with noise. J Theor Biol. 1999;200:307–321. doi: 10.1006/jtbi.1999.0996. [DOI] [PubMed] [Google Scholar]
- Wilson DS. A Theory of Group Selection. Proc Nat Acad Sci. 1975;72:143–146. doi: 10.1073/pnas.72.1.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson DS, Sober E. Reintroducing group selection to the human behavioral sciences. Behavioral and Brain Sciences. 1994;17:585–654. [Google Scholar]
- Wynne-Edwards VC. Animal dispresion; in relation to social behaviour. Oliver and Boyd; London: 1962. [Google Scholar]
- Yamagishi T. The provision of a sanctioning system as a public good. J Pers Soc Psych. 1986;51:110–116. [Google Scholar]
- Yamagishi T. Seriousness of social dilemmas and the provision of a sanctioning system. Soc Psychol Quart. 1988;51:32–42. [Google Scholar]