Skip to main content
PLOS One logoLink to PLOS One
. 2014 Oct 29;9(10):e111278. doi: 10.1371/journal.pone.0111278

Optimal Cooperation-Trap Strategies for the Iterated Rock-Paper-Scissors Game

Zedong Bi 1, Hai-Jun Zhou 1,*
Editor: Luo-Luo Jiang2
PMCID: PMC4213018  PMID: 25354212

Abstract

In an iterated non-cooperative game, if all the players act to maximize their individual accumulated payoff, the system as a whole usually converges to a Nash equilibrium that poorly benefits any player. Here we show that such an undesirable destiny is avoidable in an iterated Rock-Paper-Scissors (RPS) game involving two rational players, X and Y. Player X has the option of proactively adopting a cooperation-trap strategy, which enforces complete cooperation from the rational player Y and leads to a highly beneficial and maximally fair situation to both players. That maximal degree of cooperation is achievable in such a competitive system with cyclic dominance of actions may stimulate further theoretical and empirical studies on how to resolve conflicts and enhance cooperation in human societies.

Introduction

The solution concept of Nash equilibrium (NE) plays a fundamental role both in classic game theory and in evolutionary game theory [1][4]. This concept is developed under the assumption that the players of a game system are sufficiently rational, so that they are able to learn accurately the strategies of the competing players and to optimize their own strategy accordingly. A Nash equilibrium is then a point in the strategy space of the game system such that any single player is unable to achieve better performance by changing her/his own strategy in any arbitrary way.

Many non-cooperative games have only a unique NE. When such a game is played by highly rational players who act to maximize their individual accumulated payoff, it is unavoidable that the system will sooner or later converge to this unique equilibrium situation. Unfortunately, however, it is usually the case that the NE of a non-cooperative game is an unfavorable or even miserable destiny for all the players. Let's consider the two-player Prisoner's Dilemma (PD) game as a simple example. The cooperative situation of both players choosing not to confess is much better than the defection situation of both players choosing to confess, but the latter is the unique NE of this game while the former is not [5]. The Nash equilibrium theory therefore predicts that cooperation is unlikely to sustain when rational players face the conflict between self-interest and group benefit. Yet cooperation is actually a ubiquitous phenomenon of human society at all levels, and it is also widely observed in various biological systems. Researchers have been puzzled by these facts very much for many years, and they have proposed a long list of microscopic mechanisms trying to explain the promotion and maintenance of cooperation [6][8].

In this paper we study the issue of cooperation in the itererated two-player Rock-Paper-Scissors (RPS) game, which is a fundamental non-cooperative game with cyclic dominance among its action choices (namely Rock beats Scissors, Scissors beats Paper, and Paper in turn beats Rock), see Fig. 1. This game has been widely used to study competition phenomena in society and biology, especially species diversity and pattern formation (see, for example, references [9][12]). While the NE theoretical framework assumes that the rational players of such a game behave passively in the sense that they try to maximize individual gains by making best responses to the inferred/experienced strategy of the opponent, we assume that one of the players might act more proactively. An intelligent and rational player may ask the following question: how should I design my own strategy so that my rational opponent(s), in best response to me, for sure will adopt certain strategy that is most beneficial to me? In later discussions we refer to such a strategy as a cooperation-trap (CT) strategy, as it has the effect of trapping an opponent in a cooperation state. When optimized, such a CT strategy offers high and maximally fair accumulated payoffs to both players.

Figure 1. The Rock-Paper-Scissors game.

Figure 1

(A) The payoff matrix. Each matrix element is the payoff of the row player X's action in competition with the column player Y's action. (B) The cyclic (non-transitive) dominance relationship among the three candidate actions: Rock (Inline graphic) beats Scissors (Inline graphic), Inline graphic beats Paper (Inline graphic), and Inline graphic in turn beats Inline graphic.

In a literature search for related studies, we found that an early paper of Grofman and Pool [13] investigated cooperation in the PD game from the same angle of intelligent design. In this pioneer but largely forgotten paper, the authors proved that a partial Tit-for-Tat strategy [14] has the potential of enforcing cooperation in the two-player iterated PD game. The Win-stay, Lose-shift strategy [15], [16] can also be analyzed in a similar way.

The present effort can be regarded as an extension of the Grofman-Pool theory to the iterated RPS game, which has the additional difficulty of having more than two action choices that are related by a rotation symmetry (see Fig. 1B). This same theoretical framework may also be applicable to many other two-player iterated non-cooperative games, and it may serve as a guiding principle of designing fair solutions or strategies for the purpose of resolving conflicts and enhancing cooperation in human societies.

Results

The Rock-Paper-Scissors game

Consider two players X and Y playing the RPS game for an indefinite number of rounds. At every game round each player can choose one action among three candidate actions Inline graphic (rock), Inline graphic (paper) and Inline graphic (scissors). This game has only a single parameter, the payoff Inline graphic (Inline graphic) of the winning action (see Fig. 1A). For example, if the player X chooses action Inline graphic in one game round and her opponent Y chooses action Inline graphic, then X wins with payoff Inline graphic and Y loses and gets zero payoff; if the competition is a tie with both players choosing the same action, each player gets unit payoff.

When Inline graphic the system has only a unique NE and it is mixed-strategy in nature, namely each player chooses the three actions with equal probability Inline graphic at every game round independently of each other and of the prior action choices [2]. In this mixed-strategy NE the expected payoff per round (EPR) for each player is then simply Inline graphic. We refer to Inline graphic as the NE payoff. For Inline graphic the NE payoff is less than the unit payoff value each player would get if both players choose the same action in every game round, and consequently the NE is evolutionarily unstable in this parameter range. When Inline graphic the NE mixed strategy outperforms the pure strategy of both players choosing the same action, and the NE is then evolutionarily stable [3], [4], [17] and it is the converging point of various dynamical learning processes [18].

Memoryless cooperation-trap strategies

We now develop CT strategies for player X, and begin with the simplest case of memoryless strategies, namely at every game round player X does not consider her and her opponent's prior actions nor the outcomes of prior plays but chooses actions Inline graphic, Inline graphic and Inline graphic according to the corresponding probabilities Inline graphic, Inline graphic, and Inline graphic (Inline graphic), which are fixed by player X at the beginning of the whole game. Without loss of generality we assume that Inline graphic and Inline graphic, i.e., action Inline graphic is a favoriate choice of X.

As player Y is sufficiently intelligent, he will figure out the strategy of X after a small number of game repeats. Alternatively, with the aim of promoting cooperation from player Y, player X may also explicitly inform Y about her strategy parameters, which are Inline graphic and Inline graphic in the present case. And since Y is sufficiently rational, he then for sure will adopt the optimized probabilities Inline graphic, Inline graphic, and Inline graphic (Inline graphic) of choosing the three actions Inline graphic, Inline graphic and Inline graphic. The EPR Inline graphic of player X and the optimized EPR Inline graphic of player Y are

graphic file with name pone.0111278.e042.jpg (1)
graphic file with name pone.0111278.e043.jpg (2)

If the strategy of player X have the following property that Inline graphic, then because action Inline graphic is strictly the least favored choice of player X, then player Y realizes that it is of his best interest to choose action Inline graphic in every game round (Inline graphic) if Inline graphic but to choose action Inline graphic in every game round (Inline graphic) if Inline graphic. In other words, player X traps player Y to stay in a pure strategy which has maximal degree of predictability. Player X of course should choose the strategy parameters Inline graphic and Inline graphic to maximize her EPR Inline graphic under the constraint of not destroying the nice trapping effect of her strategy. It is not difficult to verify the following conclusions:

  1. If the payoff parameter Inline graphic, the optimal CT strategy is
    graphic file with name pone.0111278.e056.jpg (3)
    (Here and in latter discussions, Inline graphic is an arbitrarily small positive value.) The associated maximal EPR for player X is Inline graphic, while player Y is very satisfied with sticking to action Inline graphic and getting a larger EPR of Inline graphic. To give a concrete example, at Inline graphic we have Inline graphic and Inline graphic, which are considerably larger than the NE payoff Inline graphic.
  2. If Inline graphic, the optimal CT strategy is
    graphic file with name pone.0111278.e066.jpg (4)
    The associated optimal EPR of player X is Inline graphic, while player Y receives a larger EPR value of Inline graphic by sticking to action Inline graphic. Notice that when Inline graphic is sufficiently large, Inline graphic and Inline graphic, which are almost Inline graphic times that of the NE payoff Inline graphic.

Figure 2 gives a direct view about how the optimal EPRs of both players and the optimal CT strategy of player X change with Inline graphic. This optimal memoryless CT strategy indeed offers both players higher accumulated payoffs than the NE mixed strategy does. However, the passive player Y benefits more than the proactive player X. It is then natural for player X to feel that she has sacrificed too much for enforcing cooperation and to declare that such a CT strategy, although better than the NE mixed strategy, is unfair as her opponent earns more by free riding. Furthermore, this CT strategy is worse than the NE mixed strategy in the parameter range of Inline graphic.

Figure 2. Optimal memoryless CT strategy.

Figure 2

The optimal values of both players' expected payoff per round Inline graphic and Inline graphic are shown in the upper panel (in units of NE payoff Inline graphic) for each fixed value of Inline graphic, while the optimal values of the CT strategy's choice probabilities Inline graphic, Inline graphic and Inline graphic are shown in the lower panel. When Inline graphic the NE mixed strategy is better for player X than the CT strategy.

These shortcomings of the memoryless CT strategy can be eliminated by increasing the memory length of the CT strategy.

Cooperation-trap strategies with finite memory length

Recent laboratory experiments carried at Zhejiang University [19] revealed that decision-making of human subjects has strong memory effect, namely the payoffs of the previous game rounds influence considerably a player's action choices in the following game rounds. For the RPS game, the implications of such conditional response strategies have not yet been fully explored. Here we suggest that the proactive player X can adopt an optimized version of such a strategy to enforce fair cooperation.

When the payoff parameter Inline graphic, a play output of win-lose brings payoff Inline graphic to the group, while a tie output only brings lower payoff Inline graphic. Therefore it is desirable for player X to discourage the occurrence of tie output. For the simplest case of unit memory length, the CT strategy then goes as follows: If player X wins over or loses to player Y in the previous game round, then in the next round she chooses action Inline graphic with probability Inline graphic and action Inline graphic with probability Inline graphic (she avoid choosing action Inline graphic, i.e., Inline graphic); but if X ties with Y in the previous game round, then in the next round she chooses the three candidate actions with equal probability Inline graphic. This strategy has only a single parameter Inline graphic. The motivation for player X to adopt the NE mixed strategy after experiencing a tie output is to discourage player Y from choosing action Inline graphic: although Y might get a higher expected payoff in one game round by choosing action Inline graphic rather than action Inline graphic, the former choice has a high probability of leading to a tie, which will then reduce player Y's expected payoff to Inline graphic in the following one or even more game rounds.

On the other hand, when Inline graphic, a play output of tie is better off to the group than a win-lose output. Then player X has the option of implementing a CT strategy to discourage player Y from either winning over or losing to her. Again for the simplest case of unit memory length, the recipe of the CT strategy is: If player X ties with player Y in the previous game round, then in the next round she chooses action Inline graphic with probability Inline graphic and action Inline graphic with probability Inline graphic; but if X either wins over or loses to Y in the previous round, then in the next round she chooses the three candidate actions with equal probability Inline graphic.

It turns out that the optimal CT strategy of unit memory length has the following quantitative properties:

  1. If Inline graphic, then the optimal value Inline graphic for the choice probability Inline graphic is Inline graphic, and the optimal EPRs of player X and player Y are equal, Inline graphic.

  2. If Inline graphic, then Inline graphic, and the optimal EPRs for X and Y are, respectively, Inline graphic and Inline graphic.

  3. If Inline graphic, then Inline graphic, and the optimal EPRs for X and Y are, respectively, Inline graphic and Inline graphic.

  4. If Inline graphic, then Inline graphic, and the optimal EPRs for X and Y are equal, Inline graphic.

Figure 3 gives a direct view of these properties. Compared with the optimal memoryless CT strategy of Fig. 2, we notice a major qualitative improvement is that this new optimal CT strategy achieves fair outcomes to player X and player Y when Inline graphic or Inline graphic. However, this optimal CT strategy of unit memory length is still not perfect, as it is not applicable for Inline graphic, and it is not completely fair to the proactive player X for Inline graphic.

Figure 3. Optimal CT strategy of unit memory length.

Figure 3

The optimal values of both players' expected payoff per round Inline graphic and Inline graphic are shown in the upper panel (in units of NE payoff Inline graphic) for each fixed value of Inline graphic, while the optimal values of the CT strategy's choice probabilities Inline graphic, Inline graphic and Inline graphic are shown in the lower panel. When Inline graphic the NE mixed strategy is better for player X than the CT strategy.

To completely eliminate these undesirable features, player X can increase the memory length of her CT strategy and therefore be more non-tolerant to defection. There are many ways of implementing such an idea. When Inline graphic, arguably the simplest CT strategy of memory length Inline graphic goes as follows: By default player X adopts the mixed strategy Inline graphic in every game round, namely she chooses action Inline graphic with probability Inline graphic and action Inline graphic with the remaining probability Inline graphic; however if a tie occurs in one game round, then player X shifts to the NE mixed strategy Inline graphic in the next Inline graphic game rounds and then shifts back to the default strategy Inline graphic in the (m+1)-th game round. It is a simple exercise to check that, if

graphic file with name pone.0111278.e144.jpg (5)

where Inline graphic, then player Y will be satisfied with sticking to action Inline graphic in every game round. If player X sets the memory length to the smallest positive integer Inline graphic which reduces Eq. (5) to the trivial requirement of Inline graphic, then it is optimal for player X to set Inline graphic to the value Inline graphic, and the optimal EPRs for player X and player Y are equal, Inline graphic. Notice that for Inline graphic approaches Inline graphic from above with Inline graphic, the required minimal memory length diverges as Inline graphic. In other words, it is most difficult to enforce fair cooperation when Inline graphic, see Fig. 4.

Figure 4. Optimal CT strategy of finite memory length.

Figure 4

The optimal values of both players' expected payoff per round Inline graphic and Inline graphic are shown in the upper panel (in units of NE payoff Inline graphic) for each fixed value of Inline graphic, while the minimal memory length Inline graphic of the CT strategy is shown in the lower panel.

If the payoff parameter Inline graphic, an optimal CT strategy with memory length Inline graphic can be constructed following the same line of reasoning as above, namely that player X adopts action Inline graphic at every game round, but if she loses to player Y in one game round, then she shifts to the NE mixed strategy in the next Inline graphic game rounds and then shifts back to the default strategy Inline graphic in the Inline graphic-th game round. We can easily verify that if player X sets the memory length to be Inline graphic, then it is optimal for player Y to stick to action Inline graphic in every game round, and the optimal EPRs for both players are equal, Inline graphic.

As clearly demonstrated in Fig. 4, for each payoff parameter Inline graphic, an optimal CT strategy with a finite memory length Inline graphic can be implemented to achieve maximal and fair accumulated payoff for both players. At Inline graphic, there is no need to adopt a CT strategy, as the NE mixed strategy is itself optimal.

Discussion

We have demonstrated in this paper that fair cooperation can be achieved in the two-player iterated RPS game. Such a highly cooperative state brings maximal accumulated payoff to the group, and it is not enforced by external authorities but by the proactive decision of one player to adopt an optimal cooperation-trap strategy. The basic designing principle of such optimal CT strategies should be generally applicable to other two-player iterated non-cooperation games.

For the optimal CT strategies to work, the passive player Y is assumed to be considerably rational so that he adopts a best response strategy to that of his opponent X to maximize his accumulated payoff, while the proactive player X is assumed in addition to be wise enough so that she does not exploit the cooperation state of her opponent too much but is satisfied with a fair share of the total accumulated group payoff. This latter assumption might be a little bit too strong, but maybe it is not strictly necessary as player Y will punish X for defection behaviors.

For the iterated RPS game, it appears to be impossible for the proactive player X to design a CT strategy which brings higher expected payoff per game round to herself than to her opponent. However, this is not a general conclusion. For some other game systems, notably the iterated PD game [13], [20]), the proactive player X has the option of optimizing her CT strategy to extort her opponent Y. We do not recommend the adoption of such greedy strategies, as the opponent player Y will very likely be frustrated by the defection behaviors of player X and he may then choose not to cooperate even such a choice hurts also himself [21].

When strategic interactions occur in biological systems [9], [10], the involved individual animals, insects, bacteria, cells,…, are of course far from being rational or sufficiently intelligent. However the collective decision-making of such agents at the population level, aided by the evolutionary mechanism of mutation and selection, may appear to be very rational. By trial and error, such systems may develop certain CT-like strategies even without the need of intelligent designing. It would be very interesting to investigate empirically whether CT strategies are actually implemented in some biological systems, such as the formation of symbiosis relationship between two species.

Cooperation in a finite-population RPS game system with more than two players may be much more difficult to achieve than the case of two players. A recent theoretical investigation by one of the present authors [19] suggested that optimized conditional response strategies might offer higher accumulated payoffs to individual players than the NE mixed strategy does. But it is still an open question as to whether high degree of cooperation can also be enforced in a multiple-player iterated RPS game by a number of proactive players. The case of multiple players interacting through a ring topology might serve as the simplest model system to study. We leave such a challenging issue to future investigations.

The iterated two-player RPS game might also serve as a simple system to quantitatively measure the degree of rationality of single human subjects. For example, an experiment can be arranged as follows. A human subject Y plays repeatedly with a fixed opponent X which is actually a computer implementing an optimal CT strategy. But Y does not know that he is playing with a computer and assumes he is playing with another human subject. By analyzing the evolution trajectory of player Y's action choices, we may quantitative measure the learning behavior of player Y and his tendency of making rational decisions. We are discussing with colleagues about the possibility of carrying out such an experimental study.

Acknowledgments

HJZ thanks Zhijian Wang and Bin Xu for a recent fruitful collaboration on the finite-population Rock-Paper-Scissors game, which inspired the present work greatly, and thanks Jiping Huang for comments on the manuscript.

Data Availability

The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper.

Funding Statement

Funding provided by The National Basic Research Program of China (grant number 2013CB932804) to HJZ and The National Science Foundation of China (grant number 11121403 and 11225526) to HJZ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Nash JF (1950) Equilibrium points in n-person games. Proc Natl Acad Sci USA 36: 48–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Osborne MJ, Rubinstein A (1994) A Course in Game Theory. New York: MIT Press. [Google Scholar]
  • 3. Maynard Smith J, Price GR (1973) The logic of animal conflict. Nature 246: 15–18. [Google Scholar]
  • 4.Weibull JM (1995) Evolutionary Game Theory. Cambridge, MA: MIT Press. [Google Scholar]
  • 5.Fisher L (2008) Rock, Paper, Scissors: Game Theory in Everyday Life. New York: Basic Books.
  • 6.Axelrod R (1984) The Evolution of Cooperation. New York: Basic Books. [Google Scholar]
  • 7. Kollock P (1998) Social dilemmas: The anatomy of cooperation. Annu Rev Sociol 24: 183–214. [Google Scholar]
  • 8.Nowak NA, Highfield R (2011) SuperCooperators: Altruism, Evolution, and Why We Need Each Other to Succeed. New York: Free Press. [Google Scholar]
  • 9. Sinervo B, Lively C (1996) The rock-paper-scissors game and the evolution of alternative male strategies. Nature 380: 240–243. [Google Scholar]
  • 10. Kerr B, Riley MA, Feldman MW, Bohannan BJM (2002) Local dispersal promotes biodiversity in a real-life game of rock-paper-scissors. Nature 418: 171–174. [DOI] [PubMed] [Google Scholar]
  • 11. Reichenbach T, Mobilia M, Frey E (2007) Mobility promotes and jeopardizes biodiversity in rock-paper-scissors games. Nature 448: 1046–1049. [DOI] [PubMed] [Google Scholar]
  • 12. Jiang LL, Zhou T, Perc M, Wang BH (2011) Effects of competition on pattern formation in the rock-paper-scissors game. Phys Rev E 84: 021912. [DOI] [PubMed] [Google Scholar]
  • 13. Grofman B, Pool J (1977) How to make cooperation the optimizing strategy in a two-person game. J Math Sociol 5: 173–186. [Google Scholar]
  • 14.Rapoport A, Chammah AM (1965) Prisoner's Dilemma: A Study in Conflict and Cooperation. Ann Arbor, Michigan: University of Michigan Press. [Google Scholar]
  • 15. Kraines D, Kraines V (1993) Learning to cooperate with pavlov: an adaptive strategy for the iterated prisoner's dilemma with noise. Theory and Decision 35: 107–150. [Google Scholar]
  • 16. Nowak M, Sigmund K (1993) A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature 364: 56–58. [DOI] [PubMed] [Google Scholar]
  • 17. Taylor PD, Jonker LB (1978) Evolutionarily stable strategies and game dynamics. Mathematical Biosciences 40: 145–156. [Google Scholar]
  • 18.Sandholm WM (2010) Population Games and Evolutionary Dynamics. New York: MIT Press. [Google Scholar]
  • 19. Wang Z, Xu B, Zhou HJ (2014) Social cycling and conditional responses in the rock-paper-scissors game. Sci Rep 4: 5830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Press WH, Dyson FJ (2012) Iterated prisoner's dilemma contains strategies that dominate any evolutionary opponent. Proc Natl Acad Sci USA 109: 10409–10413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Hilbe C, Röhl T, Milinski M (2014) Extortion subdues human players but is finally punished in the prisoner's dilemma. Nature Commun 5: 3976. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES