Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1998 Nov 10;95(23):13755–13758. doi: 10.1073/pnas.95.23.13755

Working memory constrains human cooperation in the Prisoner’s Dilemma

Manfred Milinski 1,*, Claus Wedekind 1
PMCID: PMC24892  PMID: 9811873

Abstract

Many problems in human society reflect the inability of selfish parties to cooperate. The “Iterated Prisoner’s Dilemma” has been used widely as a model for the evolution of cooperation in societies. Axelrod’s computer tournaments and the extensive simulations of evolution by Nowak and Sigmund and others have shown that natural selection can favor cooperative strategies in the Prisoner’s Dilemma. Rigorous empirical tests, however, lag behind the progress made by theorists. Clear predictions differ depending on the players’ capacity to remember previous rounds of the game. To test whether humans use the kind of cooperative strategies predicted, we asked students to play the iterated Prisoner’s Dilemma game either continuously or interrupted after each round by a secondary memory task (i.e., playing the game “Memory”) that constrained the students’ working-memory capacity. When playing without interruption, most students used “Pavlovian” strategies, as predicted, for greater memory capacity, and the rest used “generous tit-for-tat” strategies. The proportion of generous tit-for-tat strategies increased when games of Memory interfered with the subjects’ working memory, as predicted. Students who continued to use complex Pavlovian strategies were less successful in the Memory game, but more successful in the Prisoner’s Dilemma, which indicates a trade-off in memory capacity for the two tasks. Our results suggest that the set of strategies predicted by game theorists approximates human reality.


The “iterated Prisoner’s Dilemma” has become the paradigm for the evolution of the cooperation of egoists (14). Players in this game can either cooperate or defect (not cooperate). If they cooperate, both do better than if both defect. If one player defects while the other cooperates, the defector gets the highest reward, and the cooperator gets the lowest. A rational player should defect no matter what his opponent does if they play only one round; thus, both players will end up with a much lower reward than they would have received if they had cooperated—hence, the dilemma. If the game is played repeatedly by the same players, cooperation by reciprocation (5) is possible (1, 68).

In computer simulations of evolution with randomly generated mixtures of stochastic strategies that respond only to the opponent’s last move (memory-1 strategies), the “generous tit-for-tat” (GTFT) strategy was the evolutionary end product (7). GTFT players usually copy their partner’s last choice but are sometimes cooperative after their partner’s defection. When players adopt strategies that also react to their own previous move (memory-2 strategies), a new winning strategy emerged (8), the “win-stay, lose-shift” or “Pavlov” strategy. With this strategy, one cooperates after both players have cooperated with probability Pcc = 1, after one has cooperated and one’s partner has defected with Pcd = 0, after one has defected and one’s partner has cooperated with Pdc = 0 , and after both defected with Pdd = (almost) 1. The greatest difference between Pavlovian and GTFT strategies is in Pdc, which is 1 for GTFT. Pavlovian players cooperate with GTFT players and other Pavlovian players, exploit unconditional cooperators, and are more heavily exploited by unconditional defectors than GTFT players. When longer memories were allowed (911), memory-4 strategies evolved (taking into account the previous two choices of both players) that were Pavlovian but answered a single defection by defecting twice; their Pdd would be lower than that of strict Pavlovian memory-2 strategies.

Humans have been and still are under Prisoner’s Dilemma selection (12) and thus could have evolved or learned suitable strategies. They are able to cooperate in the Prisoner’s Dilemma (13, 14). A rigorous test of the results of simulations of evolution (711) would be to constrain the subjects’ working-memory capacity experimentally; compared with controls, these subjects should use the predicted memory-1 strategy GTFT more often. In a study with unconstrained subjects, only a minority used GTFT, the rest used Pavlovian strategies (14). We further predict that memory-constrained subjects who adopt a GTFT strategy will profit from a trade-off of their memory capacity.

In our previous study (14), we found that the Pavlovian strategy actually used was both more complex (i.e., its Pdd was much smaller than 1) and significantly more successful than strict Pavlov. Because complex strategies are harder to remember and implement, there are costs for complexity (15). Theorists have dealt with complexity in various plausible ways (1618). A native efficient strategy can only be invaded by strategies that are as efficient but simpler (1518). A strategy will be displaced by another one of greater complexity if the resulting improvement in game payoffs is sufficiently large (18). Thus, if complex Pavlovian strategies become more costly when the subjects’ memory capacity is constrained, we predict, by using this theory, that either simpler Pavlovian or even less complex memory-1 strategies (i.e., the strategies that survived simulations of evolution with short strategic memories) will emerge (68).

Humans have multiple memory systems that include a working (short-term) memory and various long-term systems (1925). Whereas long-term memory has an enormous capacity for storage, coupled with relatively slow input and retrieval, working memory has a limited capacity for storage with rapid input and retrieval (19, 20). Often, a single training trial produces only short-term memory (22). Unrehearsed information is forgotten within seconds or sometimes minutes (19, 22); the most recent information is forgotten last (19).

By supplying subjects with an additional memory task (i.e., playing the game “Memory”) after each choice in the Prisoner’s Dilemma, many factors will work to reduce the number of choices that will be remembered: (i) the recency effect is dissipated by the intervening task; (ii) the additional memory task uses up storage capacity; (iii) the additional memory task prevents rehearsal of previous choices and thus their transfer into long-term memory, and (iv), because the second task takes time, stored information is more likely to decay. If playing Memory reduces the subjects’ working-memory capacity for the Prisoner’s Dilemma, we predict that they will use either simpler Pavlovian strategies or switch to memory-1 strategies (i.e., any GTFT).

METHODS AND RESULTS

First-year biology students of Bern University were assigned either to a constrained-memory group (1996 experiment, n = 16; 1997 experiment, n = 16) or to an unconstrained-memory group (1996 experiment, n = 14; 1997 experiment, n = 16). All groups played the simultaneous Prisoner’s Dilemma (T = 4, R = 3, P = 1, S = 0; ref. 14). A cooperating player receives R points from a cooperating partner but only S points from a defecting one. A defecting player gets T points from a cooperating partner but only S points from a defecting one. In the first session, to learn the game in a social situation, each subject played four games against randomly assigned members of the group. Each player chose to cooperate or defect by lifting a card labeled either “C” or “D”. The most recent pair of choices and payoffs were displayed simultaneously on a large screen opposite the players and the group after both players had chosen. An on-line program randomly terminated each game (14); up to 24 choices occurred.

In the constrained-memory groups, each player uncovered two cards of her Memory game (on the table next to an opaque partition between the players) after each choice in the Prisoner’s Dilemma game. The game consisted of 32 cards showing either Swiss money or blanks (8 cards). If the cards showed the same picture, they were removed; otherwise, they were returned. For Memory players, the mean payoff per choice in the Prisoner’s dilemma was multiplied by the mean payoff per round of Memory, so that nobody could win by concentrating on only one task. The subjects knew that, in each group, the players with the first, second, and third highest mean payoffs after the second session would receive 60, 30, and 10 Swiss Francs, respectively.

In the second session, each student played two games of 20 choices (number not known by the student), each against a different “pseudoplayer” who was not a member of the group. Games in the second session were played without an audience. Otherwise, the procedure was analogous to the first session and included playing Memory in the constrained-memory groups. The pseudoplayers used predetermined strategies to test for Pavlov or GTFT (i.e., to determine the Pcc, Pcd, Pdc, and Pdd values of a subject’s strategy). Either the pseudoplayers always chose C with the exception of a single D for their 16th choice (“allC”), or they made their first 5 choices according to strict tit-for-tat (TFT, i.e. it starts with C and thereafter copies the partner’s previous choice exactly) and subsequently played D (“allD”)

Do constrained-memory players change their strategy in the Prisoner’s Dilemma from Pavlovian strategies in the direction of the predicted memory-1 strategy GTFT? The greatest difference between Pavlov and GTFT is predicted for Pdc (probability to play C after one plays D and one’s opponent plays C). Pdc increased significantly (t = 2.38, P = 0.013, directed) in the constrained-memory players’ strategies. This result suggests that more constrained- than unconstrained-memory players used a GTFT strategy. Compatible with this hypothesis is the finding that the payoff against the allC pseudoplayer was higher for unconstrained-memory players (average payoff = 3.31) than for constrained-memory players (average payoff = 3.12; t = 2.19, P = 0.02, directed). Pavlovian strategies exploit allC, whereas GTFT does not.

To test the hypothesis further, we classified the response of each subject to our allC pseudoplayer into either Pavlovian or TFT-like depending on which type of strategy (Pavlov or TFT) corresponded to the player’s response with fewer mistakes (as in ref. 14). If this did not allow for a decision, we used also the Pdc value from the first session (six cases). The proportion of TFT-like players was significantly higher in the constrained- than in the unconstrained-memory groups (Fig. 1 A and B). Furthermore, our Pavlovian and TFT-like players used their strategies consistently; the Pdc values of all players correlated positively between the second and the first session (r = 0.32, n = 52, P = 0.015, directed). The TFT-like strategies looked similar in constrained- and unconstrained-memory players (Fig. 1D); because their Pcc and Pdc values were almost 1 and their Pcd and Pdd values were much smaller than 1 but above 0, this strategy can be identified as GTFT (7).

Figure 1.

Figure 1

Pavlovian and TFT-like players in unconstrained-memory situation (A) and constrained-memory situation (B). The difference between A and B is χ2 = 8.89, Yates corrected, directed. (C and D) Comparison of the strategy played by Pavlovian players and TFT-like players in unconstrained- (gray) and constrained-memory (black) situations.

Do Pavlovian constrained-memory players change their strategy in the Prisoner’s Dilemma in the direction of strict Pavlov (i.e., the predicted memory-2 strategy)? If so, then Pcd should decrease, and Pdd should increase in Pavlovian constrained-memory players. Only Pcd (probability to play C after one chooses C and one’s partner chooses D) changed significantly (also after Bonferroni adjustment, t = 2.80, P = 0.005, directed), as expected (Fig. 1C). The Pdd remained low, which could indicate either a Pavlovian memory-4 strategy or the memory-2 strategy GRIM that appeared in simulations of evolution but was not an end product (8).

The change in the Pdc strategy from the unconstrained- to the constrained-memory situation allowed for more memory in the Memory game; the payoff gained in the Memory game increased with Pdc (Fig. 2A). Similarly, if the change from Pavlovian strategies to GTFT allows for more memory in the Memory game, GTFT players should gain more in the Memory game than Pavlovian players among the constrained-memory players, which was the case (Fig. 2B). The analogous comparisons between the payoff gained in the Prisoner’s Dilemma game and Pdc (Fig. 2C) or the strategies played (Fig. 2D) indicate a trade-off in using memory capacity for the two tasks.

Figure 2.

Figure 2

The payoff achieved in the Memory game while playing against the allC pseudoplayer compared to the Pdc-value (A; Spearman correlation test) and to the classification into Pavlovian and TFT-like strategies (B; medians and quartiles; Mann–Whitney U test, u = 47.5, directed). The payoff that the constrained-memory players achieved in the Prisoner’s Dilemma game while playing against the allC pseudoplayer compared to the Pdc-value (C; Spearman correlation test) and compared to the classification into Pavlovian and TFT-like strategies (D; medians and quartiles; Mann–Whitney U test, u = 178.5, directed).

Our test relies on the assumption that playing Memory constrained our subjects’ memory capacity so much so that many of them could remember at most the last round of choices in the Prisoner’s Dilemma game. To test this assumption, two other groups of students, after a comparable first session, played the Prisoner’s Dilemma game against a pseudoplayer (C, C, C, C, D, D, D, D, thereafter random). Every second subject played Memory after each choice. After the 20th choice, each subject was asked to write down both players’ 20th, 19th, 18th, etc. choices from memory. Unconstrained-memory players had about 75% correct recall of both the 20th and the 19th round, whereas constrained-memory players achieved this percentage of recall for the 20th round only (Fig. 3). This result suggests that playing Memory constrained the subjects to memory-1 or memory-2 strategies in the Prisoner’s Dilemma. Surprisingly, the pseudoplayer’s first three choices of C were retained in some subjects’ long-term memory (Fig. 3). Again, among the constrained-memory players, GTFT players gained more in the Memory game than Pavlovian players; the payoff achieved in the Memory game correlated significantly with the Pdc value (rs = 0.752, n = 13, P < 0.003, directed), whereas the payoff achieved in the Prisoner’s Dilemma game tended to decrease with increasing Pdc value (rs = −0.411, n = 13, P = 0.10, directed).

Figure 3.

Figure 3

Percentage of correct recall of round 20 to 1 of the Prisoner’s Dilemma game by unconstrained-memory players (gray, n = 14) and constrained-memory players (black, n = 13). Correct recall = 1; incorrect = 0; undecided = 0.5. White area shows significant deviations from 50%, P < 0.05, one-tailed (no negative memory assumed). Correct recall on round 19 is significantly different between unconstrained- and constrained-memory players (z = 1.871, P = 0.038, directed).

DISCUSSION

We regard human cooperation as suitable for testing the predictions of evolutionary models. We have shown that humans adopt strategies in the iterated Prisoner’s Dilemma that are conditional on their memory capacity. Memory capacity thus seems to be a constraint to social behavior. The present test situation is certainly unnatural in many respects. However, the fact that we found a significant change of strategy in the direction of the predicted (7) memory-1 strategy GTFT when we constrained the subjects’ working memory capacity by a secondary memory task, is a strong indication not only of the existence of human cooperation strategies that are conditional on memory capacity but of the unexpected accuracy of the theorists’ findings. Humans seem to use the simple rule of GTFT, as predicted, when they trade-off working-memory capacity. The simple TFT strategy that won Axelrod’s tournaments (1, 6) was not far from human reality. Conditionally cooperative strategies such as TFT have the highest evolutionary robustness as formal models have shown (15, 26). We found either complex Pavlovian strategies or simple GTFT; no memory-constrained subject used strict Pavlovian strategies as a “stepping stone” of intermediate complexity, as might be expected. We are simpletons but only when it pays off.

Acknowledgments

We thank the students for their participation, P. Aeschlimann, P. Boltshauser, M. Christen, M. Frischknecht, R. Künzler, D. Mazzi, N. Steck, K. Turi Nagy, and M. Tognola for assisting as pseudoplayers, M. Nowak, K. Sigmund, and the anonymous reviewers for comments on the manuscript, and the Swiss National Science Foundation for support.

ABBREVIATIONS

GTFT

generous tit-for-tat

TFT

tit-for-tat

Footnotes

This paper was submitted directly (Track II) to the Proceedings Office.

References

  • 1.Axelrod R, Hamilton W D. Science. 1981;211:1390–1396. doi: 10.1126/science.7466396. [DOI] [PubMed] [Google Scholar]
  • 2.May R M. Nature (London) 1981;292:291–292. doi: 10.1038/292291a0. [DOI] [PubMed] [Google Scholar]
  • 3.May R M. Nature (London) 1987;327:15–17. [Google Scholar]
  • 4.Dugatkin L A. Cooperation Among Animals: An Evolutionary Perspective. Oxford: Oxford Univ. Press; 1997. [Google Scholar]
  • 5.Trivers R L. Q Rev Biol. 1971;46:35–57. [Google Scholar]
  • 6.Axelrod R. The Evolution of Cooperation. New York: Basic Books; 1984. [Google Scholar]
  • 7.Nowak M A, Sigmund K. Nature (London) 1992;355:250–253. [Google Scholar]
  • 8.Nowak M, Sigmund K. Nature (London) 1993;364:56–58. doi: 10.1038/364056a0. [DOI] [PubMed] [Google Scholar]
  • 9.Axelrod R. In: Genetic Algorithms and Simulated Annealing. Lawrence D, editor. London: Pitman; 1987. pp. 32–41. [Google Scholar]
  • 10.Lindgren K. In: Artificial Life II. Langton C G, Farmer J D, Rasmussen S, Taylor C, editors. Reading, MA: Addison–Wesley; 1991. pp. 295–312. [Google Scholar]
  • 11.Hauert C, Schuster H G. Proc R Soc London Ser B. 1997;264:513–519. [Google Scholar]
  • 12.Milinski M. Nature (London) 1993;364:12–13. doi: 10.1038/364012a0. [DOI] [PubMed] [Google Scholar]
  • 13.Dawes R M. Ann Rev Psychol. 1980;31:169–193. [Google Scholar]
  • 14.Wedekind C, Milinski M. Proc Natl Acad Sci USA. 1996;93:2686–2689. doi: 10.1073/pnas.93.7.2686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bendor J, Swistak P. American Political Science Review. 1997;91:290–307. [Google Scholar]
  • 16.Rubinstein A. J Econ Theory. 1986;39:83–96. [Google Scholar]
  • 17.Banks J S, Sundaram R K. Games Econ Behav. 1990;2:97–117. [Google Scholar]
  • 18.Binmore K G, Samuelson L. J Econ Theory. 1992;57:278–305. [Google Scholar]
  • 19.Baddeley A. Working Memory. Oxford: Clarendon; 1987. [Google Scholar]
  • 20.Baddeley A. In: Memory Systems 1994. Schacter D L, Tulving E, editors. Cambridge, MA: MIT Press; 1994. pp. 351–367. [Google Scholar]
  • 21.Craik F I M. Annu Rev Psychol. 1979;30:63–102. [Google Scholar]
  • 22.Goelet P, Castelucci V F, Schacher S, Kandel E R. Nature (London) 1986;322:419–422. doi: 10.1038/322419a0. [DOI] [PubMed] [Google Scholar]
  • 23.Squire L R. In: Memory Systems 1994. Schacter D L, Tulving E, editors. Cambridge, MA: MIT Press; 1994. pp. 203–231. [Google Scholar]
  • 24.Squire L R. Science. 1986;232:1612–1619. doi: 10.1126/science.3086978. [DOI] [PubMed] [Google Scholar]
  • 25.Schacter D L, Tulving E. In: Memory Systems 1994. Schacter D L, Tulving E, editors. Cambridge, MA: MIT Press; 1994. pp. 1–38. [Google Scholar]
  • 26.Bendor J, Swistak P. Proc Natl Acad Sci USA. 1995;92:3596–3600. doi: 10.1073/pnas.92.8.3596. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES