Simulating The Prisoner's Dilemma

To betray, or not to betray?

Shri Khalpada

Shri Khalpada

The Prisoner's Dilemma is a fundamental concept in game theory that models the conflict between individual rationality and collective benefit. It comes up everywhere from economics to evolutionary biology to social science.

Whether it's two nations deciding whether to disarm, two companies considering a price war, or even two vampire bats deciding whether to share food, the Prisoner's Dilemma captures a fundamental tension in strategic decision-making: should I cooperate for mutual benefit, or defect for personal gain?

Rules of the Game

The concept is simple: Two players are arrested for a crime and interrogated separately. Each has two choices - either cooperate with their partner by staying silent, or defect by betraying them and testifying against them.

Cooperate
Work together for mutual benefit
Defect
Betray your partner for personal gain
P1 ActionP2 ActionP1 PointsP2 Points
CooperateCooperate33
CooperateDefect05
DefectCooperate50
DefectDefect11
Mathematical Details

For this to be a true Prisoner's Dilemma, the payoffs have to satisfy a specific mathematical relationship. Let TT represent the Temptation payoff (when you defect and your partner cooperates), RR the Reward for mutual cooperation, PP the Punishment for mutual defection, and SS the Sucker's payoff (when you cooperate but your partner defects).

The key inequality is: T>R>P>ST > R > P > S, which in our case is 5>3>1>05 > 3 > 1 > 0. Additionally, we require 2R>T+S2R > T + S (or 6>56 > 5) to ensure that mutual cooperation is better than alternating between cooperation and defection.

The dilemma arises because while mutual cooperation yields the best collective outcome (3 points each), each player has an individual incentive to defect regardless of what the other does.

The simulator below lets you play a single round of the Prisoner's Dilemma against an opponent who picks their action randomly every time. Try it out and see what happens!

Try A Few Rounds For Yourself

Your opponent will make a random choice.

Opp. Cooperates
Opp. Defects
You Cooperate
Both Cooperate
You: ■■■□□ (3)
Opp: ■■■□□ (3)
Count: 0
You Cooperate
Opp. Defects
You: □□□□□ (0)
Opp: ■■■■■ (5)
Count: 0
You Defect
You Defect
Opp. Cooperates
You: ■■■■■ (5)
Opp: □□□□□ (0)
Count: 0
Both Defect
You: ■□□□□ (1)
Opp: ■□□□□ (1)
Count: 0
You
vs
Opponent
ROUND 1
You: 0-Opp: 0
Your score:If both cooperated every round:

You may have noticed that cooperation is not always the best strategy in a one-off game. This makes some sense - if you don't know what your opponent will do, defecting guarantees you at least 1 point, while cooperating risks getting 0 if they defect. Put another way, you eliminate the worst-case scenario of being exploited by defecting yourself, while also giving yourself a chance at exploiting them!

The Power of Memory

When playing against a random opponent, there's little incentive for sustained cooperation. But what happens when your opponent remembers your actions? This is where things get interesting and more realistic. In the real world, businesses interact repeatedly, neighbors see each other daily, and nations engage in ongoing relationships. Let's explore how the game changes when you face a Tit-for-Tat strategy that cooperates on the first move, then mirrors whatever you did previously.

This Time, The Opponent Is Smarter

Your opponent will use a Tit-for-Tat strategy (they'll cooperate first, then copy your last move).

Opp. Cooperates
Opp. Defects
You Cooperate
Both Cooperate
You: ■■■□□ (3)
Opp: ■■■□□ (3)
Count: 0
You Cooperate
Opp. Defects
You: □□□□□ (0)
Opp: ■■■■■ (5)
Count: 0
You Defect
You Defect
Opp. Cooperates
You: ■■■■■ (5)
Opp: □□□□□ (0)
Count: 0
Both Defect
You: ■□□□□ (1)
Opp: ■□□□□ (1)
Count: 0
You
vs
Opponent
ROUND 1
You: 0-Opp: 0
Your score:If both cooperated every round:

What Did You Notice?

Against Tit-for-Tat, sustained cooperation becomes more rewarding. If you cooperate, your opponent cooperates back, leading to a steady stream of 3 points per round for both players. But if you defect, you trigger a cycle of mutual defection where both players earn only 1 point per round. The presence of memory fundamentally changes the strategic landscape.

The simulator below lets you try out the different strategies we've explored so far rapidly over multiple rounds. It also introduces the idea of noise, where random errors can occur in the decision-making process. This can simulate real-world scenarios where communication or decision-making is imperfect, such as a nation that mistakenly interprets another nation's actions as hostile.

Your Strategy
Noise: 0%
Opponent's Strategy
Noise: 0%
Opp. Cooperates
Opp. Defects
You Cooperate
Both
Cooperate
You: ■■■□□ (3)
Opp: ■■■□□ (3)
Count: 0
You Cooperate
Opp. Defects
You: □□□□□ (0)
Opp: ■■■■■ (5)
Count: 0
You Defect
You Defect
Opp. Cooperates
You: ■■■■■ (5)
Opp: □□□□□ (0)
Count: 0
Both
Defect
You: ■□□□□ (1)
Opp: ■□□□□ (1)
Count: 0
ROUND 1
You: 0-Opp: 0
Your score:If both cooperated every round:

Point Differential Over Time

Population Dynamics

Now let's see how these strategies play out in a dynamic population with three different types of agents: Cooperators, Defectors, and Tit-for-Tat Reciprocators.

Each agent is represented as a particle in the simulation. When agents collide, they play the Prisoner's Dilemma. This simulation models how cooperation might evolve in biological populations, social networks, or even markets. See how cooperation emerges or collapses based on the mix of strategies and the level of noise in the system.

Simulating A Population

Watch how cooperation and defection evolve in a dynamic population of agents.

Population

When enabled, reciprocators start with knowledge of defectors' tendencies. When disabled, defectors will be able to exploit their first interaction with a reciprocator. Checking this box shows the longer term equilibrium.

Settings

Chance agents make mistakes
When enabled, each agent will move at a different random speed.
Total Agents: 45  •  Collisions: 0
Avg. Cooperator Score
0 agents
Avg. Defector Score
0 agents
Avg Reciprocator Score
0 agents
Total Population Score
0
the sum of all agents' scores
% Of Max Possible Population Score *
0%
* if every interaction was a cooperation

So, What's The Best Strategy?

Perhaps unsatisfyingly, the answer is "it depends." If you're playing a one-off game with someone you'll never interact with again, defecting is the rational choice, since it maximizes your individual payoff regardless of what the other player does. But in repeated interactions, especially with memory, cooperation can yield much better long-term results.

In practice, a mix of strategies often works best. Starting with cooperation and being willing to forgive occasional defections can foster trust and lead to more stable cooperation over time. The key is to adapt your strategy based on the context and the behavior of your opponent.

If you want to dive deeper into this topic, Veritasium has a fascinating video exploring the Prisoner's Dilemma and its implications.

If you like this type of content, you can follow me on BlueSky. If you wanted to support me further, buying me a coffee would be much appreciated. It helps us keep the lights on and the servers running! ☕

We're just getting started.

Subscribe for more thoughtful, data-driven explorations.