Simulating The Prisoner's Dilemma

To betray, or not to betray?

Shri Khalpada

September 15, 2025

The Prisoner's Dilemma is a fundamental concept in game theory that models the conflict between individual rationality and collective benefit. It comes up everywhere from economics to evolutionary biology to social science.

Whether it's two nations deciding whether to disarm, two companies considering a price war, or even two vampire bats deciding whether to share food, the Prisoner's Dilemma captures a fundamental tension in strategic decision-making: should I cooperate for mutual benefit, or defect for personal gain?

Rules of the Game

The concept is simple: Two players are arrested for a crime and interrogated separately. Each has two choices - either cooperate with their partner by staying silent, or defect by betraying them and testifying against them.

Cooperate

Work together for mutual benefit

Defect

Betray your partner for personal gain

P1 Action	P2 Action	P1 Points	P2 Points
Cooperate	Cooperate	3	3
Cooperate	Defect	0	5
Defect	Cooperate	5	0
Defect	Defect	1	1

Mathematical Details

For this to be a true Prisoner's Dilemma, the payoffs have to satisfy a specific mathematical relationship. Let $T$ represent the Temptation payoff (when you defect and your partner cooperates), $R$ the Reward for mutual cooperation, $P$ the Punishment for mutual defection, and $S$ the Sucker's payoff (when you cooperate but your partner defects).

The key inequality is: $T > R > P > S$ , which in our case is $5 > 3 > 1 > 0$ . Additionally, we require $2R > T + S$ (or $6 > 5$ ) to ensure that mutual cooperation is better than alternating between cooperation and defection.

The dilemma arises because while mutual cooperation yields the best collective outcome (3 points each), each player has an individual incentive to defect regardless of what the other does.

The simulator below lets you play a single round of the Prisoner's Dilemma against an opponent who picks their action randomly every time. Try it out and see what happens!

Try A Few Rounds For Yourself

Your opponent will make a random choice.

Opp. Cooperates

Opp. Defects

You Cooperate

Both Cooperate

You: ■■■□□ (3)
Opp: ■■■□□ (3)

Count: 0

You Cooperate
Opp. Defects

You: □□□□□ (0)
Opp: ■■■■■ (5)

Count: 0

You Defect

You Defect
Opp. Cooperates

You: ■■■■■ (5)
Opp: □□□□□ (0)

Count: 0

Both Defect

You: ■□□□□ (1)
Opp: ■□□□□ (1)

Count: 0

You

Opponent

ROUND 1

You: 0-Opp: 0

Your score:—If both cooperated every round:—

You may have noticed that cooperation is not always the best strategy in a one-off game. This makes some sense - if you don't know what your opponent will do, defecting guarantees you at least 1 point, while cooperating risks getting 0 if they defect. Put another way, you eliminate the worst-case scenario of being exploited by defecting yourself, while also giving yourself a chance at exploiting them!

The Power of Memory

When playing against a random opponent, there's little incentive for sustained cooperation. But what happens when your opponent remembers your actions? This is where things get interesting and more realistic. In the real world, businesses interact repeatedly, neighbors see each other daily, and nations engage in ongoing relationships. Let's explore how the game changes when you face a Tit-for-Tat strategy that cooperates on the first move, then mirrors whatever you did previously.

This Time, The Opponent Is Smarter

Your opponent will use a Tit-for-Tat strategy (they'll cooperate first, then copy your last move).

Opp. Cooperates

Opp. Defects

You Cooperate

Both Cooperate

You: ■■■□□ (3)
Opp: ■■■□□ (3)

Count: 0

You Cooperate
Opp. Defects

You: □□□□□ (0)
Opp: ■■■■■ (5)

Count: 0

You Defect

You Defect
Opp. Cooperates

You: ■■■■■ (5)
Opp: □□□□□ (0)

Count: 0

Both Defect

You: ■□□□□ (1)
Opp: ■□□□□ (1)

Count: 0

You

Opponent

ROUND 1

You: 0-Opp: 0

Your score:—If both cooperated every round:—

What Did You Notice?

Against Tit-for-Tat, sustained cooperation becomes more rewarding. If you cooperate, your opponent cooperates back, leading to a steady stream of 3 points per round for both players. But if you defect, you trigger a cycle of mutual defection where both players earn only 1 point per round. The presence of memory fundamentally changes the strategic landscape.

The simulator below lets you try out the different strategies we've explored so far rapidly over multiple rounds. It also introduces the idea of noise, where random errors can occur in the decision-making process. This can simulate real-world scenarios where communication or decision-making is imperfect, such as a nation that mistakenly interprets another nation's actions as hostile.

Try Out Different Strategies

Now let's see how different strategies perform over many rounds.

Your Strategy

Noise: 0%

Opponent's Strategy

Noise: 0%

Opp. Cooperates

Opp. Defects

You Cooperate

Both
Cooperate

You: ■■■□□ (3)
Opp: ■■■□□ (3)

Count: 0

You Cooperate
Opp. Defects

You: □□□□□ (0)
Opp: ■■■■■ (5)

Count: 0

You Defect

You Defect
Opp. Cooperates

You: ■■■■■ (5)
Opp: □□□□□ (0)

Count: 0

Both
Defect

You: ■□□□□ (1)
Opp: ■□□□□ (1)

Count: 0

ROUND 1

You: 0-Opp: 0

Your score:—If both cooperated every round:—

Point Differential Over Time

Simulation Speed:

Population Dynamics

Now let's see how these strategies play out in a dynamic population with three different types of agents: Cooperators, Defectors, and Tit-for-Tat Reciprocators.

Each agent is represented as a particle in the simulation. When agents collide, they play the Prisoner's Dilemma. This simulation models how cooperation might evolve in biological populations, social networks, or even markets. See how cooperation emerges or collapses based on the mix of strategies and the level of noise in the system.

Simulating A Population

Watch how cooperation and defection evolve in a dynamic population of agents.

Population

Cooperators:

Defectors:

Reciprocators:

Reciprocators know about defectors

When enabled, reciprocators start with knowledge of defectors' tendencies. When disabled, defectors will be able to exploit their first interaction with a reciprocator. Checking this box shows the longer term equilibrium.

Settings

Noise: 5%

Chance agents make mistakes

Animation Speed: 1x

Randomize agent speeds

When enabled, each agent will move at a different random speed.

Total Agents: 45 • Collisions: 0

Avg. Cooperator Score

—

0 agents

Avg. Defector Score

—

0 agents

Avg Reciprocator Score

—

0 agents

Total Population Score

the sum of all agents' scores

% Of Max Possible Population Score *

* if every interaction was a cooperation

So, What's The Best Strategy?

Perhaps unsatisfyingly, the answer is "it depends." If you're playing a one-off game with someone you'll never interact with again, defecting is the rational choice, since it maximizes your individual payoff regardless of what the other player does. But in repeated interactions, especially with memory, cooperation can yield much better long-term results.

In practice, a mix of strategies often works best. Starting with cooperation and being willing to forgive occasional defections can foster trust and lead to more stable cooperation over time. The key is to adapt your strategy based on the context and the behavior of your opponent.

If you want to dive deeper into this topic, Veritasium has a fascinating video exploring the Prisoner's Dilemma and its implications.

If you like this type of content, you can follow me on BlueSky. If you wanted to support me further, buying me a coffee would be much appreciated. It helps us keep the lights on and the servers running! ☕

We're just getting started.

Subscribe for more thoughtful, data-driven explorations.