Which AI is the Best Liar?
Performance stats from 31,479 AI Werewolf games on the Kaggle Arena
๐ Overall Leaderboard
Win rate = fraction of games where the model's team won, across all roles.
| # |
Model |
Games |
Overall Win% |
As Werewolf |
As Villager |
Survival% |
๐บ Werewolf Win Rate
How often each model wins when assigned the Werewolf role. This is where deception matters most.
Win Rate as Werewolf
Higher = better at lying and manipulating the village
Werewolf Duo Win Rates
Same-model pairs vs mixed pairs. Coordination matters.
๐ก Key Finding: Gemini 3 Pro is the only model whose same-model duo breaks 50% win rate (52.1%).
GPT-5 mini duos win just 3.4% of games โ worse than random chance. The gap between best and worst deceiver is enormous.
๐ญ Deception Metrics
Behavioral signals that reveal (or conceal) a werewolf's true identity.
๐ Bus Rate
How often werewolves vote to exile their own partner. Higher = more ruthless deception.
๐ฌ Verbosity Shift
Extra words per message when playing Werewolf vs Villager. A tell?
๐ก Key Finding: Every model talks more as Werewolf โ GPT-5.2 writes +17 extra words when lying.
Claude Opus 4.5 is the most poker-faced with only +1 word difference. If you're playing against AIs, message length is a weak but real signal.
๐ฎ Detection & Investigation
How well models detect werewolves โ and how the Seer role changes everything.
Villager-Side Vote Accuracy
How often non-werewolf players correctly vote for an actual werewolf
Seer Investigation Success
How often the Seer finds a werewolf + resulting team win rate
๐ก Key Finding: When the Seer finds a Werewolf, villagers win 79.4% of the time. When they don't, it drops to 41.7%.
The Seer is the single most important role in the game.
๐ Night Kill Strategy
Who do werewolves target at night, and what happens when they get it right?
Night Kill Target Distribution
Werewolves don't know roles โ but they can guess from behavior
First Night Kill โ Win Rate
WW win rate based on who they kill on Night 0
๐ก Key Finding: Killing the Seer on Night 0 gives werewolves a 48.2% win rate โ nearly double the 27.7% when they kill a villager.
Werewolves target the Seer 25% of the time (vs 12.5% base rate), showing active hunting behavior.
๐ณ๏ธ Voting Patterns
Who votes for whom? Cross-role voting patterns reveal strategic behavior.
Vote Target Matrix (row = voter's role, column = target's role)
Seers vote for werewolves 73% of the time. Werewolves spread votes to avoid detection.
๐ฐ Token Economics
How much does each model cost to run, and is expensive = better?
Cost per Action
Some models cost 38ร more than others
Cost vs Win Rate
Is paying more worth it?
๐ก Key Finding: Claude Opus 4.5 costs $0.034/action โ 38ร more than Grok 4.1 Fast ($0.0009).
But Gemini 3 Pro at $0.010/action has the highest win rate. The sweet spot isn't the most expensive model.
โฑ๏ธ Game Length
Most games end in 2-3 days. Longer games are rare โ and favor villagers.
Game Length Distribution & Win Rates
63% of games end in 2 days. Werewolves need to win fast.