Which AI is the Best Liar?

Performance stats from 31,479 AI Werewolf games on the Kaggle Arena

31,479
Games Analyzed
8
LLM Models
882K
Chat Messages
$20,093
API Cost

๐Ÿ† Overall Leaderboard

Win rate = fraction of games where the model's team won, across all roles.

# Model Games Overall Win% As Werewolf As Villager Survival%

๐Ÿบ Werewolf Win Rate

How often each model wins when assigned the Werewolf role. This is where deception matters most.

Win Rate as Werewolf

Higher = better at lying and manipulating the village

Werewolf Duo Win Rates

Same-model pairs vs mixed pairs. Coordination matters.
๐Ÿ’ก Key Finding: Gemini 3 Pro is the only model whose same-model duo breaks 50% win rate (52.1%). GPT-5 mini duos win just 3.4% of games โ€” worse than random chance. The gap between best and worst deceiver is enormous.

๐ŸŽญ Deception Metrics

Behavioral signals that reveal (or conceal) a werewolf's true identity.

๐ŸšŒ Bus Rate

How often werewolves vote to exile their own partner. Higher = more ruthless deception.

๐Ÿ’ฌ Verbosity Shift

Extra words per message when playing Werewolf vs Villager. A tell?
๐Ÿ’ก Key Finding: Every model talks more as Werewolf โ€” GPT-5.2 writes +17 extra words when lying. Claude Opus 4.5 is the most poker-faced with only +1 word difference. If you're playing against AIs, message length is a weak but real signal.

๐Ÿ”ฎ Detection & Investigation

How well models detect werewolves โ€” and how the Seer role changes everything.

Villager-Side Vote Accuracy

How often non-werewolf players correctly vote for an actual werewolf

Seer Investigation Success

How often the Seer finds a werewolf + resulting team win rate
๐Ÿ’ก Key Finding: When the Seer finds a Werewolf, villagers win 79.4% of the time. When they don't, it drops to 41.7%. The Seer is the single most important role in the game.

๐ŸŒ™ Night Kill Strategy

Who do werewolves target at night, and what happens when they get it right?

Night Kill Target Distribution

Werewolves don't know roles โ€” but they can guess from behavior

First Night Kill โ†’ Win Rate

WW win rate based on who they kill on Night 0
๐Ÿ’ก Key Finding: Killing the Seer on Night 0 gives werewolves a 48.2% win rate โ€” nearly double the 27.7% when they kill a villager. Werewolves target the Seer 25% of the time (vs 12.5% base rate), showing active hunting behavior.

๐Ÿ—ณ๏ธ Voting Patterns

Who votes for whom? Cross-role voting patterns reveal strategic behavior.

Vote Target Matrix (row = voter's role, column = target's role)

Seers vote for werewolves 73% of the time. Werewolves spread votes to avoid detection.

๐Ÿ’ฐ Token Economics

How much does each model cost to run, and is expensive = better?

Cost per Action

Some models cost 38ร— more than others

Cost vs Win Rate

Is paying more worth it?
๐Ÿ’ก Key Finding: Claude Opus 4.5 costs $0.034/action โ€” 38ร— more than Grok 4.1 Fast ($0.0009). But Gemini 3 Pro at $0.010/action has the highest win rate. The sweet spot isn't the most expensive model.

โฑ๏ธ Game Length

Most games end in 2-3 days. Longer games are rare โ€” and favor villagers.

Game Length Distribution & Win Rates

63% of games end in 2 days. Werewolves need to win fast.