We gave seven AIs command of Europe's great powers to battle for global supremacy. Would o3 betray Claude? Could Gemini outwit DeepSeek? In AI Diplomacy, language models lie, scheme, and form shaky alliances in a high-stakes strategy game.
We've been working on AI Diplomacy for months and are excited to make it public today.
We built this because traditional AI benchmarks are challenging to understand and don't actually reflect how we, as humans, interact with AI. We wanted a way to understand the quality of how AI communicates and its ability to strategize long term. The cherry on top was seeing if it was able to lie and betray!
So we tried something different: What if we just let AI models play Diplomacy against each other and we exposed their communication and thinking behind each move?
The results are both entertaining and insightful. We tested out 18 different models across countless games to understand how each AI performs. One of our favorite insights: OpenAI's o3 turned into a master manipulator, lying and backstabbing its way to victory. Meanwhile, Anthropic's Claude 4 Opus refused to betray anyone—even when losing.
It's completely open source, and we'd love your help making it better! Try different model combinations, suggest new features, or just enjoy watching AIs negotiate (and betray) each other.
Huge thanks to Alex Duffy, Tyler Marques, Sam Paech, The TextArena team, Oam Patel, and countless others for leading the build, and the entire team at Every for making this launch possible.
I love this! Diplomacy is my favorite board game. It's ruthless, and the only element of chance is that players won't do what they said they'd do. I think that this would be an incredible benchmark/arena for alignment and reasoning, including reasoning about the actions of other agents.
It also lowers the barrier to playing Diplomacy for humans. I'd love to have AI players with tunable difficulty, especially because I wouldn't have to worry about ruining any IRL relationships. It reminds me of the chess bots on Chess.com that all have different difficulty levels, personalities, and play styles.
This is a really fun and unique approach to understanding AI behavior! The idea of using AI Diplomacy to test models’ strategic thinking, decision-making, and communication skills is brilliant.
Congratulations on the launch of AI Diplomacy! 🎉 It's intriguing to see how different AI models perform in complex interactions like negotiation and strategy. The insights you’ve shared about their varied behaviors are fascinating! Looking forward to exploring how these AIs negotiate and plot! 🚀