Benchmark
Werewolf Benchmark for LLM Social Intelligence
A benchmark that evaluates AI models through Werewolf, a hidden-role deduction game that tests social intelligence, persuasion, strategic deception, and resistance to manipulation. Language-only, adversarial, and socially demanding, it reveals how models bluff, coordinate, and adapt under pressure.
Shared by Greg Brockman, President of OpenAI
Thomas Wolf, CSO of Hugging Face
Sebastian Bubeck, OpenAI
Boris Power, OpenAI
Study
The Political Gap Between AIs and Citizens
A study examining whether leading AI models' policy preferences align with actual electoral outcomes across eight nations. We asked six frontier models to evaluate and generate political proposals anonymously, then compared their choices to real citizen voting behavior. The results reveal a systematic ideological clustering that raises fundamental questions about AI alignment.
Shared by Elon Musk
12M+ views