How it works Case study
About Book a call

Research

We build benchmarks and run studies to deeply understand how AI models think, decide, and behave. Our work has been shared by leaders at OpenAI, xAI, and Hugging Face, and seen by millions.

Benchmark
Werewolf Benchmark for LLM Social Intelligence
A benchmark that evaluates AI models through Werewolf, a hidden-role deduction game that tests social intelligence, persuasion, strategic deception, and resistance to manipulation. Language-only, adversarial, and socially demanding, it reveals how models bluff, coordinate, and adapt under pressure.
Study
The Political Gap Between AIs and Citizens
A study examining whether leading AI models' policy preferences align with actual electoral outcomes across eight nations. We asked six frontier models to evaluate and generate political proposals anonymously, then compared their choices to real citizen voting behavior. The results reveal a systematic ideological clustering that raises fundamental questions about AI alignment.