The roster
Models
40 models across 19 vendors, accessed through OpenRouter: current flagships, mid-tiers and small models. Every one predicted its own complete tournament — group scores, bracket, champion — before kickoff. The roster was frozen pre-kickoff; models added later would appear as unranked exhibition entries.
| Model | Tier | |
|---|---|---|
| Jamba Large 1.7 ai21/jamba-large-1.7 | flagship | View |
| Qwen 2.5 72B qwen/qwen-2.5-72b-instruct | legacy | View |
| Qwen3.6 Flash qwen/qwen3.6-flash | small | View |
| Qwen3.7 Max qwen/qwen3.7-max | flagship | View |
| Qwen3.7 Plus qwen/qwen3.7-plus | mid | View |
| Nova 2 Lite amazon/nova-2-lite-v1 | mid | View |
| Claude 3 Haiku anthropic/claude-3-haiku | legacy | View |
| Claude Fable 5 anthropic/claude-fable-5 | flagship | View |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | small | View |
| Claude Opus 4.8 anthropic/claude-opus-4.8 | mid | View |
| Command A cohere/command-a | flagship | View |
| DeepSeek V4 Flash deepseek/deepseek-v4-flash | small | View |
| DeepSeek V4 Pro deepseek/deepseek-v4-pro | flagship | View |
| Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite | small | View |
| Gemini 3.1 Pro Preview google/gemini-3.1-pro-preview | flagship | View |
| Gemini 3.5 Flash google/gemini-3.5-flash | mid | View |
| Gemma 2 27B google/gemma-2-27b-it | legacy | View |
| Mercury 2 inception/mercury-2 | oddball | View |
| Llama 3 70B meta-llama/llama-3-70b-instruct | legacy | View |
| Llama 4 Maverick meta-llama/llama-4-maverick | flagship | View |
| Llama 4 Scout meta-llama/llama-4-scout | small | View |
| WizardLM-2 8x22B microsoft/wizardlm-2-8x22b | oddball | View |
| MiniMax M3 minimax/minimax-m3 | flagship | View |
| Mistral Medium 3.5 mistralai/mistral-medium-3-5 | flagship | View |
| Mistral Small 4 mistralai/mistral-small-2603 | small | View |
| Kimi K2.6 moonshotai/kimi-k2.6 | flagship | View |
| Hermes 3 405B nousresearch/hermes-3-llama-3.1-405b | oddball | View |
| Nemotron 3 Ultra nvidia/nemotron-3-ultra-550b-a55b | flagship | View |
| GPT-3.5 Turbo openai/gpt-3.5-turbo | legacy | View |
| GPT-4 openai/gpt-4 | legacy | View |
| GPT-4o openai/gpt-4o | legacy | View |
| GPT-5.4 Mini openai/gpt-5.4-mini | small | View |
| GPT-5.4 Nano openai/gpt-5.4-nano | small | View |
| GPT-5.5 openai/gpt-5.5 | flagship | View |
| GPT-5.5 Pro openai/gpt-5.5-pro | flagship | View |
| Hunyuan A13B tencent/hunyuan-a13b-instruct | oddball | View |
| Grok 4.20 x-ai/grok-4.20 | mid | View |
| Grok 4.3 x-ai/grok-4.3 | flagship | View |
| GLM 4.7 Flash z-ai/glm-4.7-flash | small | View |
| GLM 5.1 z-ai/glm-5.1 | flagship | View |
Knowledge cutoffs differ between models; that asymmetry is part of what the benchmark measures and is shown rather than corrected for. Full snapshot details in data/roster.json.