The roster

Models

40 models across 19 vendors, accessed through OpenRouter: current flagships, mid-tiers and small models. Every one predicted its own complete tournament — group scores, bracket, champion — before kickoff. The roster was frozen pre-kickoff; models added later would appear as unranked exhibition entries.

ModelTier
Jamba Large 1.7

ai21/jamba-large-1.7

flagshipView
Qwen 2.5 72B

qwen/qwen-2.5-72b-instruct

legacyView
Qwen3.6 Flash

qwen/qwen3.6-flash

smallView
Qwen3.7 Max

qwen/qwen3.7-max

flagshipView
Qwen3.7 Plus

qwen/qwen3.7-plus

midView
Nova 2 Lite

amazon/nova-2-lite-v1

midView
Claude 3 Haiku

anthropic/claude-3-haiku

legacyView
Claude Fable 5

anthropic/claude-fable-5

flagshipView
Claude Haiku 4.5

anthropic/claude-haiku-4.5

smallView
Claude Opus 4.8

anthropic/claude-opus-4.8

midView
Command A

cohere/command-a

flagshipView
DeepSeek V4 Flash

deepseek/deepseek-v4-flash

smallView
DeepSeek V4 Pro

deepseek/deepseek-v4-pro

flagshipView
Gemini 3.1 Flash Lite

google/gemini-3.1-flash-lite

smallView
Gemini 3.1 Pro Preview

google/gemini-3.1-pro-preview

flagshipView
Gemini 3.5 Flash

google/gemini-3.5-flash

midView
Gemma 2 27B

google/gemma-2-27b-it

legacyView
Mercury 2

inception/mercury-2

oddballView
Llama 3 70B

meta-llama/llama-3-70b-instruct

legacyView
Llama 4 Maverick

meta-llama/llama-4-maverick

flagshipView
Llama 4 Scout

meta-llama/llama-4-scout

smallView
WizardLM-2 8x22B

microsoft/wizardlm-2-8x22b

oddballView
MiniMax M3

minimax/minimax-m3

flagshipView
Mistral Medium 3.5

mistralai/mistral-medium-3-5

flagshipView
Mistral Small 4

mistralai/mistral-small-2603

smallView
Kimi K2.6

moonshotai/kimi-k2.6

flagshipView
Hermes 3 405B

nousresearch/hermes-3-llama-3.1-405b

oddballView
Nemotron 3 Ultra

nvidia/nemotron-3-ultra-550b-a55b

flagshipView
GPT-3.5 Turbo

openai/gpt-3.5-turbo

legacyView
GPT-4

openai/gpt-4

legacyView
GPT-4o

openai/gpt-4o

legacyView
GPT-5.4 Mini

openai/gpt-5.4-mini

smallView
GPT-5.4 Nano

openai/gpt-5.4-nano

smallView
GPT-5.5

openai/gpt-5.5

flagshipView
GPT-5.5 Pro

openai/gpt-5.5-pro

flagshipView
Hunyuan A13B

tencent/hunyuan-a13b-instruct

oddballView
Grok 4.20

x-ai/grok-4.20

midView
Grok 4.3

x-ai/grok-4.3

flagshipView
GLM 4.7 Flash

z-ai/glm-4.7-flash

smallView
GLM 5.1

z-ai/glm-5.1

flagshipView

Knowledge cutoffs differ between models; that asymmetry is part of what the benchmark measures and is shown rather than corrected for. Full snapshot details in data/roster.json.