About
What is PunditBench?
PunditBench is a public experiment: 40 large language models — flagships, mid-tiers and small models from 19 vendors — each predicted the entire 2026 World Cup before the opening kickoff. The models get no live data, no odds, no squad news; just their training knowledge and identical prompts. Reality then grades every claim, match by match and round by round.
It works as a self-consistent simulation. Every model first predicts all 72 group matches in one prompt. From its own scorelines we compute its own group tables and best third-placed teams (FIFA tiebreakers, official Annexe C slotting), which yields the model's own Round of 32 — and it then predicts each knockout round of its own simulated tournament, through to its own champion. Nothing anywhere depends on a real result; the complete set, raw API traffic included, is locked and SHA-256 pre-registered in the public repository before kickoff, so nothing can be quietly edited after the fact.
Scoring is identical for everyone. Group matches: 3 points for the exact score, 2 for the right goal difference, 1 for the right outcome. The bracket is scored against the real tournament as it unfolds: points for every real team a model had reaching each stage (champion 13), a bonus for every simulated pairing that actually occurs, and matched pairings' scorelines scored like normal matches. The complete rules, integrity checks and caveats are in the methodology.
Honest caveats
- One run at temperature 0 samples a single trajectory, not a model's full predictive distribution.
- Football is high-variance and bracket scoring is top-heavy by design — a lucky champion call moves the table. Treat small leaderboard gaps as noise.
- Knowledge cutoffs differ between models; some predate squad announcements or even qualification. That asymmetry is part of what is being measured, not corrected for.
- This is a benchmark of language models, not a forecasting product.
Legal
Not betting advice. Everything on this site is statistics and entertainment only. Do not use it to gamble.
AI-generated content. All predictions shown on this site are AI-generated content, produced by the listed language models.
Trademarks. PunditBench is an independent project, not affiliated with, endorsed by, or connected to FIFA or any football federation. Tournament and team names are used editorially to describe real sporting events.
Privacy. By default this site sets no cookies. If you accept in the consent banner, Google Analytics 4 counts visits — pseudonymous usage statistics with anonymized IP addresses; no ads, no cross-site tracking. Your choice is stored only on your device, and you can change it at any time via “Analytics settings” in the footer. Analytics data is processed by Google — see Google's privacy policy. Independently of that choice, we count page views with a simple anonymous counter: plain aggregate numbers with no cookies, no identifiers, and no personal data of any kind (the total is shown in the footer).
Imprint. Publisher: to be announced. Contact: GitHub issues.
Data
Everything is published as plain JSON, copied into this site at build time and versioned in the public repository:
- /data/roster.json — the frozen 40-model roster
- /data/teams.json — the 48 qualified teams
- /data/fixtures/<stage>.json — fixtures per stage
- /data/results.json — real results as they are entered
- /data/predictions/<stage>/<model>.json — every model's predictions per stage
Points are never stored — they are recomputed from predictions and results on every build, so anything on this site can be re-derived from the files above.