Research Paper Cockpit

Daily Digest - 2026-05-10

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-05-13.

Papers

1 visible entries

arxiv Score 7.8

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

2026-05-07 · Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta

General AI

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes c…

Review
pending
Role
unreviewed
Read
soon