Paper Detail

Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling

Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan

Browse

Workflow Queues

arxiv Score 15.6

Published 2026-07-02 · First seen 2026-07-03

General AI

Open paper source

Abstract

Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evaluator. As a result, they may overlook important dimensions of human preference, a failure mode we term dimensional blind spots. To address this limitation, we propose Multi-Role Rubric Generation (MRRG), a training-free and reference-free framework that elicits evaluation criteria from multiple complementary roles and consolidates them into an auditable rubric-based scorer. This scorer can be used both to validate pairwise preferences and to provide rewards for GRPO-style Reinforcement Learning with Verifiable Rewards (RLVR). Experiments on preference validation benchmarks show that MRRG consistently outperforms single-role rubric generation baselines across multiple backbone models. Further RLVR experiments demonstrate that MRRG yields a stronger reward signal for improving open-ended generation.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{fu2026many,
  title = {Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling},
  author = {Dazhi Fu and Jiuding Yang and Yiwen Guo and Jicong Fan},
  year = {2026},
  abstract = {Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evaluator. As a result, they may overlook important dimensions of human preference, a failure mode we term dimensional blind spots. To address this limitation, we propose Multi-Role Rubr},
  url = {https://arxiv.org/abs/2607.01830},
  keywords = {cs.LG},
  eprint = {2607.01830},
  archiveprefix = {arXiv},
}

Metadata

{}