Paper Detail

Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling

Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan

arxiv Score 15.6

Published 2026-07-02 · First seen 2026-07-03

General AI

Abstract

Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evaluator. As a result, they may overlook important dimensions of human preference, a failure mode we term dimensional blind spots. To address this limitation, we propose Multi-Role Rubric Generation (MRRG), a training-free and reference-free framework that elicits evaluation criteria from multiple complementary roles and consolidates them into an auditable rubric-based scorer. This scorer can be used both to validate pairwise preferences and to provide rewards for GRPO-style Reinforcement Learning with Verifiable Rewards (RLVR). Experiments on preference validation benchmarks show that MRRG consistently outperforms single-role rubric generation baselines across multiple backbone models. Further RLVR experiments demonstrate that MRRG yields a stronger reward signal for improving open-ended generation.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{fu2026many,
  title = {Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling},
  author = {Dazhi Fu and Jiuding Yang and Yiwen Guo and Jicong Fan},
  year = {2026},
  abstract = {Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evaluator. As a result, they may overlook important dimensions of human preference, a failure mode we term dimensional blind spots. To address this limitation, we propose Multi-Role Rubr},
  url = {https://arxiv.org/abs/2607.01830},
  keywords = {cs.LG},
  eprint = {2607.01830},
  archiveprefix = {arXiv},
}

Metadata

{}