Paper Detail

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

Browse

Workflow Queues

arxiv Score 14.8

Published 2026-05-07 · First seen 2026-05-09

General AI

Open paper source

Abstract

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft LLM-based verifier, with evaluations conducted on indefinite integral tasks and general mathematical reasoning tasks. Experimental results show that VHG substantially outperforms all baseline methods by a clear margin.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{lai2026verifier,
  title = {Verifier-Backed Hard Problem Generation for Mathematical Reasoning},
  author = {Yuhang Lai and Jiazhan Feng and Yee Whye Teh and Ning Miao},
  year = {2026},
  abstract = {Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifi},
  url = {https://arxiv.org/abs/2605.06660},
  keywords = {cs.LG, cs.AI, cs.CL, Computer science, Component (thermodynamics), Artificial intelligence, Baseline (sea), Theoretical computer science},
  eprint = {2605.06660},
  archiveprefix = {arXiv},
}

Metadata

{}