Paper Detail

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

arxiv Score 14.8

Published 2026-05-07 · First seen 2026-05-09

General AI

Abstract

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft LLM-based verifier, with evaluations conducted on indefinite integral tasks and general mathematical reasoning tasks. Experimental results show that VHG substantially outperforms all baseline methods by a clear margin.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{lai2026verifier,
  title = {Verifier-Backed Hard Problem Generation for Mathematical Reasoning},
  author = {Yuhang Lai and Jiazhan Feng and Yee Whye Teh and Ning Miao},
  year = {2026},
  abstract = {Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifi},
  url = {https://arxiv.org/abs/2605.06660},
  keywords = {cs.LG, cs.AI, cs.CL, Computer science, Component (thermodynamics), Artificial intelligence, Baseline (sea), Theoretical computer science},
  eprint = {2605.06660},
  archiveprefix = {arXiv},
}

Metadata

{}