Paper Detail

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Qifan Zhang, Dongyang Ma, Tianqing Fang, Jia Li, Jing Tang, Nuo Chen, Haitao Mi, Yan Wang

Browse

Workflow Queues

huggingface Score 5.5

Published 2026-04-20 · First seen 2026-04-21

General AI

Open paper source

Abstract

Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about unseen environments prior to task execution. To instill this ability, we design an outcome-based reward mechanism that measures how much an agent's self-generated world knowledge improves its success rate on downstream tasks. This reward signal is used exclusively during the training phase to teach the model how to explore and summarize effectively. At inference time, the agent requires no external rewards or human instructions. It spontaneously performs native self-evolution to adapt to unknown environments using its internal parameters. When applied to Qwen3-30B and Seed-OSS-36B, this shift to native evolution yields a 20% performance increase on WebVoyager and WebWalker. Most strikingly, the generated world knowledge even enables a compact 14B Qwen3 model to outperform the unassisted Gemini-2.5-Flash, establishing a new paradigm for truly evolving agents.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{zhang2026training,
  title = {Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration},
  author = {Qifan Zhang and Dongyang Ma and Tianqing Fang and Jia Li and Jing Tang and Nuo Chen and Haitao Mi and Yan Wang},
  year = {2026},
  abstract = {Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about unseen environments prior to task execution. To instill this ability, we design an outcome-based reward mechanism that measures how much an agent's self-generated world knowledge impr},
  url = {https://huggingface.co/papers/2604.18131},
  keywords = {meta-evolution, self-evolution, outcome-based reward mechanism, downstream tasks, native self-evolution, world knowledge, Qwen3-30B, Seed-OSS-36B, WebVoyager, WebWalker, Gemini-2.5-Flash, code available, huggingface daily},
  eprint = {2604.18131},
  archiveprefix = {arXiv},
}

Metadata

{}