Paper Detail

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Guibin Zhang, Xun Xu, Yanwei Yue, Zikun Su, Wangchunshu Zhou, Xiaobin Hu, Shuicheng Yan

huggingface Score 15.5

Published 2026-06-16 · First seen 2026-06-17

Research Track A · General AI

Abstract

Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write reusable knowledge, and maintain a growing repository. We introduce OPD-Evolver, a slow-fast co-evolution framework that cultivates such an agent evolver through on-policy self-distillation. In the fast loop, OPD-Evolver interacts with a four-level memory hierarchy to read, use, write, and maintain experience for rapid test-time evolution. In the slow loop, outcome-calibrated memory attribution and privileged hindsight distill these four abilities into the deployable policy. Across multi-domain benchmarks, OPD-Evolver surpasses memory systems such as ReasoningBank by up to 11.5%, and training-based methods such as Skill0 by ~5.8%. Further analysis shows that OPD-Evolver internalizes high-value experience and memory management, enabling OPD-Evolver-9B to challenge giant counterparts such as Qwen3.5-397B-A17B and Step-3.5-Flash, pointing beyond memory-augmented agents toward genuinely qualified agent evolvers.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{zhang2026opd,
  title = {OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation},
  author = {Guibin Zhang and Xun Xu and Yanwei Yue and Zikun Su and Wangchunshu Zhou and Xiaobin Hu and Shuicheng Yan},
  year = {2026},
  abstract = {Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write reusable knowledge, and maintain a growing repository. We introduce OPD-Evolver, a slow-fast co-evolution framework that cultivates such an agent evolver through on-policy self-dis},
  url = {https://huggingface.co/papers/2606.17628},
  keywords = {self-evolving agents, memory hierarchy, on-policy self-distillation, slow-fast co-evolution, policy learning, memory management, experience retention, agent evolver, code available, huggingface daily},
  eprint = {2606.17628},
  archiveprefix = {arXiv},
}

Metadata

{}