Paper Detail

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia

Browse

Workflow Queues

huggingface Score 7.5

Published 2026-06-04 · First seen 2026-06-10

General AI

Open paper source

Abstract

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{pan2026retrospective,
  title = {Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts},
  author = {Wenbo Pan and Shujie Liu and Chin-Yew Lin and Jingying Zeng and Xianfeng Tang and Xiangyang Zhou and Yan Lu and Xiaohua Jia},
  year = {2026},
  abstract = {AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Spe},
  url = {https://huggingface.co/papers/2606.05922},
  keywords = {Retrospective Harness Optimization, self-supervised method, agent harness, past trajectories, coreset, parallel re-solving, self-validation, self-consistency, pairwise self-preference, SWE-Bench Pro, code available, huggingface daily},
  eprint = {2606.05922},
  archiveprefix = {arXiv},
}

Metadata

{}