Paper Detail

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia

huggingface Score 7.5

Published 2026-06-04 · First seen 2026-06-10

General AI

Abstract

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{pan2026retrospective,
  title = {Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts},
  author = {Wenbo Pan and Shujie Liu and Chin-Yew Lin and Jingying Zeng and Xianfeng Tang and Xiangyang Zhou and Yan Lu and Xiaohua Jia},
  year = {2026},
  abstract = {AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Spe},
  url = {https://huggingface.co/papers/2606.05922},
  keywords = {Retrospective Harness Optimization, self-supervised method, agent harness, past trajectories, coreset, parallel re-solving, self-validation, self-consistency, pairwise self-preference, SWE-Bench Pro, code available, huggingface daily},
  eprint = {2606.05922},
  archiveprefix = {arXiv},
}

Metadata

{}