Paper Detail

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu

Browse

Workflow Queues

arxiv Score 18.3

Published 2026-06-05 · First seen 2026-06-09

Research Track B · General AI

Open paper source

Abstract

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{xie2026slimsearcher,
  title = {SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating},
  author = {Zequn Xie and Junjie Wang and Dan Yang and Jie Feng and Yue Shen and Jian Wang and Jinjie Gu},
  year = {2026},
  abstract = {Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency tra},
  url = {https://arxiv.org/abs/2606.07074},
  keywords = {cs.LG, cs.AI, Supervised Fine-Tuning, Reinforcement Learning, Pareto-efficient filtration, adaptive reward gating, reward-shaping mechanism, tool-call rounds, accuracy-focused training paradigms, brute-force strategies, performative reasoning, trajectory optimization, huggingface daily},
  eprint = {2606.07074},
  archiveprefix = {arXiv},
}

Metadata

{}