Paper Detail

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu

arxiv Score 18.3

Published 2026-06-05 · First seen 2026-06-09

Research Track B · General AI

Abstract

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{xie2026slimsearcher,
  title = {SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating},
  author = {Zequn Xie and Junjie Wang and Dan Yang and Jie Feng and Yue Shen and Jian Wang and Jinjie Gu},
  year = {2026},
  abstract = {Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency tra},
  url = {https://arxiv.org/abs/2606.07074},
  keywords = {cs.LG, cs.AI, Supervised Fine-Tuning, Reinforcement Learning, Pareto-efficient filtration, adaptive reward gating, reward-shaping mechanism, tool-call rounds, accuracy-focused training paradigms, brute-force strategies, performative reasoning, trajectory optimization, huggingface daily},
  eprint = {2606.07074},
  archiveprefix = {arXiv},
}

Metadata

{}