Paper Detail

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge

Browse

Workflow Queues

huggingface Score 17.5

Published 2026-03-29 · First seen 2026-03-31

General AI

Open paper source

Abstract

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision experiences through hindsight reasoning. These experiences are organized into a quality-filtered experience bank that supports policy-driven experience retrieval at inference time. Specifically, MuSEAgent enables adaptive experience exploitation through complementary wide- and deep-search strategies, allowing the agent to dynamically retrieve multimodal guidance across diverse compositional semantic viewpoints. Extensive experiments demonstrate that MuSEAgent consistently outperforms strong trajectory-level experience retrieval baselines on both fine-grained visual perception and complex multimodal reasoning tasks. These results validate the effectiveness of stateful experience modeling in improving multimodal agent reasoning.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{wang2026museagent,
  title = {MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences},
  author = {Shijian Wang and Jiarui Jin and Runhao Fu and Zexuan Yan and Xingjian Wang and Mengkang Hu and Eric Wang and Xiaoxi Li and Kangning Zhang and Li Yao and Wenxiang Jiao and Xuelian Cheng and Yuan Lu and Zongyuan Ge},
  year = {2026},
  abstract = {Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision exper},
  url = {https://huggingface.co/papers/2603.27813},
  keywords = {multimodal reasoning agent, stateful experience learning, hindsight reasoning, experience bank, policy-driven experience retrieval, wide-search strategy, deep-search strategy, multimodal guidance, compositional semantic viewpoints, code available, huggingface daily},
  eprint = {2603.27813},
  archiveprefix = {arXiv},
}

Metadata

{}