Paper Detail

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Guangyi Liu, Gao Wu, Congxiao Liu, Pengxiang Zhao, Liang Liu, Mading Li, Qi Zhang, Mengyan Wang, Liang Guo, Yong Liu

Browse

Workflow Queues

huggingface Score 10.7

Published 2026-06-18 · First seen 2026-06-24

Research Track B · General AI

Open paper source

Abstract

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{liu2026memgui,
  title = {MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
  author = {Guangyi Liu and Gao Wu and Congxiao Liu and Pengxiang Zhao and Liang Liu and Mading Li and Qi Zhang and Mengyan Wang and Liang Guo and Yong Liu},
  year = {2026},
  abstract = {MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context managemen},
  url = {https://huggingface.co/papers/2606.19926},
  keywords = {MLLM-based mobile GUI agents, ReAct-style prompting, context management, Context-as-Action (ConAct), structured context fields, folded action history, folded UI state, recent step record, end-to-end long-horizon mobile GUI agent, MemGUI-3K, supervised training, offline analysis, MemGUI-Bench, MobileWorld benchmark, code available, huggingface daily},
  eprint = {2606.19926},
  archiveprefix = {arXiv},
}

Metadata

{}