Paper Detail

Seek to Segment: Active Perception for Panoramic Referring Segmentation

Song Tang, Shuming Hu, Xincheng Shuai, Henghui Ding, Yu-Gang Jiang

arxiv Score 16.6

Published 2026-07-02 · First seen 2026-07-03

General AI

Abstract

Existing referring segmentation models passively process static images captured from fixed perspectives, limiting their applicability in Embodied AI, where agents must perform active perception in the continuous 360$^\circ$ environments. To bridge this gap, we introduce a novel task: Active Panoramic Referring Segmentation (APRS). In this setting, an agent is required to adjust its viewing direction ($Δθ, Δφ$) to explore the 360$^\circ$ environment, seeking the object specified by a user instruction for segmentation. To tackle this challenging task, we propose PanoSeeker, a memory-augmented agent for efficient APRS. Rather than relying on heuristic scanning, PanoSeeker integrates a Vision-Language Model (VLM) with EgoSphere, an explicit spatial visual memory. By progressively integrating sequential local observations into a unified 360$^\circ$ representation, EgoSphere enables the agent to plan efficient and non-redundant search trajectories. Once the target is found, the agent performs active viewpoint alignment and outputs the segmentation mask. Furthermore, we curate an expert-annotated search trajectory dataset with memory timelines for Supervised Fine-Tuning, followed by Reinforcement Learning post-training to explicitly optimize PanoSeeker's exploration efficiency. Extensive experiments on our newly established APRS benchmark demonstrate that PanoSeeker achieves superior search efficiency and segmentation accuracy, significantly outperforming adapted state-of-the-art baselines.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{tang2026seek,
  title = {Seek to Segment: Active Perception for Panoramic Referring Segmentation},
  author = {Song Tang and Shuming Hu and Xincheng Shuai and Henghui Ding and Yu-Gang Jiang},
  year = {2026},
  abstract = {Existing referring segmentation models passively process static images captured from fixed perspectives, limiting their applicability in Embodied AI, where agents must perform active perception in the continuous 360\$\textasciicircum{}\textbackslash{}circ\$ environments. To bridge this gap, we introduce a novel task: Active Panoramic Referring Segmentation (APRS). In this setting, an agent is required to adjust its viewing direction (\$Δθ, Δφ\$) to explore the 360\$\textasciicircum{}\textbackslash{}circ\$ environment, seeking the object specified by a user instruc},
  url = {https://arxiv.org/abs/2607.02497},
  keywords = {cs.CV},
  eprint = {2607.02497},
  archiveprefix = {arXiv},
}

Metadata

{}