Paper Detail

Vision-Language Model Reasoning for Contextual Semantic Mapping in Intralogistics

Marvin Rüdt, Hao Pang, Constantin Enke, Zäzilia Seibold, Kai Furmans

arxiv Score 8.2

Published 2026-06-23 · First seen 2026-06-24

General AI

Abstract

Autonomous mobile robots operating in intralogistics environments rely on geometric maps for localization and navigation, but lack semantic understanding of objects and their contextual properties. We present a contextual semantic mapping pipeline that combines SLAM-based geometric mapping, SAM-based instance segmentation, instance clustering, and VLM multi-view reasoning to produce a contextual semantic map representation encoding geometric structure, object class, and object movability. By aggregating observations across multiple viewpoints and querying a VLM in a zero-shot, open-vocabulary setting, the pipeline infers contextual object properties--here demonstrated through movability--without requiring task-specific training or predefined object categories. We evaluate three VLMs under two prompting strategies and conduct a component-wise analysis of the pipeline. The proposed pipeline achieves 98.93 % mIoU for semantic classification and 89.17 % mAcc for object movability estimation. Component analysis identifies VLM reasoning as the primary bottleneck for contextual understanding and instance clustering as the main limitation for panoptic performance. The resulting semantic map supports context-aware filtering and robust navigation in dynamic intralogistics environments.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{rdt2026vision,
  title = {Vision-Language Model Reasoning for Contextual Semantic Mapping in Intralogistics},
  author = {Marvin Rüdt and Hao Pang and Constantin Enke and Zäzilia Seibold and Kai Furmans},
  year = {2026},
  abstract = {Autonomous mobile robots operating in intralogistics environments rely on geometric maps for localization and navigation, but lack semantic understanding of objects and their contextual properties. We present a contextual semantic mapping pipeline that combines SLAM-based geometric mapping, SAM-based instance segmentation, instance clustering, and VLM multi-view reasoning to produce a contextual semantic map representation encoding geometric structure, object class, and object movability. By agg},
  url = {https://arxiv.org/abs/2606.24814},
  keywords = {cs.RO},
  eprint = {2606.24814},
  archiveprefix = {arXiv},
}

Metadata

{}