Paper Detail

All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation

Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han

arxiv Score 19.4

Published 2026-03-15 · First seen 2026-03-27

Research Track A · General AI

Abstract

Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong VLN (AML-VLN) problem. Existing parameter-efficient adapters (e.g., LoRA and its variants) are limited by their two-dimensional matrix form, which fails to capture the multi-hierarchical navigation knowledge spanning multiple scenes and environments. To address this, we propose Tucker Adaptation (TuKA), which represents the multi-hierarchical navigation knowledge as a high-order tensor and leverages Tucker decomposition to decouple the knowledge into shared subspaces and scenario-specific experts. We further introduce a decoupled knowledge incremental learning strategy to consolidate shared subspaces while constraining specific experts for decoupled lifelong learning. Building on TuKA, we also develop a VLN agent named AlldayWalker, which continually learns across multiple navigation scenarios, achieving all-day multi-scenes navigation. Extensive experiments show that AlldayWalker consistently outperforms state-of-the-art baselines.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{wang2026all,
  title = {All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation},
  author = {Xudong Wang and Gan Li and Zhiyu Liu and Yao Wang and Lianqing Liu and Zhi Han},
  year = {2026},
  abstract = {Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong VLN (AML-VLN) problem. Existing parameter-efficient adapters (e.g., LoRA and its variants) are limited by their two-dimensional matrix form, which fails to capture the multi-hierarc},
  url = {https://arxiv.org/abs/2603.14276},
  keywords = {cs.CV, cs.AI},
  eprint = {2603.14276},
  archiveprefix = {arXiv},
}

Metadata

{}