Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

Key Findings

The study reveals that simple Sequential Fine-Tuning with low-rank adaptation (LoRA) is a surprisingly effective method for continual reinforcement learning in large Vision-Language-Action models, contrary to established belief. This simple recipe achieves high plasticity and minimal catastrophic forgetting, often outperforming more sophisticated methods due to a synergy between large pretrained models, parameter-efficient adaptation, and on-policy RL.

Limitations

The provided abstract does not explicitly detail limitations or specific directions for future work, focusing instead on the successful results.

Methodology

The authors conducted a systematic study by applying Sequential Fine-Tuning with low-rank adaptation (LoRA) to three large pretrained Vision-Language-Action models across five challenging lifelong reinforcement learning benchmarks.

Significance

This research challenges the conventional wisdom that complex methods are necessary for continual learning, positioning simple fine-tuning as a powerful and scalable approach for developing adaptive AI agents.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{hu2026simple,
  title = {Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning},
  author = {Jiaheng Hu and Jay Shim and Chen Tang and Yoonchang Sung and Bo Liu and Peter Stone and Roberto Martin-Martin},
  year = {2026},
  abstract = {Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five c},
  url = {https://arxiv.org/abs/2603.11653},
  keywords = {cs.LG, cs.RO},
  eprint = {2603.11653},
  archiveprefix = {arXiv},
}

Metadata

{}