Paper Detail

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin

arxiv Score 31.5

Published 2026-03-12 · First seen 2026-03-27

Research Track A · General AI

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

Key Findings

The study reveals that simple Sequential Fine-Tuning with low-rank adaptation (LoRA) is a surprisingly effective method for continual reinforcement learning in large Vision-Language-Action models, contrary to established belief. This simple recipe achieves high plasticity and minimal catastrophic forgetting, often outperforming more sophisticated methods due to a synergy between large pretrained models, parameter-efficient adaptation, and on-policy RL.

Limitations

The provided abstract does not explicitly detail limitations or specific directions for future work, focusing instead on the successful results.

Methodology

The authors conducted a systematic study by applying Sequential Fine-Tuning with low-rank adaptation (LoRA) to three large pretrained Vision-Language-Action models across five challenging lifelong reinforcement learning benchmarks.

Significance

This research challenges the conventional wisdom that complex methods are necessary for continual learning, positioning simple fine-tuning as a powerful and scalable approach for developing adaptive AI agents.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{hu2026simple,
  title = {Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning},
  author = {Jiaheng Hu and Jay Shim and Chen Tang and Yoonchang Sung and Bo Liu and Peter Stone and Roberto Martin-Martin},
  year = {2026},
  abstract = {Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five c},
  url = {https://arxiv.org/abs/2603.11653},
  keywords = {cs.LG, cs.RO},
  eprint = {2603.11653},
  archiveprefix = {arXiv},
}

Metadata

{}