Paper Detail
Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.
The study reveals that simple Sequential Fine-Tuning with low-rank adaptation (LoRA) is a surprisingly effective method for continual reinforcement learning in large Vision-Language-Action models, contrary to established belief. This simple recipe achieves high plasticity and minimal catastrophic forgetting, often outperforming more sophisticated methods due to a synergy between large pretrained models, parameter-efficient adaptation, and on-policy RL.
The provided abstract does not explicitly detail limitations or specific directions for future work, focusing instead on the successful results.
The authors conducted a systematic study by applying Sequential Fine-Tuning with low-rank adaptation (LoRA) to three large pretrained Vision-Language-Action models across five challenging lifelong reinforcement learning benchmarks.
This research challenges the conventional wisdom that complex methods are necessary for continual learning, positioning simple fine-tuning as a powerful and scalable approach for developing adaptive AI agents.
No ranking explanation is available yet.
No tags.
@article{hu2026simple,
title = {Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning},
author = {Jiaheng Hu and Jay Shim and Chen Tang and Yoonchang Sung and Bo Liu and Peter Stone and Roberto Martin-Martin},
year = {2026},
abstract = {Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five c},
url = {https://arxiv.org/abs/2603.11653},
keywords = {cs.LG, cs.RO},
eprint = {2603.11653},
archiveprefix = {arXiv},
}
{}