Paper Detail

Learning Transferable Dynamics Priors from Action to World Modeling

Ze Huang, Jiahui Zhang, Hairuo Liu, Chenxi Zhang, Ran Cheng, Li Zhang

huggingface Score 11.0

Published 2026-06-28 · First seen 2026-06-30

General AI

Abstract

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{huang2026learning,
  title = {Learning Transferable Dynamics Priors from Action to World Modeling},
  author = {Ze Huang and Jiahui Zhang and Hairuo Liu and Chenxi Zhang and Ran Cheng and Li Zhang},
  year = {2026},
  abstract = {We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors fro},
  url = {https://huggingface.co/papers/2606.29501},
  keywords = {world modeling, diffusion world model, action-conditioned, multi-view interactive, pretraining, robot manipulation, simulator-centric learning, policy-centric learning, video-action joint prediction, dynamics priors, huggingface daily},
  eprint = {2606.29501},
  archiveprefix = {arXiv},
}

Metadata

{}