Paper Detail

WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Qingyan Bai, Ka Leong Cheng, Yue Yu, Yixuan Li, Yihao Meng, Zichen Liu, Yanhong Zeng, Yujun Shen, Qifeng Chen

arxiv Score 6.6

Published 2026-07-02 · First seen 2026-07-03

General AI

Abstract

We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework explicitly decouples semantic motion orchestration from visual generation. By leveraging an LLM to coordinate 3D trajectories with camera movements and subsequently employing these orchestrated trajectories as control signals for video generation, our approach ensures strict physical logic and appearance stability, successfully preserving the exact visual identities of dynamic entities even when they re-enter the scene after prolonged periods out of view. Experimental results demonstrate that our method supports the synthesis of complex and extended events with unprecedented controllability and persistent dynamic object memory. Project Page: https://worlddirector.github.io/

Workflow Status

Review status
pending
Role
unreviewed
Read priority
later
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{wang2026worlddirector,
  title = {WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory},
  author = {Hanlin Wang and Hao Ouyang and Qiuyu Wang and Wen Wang and Qingyan Bai and Ka Leong Cheng and Yue Yu and Yixuan Li and Yihao Meng and Zichen Liu and Yanhong Zeng and Yujun Shen and Qifeng Chen},
  year = {2026},
  abstract = {We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework explicitly decouples semantic motion orchestration from visual generation. By leveraging an LLM to coordinate 3D trajectories with camera movements and subsequently employing these },
  url = {https://arxiv.org/abs/2607.02517},
  keywords = {cs.CV, video world model, persistent dynamic object memory, viewpoint exploration, semantic motion orchestration, visual generation, LLM, 3D trajectories, camera movements, physical logic, appearance stability, huggingface daily},
  eprint = {2607.02517},
  archiveprefix = {arXiv},
}

Metadata

{}