Paper Detail

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Zhenhao Yang, Xiaoshi Wu, Zhengyao Lv, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Kun Gai, Kwan-Yee K. Wong

huggingface Score 8.0

Published 2026-05-29 · First seen 2026-06-01

General AI

Abstract

Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïve learnable memory architectures in long-horizon extrapolation, namely computational inefficiency and attention dispersion. Through a systematic analysis of attention dispersion, we propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. Extensive experiments demonstrate that DecMem significantly outperforms current state-of-the-art methods. By ensuring precise and efficient long-term memory and achieving superior extrapolation capabilities, DecMem enables minute-level controllable long video generation with high fidelity and consistency.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{yang2026decmem,
  title = {DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory},
  author = {Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong},
  year = {2026},
  abstract = {Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïve learnable memory architectures in long-horizo},
  url = {https://huggingface.co/papers/2605.31336},
  keywords = {video generative models, world models, spatio-temporal consistency, long-horizon reasoning, learnable memory, attention dispersion, sparse global memory, anchored local memory, video generation, extrapolation, code available, huggingface daily},
  eprint = {2605.31336},
  archiveprefix = {arXiv},
}

Metadata

{}