Paper Detail

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Zhiqi Li, Chengrui Dong, Zhenhua Du, Hangning Zhou, Cong Qiu, Hailong Qin, Mu Yang, Dongxu Wei, Peidong Liu

huggingface Score 11.0

Published 2026-06-29 · First seen 2026-06-30

General AI

Abstract

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and deterministic pose-conditioned rendering given the sampled state. We instantiate this paradigm as NeuWorld: a transformer VAE learns locally anchored NIS from sparse posed frames, and a diffusion transformer evolves NIS conditioned on future camera trajectories and geometry-aware retrieved history. By reusing the VAE encoder as a unified conditioner, NeuWorld maps camera, reference-image, and history cues into the same NIS modality, avoiding external heterogeneous encoders. Trained from scratch on public posed-view data without pretrained video backbones or auxiliary 3D reconstructors, NeuWorld achieves strong long-horizon consistency with favorable inference efficiency.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{li2026walking,
  title = {Walking in the Implicit: Interactive World Exploration via Neural Scene Representation},
  author = {Zhiqi Li and Chengrui Dong and Zhenhua Du and Hangning Zhou and Cong Qiu and Hailong Qin and Mu Yang and Dongxu Wei and Peidong Liu},
  year = {2026},
  abstract = {Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and determinis},
  url = {https://huggingface.co/papers/2606.30045},
  keywords = {latent video frames, implicit state, Neural Implicit Scene, transformer VAE, diffusion transformer, pose-conditioned rendering, camera trajectories, geometry-aware retrieval, VAE encoder, unified conditioner, long-horizon consistency, code available, huggingface daily},
  eprint = {2606.30045},
  archiveprefix = {arXiv},
}

Metadata

{}