Paper Detail

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Zhiqi Li, Chengrui Dong, Zhenhua Du, Hangning Zhou, Cong Qiu, Hailong Qin, Mu Yang, Dongxu Wei, Peidong Liu

Browse

Workflow Queues

huggingface Score 11.0

Published 2026-06-29 · First seen 2026-06-30

General AI

Open paper source

Abstract

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and deterministic pose-conditioned rendering given the sampled state. We instantiate this paradigm as NeuWorld: a transformer VAE learns locally anchored NIS from sparse posed frames, and a diffusion transformer evolves NIS conditioned on future camera trajectories and geometry-aware retrieved history. By reusing the VAE encoder as a unified conditioner, NeuWorld maps camera, reference-image, and history cues into the same NIS modality, avoiding external heterogeneous encoders. Trained from scratch on public posed-view data without pretrained video backbones or auxiliary 3D reconstructors, NeuWorld achieves strong long-horizon consistency with favorable inference efficiency.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{li2026walking,
  title = {Walking in the Implicit: Interactive World Exploration via Neural Scene Representation},
  author = {Zhiqi Li and Chengrui Dong and Zhenhua Du and Hangning Zhou and Cong Qiu and Hailong Qin and Mu Yang and Dongxu Wei and Peidong Liu},
  year = {2026},
  abstract = {Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and determinis},
  url = {https://huggingface.co/papers/2606.30045},
  keywords = {latent video frames, implicit state, Neural Implicit Scene, transformer VAE, diffusion transformer, pose-conditioned rendering, camera trajectories, geometry-aware retrieval, VAE encoder, unified conditioner, long-horizon consistency, code available, huggingface daily},
  eprint = {2606.30045},
  archiveprefix = {arXiv},
}

Metadata

{}