LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Abstract

Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. Our key design is a three-stage distillation pipeline that progressively transfers editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor, enabling stable long-horizon edits without sacrificing visual fidelity. To further support real-time deployment, we introduce an AR-oriented mask cache that reuses region-related computation across frames, substantially reducing redundant processing and accelerating inference. Finally, we establish a dedicated benchmark for streaming video editing. Extensive evaluations demonstrate that our method achieves state-of-the-art visual quality among streaming baselines while drastically boosting inference speed to 12.66 FPS, making it suitable for interactive and augmented reality applications.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{wang2026liveedit,
  title = {LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing},
  author = {Xinyu Wang and Chongbo Zhao and Fangneng Zhan and Yue Ma},
  year = {2026},
  abstract = {Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming vid},
  url = {https://huggingface.co/papers/2606.26740},
  keywords = {streaming video editing, causal editing, frame-by-frame editing, content preservation, real-time responsiveness, three-stage distillation pipeline, bidirectional foundation model, unidirectional streaming editor, long-horizon edits, AR-oriented mask cache, inference speed, interactive applications, augmented reality, code available, huggingface daily},
  eprint = {2606.26740},
  archiveprefix = {arXiv},
}

Metadata

{}