Paper Detail
Xinyu Wang, Chongbo Zhao, Fangneng Zhan, Yue Ma
Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. Our key design is a three-stage distillation pipeline that progressively transfers editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor, enabling stable long-horizon edits without sacrificing visual fidelity. To further support real-time deployment, we introduce an AR-oriented mask cache that reuses region-related computation across frames, substantially reducing redundant processing and accelerating inference. Finally, we establish a dedicated benchmark for streaming video editing. Extensive evaluations demonstrate that our method achieves state-of-the-art visual quality among streaming baselines while drastically boosting inference speed to 12.66 FPS, making it suitable for interactive and augmented reality applications.
No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.
No ranking explanation is available yet.
No tags.
@misc{wang2026liveedit,
title = {LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing},
author = {Xinyu Wang and Chongbo Zhao and Fangneng Zhan and Yue Ma},
year = {2026},
abstract = {Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming vid},
url = {https://huggingface.co/papers/2606.26740},
keywords = {streaming video editing, causal editing, frame-by-frame editing, content preservation, real-time responsiveness, three-stage distillation pipeline, bidirectional foundation model, unidirectional streaming editor, long-horizon edits, AR-oriented mask cache, inference speed, interactive applications, augmented reality, code available, huggingface daily},
eprint = {2606.26740},
archiveprefix = {arXiv},
}
{}