Paper Detail

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen

Browse

Workflow Queues

huggingface Score 5.5

Published 2026-03-30 · First seen 2026-03-31

General AI

Open paper source

Abstract

Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, structure and semantic coherence between edited and original images. To address these limitations, we introduce GEditBench v2, a comprehensive benchmark with 1,200 real-world user queries spanning 23 tasks, including a dedicated open-set category for unconstrained, out-of-distribution editing instructions beyond predefined tasks. Furthermore, we propose PVC-Judge, an open-source pairwise assessment model for visual consistency, trained via two novel region-decoupled preference data synthesis pipelines. Besides, we construct VCReward-Bench using expert-annotated preference pairs to assess the alignment of PVC-Judge with human judgments on visual consistency evaluation. Experiments show that our PVC-Judge achieves state-of-the-art evaluation performance among open-source models and even surpasses GPT-5.1 on average. Finally, by benchmarking 16 frontier editing models, we show that GEditBench v2 enables more human-aligned evaluation, revealing critical limitations of current models, and providing a reliable foundation for advancing precise image editing.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{jiang2026geditbench,
  title = {GEditBench v2: A Human-Aligned Benchmark for General Image Editing},
  author = {Zhangqi Jiang and Zheng Sun and Xianfang Zeng and Yufeng Yang and Xuanyang Zhang and Yongliang Wu and Wei Cheng and Gang Yu and Xu Yang and Bihan Wen},
  year = {2026},
  abstract = {Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, structure and semantic coherence between edited and original images. To address these limitations, we introduce GEditBench v2, a comprehensive benchmark with 1,200 real-world user },
  url = {https://huggingface.co/papers/2603.28547},
  keywords = {GEditBench v2, PVC-Judge, visual consistency, region-decoupled preference data synthesis, VCReward-Bench, image editing, human alignment, evaluation performance, huggingface daily},
  eprint = {2603.28547},
  archiveprefix = {arXiv},
}

Metadata

{}