Paper Detail

RewardFlow: Generate Images by Optimizing What You Reward

Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou

arxiv Score 11.8

Published 2026-04-09 · First seen 2026-04-10

General AI

Abstract

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{susladkar2026rewardflow,
  title = {RewardFlow: Generate Images by Optimizing What You Reward},
  author = {Onkar Susladkar and Dong-Hwan Jang and Tushar Prakash and Adheesh Juvekar and Vedant Shah and Ayush Barik and Nabeel Bashir and Muntasir Wahed and Ritish Shrirao and Ismini Lourentzou},
  year = {2026},
  abstract = {We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterog},
  url = {https://arxiv.org/abs/2604.08536},
  keywords = {cs.CV, cs.AI, Computer science, Inference, Artificial intelligence, Differentiable function, Object (grammar)},
  eprint = {2604.08536},
  archiveprefix = {arXiv},
}

Metadata

{}