Paper Detail

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang

Browse

Workflow Queues

huggingface Score 15.5

Published 2026-04-30 · First seen 2026-05-01

General AI

Open paper source

Abstract

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which results in a failure mode that we term blind execution. To address this gap, we introduce InteractWeb-Bench, the first multimodal interactive benchmark for website generation under non-expert low-code user conditions. InteractWeb-Bench introduces four types of user agents and persona-driven instruction perturbations to systematically simulate diverse user behaviors, including ambiguity, redundancy, and contradiction, grounded in requirement engineering defect taxonomies. We develop an interactive execution environment for agents, featuring a unified action space comprising Clarify, Implement, Verify, and Submit, enabling iterative intent refinement, code synthesis, and visual feedback-based validation. Extensive experiments and analysis reveal that frontier MLLM-based agents remain trapped in blind execution, exposing limitations in intent recognition and adaptive interaction.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{wang2026interactweb,
  title = {InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?},
  author = {Qiyao Wang and Haoran Hu and Longze Chen and Hongbo Wang and Hamid Alinejad-Rokny and Yuan Lin and Min Yang},
  year = {2026},
  abstract = {With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and },
  url = {https://huggingface.co/papers/2604.27419},
  keywords = {multimodal large language models, coding agents, website generation, interactive benchmark, user agents, persona-driven instruction perturbations, requirement engineering defect taxonomies, interactive execution environment, unified action space, blind execution, intent recognition, adaptive interaction, code available, huggingface daily},
  eprint = {2604.27419},
  archiveprefix = {arXiv},
}

Metadata

{}