Paper Detail

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo

arxiv Score 19.3

Published 2026-04-16 · First seen 2026-04-17

Research Track B · General AI

Abstract

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{li2026mm,
  title = {MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation},
  author = {Yan Li and Zezi Zeng and Yifan Yang and Yuqing Yang and Ning Liao and Weiwei Guo and Lili Qiu and Mingxi Cheng and Qi Dai and Zhendong Wang and Zhengyuan Yang and Xue Yang and Ji Li and Lijuan Wang and Chong Luo},
  year = {2026},
  abstract = {The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generat},
  url = {https://arxiv.org/abs/2604.15309},
  keywords = {cs.CV, cs.AI, cs.CL, Web page, Computer science, Benchmark (surveying), World Wide Web, Web modeling},
  eprint = {2604.15309},
  archiveprefix = {arXiv},
}

Metadata

{}