Paper Detail

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan

Browse

Workflow Queues

arxiv Score 15.5

Published 2026-05-07 · First seen 2026-05-13

Research Track B · General AI

Open paper source

Abstract

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stable visual states while preserving interactive behavior and 2) LLM-based environment synthesis grounded in real-world websites and core web navigation skills. Using this framework, we scale RL training to thousands of diverse environments and tasks. Our best model, Weblica-8B, outperforms open-weight baselines of similar size across multiple web navigation benchmarks while using fewer inference steps, scales favorably with additional test-time compute, and is competitive with API models.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{kar2026weblica,
  title = {Weblica: Scalable and Reproducible Training Environments for Visual Web Agents},
  author = {Oğuzhan Fatih Kar and Roman Bachmann and Yuanzheng Gong and Anders Boesen Lindbo Larsen and Afshin Dehghan},
  year = {2026},
  abstract = {The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stabl},
  url = {https://arxiv.org/abs/2605.06761},
  keywords = {cs.AI, cs.CV, cs.LG},
  eprint = {2605.06761},
  archiveprefix = {arXiv},
}

Metadata

{}