Paper Detail

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan

arxiv Score 17.5

Published 2026-05-07 · First seen 2026-05-13

Research Track B · General AI

Abstract

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stable visual states while preserving interactive behavior and 2) LLM-based environment synthesis grounded in real-world websites and core web navigation skills. Using this framework, we scale RL training to thousands of diverse environments and tasks. Our best model, Weblica-8B, outperforms open-weight baselines of similar size across multiple web navigation benchmarks while using fewer inference steps, scales favorably with additional test-time compute, and is competitive with API models.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{kar2026weblica,
  title = {Weblica: Scalable and Reproducible Training Environments for Visual Web Agents},
  author = {Oğuzhan Fatih Kar and Roman Bachmann and Yuanzheng Gong and Anders Boesen Lindbo Larsen and Afshin Dehghan},
  year = {2026},
  abstract = {The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stabl},
  url = {https://arxiv.org/abs/2605.06761},
  keywords = {cs.AI, cs.CV, cs.LG},
  eprint = {2605.06761},
  archiveprefix = {arXiv},
}

Metadata

{}