Paper Detail

SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce

Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu

arxiv Score 11.8

Published 2026-02-01 · First seen 2026-03-27

Research Track B · General AI

Abstract

A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents operating in a live browser. SimGym extracts per-shop buyer profiles and intents from production interaction data, identifies distinct behavioral archetypes, and simulates cohort-weighted sessions across control and treatment storefronts. We validate SimGym against real human outcomes from real UI changes on a major e-commerce platform under confounder control. Even without alignment post training, SimGym agents achieve state of the art alignment with observed outcome shifts and reduces experiment cycles from weeks to under an hour , enabling rapid experimentation without exposure to real buyers.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{castelo2026simgym,
  title = {SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce},
  author = {Alberto Castelo and Zahra Zanjani Foumani and Ailin Fan and Keat Yang Koay and Vibhor Malik and Yuanzheng Zhu and Han Li and Meysam Feghhi and Ronie Uliana and Shuang Xie and Zhaoyu Zhang and Angelo Ocana Martins and Mingyu Zhao and Francis Pelland and Jonathan Faerman and Nikolas LeBlanc and Aaron Glazer and Andrew McNamara and Lingyun Wang and Zhong Wu},
  year = {2026},
  abstract = {A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents operating in a live browser. SimGym extracts per-shop buyer profiles and intents from production interaction data, identifies distinct behavioral archetypes, and simulates cohort-wei},
  url = {https://arxiv.org/abs/2602.01443},
  keywords = {cs.AI},
  eprint = {2602.01443},
  archiveprefix = {arXiv},
}

Metadata

{}