Paper Detail

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang

Browse

Workflow Queues

arxiv Score 13.3

Published 2026-05-19 · First seen 2026-05-25

Research Track B · General AI

Open paper source

Abstract

A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and intents from production clickstream data; (b) a live-browser agent architecture that combines multimodal perception over visual and browser-structured observations with episodic memory and guardrails to conduct coherent shopping sessions across control and treatment storefronts; and (c) an evaluation protocol that compares simulated outcome shifts with observed shifts in real buyer behavior. We validate SimGym on A/B tests of visually driven UI theme changes from a major e-commerce platform across diverse storefronts and product categories. Empirical results show that SimGym agents achieve strong agreement with observed outcome shifts, attaining 77% directional alignment with add-to-cart shifts observed across interface variants in real-buyer traffic. It reduces experimental cycles from weeks to under an hour, enabling rapid experimentation without exposing real buyers to candidate variants.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{li2026simgym,
  title = {SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents},
  author = {Han Li and Vibhor Malik and Zahra Zanjani Foumani and Alberto Castelo and Shuang Xie and Ailin Fan and Keat Yang Koay and Yuanzheng Zhu and Meysam Feghhi and Ronie Uliana and Zhaoyu Zhang and Angelo Ocana Martins and Mingyu Zhao and Francis Pelland and Jonathan Faerman and Nikolas LeBlanc and Aaron Glazer and Andrew McNamara and Zhong Wu and Lingyun Wang},
  year = {2026},
  abstract = {A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and int},
  url = {https://arxiv.org/abs/2605.19219},
  keywords = {cs.AI},
  eprint = {2605.19219},
  archiveprefix = {arXiv},
}

Metadata

{}