Paper Detail

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang

arxiv Score 15.8

Published 2026-05-19 · First seen 2026-05-25

Research Track B · General AI

Abstract

A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and intents from production clickstream data; (b) a live-browser agent architecture that combines multimodal perception over visual and browser-structured observations with episodic memory and guardrails to conduct coherent shopping sessions across control and treatment storefronts; and (c) an evaluation protocol that compares simulated outcome shifts with observed shifts in real buyer behavior. We validate SimGym on A/B tests of visually driven UI theme changes from a major e-commerce platform across diverse storefronts and product categories. Empirical results show that SimGym agents achieve strong agreement with observed outcome shifts, attaining 77% directional alignment with add-to-cart shifts observed across interface variants in real-buyer traffic. It reduces experimental cycles from weeks to under an hour, enabling rapid experimentation without exposing real buyers to candidate variants.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{li2026simgym,
  title = {SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents},
  author = {Han Li and Vibhor Malik and Zahra Zanjani Foumani and Alberto Castelo and Shuang Xie and Ailin Fan and Keat Yang Koay and Yuanzheng Zhu and Meysam Feghhi and Ronie Uliana and Zhaoyu Zhang and Angelo Ocana Martins and Mingyu Zhao and Francis Pelland and Jonathan Faerman and Nikolas LeBlanc and Aaron Glazer and Andrew McNamara and Zhong Wu and Lingyun Wang},
  year = {2026},
  abstract = {A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and int},
  url = {https://arxiv.org/abs/2605.19219},
  keywords = {cs.AI},
  eprint = {2605.19219},
  archiveprefix = {arXiv},
}

Metadata

{}