Research Paper Cockpit

Daily Digest - 2026-05-25

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-05-25.

Papers

78 visible entries

arxiv Score 36.4

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention

2026-05-12 · Hamza Ahmed Durrani, Rafay Suleman Durrani

Research Track A · General AI

Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image captioning, visual question answering, and cross-modal retrieval. However, they face catastrophic forgetting when learning n…

Review
pending
Role
unreviewed
Read
now
arxiv Score 29.0

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

2026-05-19 · Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han Lù, Siva Reddy, Gunhee Kim

Research Track B · General AI

Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories…

Review
pending
Role
unreviewed
Read
now
arxiv Score 27.8

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

2026-05-21 · Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

General AI

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advant…

Review
pending
Role
unreviewed
Read
now
huggingface Score 24.0

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

2026-05-21 · Jinho Park, Youbin Kim, Hogun Park, Eunbyung Park

General AI

Spatio-temporal reasoning is a core capability for Multimodal Large Language Models (MLLMs) operating in the real world. As such, evaluating it precisely has become an essential challenge. However, existing spatio-temporal reasoning benchmark datasets primarily rely on static image sets or passively curated video data,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.5

SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

2026-05-21 · Javad Parsa, Enis Simsar, Amir Joudaki, Thomas Hofmann, André M. H. Teixeira

Research Track A · General AI

Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.9

Web Agents Should Adopt the Plan-Then-Execute Paradigm

2026-05-14 · Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner

Research Track B · General AI

ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reas…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.8

MemGym: a Long-Horizon Memory Environment for LLM Agents

2026-05-20 · Wujiang Xu, Yu Wang, Kai Mei, Kaiqu Liang, Zhenting Wang, Mingyu Jin, Han Zhang, Shi-Xiong Zhang, Wenyue Hua, Sambit Sahu, Dimitris N. Metaxas

Research Track A · Research Track B · General AI

Memory is a central capability for LLM agents operating across long-horizon tasks. Existing memory benchmarks predominantly evaluate retention of personalized information in multi-turn chat scenarios, overlooking the dynamic memory formation that occurs during extended agent execution. Consequently, the memory systems …

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.8

StepAudio 2.5 Technical Report

2026-05-22 · Bin Lin, Bo Zhao, Boyong Wu, Chao Yan, Chen Wu, Cheng Yi, Chengyuan Yao, Daijiao Liu, Fei Tian, Feng Tian, Haiyang Sun, Haoyang Zhang, Jiangjie Zhen, Jinglan Gong, Jun Chen, Li Xie, Peilin Li, Peng Yang, Pengfei Tan, Qingjian Lin, Runze Li, Shenghua Hu, Siyi Zhou, Wenwen Qu, Xiangyu Li, Xiangyu Tony Zhang, Xuerui Yang, Yang Yang, Yechang Huang, Yu Fu, Yuchu Luo, Yuxin Li, Yuxin Zhang, Zhengyan Sheng, Brian Li, Chang Zeng, Changlin Zhang, Chen Geng, Chenghao Dong, Chengli Feng, Dan Zhou, Danni Wan, Di Chen, Die Zhang, Dongqing Pang, Guanglong Yang, Guoqiang Hu, Huangxi Zhu, Jianzheng Gao, Jinghua Liang, Jinmei Wan, Junjie Yuan, Kang An, Lei Lei, Limin Zhong, Lun Cai, Mengqiang Ren, Min Xu, Mingliang Li, Mingxiao Li, Na Wang, Qiang Tong, Qiaoling Huang, Qingfu Du, Rui Wang, Shengchen Zhou, Shi Qiu, Shihao Peng, Shiliang Yang, Siqi Tu, Tianjiao Deng, Ting Xu, Tong Wang, WeiMing Niu, Wuxun Xie, Xianwei Zhang, Xianyu Feng, Xiaojia Liu, Xing Chen, Xiongbin Wu, Yan Wu, Yang Li, Yi Liu, Yifan Zhang, Yile Liu, Yongshen Long, Yu Luo, Yuanhao Ding, Yuhao Wang, Yuhe Yin, Yunfang Xu, Yuxiang Yang, Zhiguo Huang, Zhiyue Wu, Zichao Li, Zichao Zhou, Daxin Jiang, Future Li, Gang Yu, Xiangyu Zhang, Yibo Zhu

General AI

Unified audio-language modeling has emerged as a prominent trend in modern speech systems, promising to bring the reasoning capabilities of large language models to auditory tasks. However, existing unified foundations often struggle to match the depth of specialized systems across automatic speech recognition (ASR), t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

PhotoFlow: Agentic 3D Virtual Photography Missions

2026-05-22 · Jiarui Guo, Haojia Wei, Yiming Zhang, Yifei Liu, Yuning Gong, Hongjie Zhang, Xue Yang, Zhihang Zhong

General AI

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatia…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.5

Learning When to Adapt

2026-05-18 · Ali Zindari, Xiaowen Jiang, Rotem Mulayoff, Sebastian U. Stich

Research Track A · General AI

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable compromise between adapting to the fine-tuning distribution and preserving pre-trained behavior…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.9

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

2026-05-12 · Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung, Koushik Sen, Dawn Song

Research Track B · General AI

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue that benchmarks must be se…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.6

ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models

2026-05-22 · Fen Wang, Zekai Shao, Qiman Kang, Chunran Hu, Zhixuan Zhang, Lexu Xie, Chao Liu, Siming Chen

General AI

Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfull…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.6

DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs

2026-05-22 · Jiazhen Pan, Weixiang Shen, Jun Li, Julian Canisius, Felix Bitzer, Paula Roßmüller, Jiancheng Yang, Virginie Kreutzinger, Daniel Rueckert, Benedikt Wiestler

General AI

Medical diagnosis is not a single prediction from a fully specified vignette. It is a sequential workup: clinicians decide what evidence to obtain, revise a differential diagnosis, and stop when the diagnosis is sufficiently supported. Most medical AI benchmarks instead reveal the relevant context upfront and score onl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.6

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

2026-05-22 · Rim Assouel, Amir Bar, Michal Drozdzal, Adriana Romero-Soriano

General AI

Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understanding and acting as a l…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

2026-05-20 · Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

Research Track B · General AI

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requirin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

2026-05-21 · Haokun Wen, Xuemeng Song, Xinghao Xie, Xiaolin Chen, Xiangyu Zhao, Weili Guan

General AI

Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to dev…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.5

Dynamic Mixture of Latent Memories for Self-Evolving Agents

2026-05-21 · Dianzhi Yu, Vireo Zhang, Hongru Wang, Yanyu Chen, Minda Hu, Wanghan Xu, Siki Chen, Philip Torr, Zhenfei Yin, Irwin King

Research Track A · General AI

Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by updating model parameters, which induces catastrophic forgetting, or rely on external m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.1

Using Large Language Models in Physics Education

2026-05-22 · Jonah R. Donaldson, Aliya Navaz, Konstantinos Doran, Alysta Lim, Mario Campanelli

General AI

The rapid advancement of Large Language Models (LLMs) has introduced new possibilities and challenges in physics education, necessitating rigorous evaluation of their capabilities as both problem solvers and automated assessors. This paper presents the results of three complementary studies that evaluated frontier mode…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

2026-05-20 · Kei Hiroshima, Kento Uchida, Shinichi Shirakawa

Research Track A · General AI

Continual learning (CL) aims to train models sequentially on multiple tasks while mitigating catastrophic forgetting of previously learned knowledge. Recent advances in large pre-trained models (LPMs) and model merging techniques, such as MAGMAX, have demonstrated effective CL performance by combining task-specific par…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.6

ETCHR: Editing To Clarify and Harness Reasoning

2026-05-22 · Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang, Jiaqi Wang, Dahua Lin

General AI

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fixed predefined toolk…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

2026-05-21 · Ruofan Jin, Zaixi Zhang

General AI

Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

2026-05-22 · Joydeep Chandra

General AI

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

2026-05-22 · Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang, Bin Chen, Ke Chen, Yaowei Wang

General AI

High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

2026-05-22 · Michal Shlapentokh-Rothman, Prachi Garg, Yu-Xiong Wang, Derek Hoiem

General AI

Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or decompose the query into…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.0

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

2026-05-20 · Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, Bin Wu, Busheng Zhang, Mengru Wang, Yuqi Zhu, Ningyu Zhang, Keyan Ding, Qiang Zhang, Huajun Chen

General AI

The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current academic retrieval tools predominantly rely on superficial keyword match…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

2026-05-19 · Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang

Research Track B · General AI

A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM)…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

2026-05-20 · Chongrui Ye, Yuxiang Liu, Yu Wang, Haofei Yu, Yining Zhao, Ge Liu, Julian McAuley, Jiaxuan You

Research Track A · Research Track B · General AI

Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acquisition and consolidation into a single …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

One-Way Policy Optimization for Self-Evolving LLMs

2026-05-21 · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan

General AI

Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency and optimization instability. To stabilize training, existing methods typically impose …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.5

Anytime Training with Schedule-Free Spectral Optimization

2026-05-21 · Anuj Apte, Pranav Deshpande, Niraj Kumar, Shouvanik Chakrabarti, Junhyung Lyle Kim

Research Track A

Standard neural network training relies on learning-rate schedules tied to a fixed horizon, leading to strong path dependence and costly re-tuning as data availability changes. Schedule-Free (SF) methods address this by removing explicit schedules, yet SF-AdamW, the current state-of-the-art anytime optimizer, consisten…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.4

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

2026-05-14 · William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright, Chris Russell

Research Track B · General AI

As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four w…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

2026-05-18 · Boyuan Sun, Bowen Yin, Yuanming Li, Xihan Wei, Qibin Hou

General AI

We present SWIM (See What I Mean), a novel training strategy that aligns vision and language representations to enable fine-grained object understanding solely from textual prompts. Unlike existing approaches that require explicit visual prompts, such as masks or points, SWIM leverages mask supervision only during trai…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

2026-05-20 · Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini, Christos Kozyrakis

Research Track B · General AI

Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.6

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

2026-05-22 · Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo

General AI

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.4

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

2026-05-14 · Tri Cao, Yulin Chen, Hieu Cao, Yibo Li, Khoi Le, Thong Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng Yan, Bryan Hooi

Research Track B · General AI

Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content or visual interfaces. Existing guard models still suffer from limited generalization to unseen domains and attack pattern…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.4

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents

2026-05-15 · Chinmay Savadikar, Mingyu Zhao, Yuanzheng Zhu, Han Li, Shuang Xie, Alberto Castelo, Tianfu Wu, Lingyun Wang

Research Track B · General AI

Developing and evaluating e-commerce web agents requires environments that preserve meaningful task structure while enabling controllable, reproducible, and scalable scientific comparison. Existing methodologies force a tradeoff: live storefronts provide realism but are non-stationary, difficult to inspect, and irrepro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.4

Skim: Speculative Execution for Fast and Efficient Web Agents

2026-05-15 · Mike Wong, Kevin Hsieh, Suman Nath, Ravi Netravali

Research Track B · General AI

Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today's web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, browser rendering, and ReAct-style planning are applied to every step o…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

2026-05-21 · Karan Goyal

General AI

The rapid proliferation of Vision-Language Models (VLMs) is often framed as enabling unified multimodal knowledge discovery but rests on an under-examined assumption: that current VLMs faithfully synthesise multimodal data. We argue they often do not, and this gap reflects a trustworthiness problem in the dominant Visi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

Understanding Data Temporality Impact on Large Language Models Pre-training

2026-05-21 · Pilchen Hippolyte, Fabre Romain, Signe Talla Franck, Perez Patrick, Grave Edouard

Research Track A · General AI

Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training dynamics on the acquisition of time-sensitive factual knowledge, focusing specifically…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

LLM Code Smells: A Taxonomy and Detection Approach

2026-05-21 · Zacharie Chenail-Larcher, Brahim Mahmoudi, Naouel Moha, Quentin Stiévenart, Florent Avellaneda

General AI

Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM i…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

Agentic Proving for Program Verification

2026-05-22 · Alessandro Sosso, Akhil Arora, Bas Spitters

General AI

Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation. Our results…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

Leveraging Foundation Models for Causal Generative Modeling

2026-05-22 · Aneesh Komanduri, Xintao Wu

General AI

Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning. While existing approaches focus on integrating causal constraints during the training of generative models, they often lack a unified framework to leverage the zero-shot reasoning capabilities…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

2026-05-22 · Jianshu Zhang, Yijiang Li, Huifeixin Chen, Haoran Lu, Letian Xue, Bingyang Wang, Han Liu

General AI

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce numerical outputs such as action magnitudes and spatial coordinates. Although these numbers appear meaningful, it remains unclear whether these numerical outputs are genuinely grounded in spatial perception. Theref…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Self-Evolving Multi-Agent Systems via Decentralized Memory

2026-05-21 · Guangya Hao, Yunbo Long, Zhuokai Zhao

General AI

Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurring communication and coordination overhea…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.6

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

2026-05-22 · Haoyuan Wang, Xiaohao Liu, Jiajie Su, Jianmao Xiao, Chaochao Chen

General AI

Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieves strong reliability and locality, it often exhibits limited generality, failing to propagate edits across semantically equivalent visual an…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.4

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

2026-05-14 · Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ted Chaiwachirasak, Han Li, Lingyun Wang

Research Track B · General AI

LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, cont…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.6

Inferential Privacy Leakage in Anonymized Conversational AI Logs

2026-05-22 · S M Mehedi Zaman, Kiran Garimella

General AI

Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). Firs…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

Towards Explainability of SLMs by investigating Token Level Activation

2026-05-21 · Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti

Research Track A

Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically we…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

2026-05-18 · Woongyeng Yeo, Yumin Choi, Taekyung Ki, Sung Ju Hwang

General AI

Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level act…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.0

Continual Segmentation under Joint Nonstationarity

2026-05-19 · Prashant Pandey, Himanshu Kumar, Devineni Sri Venkatraya Chowdary, Brejesh Lall

Research Track A

Evolving data streams induce joint nonstationarity in continual semantic segmentation, where semantic classes, input distributions, and supervision availability change simultaneously over time. This setting reflects practical structured prediction systems, yet remains largely unexplored in prior continual learning work…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

2026-05-19 · Juncheng Wu, Hardy Chen, Haoqin Tu, Xianfeng Tang, Freda Shi, Hui Liu, Hanqing Lu, Cihang Xie, Yuyin Zhou

General AI

Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.9

Self-Regulated Learning in Essay Writing: Consistency of Strategies and Impact on Outcomes

2026-05-14 · Gloria Fernández-Nieto, Kiyoshige Garcés, Mladen Raković, Tongguang Li, Xinyu Li, Linxuan Zhao, Dragan Gašević

Research Track A

Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary s…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

2026-05-22 · Zisu Huang, Jingwen Xu, Yifan Yang, Ziyang Gong, Qihao Yang, Muzhao Tian, Xiaohua Wang, Changze Lv, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Xue Yang, Dongdong Chen, Xiaoqing Zheng, Chong Luo

General AI

Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Human Decision-Making with Persuasive and Narrative LLM Explanations

2026-05-22 · Laura R. Marusich, Mary Grace Kozuch Dhooghe, Jonathan Z. Bakdash, Murat Kantarcioglu

General AI

Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fairly accurate predictions, but also in their ability to generate cogent narrative explanations of those predictions. Prior work has demonstrated that people generally find AI narrati…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

2026-05-22 · Anastasiia Sedova, Natalie Schluter, Skyler Seto, Maartje ter Hoeve

General AI

Cross-lingual knowledge transfer is critical for building high-performing multilingual language models for languages with insufficient training data. When target language data is scarce, the knowledge required for many downstream tasks involving scientific reasoning, commonsense inference, and world knowledge must be a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

2026-05-22 · Yifan Lu, Qi Wu, Jay Zhangjie Wu, Zian Wang, Huan Ling, Sanja Fidler, Xuanchi Ren

General AI

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synth…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

SFG-ROS: A Resource-Aware Framework for Dense Multi-Agent Perception

2026-05-22 · Constantin Blessing, Elias Geiger, Jakob Häringer, Dennis Grewe, Markus Enzweiler

General AI

Deploying heterogeneous multi-agent robot fleets for collaborative perception requires robust data exchange and scalable software architectures. However, standard ROS 2 implementations often suffer from network saturation, namespace collisions, and severe computational overhead when distributing dense sensor streams ac…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

2026-05-22 · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

General AI

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified the…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

Natural Yet Challenging to Detect: Robust In-the-Wild TTS through EMA and Dual-Scoring Prompt Selection -- Submission for WildSpoof 2026 TTS Track

2026-05-22 · Renhe Sun, Jiayi Zhou, Haolin He, Yueying Feng, Jian Liu

General AI

In this technical report, we describe our submission for the WildSpoof Challenge TTS Track: Text-to-Speech with In-the-Wild Data. We introduce F5-TTS-DPS, a model built upon the F5-TTS architecture. Our approach integrates Exponential Moving Average (EMA) into supervised fine-tuning to stabilize training and improve ge…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

Strong Teacher Not Needed? On Distillation in LLM Pretraining

2026-05-22 · Taiming Lu, Zhuang Liu

General AI

Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield better students. In this work, we examine this assumption about distillation in large language model pretraining. By varying architecture sizes and training token budgets, we create strong-to-weak, same-level, and weak-…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

When Youth Enter the Algorithmic Wild: Discovering and Understanding Potentially Harmful Teen Videos on Douyin and Kwai

2026-05-22 · Shaoxuan Zhou, Yafei Sun, Jing Zhang, Xianghang Mi

General AI

Short-video platforms like Douyin and Kwai have become central to adolescent digital life, but they also risk exposing teens to algorithmically amplified harmful content. Despite its societal importance, the scale, mechanisms, and real-world impact of this exposure remain poorly understood. Measuring it is challenging:…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

2026-05-20 · Yongkang Liu, Xing Li, Mengjie Zhao, Shanru Zhang, Zijing Wang, Qian Li, Shi Feng, Feiliang Ren, Daling Wang, Hinrich Schütze

General AI

As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However,…

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.8

Echo: Learning from Experience Data via User-Driven Refinement

2026-05-21 · Hande Dong, Xiaoyun Liang, Jiarui Yu, Jiayi Lin, Changqing Ai, Feng Liu, Wenjun Zhang, Rongbi Wei, Chaofan Zhu, Linjie Che, Feng Wu, Xin Shen, Dexu Kong, Xiaotian Wang, Qiuyuan Chen, Bingxu An, Yueting Lei, Qiang Lin

General AI

Static "human data" faces inherent limitations: it is expensive to scale and bounded by the knowledge of its creators. Continuous learning from "experience data" - interactions between agents and their environments - promises to transcend these barriers. Today, the widespread deployment of AI agents grants us low-cost …

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.6

Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework

2026-05-22 · Xiao Cao, Yansong Qu, Xiangzhen, Chang, Wen Xiao, Jiakui Hu, Heyuan Li, Jialun Liu, Zhiyong Huang, Xuelong Li

General AI

Mask-free video object insertion has emerged as a challenging task, requiring harmonious integration of reference objects into source videos. However, existing methods struggle when references exhibit severe stylistic domain gaps with the source scene. To overcome this, we propose \textit{\textbf{Smart-Insertion-V}}, a…

Review
pending
Role
unreviewed
Read
later
huggingface Score 7.0

LatentUMM: Dual Latent Alignment for Unified Multimodal Models

2026-05-18 · Yinyi Luo, Wenwen Wang, Hayes Bai, Marios Savvides, Jindong Wang

General AI

Unified multimodal models (UMMs) achieve strong performance in both understanding and generation by learning a shared latent space, yet they often exhibit functional inconsistency between these two capabilities. We observe that this issue does not stem from a lack of shared representations, but from the absence of expl…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.8

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

2026-05-20 · Shuaida He, Liwen Chen, Long Feng

General AI

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.8

HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering

2026-05-21 · Yiran Wang, Chenyi Xiong, Ziyue Qin, Miao Zhang, Kui Xiao, Zhifei Li

General AI

Continual Visual Question Answering (VQA) requires learning from non-stationary streams of visual inputs and questions while preserving past knowledge. Most prior methods adapt by updating a largely shared parameter set. This often leads to cross-level task interference, hindering accurate adaptation to the current tas…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.8

SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

2026-05-22 · Zizhao Tong, Hongfeng Lai, Zeqing Wang, Zhaohu Xing, Kexu Cheng, Haoran Xu, Zhao Pu, Shangwen Zhu, Ruili Feng, Jian Zhao, Yan Zhang, Hao Tang, Yeying Jin, Ling Shao

General AI

Interactive world models for first-person shooter (FPS) games must resolve high-frequency overlapping control signals at every frame without disrupting unaffected regions. Existing methods inject actions globally and train on single titles, failing under dense FPS inputs. We observe that FPS actions are spatially selec…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models

2026-05-22 · Hongwu Peng, Ohiremen Dibua, Yuanjun Xiong, Yifan Gong, Jianming Zhang, Yan Kang

General AI

We propose Complete-muE, a framework which targets hyperparameter transfer across dense FFN and any Mixture-of-Experts (MoE) setups in transformer blocks. Existing tools such as $μ$P (requires fixed architectue) or SDE (requires fixed per-step token count) cannot directly solve the hyperparameter transfer problem in Mo…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Geo-Align: Video Generation Alignment via Metric Geometry Reward

2026-05-22 · Zizun Li, Haoyu Guo, Runzhe Teng, Chunhua Shen, Tong He

General AI

Camera-controlled video generation has achieved remarkable progress in recent years. However, existing video-to-video re-rendering methods primarily rely on Supervised Fine-Tuning using synthetic datasets. At present, there is an extreme scarcity of synchronized, multi-view real-world video data. Consequently, the prev…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

2026-05-22 · Shuhong Zheng, Michael Oechsle, Erik Sandström, Marie-Julie Rakotosaona, Federico Tombari, Igor Gilitschenski

General AI

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers inside these models. Thi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Harnessing Individual Motivation for Collective Efficiency: A Mechanism-Driven Distributed Optimization Method

2026-05-22 · Dongwei Xie, Xuhao Wang, Yujie Tang, Jie Song

General AI

In industrial scenarios involving multi-agent collective decision-making, centralized decision-making may not be admissible due to restrictive access to individual local information, while the conflicts between participants' self-interest and global performance may also impede collaborative distributed decision-making.…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Routing Equilibrium in Mixed-Autonomy Traffic Networks with Altruistic Autonomous Agents

2026-05-22 · Lihui Yi, Ermin Wei

General AI

Recent advancements in vehicle autonomy have drawn interest in understanding the impact of autonomous vehicles on traffic systems. In this paper, we study a traffic assignment problem in a mixed-autonomy setting where both human-driven and autonomous vehicles coexist. We model the interaction as a simultaneous routing …

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

2026-05-19 · Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

General AI

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to funda…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

2026-05-20 · Dong Chen, Fangyun Wei, Ziyu Wan, Dongdong Chen, Jiawei Zhang, Jinjing Zhao, Sirui Zhang, Yang Yue, Zhiyang Liang, Baining Guo, Chong Luo, Jianmin Bao, Ji Li, Lei Shi, Qinhong Yang, Xiuyu Wu, Xuelu Feng, Yan Lu, Yanchen Dong, Yitong Wang, Yunuo Chen

General AI

We introduce Lens, a 3.8B-parameter T2I model that achieves performance competitive with, and in several cases surpassing, state-of-the-art models with more than 6B parameters across various benchmarks, while requiring significantly less training compute. For example, Lens requires only about 19.3% of the training comp…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

2026-05-20 · Siyong Jian, Siyuan Li, Luyuan Zhang, Zedong Wang, Xin Jin, Ying Li, Cheng Tan, Huan Wang

General AI

Discrete autoregressive (AR) text-to-image (T2I) models pair a VQ tokenizer with an AR policy, and current post-training pipelines optimize only the policy while keeping the VQ decoder frozen. Recent diffusion T2I work, exemplified by REPA-E, has shown that the VAE itself constitutes a key alignment bottleneck, yet no …

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

Rethinking Cross-Layer Information Routing in Diffusion Transformers

2026-05-20 · Chao Xu, Maohua Li, Qirui Li, Yixuan Xu, Yanke Zhou, Yunhe Li, Cuifeng Shen, Hanlin Tang, Kan Liu, Tao Lan, Lin Qu, Shao-Qun Zhang

General AI

Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, objectives, and latent autoencoders -- has been extensively revisited. The residual stream that governs how information accumulates across laye…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

2026-05-18 · Ali Zindari, Rotem Mulayoff, Sebastian U. Stich

General AI

Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving performance close to full fine-tuning. Despite its widespread use, the theoretical behavior of Lo…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration

2026-05-21 · Adil Meric, Lin Geng Foo, Mert Kiray, Benjamin Busam, Rishabh Dabral, Christian Theobalt

General AI

We present CoMoGen, a controllable video generation framework that generates realistic interactive dynamics from a single binary mask sequence conditioned on an input image. CoMoGen introduces a lightweight MaskAdapter that encodes binary mask sequences into a latent residual signal, injected into the Multi Modal Diffu…

Review
pending
Role
unreviewed
Read
later