arxiv
Score 36.4
2026-05-12 · Hamza Ahmed Durrani, Rafay Suleman Durrani
Research Track A · General AI
Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image captioning, visual question answering, and cross-modal retrieval. However, they face catastrophic forgetting when learning n…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 29.0
2026-05-19 · Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han Lù, Siva Reddy, Gunhee Kim
Research Track B · General AI
Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 27.8
2026-05-21 · Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao
General AI
The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advant…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 24.0
2026-05-21 · Jinho Park, Youbin Kim, Hogun Park, Eunbyung Park
General AI
Spatio-temporal reasoning is a core capability for Multimodal Large Language Models (MLLMs) operating in the real world. As such, evaluating it precisely has become an essential challenge. However, existing spatio-temporal reasoning benchmark datasets primarily rely on static image sets or passively curated video data,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.5
2026-05-21 · Javad Parsa, Enis Simsar, Amir Joudaki, Thomas Hofmann, André M. H. Teixeira
Research Track A · General AI
Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.9
2026-05-14 · Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner
Research Track B · General AI
ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.8
2026-05-20 · Wujiang Xu, Yu Wang, Kai Mei, Kaiqu Liang, Zhenting Wang, Mingyu Jin, Han Zhang, Shi-Xiong Zhang, Wenyue Hua, Sambit Sahu, Dimitris N. Metaxas
Research Track A · Research Track B · General AI
Memory is a central capability for LLM agents operating across long-horizon tasks. Existing memory benchmarks predominantly evaluate retention of personalized information in multi-turn chat scenarios, overlooking the dynamic memory formation that occurs during extended agent execution. Consequently, the memory systems …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.8
2026-05-22 · Bin Lin, Bo Zhao, Boyong Wu, Chao Yan, Chen Wu, Cheng Yi, Chengyuan Yao, Daijiao Liu, Fei Tian, Feng Tian, Haiyang Sun, Haoyang Zhang, Jiangjie Zhen, Jinglan Gong, Jun Chen, Li Xie, Peilin Li, Peng Yang, Pengfei Tan, Qingjian Lin, Runze Li, Shenghua Hu, Siyi Zhou, Wenwen Qu, Xiangyu Li, Xiangyu Tony Zhang, Xuerui Yang, Yang Yang, Yechang Huang, Yu Fu, Yuchu Luo, Yuxin Li, Yuxin Zhang, Zhengyan Sheng, Brian Li, Chang Zeng, Changlin Zhang, Chen Geng, Chenghao Dong, Chengli Feng, Dan Zhou, Danni Wan, Di Chen, Die Zhang, Dongqing Pang, Guanglong Yang, Guoqiang Hu, Huangxi Zhu, Jianzheng Gao, Jinghua Liang, Jinmei Wan, Junjie Yuan, Kang An, Lei Lei, Limin Zhong, Lun Cai, Mengqiang Ren, Min Xu, Mingliang Li, Mingxiao Li, Na Wang, Qiang Tong, Qiaoling Huang, Qingfu Du, Rui Wang, Shengchen Zhou, Shi Qiu, Shihao Peng, Shiliang Yang, Siqi Tu, Tianjiao Deng, Ting Xu, Tong Wang, WeiMing Niu, Wuxun Xie, Xianwei Zhang, Xianyu Feng, Xiaojia Liu, Xing Chen, Xiongbin Wu, Yan Wu, Yang Li, Yi Liu, Yifan Zhang, Yile Liu, Yongshen Long, Yu Luo, Yuanhao Ding, Yuhao Wang, Yuhe Yin, Yunfang Xu, Yuxiang Yang, Zhiguo Huang, Zhiyue Wu, Zichao Li, Zichao Zhou, Daxin Jiang, Future Li, Gang Yu, Xiangyu Zhang, Yibo Zhu
General AI
Unified audio-language modeling has emerged as a prominent trend in modern speech systems, promising to bring the reasoning capabilities of large language models to auditory tasks. However, existing unified foundations often struggle to match the depth of specialized systems across automatic speech recognition (ASR), t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-05-22 · Jiarui Guo, Haojia Wei, Yiming Zhang, Yifei Liu, Yuning Gong, Hongjie Zhang, Xue Yang, Zhihang Zhong
General AI
Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatia…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-05-18 · Ali Zindari, Xiaowen Jiang, Rotem Mulayoff, Sebastian U. Stich
Research Track A · General AI
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable compromise between adapting to the fine-tuning distribution and preserving pre-trained behavior…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.9
2026-05-12 · Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung, Koushik Sen, Dawn Song
Research Track B · General AI
Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue that benchmarks must be se…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.6
2026-05-22 · Fen Wang, Zekai Shao, Qiman Kang, Chunran Hu, Zhixuan Zhang, Lexu Xie, Chao Liu, Siming Chen
General AI
Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfull…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.6
2026-05-22 · Jiazhen Pan, Weixiang Shen, Jun Li, Julian Canisius, Felix Bitzer, Paula Roßmüller, Jiancheng Yang, Virginie Kreutzinger, Daniel Rueckert, Benedikt Wiestler
General AI
Medical diagnosis is not a single prediction from a fully specified vignette. It is a sequential workup: clinicians decide what evidence to obtain, revise a differential diagnosis, and stop when the diagnosis is sufficiently supported. Most medical AI benchmarks instead reveal the relevant context upfront and score onl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.6
2026-05-22 · Rim Assouel, Amir Bar, Michal Drozdzal, Adriana Romero-Soriano
General AI
Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understanding and acting as a l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-05-20 · Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi
Research Track B · General AI
LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requirin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-05-21 · Haokun Wen, Xuemeng Song, Xinghao Xie, Xiaolin Chen, Xiangyu Zhao, Weili Guan
General AI
Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to dev…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-21 · Dianzhi Yu, Vireo Zhang, Hongru Wang, Yanyu Chen, Minda Hu, Wanghan Xu, Siki Chen, Philip Torr, Zhenfei Yin, Irwin King
Research Track A · General AI
Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by updating model parameters, which induces catastrophic forgetting, or rely on external m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.1
2026-05-22 · Jonah R. Donaldson, Aliya Navaz, Konstantinos Doran, Alysta Lim, Mario Campanelli
General AI
The rapid advancement of Large Language Models (LLMs) has introduced new possibilities and challenges in physics education, necessitating rigorous evaluation of their capabilities as both problem solvers and automated assessors. This paper presents the results of three complementary studies that evaluated frontier mode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-05-20 · Kei Hiroshima, Kento Uchida, Shinichi Shirakawa
Research Track A · General AI
Continual learning (CL) aims to train models sequentially on multiple tasks while mitigating catastrophic forgetting of previously learned knowledge. Recent advances in large pre-trained models (LPMs) and model merging techniques, such as MAGMAX, have demonstrated effective CL performance by combining task-specific par…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.6
2026-05-22 · Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang, Jiaqi Wang, Dahua Lin
General AI
Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fixed predefined toolk…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-21 · Ruofan Jin, Zaixi Zhang
General AI
Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-05-22 · Joydeep Chandra
General AI
Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-05-22 · Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang, Bin Chen, Ke Chen, Yaowei Wang
General AI
High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-05-22 · Michal Shlapentokh-Rothman, Prachi Garg, Yu-Xiong Wang, Derek Hoiem
General AI
Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or decompose the query into…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.0
2026-05-20 · Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, Bin Wu, Busheng Zhang, Mengru Wang, Yuqi Zhu, Ningyu Zhang, Keyan Ding, Qiang Zhang, Huajun Chen
General AI
The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current academic retrieval tools predominantly rely on superficial keyword match…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-19 · Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang
Research Track B · General AI
A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM)…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-20 · Chongrui Ye, Yuxiang Liu, Yu Wang, Haofei Yu, Yining Zhao, Ge Liu, Julian McAuley, Jiaxuan You
Research Track A · Research Track B · General AI
Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acquisition and consolidation into a single …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-21 · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan
General AI
Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency and optimization instability. To stabilize training, existing methods typically impose …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-05-21 · Anuj Apte, Pranav Deshpande, Niraj Kumar, Shouvanik Chakrabarti, Junhyung Lyle Kim
Research Track A
Standard neural network training relies on learning-rate schedules tied to a fixed horizon, leading to strong path dependence and costly re-tuning as data availability changes. Schedule-Free (SF) methods address this by removing explicit schedules, yet SF-AdamW, the current state-of-the-art anytime optimizer, consisten…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.4
2026-05-14 · William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright, Chris Russell
Research Track B · General AI
As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four w…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-05-18 · Boyuan Sun, Bowen Yin, Yuanming Li, Xihan Wei, Qibin Hou
General AI
We present SWIM (See What I Mean), a novel training strategy that aligns vision and language representations to enable fine-grained object understanding solely from textual prompts. Unlike existing approaches that require explicit visual prompts, such as masks or points, SWIM leverages mask supervision only during trai…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-20 · Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini, Christos Kozyrakis
Research Track B · General AI
Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-05-22 · Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo
General AI
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.4
2026-05-14 · Tri Cao, Yulin Chen, Hieu Cao, Yibo Li, Khoi Le, Thong Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng Yan, Bryan Hooi
Research Track B · General AI
Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content or visual interfaces. Existing guard models still suffer from limited generalization to unseen domains and attack pattern…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.4
2026-05-15 · Chinmay Savadikar, Mingyu Zhao, Yuanzheng Zhu, Han Li, Shuang Xie, Alberto Castelo, Tianfu Wu, Lingyun Wang
Research Track B · General AI
Developing and evaluating e-commerce web agents requires environments that preserve meaningful task structure while enabling controllable, reproducible, and scalable scientific comparison. Existing methodologies force a tradeoff: live storefronts provide realism but are non-stationary, difficult to inspect, and irrepro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.4
2026-05-15 · Mike Wong, Kevin Hsieh, Suman Nath, Ravi Netravali
Research Track B · General AI
Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today's web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, browser rendering, and ReAct-style planning are applied to every step o…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-05-21 · Karan Goyal
General AI
The rapid proliferation of Vision-Language Models (VLMs) is often framed as enabling unified multimodal knowledge discovery but rests on an under-examined assumption: that current VLMs faithfully synthesise multimodal data. We argue they often do not, and this gap reflects a trustworthiness problem in the dominant Visi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-05-21 · Pilchen Hippolyte, Fabre Romain, Signe Talla Franck, Perez Patrick, Grave Edouard
Research Track A · General AI
Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training dynamics on the acquisition of time-sensitive factual knowledge, focusing specifically…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-21 · Zacharie Chenail-Larcher, Brahim Mahmoudi, Naouel Moha, Quentin Stiévenart, Florent Avellaneda
General AI
Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-05-22 · Alessandro Sosso, Akhil Arora, Bas Spitters
General AI
Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation. Our results…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-05-22 · Aneesh Komanduri, Xintao Wu
General AI
Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning. While existing approaches focus on integrating causal constraints during the training of generative models, they often lack a unified framework to leverage the zero-shot reasoning capabilities…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-05-22 · Jianshu Zhang, Yijiang Li, Huifeixin Chen, Haoran Lu, Letian Xue, Bingyang Wang, Han Liu
General AI
Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce numerical outputs such as action magnitudes and spatial coordinates. Although these numbers appear meaningful, it remains unclear whether these numerical outputs are genuinely grounded in spatial perception. Theref…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-21 · Guangya Hao, Yunbo Long, Zhuokai Zhao
General AI
Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurring communication and coordination overhea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.6
2026-05-22 · Haoyuan Wang, Xiaohao Liu, Jiajie Su, Jianmao Xiao, Chaochao Chen
General AI
Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieves strong reliability and locality, it often exhibits limited generality, failing to propagate edits across semantically equivalent visual an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.4
2026-05-14 · Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ted Chaiwachirasak, Han Li, Lingyun Wang
Research Track B · General AI
LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, cont…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.6
2026-05-22 · S M Mehedi Zaman, Kiran Garimella
General AI
Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). Firs…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.5
2026-05-21 · Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti
Research Track A
Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically we…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-05-18 · Woongyeng Yeo, Yumin Choi, Taekyung Ki, Sung Ju Hwang
General AI
Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level act…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-05-19 · Prashant Pandey, Himanshu Kumar, Devineni Sri Venkatraya Chowdary, Brejesh Lall
Research Track A
Evolving data streams induce joint nonstationarity in continual semantic segmentation, where semantic classes, input distributions, and supervision availability change simultaneously over time. This setting reflects practical structured prediction systems, yet remains largely unexplored in prior continual learning work…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-05-19 · Juncheng Wu, Hardy Chen, Haoqin Tu, Xianfeng Tang, Freda Shi, Hui Liu, Hanqing Lu, Cihang Xie, Yuyin Zhou
General AI
Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.9
2026-05-14 · Gloria Fernández-Nieto, Kiyoshige Garcés, Mladen Raković, Tongguang Li, Xinyu Li, Linxuan Zhao, Dragan Gašević
Research Track A
Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-05-22 · Zisu Huang, Jingwen Xu, Yifan Yang, Ziyang Gong, Qihao Yang, Muzhao Tian, Xiaohua Wang, Changze Lv, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Xue Yang, Dongdong Chen, Xiaoqing Zheng, Chong Luo
General AI
Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-05-22 · Laura R. Marusich, Mary Grace Kozuch Dhooghe, Jonathan Z. Bakdash, Murat Kantarcioglu
General AI
Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fairly accurate predictions, but also in their ability to generate cogent narrative explanations of those predictions. Prior work has demonstrated that people generally find AI narrati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-05-22 · Anastasiia Sedova, Natalie Schluter, Skyler Seto, Maartje ter Hoeve
General AI
Cross-lingual knowledge transfer is critical for building high-performing multilingual language models for languages with insufficient training data. When target language data is scarce, the knowledge required for many downstream tasks involving scientific reasoning, commonsense inference, and world knowledge must be a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-05-22 · Yifan Lu, Qi Wu, Jay Zhangjie Wu, Zian Wang, Huan Ling, Sanja Fidler, Xuanchi Ren
General AI
Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synth…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-05-22 · Constantin Blessing, Elias Geiger, Jakob Häringer, Dennis Grewe, Markus Enzweiler
General AI
Deploying heterogeneous multi-agent robot fleets for collaborative perception requires robust data exchange and scalable software architectures. However, standard ROS 2 implementations often suffer from network saturation, namespace collisions, and severe computational overhead when distributing dense sensor streams ac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-05-22 · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma
General AI
Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-05-22 · Renhe Sun, Jiayi Zhou, Haolin He, Yueying Feng, Jian Liu
General AI
In this technical report, we describe our submission for the WildSpoof Challenge TTS Track: Text-to-Speech with In-the-Wild Data. We introduce F5-TTS-DPS, a model built upon the F5-TTS architecture. Our approach integrates Exponential Moving Average (EMA) into supervised fine-tuning to stabilize training and improve ge…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-05-22 · Taiming Lu, Zhuang Liu
General AI
Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield better students. In this work, we examine this assumption about distillation in large language model pretraining. By varying architecture sizes and training token budgets, we create strong-to-weak, same-level, and weak-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-05-22 · Shaoxuan Zhou, Yafei Sun, Jing Zhang, Xianghang Mi
General AI
Short-video platforms like Douyin and Kwai have become central to adolescent digital life, but they also risk exposing teens to algorithmically amplified harmful content. Despite its societal importance, the scale, mechanisms, and real-world impact of this exposure remain poorly understood. Measuring it is challenging:…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-20 · Yongkang Liu, Xing Li, Mengjie Zhao, Shanru Zhang, Zijing Wang, Qian Li, Shi Feng, Feiliang Ren, Daling Wang, Hinrich Schütze
General AI
As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However,…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.8
2026-05-21 · Hande Dong, Xiaoyun Liang, Jiarui Yu, Jiayi Lin, Changqing Ai, Feng Liu, Wenjun Zhang, Rongbi Wei, Chaofan Zhu, Linjie Che, Feng Wu, Xin Shen, Dexu Kong, Xiaotian Wang, Qiuyuan Chen, Bingxu An, Yueting Lei, Qiang Lin
General AI
Static "human data" faces inherent limitations: it is expensive to scale and bounded by the knowledge of its creators. Continuous learning from "experience data" - interactions between agents and their environments - promises to transcend these barriers. Today, the widespread deployment of AI agents grants us low-cost …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.6
2026-05-22 · Xiao Cao, Yansong Qu, Xiangzhen, Chang, Wen Xiao, Jiakui Hu, Heyuan Li, Jialun Liu, Zhiyong Huang, Xuelong Li
General AI
Mask-free video object insertion has emerged as a challenging task, requiring harmonious integration of reference objects into source videos. However, existing methods struggle when references exhibit severe stylistic domain gaps with the source scene. To overcome this, we propose \textit{\textbf{Smart-Insertion-V}}, a…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 7.0
2026-05-18 · Yinyi Luo, Wenwen Wang, Hayes Bai, Marios Savvides, Jindong Wang
General AI
Unified multimodal models (UMMs) achieve strong performance in both understanding and generation by learning a shared latent space, yet they often exhibit functional inconsistency between these two capabilities. We observe that this issue does not stem from a lack of shared representations, but from the absence of expl…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.8
2026-05-20 · Shuaida He, Liwen Chen, Long Feng
General AI
Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.8
2026-05-21 · Yiran Wang, Chenyi Xiong, Ziyue Qin, Miao Zhang, Kui Xiao, Zhifei Li
General AI
Continual Visual Question Answering (VQA) requires learning from non-stationary streams of visual inputs and questions while preserving past knowledge. Most prior methods adapt by updating a largely shared parameter set. This often leads to cross-level task interference, hindering accurate adaptation to the current tas…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.8
2026-05-22 · Zizhao Tong, Hongfeng Lai, Zeqing Wang, Zhaohu Xing, Kexu Cheng, Haoran Xu, Zhao Pu, Shangwen Zhu, Ruili Feng, Jian Zhao, Yan Zhang, Hao Tang, Yeying Jin, Ling Shao
General AI
Interactive world models for first-person shooter (FPS) games must resolve high-frequency overlapping control signals at every frame without disrupting unaffected regions. Existing methods inject actions globally and train on single titles, failing under dense FPS inputs. We observe that FPS actions are spatially selec…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-22 · Hongwu Peng, Ohiremen Dibua, Yuanjun Xiong, Yifan Gong, Jianming Zhang, Yan Kang
General AI
We propose Complete-muE, a framework which targets hyperparameter transfer across dense FFN and any Mixture-of-Experts (MoE) setups in transformer blocks. Existing tools such as $μ$P (requires fixed architectue) or SDE (requires fixed per-step token count) cannot directly solve the hyperparameter transfer problem in Mo…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-22 · Zizun Li, Haoyu Guo, Runzhe Teng, Chunhua Shen, Tong He
General AI
Camera-controlled video generation has achieved remarkable progress in recent years. However, existing video-to-video re-rendering methods primarily rely on Supervised Fine-Tuning using synthetic datasets. At present, there is an extreme scarcity of synchronized, multi-view real-world video data. Consequently, the prev…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-22 · Shuhong Zheng, Michael Oechsle, Erik Sandström, Marie-Julie Rakotosaona, Federico Tombari, Igor Gilitschenski
General AI
Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers inside these models. Thi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-22 · Dongwei Xie, Xuhao Wang, Yujie Tang, Jie Song
General AI
In industrial scenarios involving multi-agent collective decision-making, centralized decision-making may not be admissible due to restrictive access to individual local information, while the conflicts between participants' self-interest and global performance may also impede collaborative distributed decision-making.…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-22 · Lihui Yi, Ermin Wei
General AI
Recent advancements in vehicle autonomy have drawn interest in understanding the impact of autonomous vehicles on traffic systems. In this paper, we study a traffic assignment problem in a mixed-autonomy setting where both human-driven and autonomous vehicles coexist. We model the interaction as a simultaneous routing …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.0
2026-05-19 · Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu
General AI
Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to funda…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.0
2026-05-20 · Dong Chen, Fangyun Wei, Ziyu Wan, Dongdong Chen, Jiawei Zhang, Jinjing Zhao, Sirui Zhang, Yang Yue, Zhiyang Liang, Baining Guo, Chong Luo, Jianmin Bao, Ji Li, Lei Shi, Qinhong Yang, Xiuyu Wu, Xuelu Feng, Yan Lu, Yanchen Dong, Yitong Wang, Yunuo Chen
General AI
We introduce Lens, a 3.8B-parameter T2I model that achieves performance competitive with, and in several cases surpassing, state-of-the-art models with more than 6B parameters across various benchmarks, while requiring significantly less training compute. For example, Lens requires only about 19.3% of the training comp…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.0
2026-05-20 · Siyong Jian, Siyuan Li, Luyuan Zhang, Zedong Wang, Xin Jin, Ying Li, Cheng Tan, Huan Wang
General AI
Discrete autoregressive (AR) text-to-image (T2I) models pair a VQ tokenizer with an AR policy, and current post-training pipelines optimize only the policy while keeping the VQ decoder frozen. Recent diffusion T2I work, exemplified by REPA-E, has shown that the VAE itself constitutes a key alignment bottleneck, yet no …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.0
2026-05-20 · Chao Xu, Maohua Li, Qirui Li, Yixuan Xu, Yanke Zhou, Yunhe Li, Cuifeng Shen, Hanlin Tang, Kan Liu, Tao Lan, Lin Qu, Shao-Qun Zhang
General AI
Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, objectives, and latent autoencoders -- has been extensively revisited. The residual stream that governs how information accumulates across laye…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.8
2026-05-18 · Ali Zindari, Rotem Mulayoff, Sebastian U. Stich
General AI
Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving performance close to full fine-tuning. Despite its widespread use, the theoretical behavior of Lo…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.8
2026-05-21 · Adil Meric, Lin Geng Foo, Mert Kiray, Benjamin Busam, Rishabh Dabral, Christian Theobalt
General AI
We present CoMoGen, a controllable video generation framework that generates realistic interactive dynamics from a single binary mask sequence conditioned on an input image. CoMoGen introduces a lightweight MaskAdapter that encodes binary mask sequences into a latent residual signal, injected into the Multi Modal Diffu…
- Review
- pending
- Role
- unreviewed
- Read
- later